Arjun Guha

Verified

Northeastern University · Software Engineering

Active 2005–2026

h-index33

Citations3.9k

Papers12852 last 5y

Funding$1.9M

Faculty page

See your match with Arjun Guha — sign in to PhdFit.Sign in

About

Arjun Guha is an associate professor in the Khoury College of Computer Sciences at Northeastern University, based in Boston. His research focuses on programming languages, with particular interest in security and reliability problems in web programming, systems, and robotics. Guha uses tools and techniques from programming languages to address these issues, and one of his recent projects aims to make serverless computing more cost-effective, reliable, and applicable. He is a member of the Programming Research Laboratory. Prior to joining Northeastern, Guha was an associate professor at the University of Massachusetts Amherst and a postdoctoral research associate at Cornell University. His work has received several awards, including an OOPSLA Most Influential Paper Award, a PLDI Distinguished Paper Award, and a PACT Best Paper Award. In his free time, Guha enjoys running, cooking, and reading.

Research topics

Artificial Intelligence
Computer Science
World Wide Web
Machine Learning
Operating system
Programming language
Software engineering
Theoretical computer science

Selected publications

Learning Reasoning World Models for Parallel Code
arXiv (Cornell University) · 2026-04-22
preprintOpen access
Large language models have shown remarkable ability in serial code generation, but they still struggle with parallel code for which training data is comparatively scarce. A common remedy is to use coding agents that interact with external tools, but tool calls can be costly and sometimes impractical, e.g., for partially written code. We propose Parallel-Code World Models (PCWMs), reasoning LLMs that aim to predict tool outcomes directly from parallel source code. To train PCWMs, we design a novel exploration and data generation pipeline that samples diverse parallel-coding problems and candidate implementations across multiple domains, then executes them via tools to record data races and performance profiles. From these, we synthesize hindsight reasoning traces that causally connect source code to observed tool outcomes. Fine-tuning on the resulting data yields noticeable gains, with a 7B-parameter world model improving from 64.3% to 72.8% accuracy in race-outcome prediction, while an 8B-parameter model improves in a performance profiling task from 49.3% to 58.6% accuracy. Furthermore, when open-weight models were tasked with fixing data races, world-model feedback improved their race-fixing rates relative to self-feedback by 2.7%-9.1% using our 7B-parameter world model and by 6.1%-11.1% using our 14B-parameter world model. Our results suggest that reasoning models have the potential to serve as practical substitutes for external tool calls in parallel-coding agents.
Publisher DOI
Steering Code LLMs with Activation Directions for Language and Library Control
arXiv (Cornell University) · 2026-03-24
preprintOpen access
Code LLMs often default to particular programming languages and libraries under neutral prompts. We investigate whether these preferences are encoded as approximately linear directions in activation space that can be manipulated at inference time. Using a difference-in-means method, we estimate layer-wise steering vectors for five language/library pairs and add them to model hidden states during generation. Across three open-weight code LLMs, these interventions substantially increase generation toward the target ecosystem under neutral prompts and often remain effective even when prompts explicitly request the opposite choice. Steering strength varies by model and target, with common ecosystems easier to induce than rarer alternatives, and overly strong interventions can reduce output quality. Overall, our results suggest that code-style preferences in LLMs are partly represented by compact, steerable structure in activation space.
Publisher DOI
Steering Code LLMs with Activation Directions for Language and Library Control
ArXiv.org · 2026-03-24
articleOpen access
Code LLMs often default to particular programming languages and libraries under neutral prompts. We investigate whether these preferences are encoded as approximately linear directions in activation space that can be manipulated at inference time. Using a difference-in-means method, we estimate layer-wise steering vectors for five language/library pairs and add them to model hidden states during generation. Across three open-weight code LLMs, these interventions substantially increase generation toward the target ecosystem under neutral prompts and often remain effective even when prompts explicitly request the opposite choice. Steering strength varies by model and target, with common ecosystems easier to induce than rarer alternatives, and overly strong interventions can reduce output quality. Overall, our results suggest that code-style preferences in LLMs are partly represented by compact, steerable structure in activation space.
Publisher OA PDF
Learning Reasoning World Models for Parallel Code
ArXiv.org · 2026-04-22
articleOpen access
Large language models have shown remarkable ability in serial code generation, but they still struggle with parallel code for which training data is comparatively scarce. A common remedy is to use coding agents that interact with external tools, but tool calls can be costly and sometimes impractical, e.g., for partially written code. We propose Parallel-Code World Models (PCWMs), reasoning LLMs that aim to predict tool outcomes directly from parallel source code. To train PCWMs, we design a novel exploration and data generation pipeline that samples diverse parallel-coding problems and candidate implementations across multiple domains, then executes them via tools to record data races and performance profiles. From these, we synthesize hindsight reasoning traces that causally connect source code to observed tool outcomes. Fine-tuning on the resulting data yields noticeable gains, with a 7B-parameter world model improving from 64.3% to 72.8% accuracy in race-outcome prediction, while an 8B-parameter model improves in a performance profiling task from 49.3% to 58.6% accuracy. Furthermore, when open-weight models were tasked with fixing data races, world-model feedback improved their race-fixing rates relative to self-feedback by 2.7%-9.1% using our 7B-parameter world model and by 6.1%-11.1% using our 14B-parameter world model. Our results suggest that reasoning models have the potential to serve as practical substitutes for external tool calls in parallel-coding agents.
Publisher OA PDF
Understanding How CodeLLMs (Mis)Predict Types with Activation Steering
2025-01-01
articleOpen accessSenior author
Publisher OA PDF DOI
ReasoningWeekly: A General Knowledge and Verbal Reasoning Challenge for Large Language Models
2025-01-01
articleOpen accessSenior author
Zixuan Wu, Francesca Lucchetti, Aleksander Boruch-Gruszecki, Jingmiao Zhao, Carolyn Jane Anderson, Joydeep Biswas, Federico Cassano, Arjun Guha. Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics. 2025.
Publisher DOI
ReasoningWeekly: A General Knowledge and Verbal Reasoning Challenge for Large Language Models
ArXiv.org · 2025-02-03 · 1 citations
preprintOpen accessSenior author
Existing benchmarks for frontier models often test specialized, "PhD-level" knowledge that is difficult for non-experts to grasp. In contrast, we present a benchmark with 613 problems based on the NPR Sunday Puzzle Challenge that requires only general knowledge. Our benchmark is challenging for both humans and models; however correct solutions are easy to verify, and models' mistakes are easy to spot. As LLMs are more widely deployed in society, we believe it is useful to develop benchmarks for frontier models that humans can understand without the need for deep domain expertise. Our work reveals capability gaps that are not evident in existing benchmarks: OpenAI o1 significantly outperforms other reasoning models on our benchmark, despite being on par with other models when tested on benchmarks that test specialized knowledge. Furthermore, our analysis of reasoning outputs uncovers new kinds of failures. DeepSeek R1, for instance, often concedes with "I give up" before providing an answer that it knows is wrong. R1 can also be remarkably "uncertain" in its output and in rare cases, it does not "finish thinking," which suggests the need for techniques to ``wrap up'' before the context window limit is reached. We also quantify the effectiveness of reasoning longer to identify the point beyond which more reasoning is unlikely to improve accuracy on our benchmark.
Publisher OA PDF DOI
Bridging the Gap Between Binary and Source Based Package Management in Spack
2025-11-12 · 1 citations
articleOpen access
Binary package managers install software quickly but they limit configurability due to rigid ABI requirements that ensure compatibility between binaries. Source package managers provide flexibility in building software, but compilation can be slow. For example, installing an HPC code with a new MPI implementation may result in a full rebuild. Spack, a widely deployed, HPC-focused package manager, can use source and pre-compiled binaries, but lacks a binary compatibility model, so it cannot mix binaries not built together. We present splicing, an extension to Spack that models binary compatibility between packages and allows seamless mixing of source and binary distributions. Splicing augments Spack’s packaging language and dependency resolution engine to reuse compatible binaries but maintains the flexibility of source builds. It incurs minimal installation-time overhead and allows rapid installation from binaries, even for ABI-sensitive dependencies like MPI that would otherwise require many rebuilds.
Publisher DOI
Substance Beats Style: Why Beginning Students Fail to Code with LLMs
2025-01-01 · 2 citations
articleOpen access
Francesca Lucchetti, Zixuan Wu, Arjun Guha, Molly Q Feldman, Carolyn Jane Anderson. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025.
Publisher OA PDF DOI
Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research Directions
Lecture notes in computer science · 2025-11-23 · 1 citations
book-chapterOpen access
Publisher OA PDF DOI

Recent grants

SHF:Small:A Language-based Approach to Faster and Safer Serverless Computing
NSF · $457k · 2020–2024
NeTS: Large: Collaborative Research:Programmable Inter-Domain Observation and Control
NSF · $692k · 2014–2019
Collaborative Research: FMitF: Track I: Game Theoretic Updates for Network and Cloud Functions
NSF · $295k · 2020–2021
Collaborative Research: FMitF: Track I: Game Theoretic Updates for Network and Cloud Functions
NSF · $295k · 2020–2024
Collaborative Research: SHF: Small: Interactive Synthesis and Repair For Robot Programs
NSF · $194k · 2020–2024

Frequent coauthors

Shriram Krishnamurthi
55 shared
Joe Gibbs Politz
University of California, San Diego
20 shared
Joydeep Biswas
The University of Texas at Austin
19 shared
Abhinav Jangda
Microsoft (United States)
17 shared
Carolyn Jane Anderson
17 shared
Federico Cassano
Northeastern University
15 shared
Nate Foster
Cornell University
13 shared
Donald Pinckney
Northeastern University
12 shared

Labs

Khoury College of Computer SciencesPI

Education

Ph.D., Computer Science
University of California, Los Angeles
2007
M.S., Computer Science
University of California, Los Angeles
2003
B.S., Computer Science
University of California, Los Angeles
2001

Awards & honors

OOPSLA Most Influential Paper Award
PLDI Distinguished Paper Award
PACT Best Paper Award
Distinguished Paper Award (2019)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Arjun Guha

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you