Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
David Bau

David Bau

· Professor of the Practice, Khoury College of Computer Sciences, Affiliate AppointmentVerified

Northeastern University · Artificial Intelligence and Data Science

Active 1994–2026

h-index31
Citations13.0k
Papers12574 last 5y
Funding
See your match with David Bau — sign in to PhdFit.Sign in

About

David Bau is an assistant professor in the Khoury College of Computer Sciences at Northeastern University, based in Boston. His research focuses on human-computer interaction and machine learning. Before joining Northeastern, he worked as a software engineer at Google, BEA, and Crossgain. Bau has been published in numerous journals and conferences, including CVPR, NeurIPS, ICCV, ECCV, and SIGGRAPH. Outside of research, he enjoys astronomy and puzzle collecting.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Machine Learning
  • Programming language
  • Natural Language Processing
  • Psychology
  • Econometrics
  • Cognitive psychology
  • Computer vision

Selected publications

  • Distilling Diversity and Control in Diffusion Models

    2026-03-06

    articleOpen accessSenior author

    Distilled diffusion models generate images in far fewer timesteps but suffer from reduced sample diversity when generating multiple outputs from the same prompt. To understand this phenomenon, we first investigate whether distillation damages concept representations by examining if the required diversity is properly learned. Surprisingly, distilled models retain the base model’s representational structure: control mechanisms like Concept Sliders and LoRAs transfer seamlessly without retraining, and Slider-Space analysis reveals distilled models possess variational directions needed for diversity yet fail to activate them. This redirects our investigation to understanding how the generation dynamics differ between base and distilled models. Using ${{\hat{\mathbf x}}_0}$ trajectory visualization, we discover distilled models commit to their final image structure almost immediately at the first timestep, while base models distribute structural decisions across many steps. To test whether this first-step commitment causes the diversity loss, we introduce diversity distillation, a hybrid approach using the base model for only the first critical timestep before switching to the distilled model. This single intervention restores sample diversity while maintaining computational efficiency. We provide both causal validation and theoretical support showing why the very first timestep concentrates the diversity bottleneck in distilled models. Our code and data are available at distillation.baulab.info

  • Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare

    arXiv (Cornell University) · 2025-02-18

    preprintOpen access

    We know from prior work that LLMs encode social biases, and that this manifests in clinical tasks. In this work we adopt tools from mechanistic interpretability to unveil sociodemographic representations and biases within LLMs in the context of healthcare. Specifically, we ask: Can we identify activations within LLMs that encode sociodemographic information (e.g., gender, race)? We find that gender information is highly localized in MLP layers and can be reliably manipulated at inference time via patching. Such interventions can surgically alter generated clinical vignettes for specific conditions, and also influence downstream clinical predictions which correlate with gender, e.g., patient risk of depression. We find that representation of patient race is somewhat more distributed, but can also be intervened upon, to a degree. To our knowledge, this is the first application of mechanistic interpretability methods to LLMs for healthcare.

  • Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research

    SSRN Electronic Journal · 2025-01-01

    preprintOpen access
  • Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare

    2025-01-01

    articleOpen access

    tasks (Gerszberg, 2024;Zack et al., 2024;Zhang et al., 2020).In this work we adopt tools from mechanistic interpretability to unveil sociodemographic representations and biases within LLMs in the context of healthcare.Specifically, we ask: Can we identify activations within LLMs that encode sociodemographic information (e.g., gender, race)?We find that, in three open weight LLMs, gender information is highly localized in MLP layers and can be reliably manipulated at inference time via patching.Such interventions can surgically alter generated clinical vignettes for specific conditions, and also influence downstream clinical predictions which correlate with gender, e.g., patient risk of depression.We find that representation of patient race is somewhat more distributed, but can also be intervened upon, to a degree.To our knowledge, this is the first application of mechanistic interpretability methods to LLMs for healthcare 1 .

  • LLMs Encode Harmfulness and Refusal Separately

    ArXiv.org · 2025-07-16

    preprintOpen access

    LLMs are trained to refuse harmful instructions, but do they truly understand harmfulness beyond just refusing? Prior work has shown that LLMs' refusal behaviors can be mediated by a one-dimensional subspace, i.e., a refusal direction. In this work, we identify a new dimension to analyze safety mechanisms in LLMs, i.e., harmfulness, which is encoded internally as a separate concept from refusal. There exists a harmfulness direction that is distinct from the refusal direction. As causal evidence, steering along the harmfulness direction can lead LLMs to interpret harmless instructions as harmful, but steering along the refusal direction tends to elicit refusal responses directly without reversing the model's judgment on harmfulness. Furthermore, using our identified harmfulness concept, we find that certain jailbreak methods work by reducing the refusal signals without reversing the model's internal belief of harmfulness. We also find that adversarially finetuning models to accept harmful instructions has minimal impact on the model's internal belief of harmfulness. These insights lead to a practical safety application: The model's latent harmfulness representation can serve as an intrinsic safeguard (Latent Guard) for detecting unsafe inputs and reducing over-refusals that is robust to finetuning attacks. For instance, our Latent Guard achieves performance comparable to or better than Llama Guard 3 8B, a dedicated finetuned safeguard model, across different jailbreak methods. Our findings suggest that LLMs' internal understanding of harmfulness is more robust than their refusal decision to diverse input instructions, offering a new perspective to study AI safety.

  • MIB: A Mechanistic Interpretability Benchmark

    UvA-DARE (University of Amsterdam) · 2025-04-17

    preprintOpen access

    How can we know whether new mechanistic interpretability methods achieve real improvements? In pursuit of lasting evaluation standards, we propose MIB, a Mechanistic Interpretability Benchmark, with two tracks spanning four tasks and five models. MIB favors methods that precisely and concisely recover relevant causal pathways or causal variables in neural language models. The circuit localization track compares methods that locate the model components - and connections between them - most important for performing a task (e.g., attribution patching or information flow routes). The causal variable localization track compares methods that featurize a hidden vector, e.g., sparse autoencoders (SAEs) or distributed alignment search (DAS), and align those features to a task-relevant causal variable. Using MIB, we find that attribution and mask optimization methods perform best on circuit localization. For causal variable localization, we find that the supervised DAS method performs best, while SAE features are not better than neurons, i.e., non-featurized hidden vectors. These findings illustrate that MIB enables meaningful comparisons, and increases our confidence that there has been real progress in the field.

  • SliderSpace: Decomposing the Visual Capabilities of Diffusion Models

    2025-10-19 · 1 citations

    preprintOpen access

    We present SliderSpace, a framework for automatically decomposing the visual capabilities of diffusion models into controllable and human-understandable directions. Unlike existing control methods that require a user to specify attributes for each edit direction individually, SliderSpace discovers multiple interpretable and diverse directions simultaneously from a single text prompt. Each direction is trained as a low-rank adaptor, enabling compositional control and the discovery of surprising possibilities in the model's latent space. Through extensive experiments on state-of-the-art diffusion models, we demonstrate SliderSpace's effectiveness across three applications: concept decomposition, artistic style exploration, and diversity enhancement. Our quantitative evaluation shows that SliderSpace-discovered directions decompose the visual structure of model's knowledge effectively, offering insights into the latent capabilities encoded within diffusion models. User studies further validate that our method produces more diverse and useful variations compared to baselines. Our code, data and trained weights are available at https://sliderspace.baulab.info

  • Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research Directions

    Lecture notes in computer science · 2025-11-23 · 1 citations

    book-chapterOpen access
  • Language Models use Lookbacks to Track Beliefs

    ArXiv.org · 2025-05-20

    preprintOpen access

    How do language models (LMs) represent characters' beliefs, especially when those beliefs may differ from reality? This question lies at the heart of understanding the Theory of Mind (ToM) capabilities of LMs. We analyze LMs' ability to reason about characters' beliefs using causal mediation and abstraction. We construct a dataset, CausalToM, consisting of simple stories where two characters independently change the state of two objects, potentially unaware of each other's actions. Our investigation uncovers a pervasive algorithmic pattern that we call a lookback mechanism, which enables the LM to recall important information when it becomes necessary. The LM binds each character-object-state triple together by co-locating their reference information, represented as Ordering IDs (OIs), in low-rank subspaces of the state token's residual stream. When asked about a character's beliefs regarding the state of an object, the binding lookback retrieves the correct state OI and then the answer lookback retrieves the corresponding state token. When we introduce text specifying that one character is (not) visible to the other, we find that the LM first generates a visibility ID encoding the relation between the observing and the observed character OIs. In a visibility lookback, this ID is used to retrieve information about the observed character and update the observing character's beliefs. Our work provides insights into belief tracking mechanisms, taking a step toward reverse-engineering ToM reasoning in LMs.

  • The Quest for the Right Mediator: Surveying Mechanistic Interpretability for NLP Through the Lens of Causal Mediation Analysis

    Computational Linguistics · 2025-09-22

    articleOpen access

    Abstract Interpretability provides a toolset for understanding how and why language models behave in certain ways. However, there is little unity in the field: Most studies use ad-hoc evaluations and do not share theoretical foundations, making it difficult to measure progress and compare the pros and cons of different techniques. Furthermore, while mechanistic understanding is frequently discussed, the basic causal units underlying these mechanisms are often not explicitly defined. In this article, we propose a perspective on interpretability research grounded in causal mediation analysis. Specifically, we describe the history and current state of interpretability taxonomized according to the types of causal units (mediators) utilized, as well as methods used to search over mediators. We discuss the pros and cons of each mediator, providing insights as to when particular kinds of mediators and search methods are most appropriate. We argue that this framing yields a more cohesive narrative of the field and helps researchers select appropriate methods based on their research objective. Our analysis yields actionable recommendations for future work, including the discovery of new mediators and the development of standardized evaluations tailored to these goals.

Frequent coauthors

  • Antonio Torralba

    48 shared
  • Jun-Yan Zhu

    25 shared
  • Bolei Zhou

    19 shared
  • Gary Hachfeld

    16 shared
  • Robert Holcomb

    University of Rochester

    16 shared
  • William J. Craig

    University of Edinburgh

    16 shared
  • Joanna Materzyńska

    13 shared
  • Hendrik Strobelt

    12 shared

Education

  • Ph.D., Computer Science

    Massachusetts Institute of Technology

    2016
  • M.S., Computer Science

    Massachusetts Institute of Technology

    2012
  • B.S., Electrical Engineering and Computer Science

    Massachusetts Institute of Technology

    2011
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with David Bau

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup