Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Jessica Hullman

Jessica Hullman

· Ginni Rometty Professor of Computer ScienceVerified

Northwestern University · Chemical Engineering

Active 1800–2026

h-index29
Citations3.8k
Papers14391 last 5y
Funding$1.0M
See your match with Jessica Hullman — sign in to PhdFit.Sign in

About

Jessica Hullman is a faculty member in the Department of Computer Science at Northwestern University, located in the McCormick School of Engineering. Her research focuses on challenges and limitations that arise when people theorize and draw inductive inferences from data. She explores how to best align data-driven interfaces and summaries with human reasoning capabilities, examining the role of interactive analysis across different stages of a statistical workflow. Her work involves evaluating data interfaces, developing tools to support reasoning under uncertainty, and understanding how these tools can be applied in domains such as strategic games and privacy. Hullman approaches these problems by drawing on formal models of rational inference to compare and propose solutions.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Political Science
  • Econometrics
  • Mathematics
  • Data science
  • Statistics
  • Demography
  • Economics
  • Psychology
  • Engineering
  • Knowledge management
  • Human–computer interaction
  • Microeconomics
  • Management science

Selected publications

  • What’s a multiverse good for anyway?

    2026-02-04

    articleOpen access

    Multiverse analysis has become a fairly popular approach, as indicated by the present special issue on the matter. Here, we take one step back and ask why one would conduct a multiverse analysis in the first place. We discuss various ways in which a multiverse may be employed – as a tool for reflection and critique, as a persuasive tool, as a serious inferential tool – as well as potential problems that arise depending on the specific purpose. For example, it fails as a persuasive tool when researchers disagree about which variations should be included in the analysis, and it fails as a serious inferential tool when the included analyses do not target a coherent estimand. Then, we take yet another step back and ask what the multiverse discourse has been good for and whether any broader lessons can be drawn. Ultimately, we conclude that the multiverse does remain a valuable tool; however, we urge against taking it too seriously.

  • ComplLLM: Fine-tuning LLMs to Discover Complementary Signals for Decision-making

    ArXiv.org · 2026-01-01

    articleOpen accessSenior author

    Multi-agent decision pipelines can outperform single agent workflows when complementarity holds, i.e., different agents bring unique information to the table to inform a final decision. We propose ComplLLM, a post-training framework based on decision theory that fine-tunes a decision-assistant LLM using complementary information as reward to output signals that complement existing agent decisions. We validate ComplLLM on synthetic and real-world tasks involving domain experts, demonstrating how the approach recovers known complementary information and produces plausible explanations of complementary signals to support downstream decision-makers.

  • ComplLLM: Fine-tuning LLMs to Discover Complementary Signals for Decision-making

    arXiv (Cornell University) · 2026-02-23

    preprintOpen accessSenior author

    Multi-agent decision pipelines can outperform single agent workflows when complementarity holds, i.e., different agents bring unique information to the table to inform a final decision. We propose ComplLLM, a post-training framework based on decision theory that fine-tunes a decision-assistant LLM using complementary information as reward to output signals that complement existing agent decisions. We validate ComplLLM on synthetic and real-world tasks involving domain experts, demonstrating how the approach recovers known complementary information and produces plausible explanations of complementary signals to support downstream decision-makers.

  • Hypothesizing an effect size by considering individual variation

    arXiv (Cornell University) · 2026-04-09

    articleOpen accessSenior author

    When designing and evaluating an experiment or observational study, it is useful to have a realistic hypothesis regarding the average treatment effect. We present an approach to conceptualizing this average by first considering a distribution of effects. We demonstrate with examples in medicine, economics, and psychology.

  • This human study did not involve human subjects: Validating LLM simulations as behavioral evidence

    arXiv (Cornell University) · 2026-02-17

    articleOpen access1st authorCorresponding

    A growing literature uses large language models (LLMs) as synthetic participants to generate cost-effective and nearly instantaneous responses in social science experiments. However, there is limited guidance on when such simulations support valid inference about human behavior. We contrast two strategies for obtaining valid estimates of causal effects and clarify the assumptions under which each is suitable for exploratory versus confirmatory research. Heuristic approaches seek to establish that simulated and observed human behavior are interchangeable through prompt engineering, model fine-tuning, and other repair strategies designed to reduce LLM-induced inaccuracies. While useful for many exploratory tasks, heuristic approaches lack the formal statistical guarantees typically required for confirmatory research. In contrast, statistical calibration combines auxiliary human data with statistical adjustments to account for discrepancies between observed and simulated responses. Under explicit assumptions, statistical calibration preserves validity and provides more precise estimates of causal effects at lower cost than experiments that rely solely on human participants. Yet the potential of both approaches depends on how well LLMs approximate the relevant populations. We consider what opportunities are overlooked when researchers focus myopically on substituting LLMs for human participants in a study.

  • This human study did not involve human subjects: Validating LLM simulations as behavioral evidence

    arXiv (Cornell University) · 2026-02-17

    preprintOpen access1st authorCorresponding

    A growing literature uses large language models (LLMs) as synthetic participants to generate cost-effective and nearly instantaneous responses in social science experiments. However, there is limited guidance on when such simulations support valid inference about human behavior. We contrast two strategies for obtaining valid estimates of causal effects and clarify the assumptions under which each is suitable for exploratory versus confirmatory research. Heuristic approaches seek to establish that simulated and observed human behavior are interchangeable through prompt engineering, model fine-tuning, and other repair strategies designed to reduce LLM-induced inaccuracies. While useful for many exploratory tasks, heuristic approaches lack the formal statistical guarantees typically required for confirmatory research. In contrast, statistical calibration combines auxiliary human data with statistical adjustments to account for discrepancies between observed and simulated responses. Under explicit assumptions, statistical calibration preserves validity and provides more precise estimates of causal effects at lower cost than experiments that rely solely on human participants. Yet the potential of both approaches depends on how well LLMs approximate the relevant populations. We consider what opportunities are overlooked when researchers focus myopically on substituting LLMs for human participants in a study.

  • Hypothesizing an effect size by considering individual variation

    arXiv (Cornell University) · 2026-04-09

    preprintOpen accessSenior author

    When designing and evaluating an experiment or observational study, it is useful to have a realistic hypothesis regarding the average treatment effect. We present an approach to conceptualizing this average by first considering a distribution of effects. We demonstrate with examples in medicine, economics, and psychology.

  • What’s a multiverse good for anyway?

    PsyArXiv (OSF Preprints) · 2026-02-04

    preprintOpen accessSenior author

    Multiverse analysis has become a fairly popular approach, as indicated by the present special issue on the matter. Here, we take one step back and ask why one would conduct a multiverse analysis in the first place. We discuss various ways in which a multiverse may be employed – as a tool for reflection and critique, as a persuasive tool, as a serious inferential tool – as well as potential problems that arise depending on the specific purpose. For example, it fails as a persuasive tool when researchers disagree about which variations should be included in the analysis, and it fails as a serious inferential tool when the included analyses do not target a coherent estimand. Then, we take yet another step back and ask what the multiverse discourse has been good for and whether any broader lessons can be drawn. Ultimately, we conclude that the multiverse does remain a valuable tool; however, we urge against taking it too seriously.

  • Can AI Do Strategy? A Dialogue and Debate

    Strategy Science · 2026-01-01 · 1 citations

    article
  • Characterizing Photorealism and Artifacts in Diffusion Model-Generated Images

    ArXiv.org · 2025-02-17 · 1 citations

    preprintOpen access

    Diffusion model-generated images can appear indistinguishable from authentic photographs, but these images often contain artifacts and implausibilities that reveal their AI-generated provenance. Given the challenge to public trust in media posed by photorealistic AI-generated images, we conducted a large-scale experiment measuring human detection accuracy on 450 diffusion-model generated images and 149 real images. Based on collecting 749,828 observations and 34,675 comments from 50,444 participants, we find that scene complexity of an image, artifact types within an image, display time of an image, and human curation of AI-generated images all play significant roles in how accurately people distinguish real from AI-generated images. Additionally, we propose a taxonomy characterizing artifacts often appearing in images generated by diffusion models. Our empirical observations and taxonomy offer nuanced insights into the capabilities and limitations of diffusion models to generate photorealistic images in 2024.

Recent grants

Frequent coauthors

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Jessica Hullman

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup