Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Emma Brunskill

Emma Brunskill

· Professor of Computer ScienceVerified

Stanford University · Symbolic Systems

Active 2006–2026

h-index40
Citations7.4k
Papers26690 last 5y
Funding$975k
See your match with Emma Brunskill — sign in to PhdFit.Sign in

About

Emma Brunskill is an Associate Professor of Computer Science at Stanford University and a Faculty Affiliate at the Institute for Human-Centered Artificial Intelligence (HAI). She holds a PhD in Computer Science from the Massachusetts Institute of Technology, obtained in 2009. Her research focuses on artificial intelligence, human-centered artificial intelligence, and learning. Brunskill has received several honors and awards, including the Alumni Impact Award from the University of Washington's Computer Science department in 2020, the Young Investigator Award from the Office of Naval Research in 2015, a CAREER Award from the NSF in 2014, and a Faculty Fellowship from Microsoft in 2012.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Political Science
  • Machine Learning
  • Data science
  • Engineering
  • Engineering ethics
  • Law
  • Management science
  • Economics
  • Environmental economics
  • Psychology

Selected publications

  • Bloom: Designing for LLM-Augmented Behavior Change Interactions

    2026-04-13 · 1 citations

    articleOpen access

    Large language models (LLMs) offer novel opportunities to support health behavior change, yet existing work has narrowly focused on text-only interactions. Building on decades of HCI research on effective behavior change interactions, we present Bloom, an application for physical activity promotion that integrates an LLM-based health coaching chatbot with existing design strategies and UI elements. As part of Bloom’s development, we conducted a redteaming evaluation and contribute a safety benchmark dataset. In a four-week randomized field study (N=54) comparing Bloom to a no-LLM control, we observed important shifts in psychological outcomes: participants in the LLM condition reported stronger beliefs that activity was beneficial, greater enjoyment, and more self-compassion. Both conditions significantly increased physical activity levels, doubling the proportion of participants meeting recommended weekly guidelines, though descriptively, we observed no advantage for the LLM condition in short-term physical activity levels. Instead, our findings suggest that LLMs may be more effective at shifting mindsets that precede longer-term behavior change.

  • Computer-assisted learning in the real world: How Khan Academy influences student math learning

    Proceedings of the National Academy of Sciences · 2026-01-02

    articleOpen accessCorresponding

    Computer-assisted learning (CAL) offers an affordable way to implement a mastery learning approach in the classroom. However, while experimental research suggests CAL can enhance student outcomes, such findings often rely on experimental conditions not easily replicated in ordinary classroom settings (e.g., opt-in participation, extensive training and support, and high CAL usage targets). To assess the real-world impact of CAL, we draw on a large three-year panel of administrative data covering over 200,000 students in school districts that licensed Khan Academy's Measures of Academic Progress accelerator, a program designed to support math learning. To identify causal effects, we exploit within-teacher and within-school changes in average classroom CAL practice time-a strategy that yields precise, policy-relevant estimates even at modest usage levels. We find that a classroom with 6.6 h of annual Khan Academy practice (about 11 min per week) experiences a [Formula: see text]0.031 SD gain in math test score performance compared to no practice. For classrooms with higher usage levels, we find approximately linear gains, with projected effects rising to [Formula: see text]0.085 SD at the recommended 30 min per week. Higher-achieving students benefit most, in part because they spend more time on CAL and progress through more skills than lower-performing peers. Teachers might reduce achievement gaps and boost overall gains by encouraging more productive use of the platform (focused on skill mastery)-especially among struggling students.

  • Trading off rewards and errors in multi-armed bandits

    arXiv (Cornell University) · 2026-05-01 · 11 citations

    preprintOpen access

    In multi-armed bandits, the most-explored arms are the most informative, while reward maximization typically pulls only the best arm. We study the tradeoff between identifying arm means accurately and accumulating reward, and present an algorithm with regret guarantees that interpolates between the two objectives. We provide both upper and lower bounds and validate empirically.

  • Can LLM-Simulated Practice and Feedback Upskill Human Counselors? A Randomized Study with 90+ Novice Counselors

    2026-04-13 · 2 citations

    articleOpen access

    The growing demand for accessible mental health support requires training more counselors, yet existing approaches remain resource-intensive and difficult to scale. LLMs can realistically simulate patients and generate actionable feedback for training, but their actual impact on novice counselor skill development remains unknown. We developed an LLM-simulated practice and feedback system and conducted a randomized study with 94 novice counselors, comparing practice alone versus practice with feedback. We evaluated behavioral performance, self-efficacy, and qualitative reflections. Results showed the practice-and-feedback group improved in client-centered microskills (reflections, questions), while the practice-alone group showed no improvements. For empathy, the practice-alone group declined over time and performed significantly worse than the feedback group. Qualitative interviews reinforced these findings: feedback helped participants adopt a client-centered listening approach, while practice-alone participants remained solution-oriented. These results suggest LLM-based training systems can promote effective skill development, and combining simulated practice with structured feedback is critical for meaningful improvement.

  • Automated reminders reduce incarceration for missed court dates: Evidence from a text message experiment

    Science Advances · 2025-10-01 · 1 citations

    articleOpen access

    Millions of Americans must attend mandatory court dates every year. To boost appearance rates, jurisdictions nationwide are increasingly turning to automated reminders. However, previous research paints an incomplete picture of their effectiveness-in particular, there has been little work assessing the impact of reminders on downstream arrests and incarceration. In partnership with the Santa Clara County Public Defender's Office, we randomly assigned 5709 public defender clients to either receive automated text message reminders (treatment) or not receive reminders (control). We found that reminders reduced warrants issued for missed court dates by ~20%, with 12.1% of clients in control issued a warrant compared to 9.7% of clients in treatment. Further, we found that incarceration from missed court dates dropped by a similar amount, from 6.6% in control to 5.2% in treatment. The effectiveness of reminders bolsters the theory that lapses in memory or comprehension contribute to missed court appearances.

  • Exploring the Benefit of Customizing Feedback Interventions For Educators and Students With Offline Contextual Multi-Armed Bandits

    2025-02-21 · 1 citations

    articleOpen access
  • Cost-Aware Near-Optimal Policy Learning

    Proceedings of the AAAI Conference on Artificial Intelligence · 2025-04-11

    articleOpen accessSenior author

    It is often of interest to learn a context-sensitive decision policy, such as in contextual multi-armed bandit processes. To quantify the efficiency of a machine learning algorithm for such settings, probably approximately correct (PAC) bounds, which bound the number of samples required, or cumulative regret guarantees, are typically used. However, real-world settings often have limited resources for experimentation, and decisions/interventions may differ in the amount of resources required (e.g., money or time). Therefore, it is of interest to consider how to design an experiment strategy that reduces the experimental budget needed to learn a near-optimal contextual policy. Unlike reinforcement learning or bandit approaches that embed costs into the reward function, we focus on reducing resource use in learning a near-optimal policy without resource constraints. We introduce two resource-aware algorithms for the contextual bandit setting and prove their soundness. Simulations based on real-world datasets demonstrate that our algorithms significantly reduce the resources needed to learn a near-optimal decision policy compared to previous resource-unaware methods.

  • GPTCoach: Towards LLM-Based Physical Activity Coaching

    2025-04-24 · 32 citations

    article
  • Repairing Reward Functions with Feedback to Mitigate Reward Hacking

    arXiv (Cornell University) · 2025-10-14

    preprintOpen accessSenior author

    Human-designed reward functions for reinforcement learning (RL) agents are frequently misaligned with the humans' true, unobservable objectives, and thus act only as proxies. Optimizing for a misspecified proxy reward function often induces reward hacking, resulting in a policy misaligned with the human's true objectives. An alternative is to perform RL from human feedback, which involves learning a reward function from scratch by collecting human preferences over pairs of trajectories. However, building such datasets is costly. To address the limitations of both approaches, we propose Preference-Based Reward Repair (PBRR): an automated iterative framework that repairs a human-specified proxy reward function by learning an additive, transition-dependent correction term from preferences. A manually specified reward function can yield policies that are highly suboptimal under the ground-truth objective, yet corrections on only a few transitions may suffice to recover optimal performance. To identify and correct for those transitions, PBRR uses a targeted exploration strategy and a new preference-learning objective. We prove in tabular domains PBRR has a cumulative regret that matches, up to constants, that of prior preference-based RL methods. In addition, on a suite of reward-hacking benchmarks, PBRR consistently outperforms baselines that learn a reward function from scratch from preferences or modify the proxy reward function using other approaches, requiring substantially fewer preferences to learn high performing policies.

  • Assessing the Quality of AI-Generated Exams: A Large-Scale Field Study

    ArXiv.org · 2025-08-09

    preprintOpen access

    While large language models (LLMs) challenge conventional methods of teaching and learning, they present an exciting opportunity to improve efficiency and scale high-quality instruction. One promising application is the generation of customized exams, tailored to specific course content. There has been significant recent excitement on automatically generating questions using artificial intelligence, but also comparatively little work evaluating the psychometric quality of these items in real-world educational settings. Filling this gap is an important step toward understanding generative AI's role in effective test design. In this study, we introduce and evaluate an iterative refinement strategy for question generation, repeatedly producing, assessing, and improving questions through cycles of LLM-generated critique and revision. We evaluate the quality of these AI-generated questions in a large-scale field study involving 91 classes -- covering computer science, mathematics, chemistry, and more -- in dozens of colleges across the United States, comprising nearly 1700 students. Our analysis, based on item response theory (IRT), suggests that for students in our sample the AI-generated questions performed comparably to expert-created questions designed for standardized exams. Our results illustrate the power of AI to make high-quality assessments more readily available, benefiting both teachers and students.

Recent grants

Frequent coauthors

Labs

Education

  • Ph.D., Computer Science

    Stanford University

    2009
  • B.A., Computer Science

    University of Cambridge

    2002

Awards & honors

  • Alumni Impact Award, University of Washington Computer Scien…
  • Young Investigator Award, Office of Naval Research (2015)
  • CAREER Award, NSF (2014)
  • Faculty Fellowship, Microsoft (2012)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Emma Brunskill

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup