Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Emily B. Fox

Emily B. Fox

· Amazon Professor of Machine LearningVerified

Stanford University · Statistics

Active 1984–2026

h-index36
Citations7.6k
Papers19759 last 5y
Funding$1.3M
See your match with Emily B. Fox — sign in to PhdFit.Sign in

About

Her research focuses on advancing machine learning methods for applications in health and biology. Particular interests are in health sensing and wearable technologies, multimodal biological data (microscopy, omics), and neuroimaging data. Methodologically, her work emphasizes sequence modeling, Bayesian approaches, and generative modeling.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Machine Learning
  • Medicine
  • Business
  • Demographic economics
  • Telecommunications
  • Virology
  • Environmental health
  • Neuroscience
  • Economics
  • Psychology

Selected publications

  • Vascular waveform analysis using Bayesian pulse deconvolution

    bioRxiv (Cold Spring Harbor Laboratory) · 2026-02-11

    articleOpen accessSenior author

    Vascular waveforms, which measure bulk flow in blood vessels, are widely used to measure vital signs, diagnose conditions, and predict long-term health outcomes. Analyzing vascular waveforms depends on three fundamentally interdependent tasks: signal filtering, pulse timing detection, and pulse shape extraction. We hypothesized that Bayesian pulse deconvolution can achieve improved performance on all three tasks by solving them jointly. This method uses an analytical, generative model of vascular waveforms with priors informed by physical and biological domain knowledge. In simulations, Bayesian pulse deconvolution achieves better performance on all tasks compared with existing algorithms: 90% reduction of median filtering error, 60% reduction in pulse timing error, and 85% reduction in shape extraction error. The advantages in simulations extend to human recordings of photoplethysmography waveforms. Taking real time-synchronized electrocardiogram R-R intervals as a proxy ground truth, Bayesian pulse deconvolution achieves 40% lower pulse interval estimation error (RMSE =5.1 ms) compared with typical algorithms (RMSE = 8.3 ms, p=1e-10). By extracting more accurate and informative insights from vascular waveforms, Bayesian pulse deconvolution could advance a wide array of health technologies that rely on interpreting signals from blood vessels.

  • Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention

    arXiv (Cornell University) · 2026-04-22

    preprintOpen accessSenior author

    Scaling large language models to long contexts is challenging due to the quadratic computational cost of full attention. Mitigation approaches include KV-cache selection or compression techniques. We instead provide an effective and end-to-end learnable bridge between the two without requiring architecture modification. In particular, our key insight is that interleaved gist compression tokens -- which provide a learnable summary of sets of raw tokens -- can serve as routing signals for sparse attention. Building on this, we introduce selective unfolding via GSA, which first compresses the context into gist tokens, then selects the most relevant gists, and subsequently restores the corresponding raw chunks for detailed attention. This yields a simple coarse-to-fine mechanism that combines compact global representations with targeted access to fine-grained evidence. We further incorporate this process directly into training in an end-to-end fashion, avoiding the need for external retrieval modules. In addition, we extend the framework hierarchically via recursive gist-of-gist construction, enabling multi-resolution context access with logarithmic per-step decoding complexity. Empirical results on LongBench and RAG benchmarks demonstrate that our method consistently outperforms other compression baselines as well as inference-time sparse attention methods across compression ratios from $8\times$ to $32\times$. The code is available at: https://github.com/yuzhenmao/gist-sparse-attention/

  • parkersruth/bayesian_pulse_deconvolution: v1.0.0-preprint

    Open MIND · 2026-02-10

    otherSenior author

    Preprint version release

  • parkersruth/bayesian_pulse_deconvolution: v1.0.0-preprint

    Zenodo (CERN European Organization for Nuclear Research) · 2026-02-10

    otherOpen accessSenior author

    Preprint version release

  • Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition

    arXiv (Cornell University) · 2026-04-07

    articleOpen accessSenior author

    Large language models exhibit sycophancy, the tendency to shift their stated positions toward perceived user preferences or authority cues regardless of evidence. Standard alignment methods fail to correct this because scalar reward models conflate two distinct failure modes into a single signal: pressure capitulation, where the model changes a correct answer under social pressure, and evidence blindness, where the model ignores the provided context entirely. We operationalise sycophancy through formal definitions of pressure independence and evidence responsiveness, serving as a working framework for disentangled training rather than a definitive characterisation of the phenomenon. We propose the first approach to sycophancy reduction via reward decomposition, introducing a multi-component Group Relative Policy Optimisation (GRPO) reward that decomposes the training signal into five terms: pressure resistance, context fidelity, position consistency, agreement suppression, and factual correctness. We train using a contrastive dataset pairing pressure-free baselines with pressured variants across three authority levels and two opposing evidence contexts. Across five base models, our two-phase pipeline consistently reduces sycophancy on all metric axes, with ablations confirming that each reward term governs an independent behavioural dimension. The learned resistance to pressure generalises beyond our training methodology and prompt structure, reducing answer-priming sycophancy by up to 17 points on SycophancyEval despite the absence of such pressure forms during training.

  • How Well Do Multimodal Models Reason on ECG Signals?

    Open MIND · 2026-02-27

    preprint

    While multimodal large language models offer a promising solution to the "black box" nature of health AI by generating interpretable reasoning traces, verifying the validity of these traces remains a critical challenge. Existing evaluation methods are either unscalable, relying on manual clinician review, or superficial, utilizing proxy metrics (e.g. QA) that fail to capture the semantic correctness of clinical logic. In this work, we introduce a reproducible framework for evaluating reasoning in ECG signals. We propose decomposing reasoning into two distinct, components: (i) Perception, the accurate identification of patterns within the raw signal, and (ii) Deduction, the logical application of domain knowledge to those patterns. To evaluate Perception, we employ an agentic framework that generates code to empirically verify the temporal structures described in the reasoning trace. To evaluate Deduction, we measure the alignment of the model's logic against a structured database of established clinical criteria in a retrieval-based approach. This dual-verification method enables the scalable assessment of "true" reasoning capabilities.

  • BALAR : A Bayesian Agentic Loop for Active Reasoning

    ArXiv.org · 2026-05-06

    articleOpen accessSenior author

    Large language models increasingly operate in interactive settings where solving a task requires multiple rounds of information exchange with a user. However, most current systems treat dialogue reactively and lack a principled mechanism to reason about what information is missing and which question should be asked next. We propose BALAR (Bayesian Agentic Loop for Active Reasoning), a task-agnostic outer-loop algorithm that requires no fine-tuning and enables structured multi-turn interaction between an LLM agent and a user. BALAR maintains a structured belief over latent states, selects clarifying questions by maximizing expected mutual information, and dynamically expands its state representation when the current one proves insufficient. We evaluate BALAR on three diverse benchmarks: AR-Bench-DC (detective cases), AR-Bench-SP (thinking puzzles), and iCraft-MD (clinical diagnosis). BALAR significantly outperforms all baselines across all three benchmarks, with $14.6\%$ higher accuracy on AR-Bench-DC, $38.5\%$ on AR-Bench-SP, and $30.5\%$ on iCraft-MD.

  • Continuous-Utility Direct Preference Optimization

    arXiv (Cornell University) · 2026-01-31

    articleOpen accessSenior author

    Large language model reasoning is often treated as a monolithic capability, relying on binary preference supervision that fails to capture partial progress or fine-grained reasoning quality. We introduce Continuous Utility Direct Preference Optimization (CU-DPO), a framework that aligns models to a portfolio of prompt-based cognitive strategies by replacing binary labels with continuous scores that capture fine-grained reasoning quality. We prove that learning with K strategies yields a Theta(K log K) improvement in sample complexity over binary preferences, and that DPO converges to the entropy-regularized utility-maximizing policy. To exploit this signal, we propose a two-stage training pipeline: (i) strategy selection, which optimizes the model to choose the best strategy for a given problem via best-vs-all comparisons, and (ii) execution refinement, which trains the model to correctly execute the selected strategy using margin-stratified pairs. On mathematical reasoning benchmarks, CU-DPO improves strategy selection accuracy from 35-46 percent to 68-78 percent across seven base models, yielding consistent downstream reasoning gains of up to 6.6 points on in-distribution datasets with effective transfer to out-of-distribution tasks.

  • Neural Garbage Collection: Learning to Forget while Learning to Reason

    arXiv (Cornell University) · 2026-04-20

    articleOpen access

    Chain-of-thought reasoning has driven striking advances in language model capability, yet every reasoning step grows the KV cache, creating a bottleneck to scaling this paradigm further. Current approaches manage these constraints on the model's behalf using hand-designed criteria. A more scalable approach would let end-to-end learning subsume this design choice entirely, following a broader pattern in deep learning. After all, if a model can learn to reason, why can't it learn to forget? We introduce Neural Garbage Collection (NGC), in which a language model learns to forget while learning to reason, trained end-to-end from outcome-based task reward alone. As the model reasons, it periodically pauses, decides which KV cache entries to evict, and continues to reason conditioned on the remaining cache. By treating tokens in a chain-of-thought and cache-eviction decisions as discrete actions sampled from the language model, we can use reinforcement learning to jointly optimize how the model reasons and how it manages its own memory: what the model evicts shapes what it remembers, what it remembers shapes its reasoning, and the correctness of that reasoning determines its reward. Crucially, the model learns this behavior entirely from a single learning signal - the outcome-based task reward - without supervised fine-tuning or proxy objectives. On Countdown, AMC, and AIME tasks, NGC maintains strong accuracy relative to the full-cache upper bound at 2-3x peak KV cache size compression and substantially outperforms eviction baselines. Our results are a first step towards a broader vision where end-to-end optimization drives both capability and efficiency in language models.

  • BALAR : A Bayesian Agentic Loop for Active Reasoning

    arXiv (Cornell University) · 2026-05-06

    preprintOpen accessSenior author

    Large language models increasingly operate in interactive settings where solving a task requires multiple rounds of information exchange with a user. However, most current systems treat dialogue reactively and lack a principled mechanism to reason about what information is missing and which question should be asked next. We propose BALAR (Bayesian Agentic Loop for Active Reasoning), a task-agnostic outer-loop algorithm that requires no fine-tuning and enables structured multi-turn interaction between an LLM agent and a user. BALAR maintains a structured belief over latent states, selects clarifying questions by maximizing expected mutual information, and dynamically expands its state representation when the current one proves insufficient. We evaluate BALAR on three diverse benchmarks: AR-Bench-DC (detective cases), AR-Bench-SP (thinking puzzles), and iCraft-MD (clinical diagnosis). BALAR significantly outperforms all baselines across all three benchmarks, with $14.6\%$ higher accuracy on AR-Bench-DC, $38.5\%$ on AR-Bench-SP, and $30.5\%$ on iCraft-MD.

Recent grants

Frequent coauthors

Labs

Education

  • Other, Electrical Engineering

    MIT Department of EECS

    2004
  • Other

    MIT Department of EECS

    2005
  • Ph.D., Electrical Engineering & Computer Science

    MIT Department of EECS

    2009
  • Other

    Duke University, Department of Statistical Science

    2011

Awards & honors

  • Presidential Early Career Award for Scientists and Engineers…
  • Sloan Research Fellowship
  • ONR Young Investigator award
  • NSF CAREER award
  • Leonard J. Savage Thesis Award in Applied Methodology
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Emily B. Fox

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup