Emily B. Fox

· Amazon Professor of Machine LearningVerified

Stanford University · Statistics

Active 1984–2026

h-index36

Citations7.6k

Papers19759 last 5y

Funding$1.3M

Faculty page Lab page Website

See your match with Emily B. Fox — sign in to PhdFit.Sign in

About

Her research focuses on advancing machine learning methods for applications in health and biology. Particular interests are in health sensing and wearable technologies, multimodal biological data (microscopy, omics), and neuroimaging data. Methodologically, her work emphasizes sequence modeling, Bayesian approaches, and generative modeling.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Medicine
Business
Demographic economics
Telecommunications
Virology
Environmental health
Neuroscience
Economics
Psychology

Selected publications

Vascular waveform analysis using Bayesian pulse deconvolution
bioRxiv (Cold Spring Harbor Laboratory) · 2026-02-11
articleOpen accessSenior author
Vascular waveforms, which measure bulk flow in blood vessels, are widely used to measure vital signs, diagnose conditions, and predict long-term health outcomes. Analyzing vascular waveforms depends on three fundamentally interdependent tasks: signal filtering, pulse timing detection, and pulse shape extraction. We hypothesized that Bayesian pulse deconvolution can achieve improved performance on all three tasks by solving them jointly. This method uses an analytical, generative model of vascular waveforms with priors informed by physical and biological domain knowledge. In simulations, Bayesian pulse deconvolution achieves better performance on all tasks compared with existing algorithms: 90% reduction of median filtering error, 60% reduction in pulse timing error, and 85% reduction in shape extraction error. The advantages in simulations extend to human recordings of photoplethysmography waveforms. Taking real time-synchronized electrocardiogram R-R intervals as a proxy ground truth, Bayesian pulse deconvolution achieves 40% lower pulse interval estimation error (RMSE =5.1 ms) compared with typical algorithms (RMSE = 8.3 ms, p=1e-10). By extracting more accurate and informative insights from vascular waveforms, Bayesian pulse deconvolution could advance a wide array of health technologies that rely on interpreting signals from blood vessels.
Publisher DOI
Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention
arXiv (Cornell University) · 2026-04-22
preprintOpen accessSenior author
Scaling large language models to long contexts is challenging due to the quadratic computational cost of full attention. Mitigation approaches include KV-cache selection or compression techniques. We instead provide an effective and end-to-end learnable bridge between the two without requiring architecture modification. In particular, our key insight is that interleaved gist compression tokens -- which provide a learnable summary of sets of raw tokens -- can serve as routing signals for sparse attention. Building on this, we introduce selective unfolding via GSA, which first compresses the context into gist tokens, then selects the most relevant gists, and subsequently restores the corresponding raw chunks for detailed attention. This yields a simple coarse-to-fine mechanism that combines compact global representations with targeted access to fine-grained evidence. We further incorporate this process directly into training in an end-to-end fashion, avoiding the need for external retrieval modules. In addition, we extend the framework hierarchically via recursive gist-of-gist construction, enabling multi-resolution context access with logarithmic per-step decoding complexity. Empirical results on LongBench and RAG benchmarks demonstrate that our method consistently outperforms other compression baselines as well as inference-time sparse attention methods across compression ratios from $8\times$ to $32\times$. The code is available at: https://github.com/yuzhenmao/gist-sparse-attention/
Publisher DOI
parkersruth/bayesian_pulse_deconvolution: v1.0.0-preprint
Open MIND · 2026-02-10
otherSenior author
Preprint version release
DOI
parkersruth/bayesian_pulse_deconvolution: v1.0.0-preprint
Zenodo (CERN European Organization for Nuclear Research) · 2026-02-10
otherOpen accessSenior author
Preprint version release
Publisher DOI
Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition
arXiv (Cornell University) · 2026-04-07
articleOpen accessSenior author
Large language models exhibit sycophancy, the tendency to shift their stated positions toward perceived user preferences or authority cues regardless of evidence. Standard alignment methods fail to correct this because scalar reward models conflate two distinct failure modes into a single signal: pressure capitulation, where the model changes a correct answer under social pressure, and evidence blindness, where the model ignores the provided context entirely. We operationalise sycophancy through formal definitions of pressure independence and evidence responsiveness, serving as a working framework for disentangled training rather than a definitive characterisation of the phenomenon. We propose the first approach to sycophancy reduction via reward decomposition, introducing a multi-component Group Relative Policy Optimisation (GRPO) reward that decomposes the training signal into five terms: pressure resistance, context fidelity, position consistency, agreement suppression, and factual correctness. We train using a contrastive dataset pairing pressure-free baselines with pressured variants across three authority levels and two opposing evidence contexts. Across five base models, our two-phase pipeline consistently reduces sycophancy on all metric axes, with ablations confirming that each reward term governs an independent behavioural dimension. The learned resistance to pressure generalises beyond our training methodology and prompt structure, reducing answer-priming sycophancy by up to 17 points on SycophancyEval despite the absence of such pressure forms during training.
Publisher OA PDF
How Well Do Multimodal Models Reason on ECG Signals?
Open MIND · 2026-02-27
preprint
While multimodal large language models offer a promising solution to the "black box" nature of health AI by generating interpretable reasoning traces, verifying the validity of these traces remains a critical challenge. Existing evaluation methods are either unscalable, relying on manual clinician review, or superficial, utilizing proxy metrics (e.g. QA) that fail to capture the semantic correctness of clinical logic. In this work, we introduce a reproducible framework for evaluating reasoning in ECG signals. We propose decomposing reasoning into two distinct, components: (i) Perception, the accurate identification of patterns within the raw signal, and (ii) Deduction, the logical application of domain knowledge to those patterns. To evaluate Perception, we employ an agentic framework that generates code to empirically verify the temporal structures described in the reasoning trace. To evaluate Deduction, we measure the alignment of the model's logic against a structured database of established clinical criteria in a retrieval-based approach. This dual-verification method enables the scalable assessment of "true" reasoning capabilities.
DOI
BALAR : A Bayesian Agentic Loop for Active Reasoning
ArXiv.org · 2026-05-06
articleOpen accessSenior author
Large language models increasingly operate in interactive settings where solving a task requires multiple rounds of information exchange with a user. However, most current systems treat dialogue reactively and lack a principled mechanism to reason about what information is missing and which question should be asked next. We propose BALAR (Bayesian Agentic Loop for Active Reasoning), a task-agnostic outer-loop algorithm that requires no fine-tuning and enables structured multi-turn interaction between an LLM agent and a user. BALAR maintains a structured belief over latent states, selects clarifying questions by maximizing expected mutual information, and dynamically expands its state representation when the current one proves insufficient. We evaluate BALAR on three diverse benchmarks: AR-Bench-DC (detective cases), AR-Bench-SP (thinking puzzles), and iCraft-MD (clinical diagnosis). BALAR significantly outperforms all baselines across all three benchmarks, with $14.6\%$ higher accuracy on AR-Bench-DC, $38.5\%$ on AR-Bench-SP, and $30.5\%$ on iCraft-MD.
Publisher OA PDF
Continuous-Utility Direct Preference Optimization
arXiv (Cornell University) · 2026-01-31
articleOpen accessSenior author
Large language model reasoning is often treated as a monolithic capability, relying on binary preference supervision that fails to capture partial progress or fine-grained reasoning quality. We introduce Continuous Utility Direct Preference Optimization (CU-DPO), a framework that aligns models to a portfolio of prompt-based cognitive strategies by replacing binary labels with continuous scores that capture fine-grained reasoning quality. We prove that learning with K strategies yields a Theta(K log K) improvement in sample complexity over binary preferences, and that DPO converges to the entropy-regularized utility-maximizing policy. To exploit this signal, we propose a two-stage training pipeline: (i) strategy selection, which optimizes the model to choose the best strategy for a given problem via best-vs-all comparisons, and (ii) execution refinement, which trains the model to correctly execute the selected strategy using margin-stratified pairs. On mathematical reasoning benchmarks, CU-DPO improves strategy selection accuracy from 35-46 percent to 68-78 percent across seven base models, yielding consistent downstream reasoning gains of up to 6.6 points on in-distribution datasets with effective transfer to out-of-distribution tasks.
Publisher OA PDF
Neural Garbage Collection: Learning to Forget while Learning to Reason
arXiv (Cornell University) · 2026-04-20
articleOpen access
Chain-of-thought reasoning has driven striking advances in language model capability, yet every reasoning step grows the KV cache, creating a bottleneck to scaling this paradigm further. Current approaches manage these constraints on the model's behalf using hand-designed criteria. A more scalable approach would let end-to-end learning subsume this design choice entirely, following a broader pattern in deep learning. After all, if a model can learn to reason, why can't it learn to forget? We introduce Neural Garbage Collection (NGC), in which a language model learns to forget while learning to reason, trained end-to-end from outcome-based task reward alone. As the model reasons, it periodically pauses, decides which KV cache entries to evict, and continues to reason conditioned on the remaining cache. By treating tokens in a chain-of-thought and cache-eviction decisions as discrete actions sampled from the language model, we can use reinforcement learning to jointly optimize how the model reasons and how it manages its own memory: what the model evicts shapes what it remembers, what it remembers shapes its reasoning, and the correctness of that reasoning determines its reward. Crucially, the model learns this behavior entirely from a single learning signal - the outcome-based task reward - without supervised fine-tuning or proxy objectives. On Countdown, AMC, and AIME tasks, NGC maintains strong accuracy relative to the full-cache upper bound at 2-3x peak KV cache size compression and substantially outperforms eviction baselines. Our results are a first step towards a broader vision where end-to-end optimization drives both capability and efficiency in language models.
Publisher OA PDF
BALAR : A Bayesian Agentic Loop for Active Reasoning
arXiv (Cornell University) · 2026-05-06
preprintOpen accessSenior author
Large language models increasingly operate in interactive settings where solving a task requires multiple rounds of information exchange with a user. However, most current systems treat dialogue reactively and lack a principled mechanism to reason about what information is missing and which question should be asked next. We propose BALAR (Bayesian Agentic Loop for Active Reasoning), a task-agnostic outer-loop algorithm that requires no fine-tuning and enables structured multi-turn interaction between an LLM agent and a user. BALAR maintains a structured belief over latent states, selects clarifying questions by maximizing expected mutual information, and dynamically expands its state representation when the current one proves insufficient. We evaluate BALAR on three diverse benchmarks: AR-Bench-DC (detective cases), AR-Bench-SP (thinking puzzles), and iCraft-MD (clinical diagnosis). BALAR significantly outperforms all baselines across all three benchmarks, with $14.6\%$ higher accuracy on AR-Bench-DC, $38.5\%$ on AR-Bench-SP, and $30.5\%$ on iCraft-MD.
Publisher DOI

Recent grants

CAREER: Scaling up Modeling and Statistical Inference for Massive Collections of Time Series
NSF · $549k · 2014–2021
PostDoctoral Research Fellowship
NSF · $135k · 2009–2013
CAREER: Exploiting Topology in Graph Algorithm Design
NSF · $587k · 2020–2025

Frequent coauthors

Nicholas J. Foti
Apple (United States)
41 shared
Michael I. Jordan
31 shared
Erik B. Sudderth
30 shared
Alan S. Willsky
30 shared
Alex Tank
University of Washington
19 shared
Andrew C. Miller
14 shared
Carlos Guestrin
13 shared
Ali Shojaie
13 shared

Labs

Institute for Computational & Mathematical EngineeringPI

Education

Other, Electrical Engineering
MIT Department of EECS
2004
Other
MIT Department of EECS
2005
Ph.D., Electrical Engineering & Computer Science
MIT Department of EECS
2009
Other
Duke University, Department of Statistical Science
2011

Awards & honors

Presidential Early Career Award for Scientists and Engineers…
Sloan Research Fellowship
ONR Young Investigator award
NSF CAREER award
Leonard J. Savage Thesis Award in Applied Methodology

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Emily B. Fox

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you