
About
Her research focuses on advancing machine learning methods for applications in health and biology. Particular interests are in health sensing and wearable technologies, multimodal biological data (microscopy, omics), and neuroimaging data. Methodologically, her work emphasizes sequence modeling, Bayesian approaches, and generative modeling.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Medicine
- Business
- Demographic economics
- Telecommunications
- Virology
- Environmental health
- Neuroscience
- Economics
- Psychology
Selected publications
Vascular waveform analysis using Bayesian pulse deconvolution
bioRxiv (Cold Spring Harbor Laboratory) · 2026-02-11
articleOpen accessSenior authorVascular waveforms, which measure bulk flow in blood vessels, are widely used to measure vital signs, diagnose conditions, and predict long-term health outcomes. Analyzing vascular waveforms depends on three fundamentally interdependent tasks: signal filtering, pulse timing detection, and pulse shape extraction. We hypothesized that Bayesian pulse deconvolution can achieve improved performance on all three tasks by solving them jointly. This method uses an analytical, generative model of vascular waveforms with priors informed by physical and biological domain knowledge. In simulations, Bayesian pulse deconvolution achieves better performance on all tasks compared with existing algorithms: 90% reduction of median filtering error, 60% reduction in pulse timing error, and 85% reduction in shape extraction error. The advantages in simulations extend to human recordings of photoplethysmography waveforms. Taking real time-synchronized electrocardiogram R-R intervals as a proxy ground truth, Bayesian pulse deconvolution achieves 40% lower pulse interval estimation error (RMSE =5.1 ms) compared with typical algorithms (RMSE = 8.3 ms, p=1e-10). By extracting more accurate and informative insights from vascular waveforms, Bayesian pulse deconvolution could advance a wide array of health technologies that rely on interpreting signals from blood vessels.
Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention
arXiv (Cornell University) · 2026-04-22
preprintOpen accessSenior authorScaling large language models to long contexts is challenging due to the quadratic computational cost of full attention. Mitigation approaches include KV-cache selection or compression techniques. We instead provide an effective and end-to-end learnable bridge between the two without requiring architecture modification. In particular, our key insight is that interleaved gist compression tokens -- which provide a learnable summary of sets of raw tokens -- can serve as routing signals for sparse attention. Building on this, we introduce selective unfolding via GSA, which first compresses the context into gist tokens, then selects the most relevant gists, and subsequently restores the corresponding raw chunks for detailed attention. This yields a simple coarse-to-fine mechanism that combines compact global representations with targeted access to fine-grained evidence. We further incorporate this process directly into training in an end-to-end fashion, avoiding the need for external retrieval modules. In addition, we extend the framework hierarchically via recursive gist-of-gist construction, enabling multi-resolution context access with logarithmic per-step decoding complexity. Empirical results on LongBench and RAG benchmarks demonstrate that our method consistently outperforms other compression baselines as well as inference-time sparse attention methods across compression ratios from $8\times$ to $32\times$. The code is available at: https://github.com/yuzhenmao/gist-sparse-attention/
parkersruth/bayesian_pulse_deconvolution: v1.0.0-preprint
Open MIND · 2026-02-10
otherSenior authorPreprint version release
parkersruth/bayesian_pulse_deconvolution: v1.0.0-preprint
Zenodo (CERN European Organization for Nuclear Research) · 2026-02-10
otherOpen accessSenior authorPreprint version release
Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition
arXiv (Cornell University) · 2026-04-07
articleOpen accessSenior authorLarge language models exhibit sycophancy, the tendency to shift their stated positions toward perceived user preferences or authority cues regardless of evidence. Standard alignment methods fail to correct this because scalar reward models conflate two distinct failure modes into a single signal: pressure capitulation, where the model changes a correct answer under social pressure, and evidence blindness, where the model ignores the provided context entirely. We operationalise sycophancy through formal definitions of pressure independence and evidence responsiveness, serving as a working framework for disentangled training rather than a definitive characterisation of the phenomenon. We propose the first approach to sycophancy reduction via reward decomposition, introducing a multi-component Group Relative Policy Optimisation (GRPO) reward that decomposes the training signal into five terms: pressure resistance, context fidelity, position consistency, agreement suppression, and factual correctness. We train using a contrastive dataset pairing pressure-free baselines with pressured variants across three authority levels and two opposing evidence contexts. Across five base models, our two-phase pipeline consistently reduces sycophancy on all metric axes, with ablations confirming that each reward term governs an independent behavioural dimension. The learned resistance to pressure generalises beyond our training methodology and prompt structure, reducing answer-priming sycophancy by up to 17 points on SycophancyEval despite the absence of such pressure forms during training.
How Well Do Multimodal Models Reason on ECG Signals?
Open MIND · 2026-02-27
preprintWhile multimodal large language models offer a promising solution to the "black box" nature of health AI by generating interpretable reasoning traces, verifying the validity of these traces remains a critical challenge. Existing evaluation methods are either unscalable, relying on manual clinician review, or superficial, utilizing proxy metrics (e.g. QA) that fail to capture the semantic correctness of clinical logic. In this work, we introduce a reproducible framework for evaluating reasoning in ECG signals. We propose decomposing reasoning into two distinct, components: (i) Perception, the accurate identification of patterns within the raw signal, and (ii) Deduction, the logical application of domain knowledge to those patterns. To evaluate Perception, we employ an agentic framework that generates code to empirically verify the temporal structures described in the reasoning trace. To evaluate Deduction, we measure the alignment of the model's logic against a structured database of established clinical criteria in a retrieval-based approach. This dual-verification method enables the scalable assessment of "true" reasoning capabilities.
BALAR : A Bayesian Agentic Loop for Active Reasoning
ArXiv.org · 2026-05-06
articleOpen accessSenior authorLarge language models increasingly operate in interactive settings where solving a task requires multiple rounds of information exchange with a user. However, most current systems treat dialogue reactively and lack a principled mechanism to reason about what information is missing and which question should be asked next. We propose BALAR (Bayesian Agentic Loop for Active Reasoning), a task-agnostic outer-loop algorithm that requires no fine-tuning and enables structured multi-turn interaction between an LLM agent and a user. BALAR maintains a structured belief over latent states, selects clarifying questions by maximizing expected mutual information, and dynamically expands its state representation when the current one proves insufficient. We evaluate BALAR on three diverse benchmarks: AR-Bench-DC (detective cases), AR-Bench-SP (thinking puzzles), and iCraft-MD (clinical diagnosis). BALAR significantly outperforms all baselines across all three benchmarks, with $14.6\%$ higher accuracy on AR-Bench-DC, $38.5\%$ on AR-Bench-SP, and $30.5\%$ on iCraft-MD.
Continuous-Utility Direct Preference Optimization
arXiv (Cornell University) · 2026-01-31
articleOpen accessSenior authorLarge language model reasoning is often treated as a monolithic capability, relying on binary preference supervision that fails to capture partial progress or fine-grained reasoning quality. We introduce Continuous Utility Direct Preference Optimization (CU-DPO), a framework that aligns models to a portfolio of prompt-based cognitive strategies by replacing binary labels with continuous scores that capture fine-grained reasoning quality. We prove that learning with K strategies yields a Theta(K log K) improvement in sample complexity over binary preferences, and that DPO converges to the entropy-regularized utility-maximizing policy. To exploit this signal, we propose a two-stage training pipeline: (i) strategy selection, which optimizes the model to choose the best strategy for a given problem via best-vs-all comparisons, and (ii) execution refinement, which trains the model to correctly execute the selected strategy using margin-stratified pairs. On mathematical reasoning benchmarks, CU-DPO improves strategy selection accuracy from 35-46 percent to 68-78 percent across seven base models, yielding consistent downstream reasoning gains of up to 6.6 points on in-distribution datasets with effective transfer to out-of-distribution tasks.
Neural Garbage Collection: Learning to Forget while Learning to Reason
arXiv (Cornell University) · 2026-04-20
articleOpen accessChain-of-thought reasoning has driven striking advances in language model capability, yet every reasoning step grows the KV cache, creating a bottleneck to scaling this paradigm further. Current approaches manage these constraints on the model's behalf using hand-designed criteria. A more scalable approach would let end-to-end learning subsume this design choice entirely, following a broader pattern in deep learning. After all, if a model can learn to reason, why can't it learn to forget? We introduce Neural Garbage Collection (NGC), in which a language model learns to forget while learning to reason, trained end-to-end from outcome-based task reward alone. As the model reasons, it periodically pauses, decides which KV cache entries to evict, and continues to reason conditioned on the remaining cache. By treating tokens in a chain-of-thought and cache-eviction decisions as discrete actions sampled from the language model, we can use reinforcement learning to jointly optimize how the model reasons and how it manages its own memory: what the model evicts shapes what it remembers, what it remembers shapes its reasoning, and the correctness of that reasoning determines its reward. Crucially, the model learns this behavior entirely from a single learning signal - the outcome-based task reward - without supervised fine-tuning or proxy objectives. On Countdown, AMC, and AIME tasks, NGC maintains strong accuracy relative to the full-cache upper bound at 2-3x peak KV cache size compression and substantially outperforms eviction baselines. Our results are a first step towards a broader vision where end-to-end optimization drives both capability and efficiency in language models.
BALAR : A Bayesian Agentic Loop for Active Reasoning
arXiv (Cornell University) · 2026-05-06
preprintOpen accessSenior authorLarge language models increasingly operate in interactive settings where solving a task requires multiple rounds of information exchange with a user. However, most current systems treat dialogue reactively and lack a principled mechanism to reason about what information is missing and which question should be asked next. We propose BALAR (Bayesian Agentic Loop for Active Reasoning), a task-agnostic outer-loop algorithm that requires no fine-tuning and enables structured multi-turn interaction between an LLM agent and a user. BALAR maintains a structured belief over latent states, selects clarifying questions by maximizing expected mutual information, and dynamically expands its state representation when the current one proves insufficient. We evaluate BALAR on three diverse benchmarks: AR-Bench-DC (detective cases), AR-Bench-SP (thinking puzzles), and iCraft-MD (clinical diagnosis). BALAR significantly outperforms all baselines across all three benchmarks, with $14.6\%$ higher accuracy on AR-Bench-DC, $38.5\%$ on AR-Bench-SP, and $30.5\%$ on iCraft-MD.
Recent grants
CAREER: Scaling up Modeling and Statistical Inference for Massive Collections of Time Series
NSF · $549k · 2014–2021
PostDoctoral Research Fellowship
NSF · $135k · 2009–2013
CAREER: Exploiting Topology in Graph Algorithm Design
NSF · $587k · 2020–2025
Frequent coauthors
- 41 shared
Nicholas J. Foti
Apple (United States)
- 31 shared
Michael I. Jordan
- 30 shared
Erik B. Sudderth
- 30 shared
Alan S. Willsky
- 19 shared
Alex Tank
University of Washington
- 14 shared
Andrew C. Miller
- 13 shared
Carlos Guestrin
- 13 shared
Ali Shojaie
Labs
Education
- 2004
Other, Electrical Engineering
MIT Department of EECS
- 2005
Other
MIT Department of EECS
- 2009
Ph.D., Electrical Engineering & Computer Science
MIT Department of EECS
- 2011
Other
Duke University, Department of Statistical Science
Awards & honors
- Presidential Early Career Award for Scientists and Engineers…
- Sloan Research Fellowship
- ONR Young Investigator award
- NSF CAREER award
- Leonard J. Savage Thesis Award in Applied Methodology
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Emily B. Fox
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup