
About
The Barnum-Simons Chair in Mathematics and Statistics at Stanford University, Professor of Mathematics and Statistics, Professor of Electrical Engineering (by courtesy), and Co-chair of the Data Science Institute. Research interests include compressive sensing, mathematical signal processing, computational harmonic analysis, statistics, scientific computing, and applications to the imaging sciences and inverse problems. Other topics of recent interest include theoretical computer science, mathematical optimization, and information theory.
Research topics
- Computer Science
- Biology
- Computational biology
- Machine Learning
- Data Mining
- Statistics
- Mathematics
- Genetics
- Econometrics
Selected publications
Towards Execution-Grounded Automated AI Research
arXiv (Cornell University) · 2026-01-20
preprintOpen accessAutomated AI research holds great potential to accelerate scientific discovery. However, current LLMs often generate plausible-looking but ineffective ideas. Execution grounding may help, but it is unclear whether automated execution is feasible and whether LLMs can learn from the execution feedback. To investigate these, we first build an automated executor to implement ideas and launch large-scale parallel GPU experiments to verify their effectiveness. We then convert two realistic research problems - LLM pre-training and post-training - into execution environments and demonstrate that our automated executor can implement a large fraction of the ideas sampled from frontier LLMs. We analyze two methods to learn from the execution feedback: evolutionary search and reinforcement learning. Execution-guided evolutionary search is sample-efficient: it finds a method that significantly outperforms the GRPO baseline (69.4% vs 48.0%) on post-training, and finds a pre-training recipe that outperforms the nanoGPT baseline (19.7 minutes vs 35.9 minutes) on pre-training, all within just ten search epochs. Frontier LLMs often generate meaningful algorithmic ideas during search, but they tend to saturate early and only occasionally exhibit scaling trends. Reinforcement learning from execution reward, on the other hand, suffers from mode collapse. It successfully improves the average reward of the ideator model but not the upper-bound, due to models converging on simple ideas. We thoroughly analyze the executed ideas and training dynamics to facilitate future efforts towards execution-grounded automated AI research.
Towards Execution-Grounded Automated AI Research
ArXiv.org · 2026-01-20
articleOpen accessAutomated AI research holds great potential to accelerate scientific discovery. However, current LLMs often generate plausible-looking but ineffective ideas. Execution grounding may help, but it is unclear whether automated execution is feasible and whether LLMs can learn from the execution feedback. To investigate these, we first build an automated executor to implement ideas and launch large-scale parallel GPU experiments to verify their effectiveness. We then convert two realistic research problems - LLM pre-training and post-training - into execution environments and demonstrate that our automated executor can implement a large fraction of the ideas sampled from frontier LLMs. We analyze two methods to learn from the execution feedback: evolutionary search and reinforcement learning. Execution-guided evolutionary search is sample-efficient: it finds a method that significantly outperforms the GRPO baseline (69.4% vs 48.0%) on post-training, and finds a pre-training recipe that outperforms the nanoGPT baseline (19.7 minutes vs 35.9 minutes) on pre-training, all within just ten search epochs. Frontier LLMs often generate meaningful algorithmic ideas during search, but they tend to saturate early and only occasionally exhibit scaling trends. Reinforcement learning from execution reward, on the other hand, suffers from mode collapse. It successfully improves the average reward of the ideator model but not the upper-bound, due to models converging on simple ideas. We thoroughly analyze the executed ideas and training dynamics to facilitate future efforts towards execution-grounded automated AI research.
TxConformal: Controlling False Discoveries in AI-Driven Therapeutic Discovery
bioRxiv (Cold Spring Harbor Laboratory) · 2026-04-30
articleSenior authorArtificial Intelligence (AI) is transforming therapeutic discovery by scoring a large set of promising candidates and prioritizing a shortlist for further investigation. Quantifying the reliability of AI scores and preventing false positives among selected candidates is key to the efficiency of the discovery process. Conformal prediction (CP) has emerged as a popular tool for guiding such prioritization, especially via the conformal selection framework to control false discovery rates (FDR) in selecting top-ranked candidates under distributional shift 1, 2 . However, deploying these advances in real-world therapeutic discovery remains challenging: distribution shifts are difficult to quantify and correct in high-dimensional biomedical data, and practical workflows often require flexible error metrics. Here, we present T x C onformal , a general framework for trustworthy decision making when building shortlists using AI scores. T x C onformal adjusts for distribution shift by balancing the hidden representations in AI models and then provides confidence measures for true discoveries of target biological properties. These confidence measures, interpretable as p-values, can be used in conjunction with statistical multiple testing procedures to derive selection decisions with limited false positives or to estimate the errors in given selection decisions. T x C onformal controls the false positive rate in six real-world tasks spanning various therapeutic discovery stages, modalities, and AI models with realistic data splits. When selecting promising combinatorial genetic perturbations, T x C onformal nearly halves false-positive selections compared to baseline methods, substantially reducing unnecessary experimental costs by tens of thousands of dollars. When selecting stable protein structures under mutant shifts, T x C onformal identifies about 10 times more proteins than baseline methods at stringent thresholds when running at a target FDR level of 10%, recovering over 90% of valuable candidates that baseline methods miss due to unaccounted distribution shifts. Furthermore, we demonstrate that T x C onformal robustly supports various alternative error metrics suitable for resource-constrained settings. Finally, in a prospective fixed-budget virtual screening campaign for novel antibiotic discovery, T x C onformal predicted false positives in close agreement with experimental outcomes, with substantial improvements over simple baselines.
Single-Asset Adaptive Leveraged Volatility Control
SSRN Electronic Journal · 2026-01-01
preprintOpen accessThematic Investing: A Risk-based Perspective
SSRN Electronic Journal · 2025-01-01
preprintOpen access1st authorCorrespondingCharacterizing the Training-Conditional Coverage of Full Conformal Inference in High Dimensions
ArXiv.org · 2025-02-27
preprintOpen accessSenior authorWe study the coverage properties of full conformal regression in the proportional asymptotic regime where the ratio of the dimension and the sample size converges to a constant. In this setting, existing theory tells us only that full conformal inference is unbiased, in the sense that its average coverage lies at the desired level when marginalized over both the new test point and the training data. Considerably less is known about the behaviour of these methods conditional on the training set. As a result, the exact benefits of full conformal inference over much simpler alternative methods is unclear. This paper investigates the behaviour of full conformal inference and natural uncorrected alternatives for a broad class of $L_2$-regularized linear regression models. We show that in the proportional asymptotic regime the training-conditional coverage of full conformal inference concentrates at the target value. On the other hand, simple alternatives that directly compare test and training residuals realize constant undercoverage bias. While these results demonstrate the necessity of full conformal in correcting for high-dimensional overfitting, we also show that this same methodology is redundant for the related task of tuning the regularization level. In particular, we show that full conformal inference still yields asymptotically valid coverage when the regularization level is selected using only the training set, without consideration of the test point. Simulations show that our asymptotic approximations are accurate in finite samples and can be readily extended to other popular full conformal variants, such as full conformal quantile regression and the LASSO, that do not directly meet our assumptions.
Thematic Investing: A Risk-Based Perspective
Financial Analysts Journal · 2025-08-01 · 3 citations
article1st authorCorrespondingCorrecting the Coverage Bias of Quantile Regression
ArXiv.org · 2025-11-02
preprintOpen accessSenior authorWe develop a collection of methods for adjusting the predictions of quantile regression to ensure coverage. Our methods are model agnostic and can be used to correct for high-dimensional overfitting bias with only minimal assumptions. Theoretical results show that the estimates we develop are consistent and facilitate accurate calibration in the proportional asymptotic regime where the ratio of the dimension of the data and the sample size converges to a constant. This is further confirmed by experiments on both simulated and real data. A key component of our work is a new connection between the leave-one-out coverage and the fitted values of variables appearing in a dual formulation of the quantile regression problem. This facilitates the use of cross-validation in a variety of settings at significantly reduced computational costs.
IEEE Transactions on Information Theory · 2025-01-15 · 2 citations
articleThe practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We aim to address this discrepancy by adopting a convex optimization and sparse recovery perspective. We consider the training and generalization properties of two-layer ReLU networks with standard weight decay regularization. Under certain regularity assumptions on the data, we show that ReLU networks with an arbitrary number of parameters learn only simple models that explain the data. This is analogous to the recovery of the sparsest linear model in compressed sensing. For ReLU networks and their variants with skip connections or normalization layers, we present isometry conditions that ensure the exact recovery of planted neurons. For randomly generated data, we show the existence of a phase transition in recovering planted neural network models, which is easy to describe: whenever the ratio between the number of samples and the dimension exceeds a numerical threshold, the recovery succeeds with high probability; otherwise, it fails with high probability. Surprisingly, ReLU networks learn simple and sparse models that generalize well even when the labels are noisy. The phase transition phenomenon is confirmed through numerical experiments.
Automated Hypothesis Validation with Agentic Sequential Falsifications
ArXiv.org · 2025-02-14 · 4 citations
preprintOpen accessHypotheses are central to information acquisition, decision-making, and discovery. However, many real-world hypotheses are abstract, high-level statements that are difficult to validate directly. This challenge is further intensified by the rise of hypothesis generation from Large Language Models (LLMs), which are prone to hallucination and produce hypotheses in volumes that make manual validation impractical. Here we propose Popper, an agentic framework for rigorous automated validation of free-form hypotheses. Guided by Karl Popper's principle of falsification, Popper validates a hypothesis using LLM agents that design and execute falsification experiments targeting its measurable implications. A novel sequential testing framework ensures strict Type-I error control while actively gathering evidence from diverse observations, whether drawn from existing data or newly conducted procedures. We demonstrate Popper on six domains including biology, economics, and sociology. Popper delivers robust error control, high power, and scalability. Furthermore, compared to human scientists, Popper achieved comparable performance in validating complex biological hypotheses while reducing time by 10 folds, providing a scalable, rigorous solution for hypothesis validation.
Frequent coauthors
- 43 shared
Rina Foygel Barber
- 37 shared
Ery Arias-Castro
University of California, San Diego
- 29 shared
Chiara Sabatti
Stanford University
- 26 shared
Weijie Su
- 23 shared
Małgorzata Bogdan
University of Wrocław
- 21 shared
David L. Donoho
Stanford University
- 20 shared
Terence Tao
University of California, Los Angeles
- 20 shared
Matteo Sesia
University of Southern California
Education
- 1995
B.S., Mathematics
California Institute of Technology
- 1996
M.S., Mathematics
California Institute of Technology
- 1999
Ph.D., Mathematics
California Institute of Technology
Awards & honors
- 2020 Princess of Asturias Award for Technical and Scientific…
- 2017 MacArthur Fellow
- 2021 IEEE Jack S. Kilby Signal Processing Medal
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Emmanuel J. Candès
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup