Stephan Mandt
· Associate Professor and HPI Co-DirectorVerifiedUniversity of California, Irvine · Computer Science
Active 2010–2026
About
Stephan Mandt is an Associate Professor of Computer Science and Statistics at the University of California, Irvine. He previously led the machine learning group at Disney Research in Pittsburgh and Los Angeles and held postdoctoral positions at Princeton and Columbia University. Stephan holds a Ph.D. in Theoretical Physics from the University of Cologne, where he received the German National Merit Scholarship. His research is supported by NSF, DARPA, IARPA, DOE, Disney, Intel, and Qualcomm. His work focuses on deep generative models such as variational autoencoders and diffusion models, uncertainty quantification in neural networks, neural data compression, and the application of machine learning in physics, chemistry, and climate science. He is actively involved in the academic community as an Action Editor of the Journal of Machine Learning Research and Transaction on Machine Learning Research, and he regularly serves as an Area Chair for major conferences including NeurIPS, ICML, AAAI, and ICLR. Stephan is also the Program Chair for AISTATS 2024 and will serve as General Chair for AISTATS 2025.
Research topics
- Machine Learning
- Data Mining
- Artificial Intelligence
- Computer Science
- Physics
- Thermodynamics
- Chemistry
- Biological system
- Mathematics
- Physical chemistry
- Chromatography
Selected publications
2026-02-10
reportOpen accessThe past two decades have witnessed natural disasters and extreme weather events that affect millions of people. At the same time, the data volume from high-resolution climate models, satellite, in-situ and ground-based measurements have substantially increased to petabyte scales. These new and readily accessible datasets create the previously missing pipeline required for scientific machine learning (ML) and therefore new opportunities for improved understanding and prediction capability of climate extreme events. This project developed a deep latent variable model framework to discover physically meaningful hidden structures from high-dimensional, spatiotemporal climate extreme data.
Thermodynamically consistent machine learning model for excess Gibbs energy
Nature Communications · 2026-04-14
articleOpen accessThe excess Gibbs energy plays a central role in chemical engineering and chemistry, providing a basis for modeling thermodynamic properties of liquid mixtures. Predicting the excess Gibbs energy of multi-component mixtures solely from molecular structures is a long-standing challenge. We address this challenge with HANNA, a flexible machine learning model for excess Gibbs energy that integrates physical laws as hard constraints, guaranteeing thermodynamically consistent predictions. HANNA is trained on experimental data for vapor-liquid equilibria, liquid-liquid equilibria, activity coefficients at infinite dilution, and excess enthalpies in binary mixtures. The end-to-end training on liquid-liquid equilibrium data is facilitated by a surrogate solver. A geometric projection method enables robust extrapolations to multi-component mixtures. We demonstrate that HANNA delivers accurate predictions, while providing a substantially broader domain of applicability than state-of-the-art benchmark methods. The trained model and corresponding code are openly available, and an interactive interface is provided on our website, MLPROP.
Style Transfer for High-Fidelity Time Series Augmentation
Communications in computer and information science · 2026-01-01
book-chapterUMAMI: Unifying Masked Autoregressive Models and Deterministic Rendering for View Synthesis
ArXiv.org · 2025-12-23
articleOpen accessSenior authorNovel view synthesis (NVS) seeks to render photorealistic, 3D-consistent images of a scene from unseen camera poses given only a sparse set of posed views. Existing deterministic networks render observed regions quickly but blur unobserved areas, whereas stochastic diffusion-based methods hallucinate plausible content yet incur heavy training- and inference-time costs. In this paper, we propose a hybrid framework that unifies the strengths of both paradigms. A bidirectional transformer encodes multi-view image tokens and Plucker-ray embeddings, producing a shared latent representation. Two lightweight heads then act on this representation: (i) a feed-forward regression head that renders pixels where geometry is well constrained, and (ii) a masked autoregressive diffusion head that completes occluded or unseen regions. The entire model is trained end-to-end with joint photometric and diffusion losses, without handcrafted 3D inductive biases, enabling scalability across diverse scenes. Experiments demonstrate that our method attains state-of-the-art image quality while reducing rendering time by an order of magnitude compared with fully generative baselines.
Formally Exploring Time-Series Anomaly Detection Evaluation Metrics
ArXiv.org · 2025-10-20
preprintOpen accessUndetected anomalies in time series can trigger catastrophic failures in safety-critical systems, such as chemical plant explosions or power grid outages. Although many detection methods have been proposed, their performance remains unclear because current metrics capture only narrow aspects of the task and often yield misleading results. We address this issue by introducing verifiable properties that formalize essential requirements for evaluating time-series anomaly detection. These properties enable a theoretical framework that supports principled evaluations and reliable comparisons. Analyzing 37 widely used metrics, we show that most satisfy only a few properties, and none satisfy all, explaining persistent inconsistencies in prior results. To close this gap, we propose LARM, a flexible metric that provably satisfies all properties, and extend it to ALARM, an advanced variant meeting stricter requirements.
Parallel Token Prediction for Language Models
Open MIND · 2025-12-24
preprintSenior authorAutoregressive decoding in language models is inherently slow, generating only one token per forward pass. We propose Parallel Token Prediction (PTP), a general-purpose framework for predicting multiple tokens in a single model call. PTP moves the source of randomness from post-hoc sampling to random input variables, making future tokens deterministic functions of those inputs and thus jointly predictable in a single forward pass. We prove that a single PTP call can represent arbitrary dependencies between tokens. PTP is trained by distilling an existing model or through inverse autoregressive training without a teacher. Experimentally, PTP achieves a 2.4x speedup on a diverse-task speculative decoding benchmark. We provide code and checkpoints at https://github.com/mandt-lab/ptp.
On the Effect of Regularization on Nonparametric Mean-Variance Regression
ArXiv.org · 2025-11-27
preprintOpen accessSenior authorUncertainty quantification is vital for decision-making and risk assessment in machine learning. Mean-variance regression models, which predict both a mean and residual noise for each data point, provide a simple approach to uncertainty quantification. However, overparameterized mean-variance models struggle with signal-to-noise ambiguity, deciding whether prediction targets should be attributed to signal (mean) or noise (variance). At one extreme, models fit all training targets perfectly with zero residual noise, while at the other, they provide constant, uninformative predictions and explain the targets as noise. We observe a sharp phase transition between these extremes, driven by model regularization. Empirical studies with varying regularization levels illustrate this transition, revealing substantial variability across repeated runs. To explain this behavior, we develop a statistical field theory framework, which captures the observed phase transition in alignment with experimental results. This analysis reduces the regularization hyperparameter search space from two dimensions to one, significantly lowering computational costs. Experiments on UCI datasets and the large-scale ClimSim dataset demonstrate robust calibration performance, effectively quantifying predictive uncertainty.
Generative AI and Foundation Models
Oxford University Press eBooks · 2025-11-19
book-chapterSenior authorAbstract AI has a long history, but only recently has Generative AI risen to prominence as a form that makes notable progress on some of the long-standing goals of the field. Whereas classic machine learning models typically predict a category label or a number, a Generative AI model may produce an entire text, a photorealistic picture, or other complex outputs. This chapter provides a brief introduction to the techniques that enable this, starting from basic ideas such as learning from training data and building foundation models. The chapter explains how large language models operate internally, introducing the transformer neural network architecture. It also provides a high-level explanation of common generative models for images and further kinds of data. Finally, it concludes with a discussion of agentic AI and possible future developments.
Parallel Token Prediction for Language Models
arXiv (Cornell University) · 2025-12-24
articleOpen accessSenior authorAutoregressive decoding in language models is inherently slow, generating only one token per forward pass. We propose Parallel Token Prediction (PTP), a general-purpose framework for predicting multiple tokens in a single model call. PTP moves the source of randomness from post-hoc sampling to random input variables, making future tokens deterministic functions of those inputs and thus jointly predictable in a single forward pass. We prove that a single PTP call can represent arbitrary dependencies between tokens. PTP is trained by distilling an existing model or through inverse autoregressive training without a teacher. Experimentally, PTP achieves a 2.4x speedup on a diverse-task speculative decoding benchmark. We provide code and checkpoints at https://github.com/mandt-lab/ptp.
UMAMI: Unifying Masked Autoregressive Models and Deterministic Rendering for View Synthesis
arXiv (Cornell University) · 2025-12-23
preprintOpen accessSenior authorNovel view synthesis (NVS) seeks to render photorealistic, 3D-consistent images of a scene from unseen camera poses given only a sparse set of posed views. Existing deterministic networks render observed regions quickly but blur unobserved areas, whereas stochastic diffusion-based methods hallucinate plausible content yet incur heavy training- and inference-time costs. In this paper, we propose a hybrid framework that unifies the strengths of both paradigms. A bidirectional transformer encodes multi-view image tokens and Plucker-ray embeddings, producing a shared latent representation. Two lightweight heads then act on this representation: (i) a feed-forward regression head that renders pixels where geometry is well constrained, and (ii) a masked autoregressive diffusion head that completes occluded or unseen regions. The entire model is trained end-to-end with joint photometric and diffusion losses, without handcrafted 3D inductive biases, enabling scalability across diverse scenes. Experiments demonstrate that our method attains state-of-the-art image quality while reducing rendering time by an order of magnitude compared with fully generative baselines.
Recent grants
CAREER: Variational Inference for Resource-Efficient Learning
NSF · $446k · 2021–2026
RI: Small: Deep Variational Data Compression
NSF · $425k · 2020–2025
Frequent coauthors
- 48 shared
Robert Bamler
- 34 shared
Fabian Jirasek
- 31 shared
Marius Kloft
University of Koblenz and Landau
- 15 shared
Griffin Mooers
University of California, Irvine
- 15 shared
Yibo Yang
Southwest Petroleum University
- 14 shared
David M. Blei
- 14 shared
Hans Hasse
- 13 shared
Padhraic Smyth
Education
- 2012
Ph.D. , Theoretical Physics
University of Cologne
- 2008
B.S. and M.S., Physics and Mathematics
University of Cologne
Awards & honors
- German National Merit Scholarship
- NSF CAREER Award
- UCI ICS Mid-Career Excellence in Research Award
- German Research Foundation's Mercator Fellowship
- Kavli Fellow of the U.S. National Academy of Sciences
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Stephan Mandt
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup