
Padhraic Smyth
· Distinguished Professor, Director of UCI's Data Science Initiative and Vice Chair of Computing and HPI Co-DirectorVerifiedUniversity of California, Irvine · Computer Science
Active 1850–2025
About
Padhraic Smyth is a distinguished professor and the associate director at the Center for Machine Learning and Intelligent Systems at the University of California, Irvine. He has been appointed the inaugural Hasso Plattner Endowed Chair in Artificial Intelligence, recognizing his significant contributions to the field of AI. His career has been marked by the development of theories and algorithms for machine learning, with a particular emphasis on statistical methods that have advanced the discipline. Smyth's work has earned him notable recognition, including the Best Paper Award at AIStats 2026 for his collaboration on developing a deep generative model for forecasting temporal point processes. His research and leadership continue to influence the evolution of machine learning and artificial intelligence.
Research topics
- Computer Science
- Data science
- World Wide Web
- Data Mining
- Artificial Intelligence
- Human–computer interaction
- Geography
- Earth science
- Ecology
- Oceanography
- Environmental science
- Climatology
- Multimedia
- Biology
- Geology
- Atmospheric sciences
Selected publications
Understanding Gender Bias in AI-Generated Product Descriptions
2025-06-23 · 1 citations
preprintOpen accessWhile gender bias in large language models (LLMs) has been extensively studied in many domains, uses of LLMs in e-commerce remain largely unexamined and may reveal novel forms of algorithmic bias and harm.Our work investigates this space, developing data-driven taxonomic categories of gender bias in the context of product description generation, which we situate with respect to existing general purpose harms taxonomies.We illustrate how AI-generated product descriptions can uniquely surface gender biases in ways that require specialized detection and mitigation approaches.Further, we quantitatively analyze issues corresponding to our taxonomic categories in two models used for this task-GPT-3.5 and an e-commerce-specific LLM-demonstrating that these forms of bias commonly occur in practice.Our results illuminate unique, under-explored dimensions of gender bias, such as assumptions about clothing size, stereotypical bias in which features of a product are advertised, and differences in the use of persuasive language.These insights contribute to our understanding of three types of AI harms identified by current frameworks: exclusionary norms, stereotyping, and performance disparities, particularly for the context of e-commerce.
Semantic Probabilistic Control of Language Models
ArXiv.org · 2025-05-04
preprintOpen accessSemantic control entails steering LM generations towards satisfying subtle non-lexical constraints, e.g., toxicity, sentiment, or politeness, attributes that can be captured by a sequence-level verifier. It can thus be viewed as sampling from the LM distribution conditioned on the target attribute, a computationally intractable problem due to the non-decomposable nature of the verifier. Existing approaches to LM control either only deal with syntactic constraints which cannot capture the aforementioned attributes, or rely on sampling to explore the conditional LM distribution, an ineffective estimator for low-probability events. In this work, we leverage a verifier's gradient information to efficiently reason over all generations that satisfy the target attribute, enabling precise steering of LM generations by reweighing the next-token distribution. Starting from an initial sample, we create a local LM distribution favoring semantically similar sentences. This approximation enables the tractable computation of an expected sentence embedding. We use this expected embedding, informed by the verifier's evaluation at the initial sample, to estimate the probability of satisfying the constraint, which directly informs the update to the next-token distribution. We evaluated the effectiveness of our approach in controlling the toxicity, sentiment, and topic-adherence of LMs yielding generations satisfying the constraint with high probability (>95%) without degrading their quality.
Technical Report: Towards Unified Diffusion Models for Multi-Model Climate Emulation at Scale
ArXiv.org · 2025-11-28
preprintOpen accessLarge ensembles of climate projections are essential for characterizing uncertainty in future climate and extreme weather events, yet computational constraints of numerical climate models limit ensemble sizes to a small number of realizations per model. We present a unified conditional diffusion model that dramatically reduces this computational barrier by learning shared distributional patterns across multiple Coupled Model Intercomparison Project phase 6 models and emission scenarios. Rather than training separate emulators for each model-scenario combination, our approach captures the common statistical structures underlying nine CMIP6 models, generating daily temperature maps with a global coverage for historical and future periods. This unified framework enables: (i) efficient probabilistic sampling for comprehensive uncertainty quantification across models and scenarios; (ii) rapid generation of large ensembles that would be computationally intractable with traditional climate models; (iii) variance-reduced treatment effect analysis via fixed-seed generation that disentangles forced climate responses from internal variability. Evaluations on held-out models demonstrate reliable generalization to unseen future climates, enabling rapid exploration of different emission pathways.
Bayesian Evaluation of Large Language Model Behavior
ArXiv.org · 2025-11-04
preprintOpen accessSenior authorIt is increasingly important to evaluate how text generation systems based on large language models (LLMs) behave, such as their tendency to produce harmful output or their sensitivity to adversarial inputs. Such evaluations often rely on a curated benchmark set of input prompts provided to the LLM, where the output for each prompt may be assessed in a binary fashion (e.g., harmful/non-harmful or does not leak/leaks sensitive information), and the aggregation of binary scores is used to evaluate the LLM. However, existing approaches to evaluation often neglect statistical uncertainty quantification. With an applied statistics audience in mind, we provide background on LLM text generation and evaluation, and then describe a Bayesian approach for quantifying uncertainty in binary evaluation metrics. We focus in particular on uncertainty that is induced by the probabilistic text generation strategies typically deployed in LLM-based systems. We present two case studies applying this approach: 1) evaluating refusal rates on a benchmark of adversarial inputs designed to elicit harmful responses, and 2) evaluating pairwise preferences of one LLM over another on a benchmark of open-ended interactive dialogue examples. We demonstrate how the Bayesian approach can provide useful uncertainty quantification about the behavior of LLM-based systems.
Bayesian Inference for Correlated Human Experts and Classifiers
ArXiv.org · 2025-06-05
preprintOpen accessSenior authorApplications of machine learning often involve making predictions based on both model outputs and the opinions of human experts. In this context, we investigate the problem of querying experts for class label predictions, using as few human queries as possible, and leveraging the class probability estimates of pre-trained classifiers. We develop a general Bayesian framework for this problem, modeling expert correlation via a joint latent representation, enabling simulation-based inference about the utility of additional expert queries, as well as inference of posterior distributions over unobserved expert labels. We apply our approach to two real-world medical classification problems, as well as to CIFAR-10H and ImageNet-16H, demonstrating substantial reductions relative to baselines in the cost of querying human experts while maintaining high prediction accuracy.
2025-03-15
preprintOpen accessUncertainty quantification is an important component of satellite-derived precipitation products, yet most current methodologies lack the ability to provide such estimates. Here we use a  generative diffusion model to produce probabilistic ensembles of precipitation intensity maps at the 1-hour 5-km resolution, conditional on infrared and microwave radiometric measurements from the GOES and DMSP satellites. The model is trained with merged ground radar and gauge data over the southeastern United States. We show that the generated precipitation maps accurately reproduce the magnitude and location of precipitation features, and the spatial autocovariance and higher order statistics of the gauge-radar reference fields over a range of scales.  The 128-member ensemble is evaluated to assess whether it provides an accurate estimate of the precipitation uncertainty. We show that on average, the spectral coherence between any two ensemble members is approximately the same as that between any ensemble member and the ground reference, attesting that the ensemble dispersion is a proper measure of the estimation uncertainty across a range of scales.  We also evaluate the ensemble in terms of reproducing the probability of exceedance of any desired intensity threshold, at the 5-km resolution of the generation up to 80-km aggregation scale and show impressive agreement. Finally, generalization of the model to “unseen domains” is pursued by applying the trained model to the Western US and the challenges and opportunities in this generalization will be discussed.
Improving Metacognition and Uncertainty Communication in Language Models
ArXiv.org · 2025-09-30
preprintOpen accessSenior authorLarge language models (LLMs) are increasingly used in decision-making contexts, but when they present answers without signaling low confidence, users may unknowingly act on erroneous outputs. Prior work shows that LLMs maintain internal uncertainty signals, yet their expressed confidence is often miscalibrated and poorly discriminates between correct and incorrect answers. We investigate whether supervised fine-tuning can improve models' ability to communicate uncertainty and whether such improvements generalize across tasks and domains. We fine-tune LLMs on datasets spanning general knowledge, mathematics, and open-ended trivia, and evaluate two metacognitive tasks: (1) single-question confidence estimation, where the model assigns a numeric certainty to its answer, and (2) pairwise confidence comparison, where the model selects which of two answers it is more likely to answer correctly. We assess generalization to unseen domains, including medical and legal reasoning. Results show that fine-tuning improves calibration (alignment between stated confidence and accuracy) and discrimination (higher confidence for correct vs. incorrect responses) within and across domains. However, gains are task-specific: training on single-question calibration does not transfer to pairwise comparison, and vice versa. Multitask fine-tuning yields broader gains, lowering calibration error and strengthening discrimination in out-of-domain evaluations. This suggests that uncertainty communication in LLMs is trainable but requires multitask training to generalize effectively.
Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete Space
arXiv (Cornell University) · 2025-10-14
preprintOpen accessSenior authorSystems of interacting continuous-time Markov chains are a powerful model class, but inference is typically intractable in high dimensional settings. Auxiliary information, such as noisy observations, is typically only available at discrete times, and incorporating it via a Doob's $h$-transform gives rise to an intractable posterior process that requires approximation. We introduce Latent Interacting Particle Systems, a model class parameterizing the generator of each Markov chain in the system. Our inference method involves estimating look-ahead functions (twist potentials) that anticipate future information, for which we introduce an efficient parameterization. We incorporate this approximation in a twisted Sequential Monte Carlo sampling scheme. We demonstrate the effectiveness of our approach on a challenging posterior inference task for a latent SIRS model on a graph, and on a neural model for wildfire spread dynamics trained on real data.
What large language models know and what people think they know
Nature Machine Intelligence · 2025-01-21 · 107 citations
articleOpen accessSenior authorAbstract As artificial intelligence systems, particularly large language models (LLMs), become increasingly integrated into decision-making processes, the ability to trust their outputs is crucial. To earn human trust, LLMs must be well calibrated such that they can accurately assess and communicate the likelihood of their predictions being correct. Whereas recent work has focused on LLMs’ internal confidence, less is understood about how effectively they convey uncertainty to users. Here we explore the calibration gap, which refers to the difference between human confidence in LLM-generated answers and the models’ actual confidence, and the discrimination gap, which reflects how well humans and models can distinguish between correct and incorrect answers. Our experiments with multiple-choice and short-answer questions reveal that users tend to overestimate the accuracy of LLM responses when provided with default explanations. Moreover, longer explanations increased user confidence, even when the extra length did not improve answer accuracy. By adjusting LLM explanations to better reflect the models’ internal confidence, both the calibration gap and the discrimination gap narrowed, significantly improving user perception of LLM accuracy. These findings underscore the importance of accurate uncertainty communication and highlight the effect of explanation length in influencing user trust in artificial-intelligence-assisted decision-making environments.
JANET: Joint Adaptive predictioN-region Estimation for Time-series
Machine Learning · 2025-06-23 · 1 citations
articleOpen accessAbstract Conformal prediction provides machine learning models with prediction sets that offer theoretical guarantees, but the underlying assumption of exchangeability limits its applicability to time series data. Furthermore, existing approaches struggle to handle multi-step ahead prediction tasks, where uncertainty estimates across multiple future time points are crucial. We propose JANET ( J oint A daptive predictio N -region E stimation for T ime-series), a novel framework for constructing conformal prediction regions that are valid for both univariate and multivariate time series. JANET generalises the inductive conformal framework and efficiently produces joint prediction regions with controlled K -familywise error rates, enabling flexible adaptation to specific application needs. Our empirical evaluation demonstrates JANET’s superior performance in multi-step prediction tasks across diverse time series datasets, highlighting its potential for reliable and interpretable uncertainty quantification in sequential data.
Recent grants
RI: Medium: Assessment of Machine Learning Algorithms in the Wild
NSF · $1.2M · 2019–2025
NIH · $1.3M · 2021–2026
Data Mining of Digital Behaviour
NSF · $2.2M · 2001–2010
Statistical Data Mining of Time-Dependent Data with Applications in Geoscience and Biology
NSF · $567k · 2004–2009
NRT-DESE: Team Science for Integrative Graduate Training in Data Science and Physical Science
NSF · $3.0M · 2016–2021
Frequent coauthors
- 200 shared
Yang Chen
Nanyang Technological University
- 35 shared
Mark Steyvers
- 35 shared
Efi Foufoula‐Georgiou
Irvine University
- 31 shared
James T. Randerson
University of California, Irvine
- 25 shared
Usama M. Fayyad
Northeastern University
- 24 shared
R.M. Goodman
- 22 shared
Alexander Ihler
University of California, Irvine
- 21 shared
Heikki Mannila
Aalto University
Education
- 1988
PhD, Electrical Engineering
California Institute of Technology
- 1984
Bachelor of Engineering, Electronic Engineering
National University of Ireland (NUIG)
Awards & honors
- Hasso Plattner Endowed Chair in Artificial Intelligence
- 2023 INNS Dennis Gabor Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Padhraic Smyth
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup