
Aarti Singh
· ProfessorVerifiedCarnegie Mellon University · Machine Learning Department
Active 2006–2025
About
Aarti Singh is the FORE Systems Professor in the Machine Learning Department at Carnegie Mellon University and serves as the Director of the NSF AI Institute for Societal Decision Making. Her research focuses on developing principled algorithms for collecting and analyzing large-scale, heterogeneous, and potentially corrupted data across scientific and social disciplines. She emphasizes creating statistically and computationally efficient interactive machine learning algorithms that facilitate higher-level decision making, aiming to push the boundaries of scientific and social discoveries through autonomous and human-in-the-loop settings. Her group investigates theory and methods for feedback-driven learning, including active sampling, stochastic optimization, bandits, and reinforcement learning, with applications in scientific fields such as material science and cosmology. Additionally, she explores human factors in decision making, designing algorithms that model and leverage human feedback, biases, and memory effects, with applications in peer review and preference modeling. Her work addresses challenges in human-centered AI, including understanding and integrating human feedback into decision-making algorithms, and she has contributed to the theory of deep learning and data structure leveraging.
Research signals
Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.
Research topics
- Computer Science
- Mathematical optimization
- Mathematics
- Political Science
- Combinatorics
- Applied mathematics
- Statistics
- Medical education
- Psychology
- Mathematical analysis
- Medicine
- Social psychology
- Algorithm
Selected publications
Optimistic Algorithms for Adaptive Estimation of the Average Treatment Effect
ArXiv.org · 2025-02-07
preprintOpen accessSenior authorEstimation and inference for the Average Treatment Effect (ATE) is a cornerstone of causal inference and often serves as the foundation for developing procedures for more complicated settings. Although traditionally analyzed in a batch setting, recent advances in martingale theory have paved the way for adaptive methods that can enhance the power of downstream inference. Despite these advances, progress in understanding and developing adaptive algorithms remains in its early stages. Existing work either focus on asymptotic analyses that overlook exploration-exploitation tradeoffs relevant in finite-sample regimes or rely on simpler but suboptimal estimators. In this work, we address these limitations by studying adaptive sampling procedures that take advantage of the asymptotically optimal Augmented Inverse Probability Weighting (AIPW) estimator. Our analysis uncovers challenges obscured by asymptotic approaches and introduces a novel algorithmic design principle reminiscent of optimism in multiarmed bandits. This principled approach enables our algorithm to achieve significant theoretical and empirical gains compared to prior methods. Our findings mark a step forward in advancing adaptive causal inference methods in theory and practice.
Occupational Mediation of Intergenerational Income Mobility
SSRN Electronic Journal · 2025-01-01
preprintOpen accessOccupational Mediation of Intergenerational Income Mobility1
SSRN Electronic Journal · 2025-01-01
preprintOpen accessOptimal Macroeconomic Policies in a Heterogeneous World
IMF Economic Review · 2024-09-01 · 1 citations
articleThe Importance of Online Data: Understanding Preference Fine-tuning via Coverage
arXiv (Cornell University) · 2024-06-03
preprintOpen accessLearning from human preference data has emerged as the dominant paradigm for fine-tuning large language models (LLMs). The two most common families of techniques -- online reinforcement learning (RL) such as Proximal Policy Optimization (PPO) and offline contrastive methods such as Direct Preference Optimization (DPO) -- were positioned as equivalent in prior work due to the fact that both have to start from the same offline preference dataset. To further expand our theoretical understanding of the similarities and differences between online and offline techniques for preference fine-tuning, we conduct a rigorous analysis through the lens of dataset coverage, a concept that captures how the training data covers the test distribution and is widely used in RL. We prove that a global coverage condition is both necessary and sufficient for offline contrastive methods to converge to the optimal policy, but a weaker partial coverage condition suffices for online RL methods. This separation provides one explanation of why online RL methods can perform better than offline methods, especially when the offline preference data is not diverse enough. Finally, motivated by our preceding theoretical observations, we derive a hybrid preference optimization (HyPO) algorithm that uses offline data for contrastive-based preference optimization and online data for KL regularization. Theoretically and empirically, we demonstrate that HyPO is more performant than its pure offline counterpart DPO, while still preserving its computation and memory efficiency.
Specifying and Solving Robust Empirical Risk Minimization Problems Using CVXPY
Journal of Optimization Theory and Applications · 2024-08-04 · 1 citations
articleData-driven Design of Randomized Control Trials with Guaranteed Treatment Effects
arXiv (Cornell University) · 2024-10-15
preprintOpen accessRandomized controlled trials (RCTs) can be used to generate guarantees on treatment effects. However, RCTs often spend unnecessary resources exploring sub-optimal treatments, which can reduce the power of treatment guarantees. To address these concerns, we develop a two-stage RCT where, first on a data-driven screening stage, we prune low-impact treatments, while in the second stage, we develop high probability lower bounds on the treatment effect. Unlike existing adaptive RCT frameworks, our method is simple enough to be implemented in scenarios with limited adaptivity. We derive optimal designs for two-stage RCTs and demonstrate how we can implement such designs through sample splitting. Empirically, we demonstrate that two-stage designs improve upon single-stage approaches, especially in scenarios where domain knowledge is available in the form of a prior. Our work is thus, a simple, yet effective, method to estimate high probablility certificates for high performant treatment effects on a RCT.
The Importance of Online Data: Understanding Preference Fine-tuning via Coverage
2024-01-01
articleAdaptation to Misspecified Kernel Regularity in Kernelised Bandits
arXiv (Cornell University) · 2023-04-26
preprintOpen accessSenior authorIn continuum-armed bandit problems where the underlying function resides in a reproducing kernel Hilbert space (RKHS), namely, the kernelised bandit problems, an important open problem remains of how well learning algorithms can adapt if the regularity of the associated kernel function is unknown. In this work, we study adaptivity to the regularity of translation-invariant kernels, which is characterized by the decay rate of the Fourier transformation of the kernel, in the bandit setting. We derive an adaptivity lower bound, proving that it is impossible to simultaneously achieve optimal cumulative regret in a pair of RKHSs with different regularities. To verify the tightness of this lower bound, we show that an existing bandit model selection algorithm applied with minimax non-adaptive kernelised bandit algorithms matches the lower bound in dependence of $T$, the total number of steps, except for log factors. By filling in the regret bounds for adaptivity between RKHSs, we connect the statistical difficulty for adaptivity in continuum-armed bandits in three fundamental types of function spaces: RKHS, Sobolev space, and Hölder space.
Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality
arXiv (Cornell University) · 2023-05-04
preprintOpen accessSenior authorIn recommender system or crowdsourcing applications of online learning, a human's preferences or abilities are often a function of the algorithm's recent actions. Motivated by this, a significant line of work has formalized settings where an action's loss is a function of the number of times that action was recently played in the prior $m$ timesteps, where $m$ corresponds to a bound on human memory capacity. To more faithfully capture decay of human memory with time, we introduce the Weighted Tallying Bandit (WTB), which generalizes this setting by requiring that an action's loss is a function of a \emph{weighted} summation of the number of times that arm was played in the last $m$ timesteps. This WTB setting is intractable without further assumption. So we study it under Repeated Exposure Optimality (REO), a condition motivated by the literature on human physiology, which requires the existence of an action that when repetitively played will eventually yield smaller loss than any other sequence of actions. We study the minimization of the complete policy regret (CPR), which is the strongest notion of regret, in WTB under REO. Since $m$ is typically unknown, we assume we only have access to an upper bound $M$ on $m$. We show that for problems with $K$ actions and horizon $T$, a simple modification of the successive elimination algorithm has $O \left( \sqrt{KT} + (m+M)K \right)$ CPR. Interestingly, upto an additive (in lieu of mutliplicative) factor in $(m+M)K$, this recovers the classical guarantee for the simpler stochastic multi-armed bandit with traditional regret. We additionally show that in our setting, any algorithm will suffer additive CPR of $Ω\left( mK + M \right)$, demonstrating our result is nearly optimal. Our algorithm is computationally efficient, and we experimentally demonstrate its practicality and superiority over natural baselines.
Recent grants
III: Small: Spectral Methods for Active Clustering and Bi-Clustering
NSF · $389k · 2011–2014
NSF · $500k · 2013–2018
BIGDATA: Mid-Scale: DA: Distribution-based machine learning for high dimensional datasets
NSF · $1.0M · 2013–2016
Frequent coauthors
- 216 shared
Neri Merhav
Technion – Israel Institute of Technology
- 216 shared
Tom Richardson
Rutgers, The State University of New Jersey
- 216 shared
Andrew Communications
University of Toronto
- 216 shared
Alexander Barg
- 216 shared
Neelam Khinvasara
Institute of Electrical and Electronics Engineers
- 216 shared
Stephen Welby
Institute of Electrical and Electronics Engineers
- 216 shared
Shannon Theory
University of Toronto
- 216 shared
Jean‐François Chamberland
Labs
Not provided
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Aarti Singh
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup