Aarti Singh

· ProfessorVerified

Carnegie Mellon University · Machine Learning Department

Active 2006–2025

h-index31

Citations3.3k

Papers21226 last 5y

Funding$1.9M

Faculty page Lab page

See your match with Aarti Singh — sign in to PhdFit.Sign in

About

Aarti Singh is the FORE Systems Professor in the Machine Learning Department at Carnegie Mellon University and serves as the Director of the NSF AI Institute for Societal Decision Making. Her research focuses on developing principled algorithms for collecting and analyzing large-scale, heterogeneous, and potentially corrupted data across scientific and social disciplines. She emphasizes creating statistically and computationally efficient interactive machine learning algorithms that facilitate higher-level decision making, aiming to push the boundaries of scientific and social discoveries through autonomous and human-in-the-loop settings. Her group investigates theory and methods for feedback-driven learning, including active sampling, stochastic optimization, bandits, and reinforcement learning, with applications in scientific fields such as material science and cosmology. Additionally, she explores human factors in decision making, designing algorithms that model and leverage human feedback, biases, and memory effects, with applications in peer review and preference modeling. Her work addresses challenges in human-centered AI, including understanding and integrating human feedback into decision-making algorithms, and she has contributed to the theory of deep learning and data structure leveraging.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Computer Science
Mathematical optimization
Mathematics
Political Science
Combinatorics
Applied mathematics
Statistics
Medical education
Psychology
Mathematical analysis
Medicine
Social psychology
Algorithm

Selected publications

Optimistic Algorithms for Adaptive Estimation of the Average Treatment Effect
ArXiv.org · 2025-02-07
preprintOpen accessSenior author
Estimation and inference for the Average Treatment Effect (ATE) is a cornerstone of causal inference and often serves as the foundation for developing procedures for more complicated settings. Although traditionally analyzed in a batch setting, recent advances in martingale theory have paved the way for adaptive methods that can enhance the power of downstream inference. Despite these advances, progress in understanding and developing adaptive algorithms remains in its early stages. Existing work either focus on asymptotic analyses that overlook exploration-exploitation tradeoffs relevant in finite-sample regimes or rely on simpler but suboptimal estimators. In this work, we address these limitations by studying adaptive sampling procedures that take advantage of the asymptotically optimal Augmented Inverse Probability Weighting (AIPW) estimator. Our analysis uncovers challenges obscured by asymptotic approaches and introduces a novel algorithmic design principle reminiscent of optimism in multiarmed bandits. This principled approach enables our algorithm to achieve significant theoretical and empirical gains compared to prior methods. Our findings mark a step forward in advancing adaptive causal inference methods in theory and practice.
Publisher OA PDF DOI
Occupational Mediation of Intergenerational Income Mobility
SSRN Electronic Journal · 2025-01-01
preprintOpen access
Publisher DOI
Occupational Mediation of Intergenerational Income Mobility1
SSRN Electronic Journal · 2025-01-01
preprintOpen access
Publisher DOI
Optimal Macroeconomic Policies in a Heterogeneous World
IMF Economic Review · 2024-09-01 · 1 citations
article
Publisher DOI
The Importance of Online Data: Understanding Preference Fine-tuning via Coverage
arXiv (Cornell University) · 2024-06-03
preprintOpen access
Learning from human preference data has emerged as the dominant paradigm for fine-tuning large language models (LLMs). The two most common families of techniques -- online reinforcement learning (RL) such as Proximal Policy Optimization (PPO) and offline contrastive methods such as Direct Preference Optimization (DPO) -- were positioned as equivalent in prior work due to the fact that both have to start from the same offline preference dataset. To further expand our theoretical understanding of the similarities and differences between online and offline techniques for preference fine-tuning, we conduct a rigorous analysis through the lens of dataset coverage, a concept that captures how the training data covers the test distribution and is widely used in RL. We prove that a global coverage condition is both necessary and sufficient for offline contrastive methods to converge to the optimal policy, but a weaker partial coverage condition suffices for online RL methods. This separation provides one explanation of why online RL methods can perform better than offline methods, especially when the offline preference data is not diverse enough. Finally, motivated by our preceding theoretical observations, we derive a hybrid preference optimization (HyPO) algorithm that uses offline data for contrastive-based preference optimization and online data for KL regularization. Theoretically and empirically, we demonstrate that HyPO is more performant than its pure offline counterpart DPO, while still preserving its computation and memory efficiency.
Publisher OA PDF DOI
Specifying and Solving Robust Empirical Risk Minimization Problems Using CVXPY
Journal of Optimization Theory and Applications · 2024-08-04 · 1 citations
article
Publisher DOI
Data-driven Design of Randomized Control Trials with Guaranteed Treatment Effects
arXiv (Cornell University) · 2024-10-15
preprintOpen access
Randomized controlled trials (RCTs) can be used to generate guarantees on treatment effects. However, RCTs often spend unnecessary resources exploring sub-optimal treatments, which can reduce the power of treatment guarantees. To address these concerns, we develop a two-stage RCT where, first on a data-driven screening stage, we prune low-impact treatments, while in the second stage, we develop high probability lower bounds on the treatment effect. Unlike existing adaptive RCT frameworks, our method is simple enough to be implemented in scenarios with limited adaptivity. We derive optimal designs for two-stage RCTs and demonstrate how we can implement such designs through sample splitting. Empirically, we demonstrate that two-stage designs improve upon single-stage approaches, especially in scenarios where domain knowledge is available in the form of a prior. Our work is thus, a simple, yet effective, method to estimate high probablility certificates for high performant treatment effects on a RCT.
Publisher OA PDF DOI
The Importance of Online Data: Understanding Preference Fine-tuning via Coverage
2024-01-01
article
Publisher DOI
Adaptation to Misspecified Kernel Regularity in Kernelised Bandits
arXiv (Cornell University) · 2023-04-26
preprintOpen accessSenior author
In continuum-armed bandit problems where the underlying function resides in a reproducing kernel Hilbert space (RKHS), namely, the kernelised bandit problems, an important open problem remains of how well learning algorithms can adapt if the regularity of the associated kernel function is unknown. In this work, we study adaptivity to the regularity of translation-invariant kernels, which is characterized by the decay rate of the Fourier transformation of the kernel, in the bandit setting. We derive an adaptivity lower bound, proving that it is impossible to simultaneously achieve optimal cumulative regret in a pair of RKHSs with different regularities. To verify the tightness of this lower bound, we show that an existing bandit model selection algorithm applied with minimax non-adaptive kernelised bandit algorithms matches the lower bound in dependence of $T$, the total number of steps, except for log factors. By filling in the regret bounds for adaptivity between RKHSs, we connect the statistical difficulty for adaptivity in continuum-armed bandits in three fundamental types of function spaces: RKHS, Sobolev space, and Hölder space.
Publisher OA PDF DOI
Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality
arXiv (Cornell University) · 2023-05-04
preprintOpen accessSenior author
In recommender system or crowdsourcing applications of online learning, a human's preferences or abilities are often a function of the algorithm's recent actions. Motivated by this, a significant line of work has formalized settings where an action's loss is a function of the number of times that action was recently played in the prior $m$ timesteps, where $m$ corresponds to a bound on human memory capacity. To more faithfully capture decay of human memory with time, we introduce the Weighted Tallying Bandit (WTB), which generalizes this setting by requiring that an action's loss is a function of a \emph{weighted} summation of the number of times that arm was played in the last $m$ timesteps. This WTB setting is intractable without further assumption. So we study it under Repeated Exposure Optimality (REO), a condition motivated by the literature on human physiology, which requires the existence of an action that when repetitively played will eventually yield smaller loss than any other sequence of actions. We study the minimization of the complete policy regret (CPR), which is the strongest notion of regret, in WTB under REO. Since $m$ is typically unknown, we assume we only have access to an upper bound $M$ on $m$. We show that for problems with $K$ actions and horizon $T$, a simple modification of the successive elimination algorithm has $O \left( \sqrt{KT} + (m+M)K \right)$ CPR. Interestingly, upto an additive (in lieu of mutliplicative) factor in $(m+M)K$, this recovers the classical guarantee for the simpler stochastic multi-armed bandit with traditional regret. We additionally show that in our setting, any algorithm will suffer additive CPR of $Ω\left( mK + M \right)$, demonstrating our result is nearly optimal. Our algorithm is computationally efficient, and we experimentally demonstrate its practicality and superiority over natural baselines.
Publisher OA PDF DOI

Recent grants

III: Small: Spectral Methods for Active Clustering and Bi-Clustering
NSF · $389k · 2011–2014
CAREER: Distilling information structure from big and dirty data: Efficient learning of clusters and graphs in modern datasets
NSF · $500k · 2013–2018
BIGDATA: Mid-Scale: DA: Distribution-based machine learning for high dimensional datasets
NSF · $1.0M · 2013–2016

Frequent coauthors

Neri Merhav
Technion – Israel Institute of Technology
216 shared
Tom Richardson
Rutgers, The State University of New Jersey
216 shared
Andrew Communications
University of Toronto
216 shared
Alexander Barg
216 shared
Neelam Khinvasara
Institute of Electrical and Electronics Engineers
216 shared
Stephen Welby
Institute of Electrical and Electronics Engineers
216 shared
Shannon Theory
University of Toronto
216 shared
Jean‐François Chamberland
216 shared

Labs

Aarti Singh's LabPI
Not provided

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Aarti Singh

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you