
Sivaraman Balakrishnan
· ProfessorVerifiedCarnegie Mellon University · Machine Learning Department
Active 2002–2026
About
Sivaraman Balakrishnan is a Professor with a joint appointment in the Department of Statistics and Data Science and in the Machine Learning Department at Carnegie Mellon University. His research interests are broadly in statistical machine learning and algorithmic statistics, spanning areas such as statistics, optimization, machine learning, and information theory. His recent research topics include Robust Statistics and Domain Adaptation, Minimax Hypothesis Testing, Assumption-Light Inference, Causal Inference, Statistical Optimal Transport, Non-Parametric Statistics, Ranking, Crowdsourcing, Learning from Comparison Data, Convex and Non-Convex Optimization, Clustering, and Topological Data Analysis. Prior to re-joining CMU, he was a postdoctoral researcher in the Department of Statistics at UC Berkeley, working under Martin Wainwright and Bin Yu. He earned his Ph.D. in Computer Science from Carnegie Mellon University’s Language Technologies Institute, where he worked with Jaime Carbonell. Sivaraman Balakrishnan has also spent time as a long-term visitor at the Simons Institute in UC Berkeley and on sabbatical in the Department of Statistics at UC Berkeley. He serves as an Associate Editor for JASA and JRSSB, and has been involved with editorial boards of other prominent journals. His research has been supported by awards from Amazon, Google, and the NSF.
Research topics
- Mathematics
- Statistics
- Mathematical analysis
- Computer Science
- Algorithm
- Mathematical optimization
- Data Mining
- Discrete mathematics
- Combinatorics
- Applied mathematics
- Mathematical economics
- Econometrics
Selected publications
Double cross-fit doubly robust estimators: Beyond series regression
Journal of the Royal Statistical Society Series B (Statistical Methodology) · 2026-03-28
articleAbstract Double cross-fit doubly robust (DCDR) estimators, which train nuisance function estimators on separate samples, are effective new estimators for causal functionals. We establish several novel theoretical results for them, building on recent work. We provide a structure-agnostic error analysis, which holds with generic nuisance functions and estimators. Then, we propose n-consistent DCDR estimators with undersmoothed local polynomial regression and k-Nearest Neighbours and a minimax rate-optimal DCDR estimator with undersmoothed kernel regression. Finally, we demonstrate inference is possible even in the non-root-n regime with a central limit theorem for an undersmoothed DCDR estimator. We reinforce our theoretical results with simulation experiments.
Statistical Inference for Optimal Transport Maps: Recent Advances and Perspectives
ArXiv.org · 2025-06-23
preprintOpen access1st authorCorrespondingIn many applications of optimal transport (OT), the object of primary interest is the optimal transport map. This map rearranges mass from one probability distribution to another in the most efficient way possible by minimizing a specified cost. In this paper we review recent advances in estimating and developing limit theorems for the OT map, using samples from the underlying distributions. We also review parallel lines of work that establish similar results for special cases and variants of the basic OT setup. We conclude with a discussion of key directions for future research with the goal of providing practitioners with reliable inferential tools.
Stability Bounds for Smooth Optimal Transport Maps and their Statistical Implications
arXiv (Cornell University) · 2025-02-17
preprintOpen access1st authorCorrespondingWe study estimators of the optimal transport (OT) map between two probability distributions. We focus on plugin estimators derived from the OT map between estimates of the underlying distributions. We develop novel stability bounds for OT maps which generalize those in past work, and allow us to reduce the problem of optimally estimating the transport map to that of optimally estimating densities in the Wasserstein distance. In contrast, past work provided a partial connection between these problems and relied on regularity theory for the Monge-Ampere equation to bridge the gap, a step which required unnatural assumptions to obtain sharp guarantees. We also provide some new insights into the connections between stability bounds which arise in the analysis of plugin estimators and growth bounds for the semi-dual functional which arise in the analysis of Brenier potential-based estimators of the transport map. We illustrate the applicability of our new stability bounds by revisiting the smooth setting studied by Manole et al., analyzing two of their estimators under more general conditions. Critically, our bounds do not require smoothness or boundedness assumptions on the underlying measures. As an illustrative application, we develop and analyze a novel tuning parameter-free estimator for the OT map between two strongly log-concave distributions.
ArXiv.org · 2025-10-23
preprintOpen accessSenior authorMany scientific applications involve testing theories that are only partially specified. This task often amounts to testing the goodness-of-fit of a candidate distribution while allowing for reasonable deviations from it. The tolerant testing framework provides a systematic way of constructing such tests. Rather than testing the simple null hypothesis that data was drawn from a candidate distribution, a tolerant test assesses whether the data is consistent with any distribution that lies within a given neighborhood of the candidate. As this neighborhood grows, the tolerance to misspecification increases, while the power of the test decreases. In this work, we characterize the information-theoretic trade-off between the size of the neighborhood and the power of the test, in several canonical models. On the one hand, we characterize the optimal trade-off for tolerant testing in the Gaussian sequence model, under deviations measured in both smooth and non-smooth norms. On the other hand, we study nonparametric analogues of this problem in smooth regression and density models. Along the way, we establish the sub-optimality of the classical chi-squared statistic for tolerant testing, and study simple alternative hypothesis tests.
Robust universal inference for misspecified models
Biometrika · 2025-11-12 · 1 citations
articleOpen accessSummary In statistical inference, it is rarely realistic to assume that the hypothesized statistical model is well specified; consequently, it is important to understand the effects of misspecification on inferential procedures. When the hypothesized statistical model is misspecified, the natural target of inference is a projection of the data-generating distribution onto the model. We present a general method for constructing valid confidence sets for such projections, under weak regularity conditions, despite possible model misspecification. Our method builds upon the universal inference method and is based on inverting a family of split-sample tests of relative fit. We study settings in which our method yields either exact or approximate, finite-sample valid confidence sets for various projection distributions. We examine the rates at which the resulting confidence sets shrink around their target of inference and complement these results with a simulation study and a study of causal discovery using a linear causal model with the CausalEffectPairs dataset.
Conservative inference for counterfactuals
Journal of Causal Inference · 2025-01-01 · 2 citations
articleOpen access1st authorCorrespondingAbstract In causal inference, the joint law of a set of counterfactual random variables is generally not identified. But many interesting quantities are functions of the joint distribution. For example, the individual treatment effect is a difference of counterfactuals and any functional of this difference such as the variance, the quantiles and density, all depend on this joint distribution. For binary treatments, many researchers have found identifiable bounds on these quantities. We extend this idea to continuous treatments. We show that a conservative version of the joint law – corresponding to the smallest treatment effect – is identified. The notion of “conservative” depends on how we choose to measure the causal effect and we consider a few such measures. Finding this law uses recent results from optimal transport theory. Under this conservative law we can bound causal effects and we may construct inferences for each individual’s counterfactual dose-response curve. Intuitively, this is the flattest counterfactual curve for each subject that is consistent with the distribution of the observables. If the outcome is univariate then, under mild conditions, this curve is simply the quantile function of the counterfactual distribution that passes through the observed point. This curve corresponds to a nonparametric rank preserving structural model.
The Annals of Statistics · 2025-12-01
articlePanprediction: Optimal Predictions for Any Downstream Task and Loss
ArXiv.org · 2025-10-31
preprintOpen access1st authorCorrespondingSupervised learning is classically formulated as training a model to minimize a fixed loss function over a fixed distribution, or task. However, an emerging paradigm instead views model training as extracting enough information from data so that the model can be used to minimize many losses on many downstream tasks. We formalize a mathematical framework for this paradigm, which we call panprediction, and study its statistical complexity. Formally, panprediction generalizes omniprediction and sits upstream from multi-group learning, which respectively focus on predictions that generalize to many downstream losses or many downstream tasks, but not both. Concretely, we design algorithms that learn deterministic and randomized panpredictors with $\tilde{O}(1/\varepsilon^3)$ and $\tilde{O}(1/\varepsilon^2)$ samples, respectively. Our results demonstrate that under mild assumptions, simultaneously minimizing infinitely many losses on infinitely many tasks can be as statistically easy as minimizing one loss on one task. Along the way, we improve the best known sample complexity guarantee of deterministic omniprediction by a factor of $1/\varepsilon$, and match all other known sample complexity guarantees of omniprediction and multi-group learning. Our key technical ingredient is a nearly lossless reduction from panprediction to a statistically efficient notion of calibration, called step calibration.
Nearly minimax optimal Wasserstein conditional independence testing
Information and Inference A Journal of the IMA · 2024-09-20
articleSenior authorAbstract This paper is concerned with minimax conditional independence testing. In contrast to some previous works on the topic, which use the total variation distance to separate the null from the alternative, here we use the Wasserstein distance. In addition, we impose Wasserstein smoothness conditions that on bounded domains are weaker than the corresponding total variation smoothness imposed, for instance, by Neykov et al. (2021, Ann. Statist., 49, 2151–2177). This added flexibility expands the distributions that are allowed under the null and the alternative to include distributions that may contain point masses for instance. We characterize the optimal rate of the critical radius of testing up to logarithmic factors. Our test statistic that nearly achieves the optimal critical radius is novel, and can be thought of as a weighted multi-resolution version of the $U$-statistic studied by Neykov et al. (2021, Ann. Statist., 49, 2151–2177).
Plugin estimation of smooth optimal transport maps
The Annals of Statistics · 2024-06-01 · 31 citations
articleWe analyze a number of natural estimators for the optimal transport map between two distributions and show that they are minimax optimal. We adopt the plugin approach: our estimators are simply optimal couplings between measures derived from our observations, appropriately extended so that they define functions on Rd. When the underlying map is assumed to be Lipschitz, we show that computing the optimal coupling between the empirical measures, and extending it using linear smoothers, already gives a minimax optimal estimator. When the underlying map enjoys higher regularity, we show that the optimal coupling between appropriate nonparametric density estimates yields faster rates. Our work also provides new bounds on the risk of corresponding plugin estimators for the quadratic Wasserstein distance, and we show how this problem relates to that of estimating optimal transport maps using stability arguments for smooth and strongly convex Brenier potentials. As an application of our results, we derive central limit theorems for plugin estimators of the squared Wasserstein distance, which are centered at their population counterpart when the underlying distributions have sufficiently smooth densities. In contrast to known central limit theorems for empirical estimators, this result easily lends itself to statistical inference for the quadratic Wasserstein distance.
Recent grants
High-dimensional Clustering: Theory and Methods
NSF · $380k · 2017–2021
Foundations of High-Dimensional and Nonparametric Hypothesis Testing
NSF · $250k · 2021–2024
Frequent coauthors
- 42 shared
Larry Wasserman
Carnegie Mellon University
- 41 shared
Aarti Singh
- 25 shared
Martin J. Wainwright
Massachusetts Institute of Technology
- 20 shared
Nihar B. Shah
- 19 shared
Zachary C. Lipton
- 18 shared
Alessandro Rinaldo
- 18 shared
Edward H. Kennedy
- 17 shared
Saurabh Garg
University of Tasmania
Education
Ph.D., Computer Science
Carnegie Mellon
Other
Department of Statistics, UC Berkeley
Awards & honors
- Amazon Research Award (2021)
- Google Research Scholar Award (2021)
- NSF (CCF-1763734, DMS-1713003, DMS-2113684 and DMS-2310632)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Sivaraman Balakrishnan
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup