
Matias D. Cattaneo
· Professor of Operations Research and Financial EngineeringVerifiedPrinceton University · Philosophy
Active 2007–2026
About
Matias D. Cattaneo is a Professor of Operations Research and Financial Engineering at Princeton University and an Amazon Scholar. He studies the mathematical foundations of data science at the intersection of econometrics, statistics, machine learning, and artificial intelligence. His research develops statistical and computational methods for the social, behavioral, and biomedical sciences, with emphasis on program evaluation and causal inference. Matias was awarded a 2026 Guggenheim Fellowship in Data Science. He is an elected Member of the International Statistical Institute and an elected Fellow of the American Statistical Association, the Institute of Mathematical Statistics, and the International Association for Applied Econometrics. His research has also been recognized through multiple paper awards, journal distinctions, invited lectures, and highly cited publications. Matias earned a Ph.D. in Economics and an M.A. in Statistics from the University of California, Berkeley, a Master in Economics from Universidad Torcuato Di Tella, and a Licentiate in Economics from Universidad de Buenos Aires. Originally from Buenos Aires, Argentina, he is married to Rocio Titiunik, and they have two daughters, Lucero (Lulu) and Maite.
Research topics
- Political Science
- Statistics
- Applied mathematics
- Mathematics
Selected publications
The Effect of Mini-Batch Noise on the Implicit Bias of Adam
ArXiv.org · 2026-02-02
articleOpen access1st authorCorrespondingWith limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for many tasks such as next token prediction, has two momentum hyperparameters $(β_1, β_2)$ controlling memory and one very important hyperparameter, batch size, controlling (in particular) the amount mini-batch noise. We introduce a theoretical framework to understand how mini-batch noise influences the implicit bias of memory in Adam (depending on $β_1$, $β_2$) towards sharper or flatter regions of the loss landscape, which is commonly observed to correlate with the generalization gap in multi-epoch training. We find that in the case of large batch sizes, higher $β_2$ increases the magnitude of anti-regularization by memory (hurting generalization), but as the batch size becomes smaller, the dependence of (anti-)regulariation on $β_2$ is reversed. A similar monotonicity shift (in the opposite direction) happens in $β_1$. In particular, the commonly "default" pair $(β_1, β_2) = (0.9, 0.999)$ is a good choice if batches are small; for larger batches, in many settings moving $β_1$ closer to $β_2$ is much better in terms of validation accuracy in multi-epoch training. Moreover, our theoretical derivations connect the scale of the batch size at which the shift happens to the scale of the critical batch size. We illustrate this effect in experiments with small-scale data in the about-to-overfit regime.
The Effect of Mini-Batch Noise on the Implicit Bias of Adam
Open MIND · 2026-02-02
preprint1st authorCorrespondingWith limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for many tasks such as next token prediction, has two momentum hyperparameters $(β_1, β_2)$ controlling memory and one very important hyperparameter, batch size, controlling (in particular) the amount mini-batch noise. We introduce a theoretical framework to understand how mini-batch noise influences the implicit bias of memory in Adam (depending on $β_1$, $β_2$) towards sharper or flatter regions of the loss landscape, which is commonly observed to correlate with the generalization gap in multi-epoch training. We find that in the case of large batch sizes, higher $β_2$ increases the magnitude of anti-regularization by memory (hurting generalization), but as the batch size becomes smaller, the dependence of (anti-)regulariation on $β_2$ is reversed. A similar monotonicity shift (in the opposite direction) happens in $β_1$. In particular, the commonly "default" pair $(β_1, β_2) = (0.9, 0.999)$ is a good choice if batches are small; for larger batches, in many settings moving $β_1$ closer to $β_2$ is much better in terms of validation accuracy in multi-epoch training. Moreover, our theoretical derivations connect the scale of the batch size at which the shift happens to the scale of the critical batch size. We illustrate this effect in experiments with small-scale data in the about-to-overfit regime.
Sharp anti-concentration inequalities for extremum statistics via copulas
Bernoulli · 2026-04-29
preprintOpen access1st authorCorrespondingWe derive sharp upper and lower bounds for the pointwise concentration function of the maximum statistic of $d$ identically distributed real-valued random variables. Our first main result places no restrictions either on the common marginal law of the samples or on the copula describing their joint distribution. We show that, in general, strictly sublinear dependence of the concentration function on the dimension $d$ is not possible. We then introduce a new class of copulas, namely those with a convex diagonal section, and demonstrate that restricting to this class yields a sharper upper bound on the concentration function. This allows us to establish several new dimension-independent and poly-logarithmic-in-$d$ anti-concentration inequalities for a variety of marginal distributions under mild dependence assumptions. Our theory improves upon the best known results in certain special cases. Applications to high-dimensional statistical inference are presented, including a specific example pertaining to Gaussian mixture approximations for factor models, for which our main results lead to superior distributional guarantees.
Inference with Mondrian random forests
Journal of the Royal Statistical Society Series B (Statistical Methodology) · 2025-11-20 · 1 citations
article1st authorCorrespondingAbstract Random forests are popular methods for regression and classification analysis, and many different variants have been proposed in recent years. One interesting example is the Mondrian random forest, in which the underlying constituent trees are constructed via a Mondrian process. We give precise bias and variance characterizations, along with a Berry–Esseen-type central limit theorem, for the Mondrian random forest regression estimator. By combining these results with a carefully crafted debiasing approach and an accurate variance estimator, we present valid statistical inference methods for the unknown regression function. These methods come with explicit error bounds in terms of the sample size, tree complexity parameter, and number of trees in the forest, and include coverage error rates for feasible confidence interval estimators. Our debiasing procedure for the Mondrian random forest also allows it to achieve the minimax-optimal point estimation convergence rate in mean squared error for multivariate β-Hölder regression functions, for all β>0 , provided that the underlying tuning parameters are chosen appropriately. Efficient and implementable algorithms are devised for both batch and online learning settings, and we study the computational complexity of different Mondrian random forest implementations. Finally, simulations with synthetic data validate our theory and methodology, demonstrating their excellent finite-sample properties.
Boundary Discontinuity Designs: Theory and Practice
arXiv (Cornell University) · 2025-11-09
preprintOpen access1st authorCorrespondingThe boundary discontinuity (BD) design is a non-experimental method for identifying causal effects that exploits a thresholding rule based on a bivariate score and a boundary curve. This widely used method generalizes the univariate regression discontinuity design but introduces unique challenges arising from its multidimensional nature. We synthesize over 80 empirical papers that use the BD design, tracing the method's application from its formative stages to its implementation in modern research. We also overview ongoing theoretical and methodological research on identification, estimation, and inference for BD designs employing local polynomial regression, and offer recommendations for practice.
The Honest Truth About Causal Trees: Accuracy Limits for Heterogeneous Treatment Effect Estimation
ArXiv.org · 2025-09-14
preprintOpen access1st authorCorrespondingRecursive decision trees are widely used to estimate heterogeneous causal treatment effects in experimental and observational studies. These methods are typically implemented using CART-type recursive partitioning and are often viewed as adaptive procedures capable of discovering treatment effect heterogeneity in high-dimensional settings. We study causal tree estimators based on adaptive recursive partitioning and establish lower bounds on their estimation accuracy. Under basic conditions, we show that causal trees constructed via standard CART-type splitting rules cannot achieve polynomial-in-$n$ convergence rates in the uniform norm (where $n$ denotes the sample size). The underlying mechanism is that greedy recursive partitioning selects highly imbalanced splits with non-vanishing probability, producing terminal nodes containing very few observations and leading to large estimation variance. We further show that sample splitting (``honesty'') yields at most negligible improvements in convergence rates. As a consequence, causal tree estimators may converge arbitrarily slowly and can even be inconsistent in some settings. Our results also clarify the role of balanced partition assumptions in existing theoretical guarantees for causal forests and related ensemble methods. The analysis develops new probabilistic tools for studying adaptive recursive partitioning procedures, including non-asymptotic approximations for suprema of partial sums and Gaussian processes. As a technical by-product, we also identify and correct an error in Eicker (1979).
Treatment Effect Heterogeneity in Regression Discontinuity Designs
ArXiv.org · 2025-03-17
preprintOpen accessEmpirical studies using Regression Discontinuity (RD) designs often explore heterogeneous treatment effects based on pretreatment covariates. However, the lack of formal statistical methods has led to the widespread use of ad hoc approaches in applications. Motivated by common empirical practice, we develop a unified, theoretically grounded framework for RD heterogeneity analysis. We show that a fully interacted local linear (in functional parameters) model effectively captures heterogeneity while still being tractable and interpretable in applications. The model structure holds without loss of generality for discrete covariates, while for continuous covariates our proposed (local functional linear-in-parameters) model can be potentially restrictive, but it nonetheless naturally matches standard empirical practice and offers a causal interpretation for RD applications. We establish principled bandwidth selection and robust bias-corrected inference methods to analyze heterogeneous treatment effects and test group differences. We provide companion software to facilitate implementation of our results. An empirical application illustrates the practical relevance of our methods.
rd2d: Estimation and Inference for Boundary Discontinuity Designs
2025-05-14
datasetOpen access1st authorCorrespondingProvides estimation and inference procedures for boundary regression discontinuity (RD) designs using local polynomial methods, based on either bivariate coordinates or distance-based approaches. Methods are developed in Cattaneo, Titiunik, and Yu (2025) <<a href="https://mdcattaneo.github.io/papers/Cattaneo-Titiunik-Yu_2025_BoundaryRD.pdf" target="_top">https://mdcattaneo.github.io/papers/Cattaneo-Titiunik-Yu_2025_BoundaryRD.pdf</a>>.
Yurinskii’s coupling for martingales
The Annals of Statistics · 2025-10-01 · 1 citations
article1st authorCorrespondingYurinskii’s coupling is a popular theoretical tool for nonasymptotic distributional analysis in mathematical statistics and applied probability, offering a Gaussian strong approximation with an explicit error bound under easily verifiable conditions. Originally stated in ℓ2-norm for sums of independent random vectors, it has recently been extended both to the ℓp-norm, for 1≤p≤∞, and to vector-valued martingales in ℓ2-norm, under some strong conditions. We present as our main result a Yurinskii coupling for approximate martingales in ℓp-norm, under substantially weaker conditions than those previously imposed. Our formulation further allows for the coupling variable to follow a more general Gaussian mixture distribution, and we provide a novel third-order coupling method, which gives tighter approximations in certain settings. We specialize our main result to mixingales, martingales, and independent data, and derive uniform Gaussian mixture strong approximations for martingale empirical processes. Applications to nonparametric partitioning-based and local polynomial regression procedures are provided, alongside central limit theorems for high-dimensional martingale vectors.
lpcde:Estimation and Inference for Local Polynomial Conditional Density Estimators
2025-01-01
articleOpen access1st authorCorresponding
Recent grants
Conference: Statistical Foundations of Data Science and their Applications
NSF · $25k · 2023–2024
Statistical Methods for Ultrahigh-dimensional Biomedical Data
NIH · $293k · 2006–2023
Partitioning-Based Learning Methods for Treatment Effect Estimation and Inference
NSF · $453k · 2023–2026
Collaborative Research: Robust Inference for Kernel Smoothing and Related Problems
NSF · $285k · 2020–2023
Nonparametric Estimation and Inference with Network Data
NSF · $350k · 2022–2025
Frequent coauthors
- 154 shared
Michael Jansson
University of California, Berkeley
- 65 shared
Rocío Titiunik
- 47 shared
Max H. Farrell
- 43 shared
Richard K. Crump
Federal Reserve Bank of New York
- 41 shared
Sebastián Calónico
Columbia University
- 24 shared
Xinwei Ma
Pennsylvania State University
- 24 shared
Yingjie Feng
- 17 shared
Max Farrell
University of California, Berkeley
Labs
Research spans econometrics, statistics, machine learning, artificial intelligence, and causal inference.
Education
- 2008
Doctor of Philosophy, Economics
University of California, Berkeley
- 2005
Master of Arts, Statistics
University of California, Berkeley
- 2003
Master of Arts (Economics)
Universidad Torcuato Di Tella
- 2000
Licentiate in Economics
Universidad de Buenos Aires
Awards & honors
- 2026 Guggenheim Fellowship in Data Science
- Elected Member of the International Statistical Institute
- Fellow of the American Statistical Association
- Fellow of the Institute of Mathematical Statistics
- Fellow of the International Association for Applied Economet…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Matias D. Cattaneo
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup