Gérard Ben Arous

· Silver Professor of Mathematics; Director, Courant InstituteVerified

New York University · Computer Science and Engineering

Active 1983–2026

h-index53

Citations8.9k

Papers24736 last 5y

Funding$720k

Faculty page Website

See your match with Gérard Ben Arous — sign in to PhdFit.Sign in

About

Gérard Ben Arous is a Professor of Mathematics at the Courant Institute of Mathematical Sciences, New York University, where he arrived in 2002. He also serves as the Director of the Courant Institute and Vice Provost for Science and Engineering Development. A native of France, Professor Ben Arous studied Mathematics at École Normale Supérieure and earned his PhD from the University of Paris VII in 1981. His academic career includes positions at the University of Paris-Sud (Orsay), École Normale Supérieure, and the Swiss Federal Institute of Technology in Lausanne, where he held the Chair of Stochastic Modeling. He has also headed departments of Mathematics and Computer Science and founded the Bernoulli Center, a Mathematics research institute in Lausanne. He is the managing editor of the journal Probability Theory and Related Fields, alongside Amir Dembo of Stanford. His research focuses on probability theory and its applications, including stochastic analysis, large deviations, random media, and random matrices, as well as their connections to partial differential equations, dynamical systems, and physics, particularly statistical mechanics of disordered media. His main interests involve the time evolution of complex systems, the universal aspects of their long-term behavior, and the mechanisms of aging related to complexity and disorder. Recognized for his contributions, he is a Fellow of the Institute of Mathematical Statistics, an elected member of the International Statistical Institute, and has been a plenary speaker at the European Congress of Mathematics and an invited speaker at the International Congress of Mathematics. His accolades include the senior Lady Davis Fellowship, the Rollo Davison Prize, and the Montyon Prize from the French Academy of Sciences.

Research topics

Computer Science
Artificial Intelligence
Mathematics
Physics
Statistical physics
Mathematical analysis
Geometry
Algorithm
Quantum mechanics
Thermodynamics
Combinatorics
Mechanics
Pure mathematics
Applied mathematics
Statistics

Selected publications

Permutation Recovery of Spikes in Noisy High-Dimensional Tensor Estimation
2026-01-01
preprintOpen access1st authorCorresponding
Publisher OA PDF DOI
Spectral alignment of stochastic gradient descent for high-dimensional classification tasks
The Annals of Applied Probability · 2025-08-01
article1st authorCorresponding
We rigorously study the relation between the training dynamics via stochastic gradient descent (SGD) and the spectra of empirical Hessian and gradient matrices. We prove that in two canonical classification tasks for multiclass high-dimensional mixtures and either 1 or 2-layer neural networks, both the SGD trajectory and emergent outlier eigenspaces of the Hessian and gradient matrices align with a common low-dimensional subspace. Moreover, in multilayer settings this alignment occurs per layer, with the final layer’s outlier eigenspace evolving over the course of training, and exhibiting rank deficiency when the SGD converges to sub-optimal classifiers. This establishes some of the rich predictions that have arisen from extensive numerical studies in the last decade about the spectra of Hessian and information matrices over the course of training in overparametrized networks.
Publisher DOI
Scaling limit for the random walk on critical lattice trees
ArXiv.org · 2025-03-28
preprintOpen access1st authorCorresponding
We prove a scaling limit theorem for the simple random walk on critical lattice trees in $\mathbb{Z}^d$, for $d\geq 8$. The scaling limit is the Brownian motion on the Integrated Super-Brownian Excursion (BISE) which is the same one that we have identified earlier for other simpler models of anomalous diffusion on critical graphs in large enough dimension. The proof of this theorem is based on a combination of the tools of lace-expansion (contained in the articles \cite{CFHP} and \cite{CFHP2}), and a new and general convergence theorem.
Publisher OA PDF DOI
Local geometry of high-dimensional mixture models: Effective spectral theory and dynamical transitions
ArXiv.org · 2025-02-21
preprintOpen access1st authorCorresponding
We study the local geometry of empirical risks in high dimensions via the spectral theory of their Hessian and information matrices. We focus on settings where the data, $(Y_\ell)_{\ell =1}^n \in \mathbb{R}^d$, are i.i.d. draws of a $k$-Gaussian mixture model, and the loss depends on the projection of the data into a fixed number of vectors, namely $\mathbf{x}^\top Y$, where $\mathbf{x}\in \mathbb{R}^{d\times C}$ are the parameters, and $C$ need not equal $k$. This setting captures a broad class of problems such as classification by one and two-layer networks and regression on multi-index models. We provide exact formulas for the limits of the empirical spectral distribution and outlier eigenvalues and eigenvectors of such matrices in the proportional asymptotics limit, where the number of samples and dimension $n,d\to\infty$ and $n/d=ϕ\in (0,\infty)$. These limits depend on the parameters $\mathbf{x}$ only through the summary statistic of the $(C+k)\times (C+k)$ Gram matrix of the parameters and class means, $\mathbf{G} = (\mathbf{x},\boldsymbolμ)^\top(\mathbf{x},\boldsymbolμ)$. It is known that under general conditions, when $\mathbf{x}$ is trained by online stochastic gradient descent, the evolution of these same summary statistics along training converges to the solution of an autonomous system of ODEs, called the effective dynamics. This enables us to connect the training dynamics to the spectral theory of these matrices generated with test data. We demonstrate our general results by analyzing the effective spectrum along the effective dynamics in the case of multi-class logistic regression. In this setting, the empirical Hessian and information matrices have substantially different spectra, each with their own static and even dynamical spectral transitions.
Publisher OA PDF DOI
Stochastic gradient descent in high dimensions for multi-spiked tensor PCA
arXiv (Cornell University) · 2024-10-23
preprintOpen access1st authorCorresponding
We study the high-dimensional dynamics of online stochastic gradient descent (SGD) for the multi-spiked tensor model. This multi-index model arises from the tensor principal component analysis (PCA) problem with multiple spikes, where the goal is to estimate $r$ unknown signal vectors within the $N$-dimensional unit sphere through maximum likelihood estimation from noisy observations of a $p$-tensor. We determine the number of samples and the conditions on the signal-to-noise ratios (SNRs) required to efficiently recover the unknown spikes from natural random initializations. We show that full recovery of all spikes is possible provided a number of sample scaling as $N^{p-2}$, matching the algorithmic threshold identified in the rank-one case [Ben Arous, Gheissari, Jagannath 2020, 2021]. Our results are obtained through a detailed analysis of a low-dimensional system that describes the evolution of the correlations between the estimators and the spikes, while controlling the noise in the dynamics. We find that the spikes are recovered sequentially in a process we term "sequential elimination": once a correlation exceeds a critical threshold, all correlations sharing a row or column index become sufficiently small, allowing the next correlation to grow and become macroscopic. The order in which correlations become macroscopic depends on their initial values and the corresponding SNRs, leading to either exact recovery or recovery of a permutation of the spikes. In the matrix case, when $p=2$, if the SNRs are sufficiently separated, we achieve exact recovery of the spikes, whereas equal SNRs lead to recovery of the subspace spanned by them.
Publisher OA PDF DOI
Langevin dynamics for high-dimensional optimization: the case of multi-spiked tensor PCA
arXiv (Cornell University) · 2024-08-12
preprintOpen access1st authorCorresponding
We study nonconvex optimization in high dimensions through Langevin dynamics, focusing on the multi-spiked tensor PCA problem. This tensor estimation problem involves recovering $r$ hidden signal vectors (spikes) from noisy Gaussian tensor observations using maximum likelihood estimation. We study the number of samples required for Langevin dynamics to efficiently recover the spikes and determine the necessary separation condition on the signal-to-noise ratios (SNRs) for exact recovery, distinguishing the cases $p \ge 3$ and $p=2$, where $p$ denotes the order of the tensor. In particular, we show that the sample complexity required for recovering the spike associated with the largest SNR matches the well-known algorithmic threshold for the single-spike case, while this threshold degrades when recovering all $r$ spikes. As a key step, we provide a detailed characterization of the trajectory and interactions of low-dimensional projections that capture the high-dimensional dynamics.
Publisher OA PDF DOI
The Larkin Mass and Replica Symmetry Breaking in the Elastic Manifold
arXiv (Cornell University) · 2024-10-29
preprintOpen access1st authorCorresponding
This is the second of a series of three papers about the Elastic Manifold model. This classical model proposes a rich picture due to the competition between the inherent disorder and the smoothing effect of elasticity. In this paper, we analyze our variational formula for the free energy obtained in our first companion paper [16]. We show that this variational formula may be simplified to one which is solved by a unique saddle point. We show that this saddle point may be solved for in terms of the corresponding critical point equation. Moreover, its terms may be interpreted in terms of natural statistics of the model: namely the overlap distribution and effective radius of the model at a given site. Using this characterization, obtain a complete characterization of the replica symmetry breaking phase. From this we are able to confirm a number of physical predictions about this boundary, namely those involving the Larkin mass [6, 53, 54], an important critical mass for the system. The zero-temperature Larkin mass has recently been shown to be the topological trivialization threshold, following work of Fyodorov and Le Doussal [37, 38], made rigorous by the first author, Bourgade and McKenna [12, 13].
Publisher OA PDF DOI
The Free Energy of the Elastic Manifold
arXiv (Cornell University) · 2024-10-24
preprintOpen access1st authorCorresponding
This is the first of a series of three papers about the Elastic Manifold model. This classical model proposes a rich picture due to the competition between the inherent disorder and the smoothing effect of elasticity. In this paper, we prove a Parisi formula, i.e. we compute the asymptotic quenched free energy and show it is given by the solution to a certain variational problem. This work comes after a long and distinguished line of work in the Physics literature, going back to the 1980's (including the foundational work by Daniel Fisher [29], Marc Mezard and Giorgio Parisi [50, 51], and more recently by Yan Fyodorov and Pierre Le Doussal [34, 35]. Even though the mathematical study of Spin Glasses has seen deep progress in the recent years, after the celebrated work by Michel Talagrand [67, 68], the Elastic Manifold model has been studied from a mathematical perspective, only recently and at zero temperature. The annealed topological complexity has been computed, by the first author with Paul Bourgade and Benjamin McKenna [15, 16]. Here we begin the study of this model at positive temperature by computing the quenched free energy. We obtain our Parisi formula by first applying Laplace's method to reduce the question to a related new family of spherical Spin Glass models with an elastic interaction. The upper bound is then obtained through an interpolation argument initially developed by Francisco Guerra [42] for the study of Spin Glasses. The lower bound follows by adapting the cavity method along the lines explored by Wei-Kuo Chen [23] and the multi-species synchronization method of Dmitry Panchenko [55]. In our next papers [19, 20] we will analyze the consequences of this Parisi formula.
Publisher OA PDF DOI
Backbone scaling limits for random walks on random critical trees
Annales de l Institut Henri Poincaré Probabilités et Statistiques · 2024-07-31
article1st authorCorresponding
Nous prouvons l’existence de la limite d’échelle pour la projection sur la lignée infinie de la marche aléatoire sur l’amas de percolation critique infini conditionné (IIC). Nous considérons aussi le cas de l’amas de percolation d’invasion d’un arbre régulier. Nous étudions ces marches projetées comme des marches aléatoires piégées de manière aléatoire (comme définies dans (Ann. Probab. 43 (2015) 2405–2457)). Nous pouvons décrire ces limites d’échelle comme des mouvements Browniens subordonnés spatialement.
Publisher DOI
Landscape complexity beyond invariance and the elastic manifold
Communications on Pure and Applied Mathematics · 2023-09-14 · 16 citations
article1st author
Abstract This paper characterizes the annealed, topological complexity (both of total critical points and of local minima) of the elastic manifold. This classical model of a disordered elastic system captures point configurations with self‐interactions in a random medium. We establish the simple versus glassy phase diagram in the model parameters, with these phases separated by a physical boundary known as the Larkin mass, confirming formulas of Fyodorov and Le Doussal. One essential, dynamical, step of the proof also applies to a general signal‐to‐noise model of soft spins in an anisotropic well, for which we prove a negative‐second‐moment threshold distinguishing positive from zero complexity. A universal near‐critical behavior appears within this phase portrait, namely quadratic near‐critical vanishing of the complexity of total critical points, and cubic near‐critical vanishing of the complexity of local minima. These two models serve as a paradigm of complexity calculations for Gaussian landscapes exhibiting few distributional symmetries, that is, beyond the invariant setting. The two main inputs for the proof are determinant asymptotics for non‐invariant random matrices from our companion paper (Ben Arous, Bourgade, McKenna 2022), and the atypical convexity and integrability of the limiting variational problems.
Publisher DOI

Recent grants

Slow Dynamics in Random Media
NSF · $300k · 2008–2012
Random Matrices, Complexity and Slow Dynamics in Random Media
NSF · $420k · 2012–2016

Frequent coauthors

Alice Guionnet
Unité de Mathématiques Pures et Appliquées
32 shared
Alexander Fribergh
Université de Montréal
32 shared
Paul Bourgade
24 shared
Jǐŕı Černý
21 shared
Alan Hammond
20 shared
Aukosh Jagannath
University of Waterloo
19 shared
Reza Gheissari
18 shared
Nina Gantert
17 shared

Labs

NYU Courant Mathematics DepartmentPI

Education

Ph.D.
University of Paris VII
1981
Other
École Normale Supérieure
Other
University of Paris-Sud (Orsay)
Other, Chair of Stochastic Modeling
Swiss Federal Institute of Technology in Lausanne

Awards & honors

Senior Lady Davis Fellowship (Israel)
Rollo Davison Prize (Imperial College, London)
Montyon Prize (French Academy of Sciences)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Gérard Ben Arous

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you