
Ery Arias-Castro
· ProfessorUniversity of California, San Diego · Mathematics
Active 1970–2025
About
Ery Arias-Castro received his Ph.D. in Statistics from Stanford University in 2004. Following his doctoral studies, he held postdoctoral positions at the Institute for Pure and Applied Mathematics (IPAM), where he participated in the program on Multiscale Geometry and Analysis in High Dimensions, and at the Mathematical Sciences Research Institute (MSRI), where he engaged in the program on Mathematical, Computational and Statistical Aspects of Image Analysis. He joined the faculty of the Department of Mathematics at UCSD in 2005. His research interests encompass high-dimensional statistics, machine learning, spatial statistics, image processing, and applied probability.
Research topics
- Computer Science
- Mathematics
- Artificial Intelligence
- Statistics
- Mathematical analysis
- Combinatorics
- Data science
- Discrete mathematics
- Algorithm
- Theoretical computer science
Selected publications
Sparse anomaly detection across referentials: A rank-based higher criticism approach
The Annals of Statistics · 2025-04-01 · 1 citations
articleOpen accessSenior authorDetecting anomalies in large sets of observations is crucial in various applications, such as epidemiological studies, gene expression studies, and systems monitoring. We consider settings where the units of interest result in multiple independent observations from potentially distinct referentials. Scan statistics and related methods are commonly used in such settings, but rely on stringent modeling assumptions for proper calibration. We instead propose a rank-based variant of the higher criticism statistic that only requires independent observations originating from ordered spaces. We show under what conditions the resulting methodology is able to detect the presence of anomalies. These conditions are stated in a general, nonparametric manner, and depend solely on the probabilities of anomalous observations exceeding nominal observations. The analysis requires a refined understanding of the distribution of the ranks under the presence of anomalies, and in particular of the rank-induced dependencies. The methodology is robust against heavy-tailed distributions through the use of ranks. Within the exponential family and a family of convolutional models, we analytically quantify the asymptotic performance of our methodology and the performance of the oracle, and show the difference is small for many common models. Simulations confirm these results. We show the applicability of the methodology through an analysis of quality control data of a pharmaceutical manufacturing process.
Stability of Sequential Lateration and of Stress Minimization in the Presence of Noise
SIAM Journal on Mathematics of Data Science · 2025-07-15 · 1 citations
articleOpen access1st authorCorrespondingSIAM Journal on Mathematics of Data Science · 2025-09-04
articleOpen access1st authorCorrespondingThe Annals of Statistics · 2025-04-01 · 1 citations
article1st authorCorrespondingWe adapt concepts, methodology, and theory originally developed in the areas of multidimensional scaling and dimensionality reduction for Euclidean data to be applicable to distributional data. We focus on classical scaling and Isomap—prototypical methods that have played important roles in these areas—and showcase their use in the context of distributional data analysis. In the process, we highlight the crucial role that the ambient metric plays.
Minimax Optimality of Classical Scaling Under General Noise Conditions
ArXiv.org · 2025-02-02
preprintOpen accessSenior authorWe establish the consistency of classical scaling under a broad class of noise models, encompassing many commonly studied cases in literature. Our approach requires only finite fourth moments of the noise, significantly weakening standard assumptions. We derive convergence rates for classical scaling and establish matching minimax lower bounds, demonstrating that classical scaling achieves minimax optimality in recovering the true configuration even when the input dissimilarities are corrupted by noise.
Cluster and then Embed: A Modular Approach for Visualization
ArXiv.org · 2025-08-27
preprintOpen accessDimensionality reduction methods such as t-SNE and UMAP are popular methods for visualizing data with a potential (latent) clustered structure. They are known to group data points at the same time as they embed them, resulting in visualizations with well-separated clusters that preserve local information well. However, t-SNE and UMAP also tend to distort the global geometry of the underlying data. We propose a more transparent, modular approach consisting of first clustering the data, then embedding each cluster, and finally aligning the clusters to obtain a global embedding. We demonstrate this approach on several synthetic and real-world datasets and show that it is competitive with existing methods, while being much more transparent.
Clustering by hill-climbing: Consistency results
The Annals of Statistics · 2025-12-01
article1st authorCorrespondingWe consider several hill-climbing approaches to clustering as formulated by Fukunaga and Hostetler (IEEE Trans. Inf. Theory IT-21 (1975) 32–40) in the 1970s. We study both continuous-space and discrete-space (i.e., medoid) variants and establish their consistency.
<i>K</i> -means and gaussian mixture modeling with a separation constraint
Communications in Statistics - Simulation and Computation · 2024-05-21
articleSenior authorWe consider the problem of clustering with K-means and Gaussian mixture models with a constraint on the separation between the centers in the context of real-valued data. We first propose a dynamic programming approach to solving the K-means problem with a separation constraint on the centers, building on Wang and Song (Citation2011). In the context of fitting a Gaussian mixture model, we then propose an EM algorithm that incorporates such a constraint. A separation constraint can help regularize the output of a clustering algorithm, and we provide both simulated and real data examples to illustrate this point.
The coreness and h-index of random geometric graphs
Latin American Journal of Probability and Mathematical Statistics · 2024-01-01
articleOpen accessIn network analysis, a measure of node centrality provides a scale indicating how central a node is within a network.The coreness is a popular notion of centrality that accounts for the maximal smallest degree of a subgraph containing a given node.In this paper, we study the coreness of random geometric graphs and show that, with an increasing number of nodes and properly chosen connectivity radius, the coreness converges to a new object, that we call the continuum coreness.In the process, we show that other popular notions of centrality measures, namely the H-index and its iterates, also converge under the same setting to new limiting objects.
Graph Max Shift: A Hill-Climbing Method for Graph Clustering
arXiv (Cornell University) · 2024-11-27
preprintOpen access1st authorCorrespondingWe present a method for graph clustering that is analogous to gradient ascent methods previously proposed for clustering points in space. The algorithm, which can be viewed as a max-degree hill-climbing procedure on the graph, iteratively moves each node to a neighboring node of highest degree. We show that, when applied to a random geometric graph whose nodes correspond to data drawn i.i.d. from a density with Morse regularity, the method is asymptotically consistent. Here, consistency is in the sense of Fukunaga and Hostetler, meaning, with respect to the partition of the support of the density defined by the basins of attraction of the density gradient flow.
Recent grants
Stable and Robust Graph Embedding, and Related Problems
NSF · $140k · 2019–2022
Theory and practice of nonparametric detection
NSF · $120k · 2006–2011
ATD: Detection of Clusters in Spatial Data and Images
NSF · $885k · 2012–2017
Some problems in geometric data analysis
NSF · $200k · 2015–2019
Collaborative Research: Multi-manifold data modeling: theory, algorithms and applications
NSF · $110k · 2009–2014
Frequent coauthors
- 37 shared
Emmanuel J. Candès
- 23 shared
Bruno Pelletier
Institut de recherche mathématique de Rennes
- 20 shared
David L. Donoho
Stanford University
- 19 shared
Nicolas Verzélen
Mathématiques, Informatique et Statistique pour l'Environnement et l'Agronomie
- 17 shared
Gábor Lugosi
- 16 shared
Arnaud Durand
Institut de Mathématiques de Jussieu-Paris Rive Gauche
- 15 shared
Xiaoming Huo
- 14 shared
Clément Berenfeld
Awards & honors
- Hellman Fellowship
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ery Arias-Castro
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup