$Ery Arias-Castro$

Ery Arias-Castro

· Professor

University of California, San Diego · Mathematics

Active 1970–2025

h-index36

Citations4.9k

Papers19547 last 5y

Funding$1.5M

Faculty page

OpenAlex

See your match with Ery Arias-Castro — sign in to PhdFit.Sign in

About

Ery Arias-Castro received his Ph.D. in Statistics from Stanford University in 2004. Following his doctoral studies, he held postdoctoral positions at the Institute for Pure and Applied Mathematics (IPAM), where he participated in the program on Multiscale Geometry and Analysis in High Dimensions, and at the Mathematical Sciences Research Institute (MSRI), where he engaged in the program on Mathematical, Computational and Statistical Aspects of Image Analysis. He joined the faculty of the Department of Mathematics at UCSD in 2005. His research interests encompass high-dimensional statistics, machine learning, spatial statistics, image processing, and applied probability.

Research topics

Computer Science
Mathematics
Artificial Intelligence
Statistics
Mathematical analysis
Combinatorics
Data science
Discrete mathematics
Algorithm
Theoretical computer science

Selected publications

Sparse anomaly detection across referentials: A rank-based higher criticism approach
The Annals of Statistics · 2025-04-01 · 1 citations
articleOpen accessSenior author
Detecting anomalies in large sets of observations is crucial in various applications, such as epidemiological studies, gene expression studies, and systems monitoring. We consider settings where the units of interest result in multiple independent observations from potentially distinct referentials. Scan statistics and related methods are commonly used in such settings, but rely on stringent modeling assumptions for proper calibration. We instead propose a rank-based variant of the higher criticism statistic that only requires independent observations originating from ordered spaces. We show under what conditions the resulting methodology is able to detect the presence of anomalies. These conditions are stated in a general, nonparametric manner, and depend solely on the probabilities of anomalous observations exceeding nominal observations. The analysis requires a refined understanding of the distribution of the ranks under the presence of anomalies, and in particular of the rank-induced dependencies. The methodology is robust against heavy-tailed distributions through the use of ranks. Within the exponential family and a family of convolutional models, we analytically quantify the asymptotic performance of our methodology and the performance of the oracle, and show the difference is small for many common models. Simulations confirm these results. We show the applicability of the methodology through an analysis of quality control data of a pharmaceutical manufacturing process.
Publisher OA PDF DOI
Stability of Sequential Lateration and of Stress Minimization in the Presence of Noise
SIAM Journal on Mathematics of Data Science · 2025-07-15 · 1 citations
articleOpen access1st authorCorresponding
Publisher DOI
Theoretical Foundations of Ordinal Multidimensional Scaling, Including Internal Unfolding and External Unfolding
SIAM Journal on Mathematics of Data Science · 2025-09-04
articleOpen access1st authorCorresponding
Publisher DOI
Embedding distributional data
The Annals of Statistics · 2025-04-01 · 1 citations
article1st authorCorresponding
We adapt concepts, methodology, and theory originally developed in the areas of multidimensional scaling and dimensionality reduction for Euclidean data to be applicable to distributional data. We focus on classical scaling and Isomap—prototypical methods that have played important roles in these areas—and showcase their use in the context of distributional data analysis. In the process, we highlight the crucial role that the ambient metric plays.
Publisher DOI
Minimax Optimality of Classical Scaling Under General Noise Conditions
ArXiv.org · 2025-02-02
preprintOpen accessSenior author
We establish the consistency of classical scaling under a broad class of noise models, encompassing many commonly studied cases in literature. Our approach requires only finite fourth moments of the noise, significantly weakening standard assumptions. We derive convergence rates for classical scaling and establish matching minimax lower bounds, demonstrating that classical scaling achieves minimax optimality in recovering the true configuration even when the input dissimilarities are corrupted by noise.
Publisher OA PDF DOI
Cluster and then Embed: A Modular Approach for Visualization
ArXiv.org · 2025-08-27
preprintOpen access
Dimensionality reduction methods such as t-SNE and UMAP are popular methods for visualizing data with a potential (latent) clustered structure. They are known to group data points at the same time as they embed them, resulting in visualizations with well-separated clusters that preserve local information well. However, t-SNE and UMAP also tend to distort the global geometry of the underlying data. We propose a more transparent, modular approach consisting of first clustering the data, then embedding each cluster, and finally aligning the clusters to obtain a global embedding. We demonstrate this approach on several synthetic and real-world datasets and show that it is competitive with existing methods, while being much more transparent.
Publisher OA PDF DOI
Clustering by hill-climbing: Consistency results
The Annals of Statistics · 2025-12-01
article1st authorCorresponding
We consider several hill-climbing approaches to clustering as formulated by Fukunaga and Hostetler (IEEE Trans. Inf. Theory IT-21 (1975) 32–40) in the 1970s. We study both continuous-space and discrete-space (i.e., medoid) variants and establish their consistency.
Publisher DOI
<i>K</i> -means and gaussian mixture modeling with a separation constraint
Communications in Statistics - Simulation and Computation · 2024-05-21
articleSenior author
We consider the problem of clustering with K-means and Gaussian mixture models with a constraint on the separation between the centers in the context of real-valued data. We first propose a dynamic programming approach to solving the K-means problem with a separation constraint on the centers, building on Wang and Song (Citation2011). In the context of fitting a Gaussian mixture model, we then propose an EM algorithm that incorporates such a constraint. A separation constraint can help regularize the output of a clustering algorithm, and we provide both simulated and real data examples to illustrate this point.
Publisher DOI
The coreness and h-index of random geometric graphs
Latin American Journal of Probability and Mathematical Statistics · 2024-01-01
articleOpen access
In network analysis, a measure of node centrality provides a scale indicating how central a node is within a network.The coreness is a popular notion of centrality that accounts for the maximal smallest degree of a subgraph containing a given node.In this paper, we study the coreness of random geometric graphs and show that, with an increasing number of nodes and properly chosen connectivity radius, the coreness converges to a new object, that we call the continuum coreness.In the process, we show that other popular notions of centrality measures, namely the H-index and its iterates, also converge under the same setting to new limiting objects.
Publisher OA PDF DOI
Graph Max Shift: A Hill-Climbing Method for Graph Clustering
arXiv (Cornell University) · 2024-11-27
preprintOpen access1st authorCorresponding
We present a method for graph clustering that is analogous to gradient ascent methods previously proposed for clustering points in space. The algorithm, which can be viewed as a max-degree hill-climbing procedure on the graph, iteratively moves each node to a neighboring node of highest degree. We show that, when applied to a random geometric graph whose nodes correspond to data drawn i.i.d. from a density with Morse regularity, the method is asymptotically consistent. Here, consistency is in the sense of Fukunaga and Hostetler, meaning, with respect to the partition of the support of the density defined by the basins of attraction of the density gradient flow.
Publisher OA PDF DOI

Recent grants

Stable and Robust Graph Embedding, and Related Problems
NSF · $140k · 2019–2022
Theory and practice of nonparametric detection
NSF · $120k · 2006–2011
ATD: Detection of Clusters in Spatial Data and Images
NSF · $885k · 2012–2017
Some problems in geometric data analysis
NSF · $200k · 2015–2019
Collaborative Research: Multi-manifold data modeling: theory, algorithms and applications
NSF · $110k · 2009–2014

Frequent coauthors

Emmanuel J. Candès
37 shared
Bruno Pelletier
Institut de recherche mathématique de Rennes
23 shared
David L. Donoho
Stanford University
20 shared
Nicolas Verzélen
Mathématiques, Informatique et Statistique pour l'Environnement et l'Agronomie
19 shared
Gábor Lugosi
17 shared
Arnaud Durand
Institut de Mathématiques de Jussieu-Paris Rive Gauche
16 shared
Xiaoming Huo
15 shared
Clément Berenfeld
14 shared

Awards & honors

Hellman Fellowship

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Ery Arias-Castro

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you