Alexandr Andoni

· Associate ProfessorVerified

Columbia University · Computer Science

Active 2003–2026

h-index33

Citations5.8k

Papers16521 last 5y

Funding$800k

Faculty page Lab page

See your match with Alexandr Andoni — sign in to PhdFit.Sign in

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Computer Science
Artificial Intelligence
Mathematics
Algorithm
Combinatorics
Geometry
Discrete mathematics

Selected publications

Nearly Optimal Attention Coresets
arXiv (Cornell University) · 2026-05-07
preprintOpen access
We consider the problem of estimating the Attention mechanism in small space, and prove the existence of coresets for it of nearly optimal size. Specifically, we show that for any set of unit-norm keys and values $(K,V)$ in $\mathbb{R}^d$, there exists a subset $(K',V')$ of size at most $O({\sqrt{d} e^{ρ+o(ρ)}/\varepsilon})$ such that \[ \left\| \operatorname{Attn}(q,K,V)- \operatorname{Attn}(q,K',V') \right\| \le \varepsilon \] simultaneously for all queries whose norm is bounded by $ρ$. This outperforms the best known results for this problem. We also offer an improved lower bound showing that $\varepsilon$-coresets must have size $Ω({\sqrt{d} e^ρ/ε})$.
Publisher DOI
Nearly Optimal Attention Coresets
ArXiv.org · 2026-05-07
articleOpen access
We consider the problem of estimating the Attention mechanism in small space, and prove the existence of coresets for it of nearly optimal size. Specifically, we show that for any set of unit-norm keys and values $(K,V)$ in $\mathbb{R}^d$, there exists a subset $(K',V')$ of size at most $O({\sqrt{d} e^{ρ+o(ρ)}/\varepsilon})$ such that \[ \left\| \operatorname{Attn}(q,K,V)- \operatorname{Attn}(q,K',V') \right\| \le \varepsilon \] simultaneously for all queries whose norm is bounded by $ρ$. This outperforms the best known results for this problem. We also offer an improved lower bound showing that $\varepsilon$-coresets must have size $Ω({\sqrt{d} e^ρ/ε})$.
Publisher OA PDF
Efficient Algorithms for Adversarially Robust Approximate Nearest Neighbor Search
arXiv (Cornell University) · 2026-01-01
preprintOpen access1st authorCorresponding
We study the Approximate Nearest Neighbor (ANN) problem under a powerful adaptive adversary that controls both the dataset and a sequence of $Q$ queries. Primarily, for the high-dimensional regime of $d = ω(\sqrt{Q})$, we introduce a sequence of algorithms with progressively stronger guarantees. We first establish a novel connection between adaptive security and \textit{fairness}, leveraging fair ANN search to hide internal randomness from the adversary with information-theoretic guarantees. To achieve data-independent performance, we then reduce the search problem to a robust decision primitive, solved using a differentially private mechanism on a Locality-Sensitive Hashing (LSH) data structure. This approach, however, faces an inherent $\sqrt{n}$ query time barrier. To break the barrier, we propose a novel concentric-annuli LSH construction that synthesizes these fairness and differential privacy techniques. The analysis introduces a new method for robustly releasing timing information from the underlying algorithm instances and, as a corollary, also improves existing results for fair ANN. In addition, for the low-dimensional regime $d = O(\sqrt{Q})$, we propose specialized algorithms that provide a strong ``for-all'' guarantee: correctness on \textit{every} possible query with high probability. We introduce novel metric covering constructions that simplify and improve prior approaches for ANN in Hamming and $\ell_p$ spaces.
Publisher DOI
Efficient Algorithms for Adversarially Robust Approximate Nearest Neighbor Search
ArXiv.org · 2026-01-01
articleOpen access1st authorCorresponding
We study the Approximate Nearest Neighbor (ANN) problem under a powerful adaptive adversary that controls both the dataset and a sequence of $Q$ queries. Primarily, for the high-dimensional regime of $d = ω(\sqrt{Q})$, we introduce a sequence of algorithms with progressively stronger guarantees. We first establish a novel connection between adaptive security and \textit{fairness}, leveraging fair ANN search to hide internal randomness from the adversary with information-theoretic guarantees. To achieve data-independent performance, we then reduce the search problem to a robust decision primitive, solved using a differentially private mechanism on a Locality-Sensitive Hashing (LSH) data structure. This approach, however, faces an inherent $\sqrt{n}$ query time barrier. To break the barrier, we propose a novel concentric-annuli LSH construction that synthesizes these fairness and differential privacy techniques. The analysis introduces a new method for robustly releasing timing information from the underlying algorithm instances and, as a corollary, also improves existing results for fair ANN. In addition, for the low-dimensional regime $d = O(\sqrt{Q})$, we propose specialized algorithms that provide a strong ``for-all'' guarantee: correctness on \textit{every} possible query with high probability. We introduce novel metric covering constructions that simplify and improve prior approaches for ANN in Hamming and $\ell_p$ spaces.
Publisher OA PDF
Embeddings into Similarity Measures for Nearest Neighbor Search
2025-12-14
article1st authorCorresponding
We introduce the notion of metric embeddings into a similarity measure over $\mathbb{R}_{+}^{m}$, such as the weighted Jaccard coefficient. We develop average embeddings into such similarity measures for a number of metric spaces, with (appropriately defined) distortion that is smaller than the best possible or known distortion of embedding into $\ell_{1}$ or $\ell_{2}$ spaces (biLipschitz or average). We complement our embeddings with a new algorithm for Approximate Nearest Neighbor Search (ANNS) that leverages such an embedding in a black box fashion. Combining these results, we obtain new efficient algorithms for ANNS under the following two classic metrics, achieving an exponential improvement to longstanding prior work: - Edit distance over length- k strings: $\operatorname{poly}(\log k)$ approximation; - $\ell_{p}$ over $\mathbb{R}^{d}$, for $p\gt2: O(\log p)$ approximation (known to be asymptotically optimal in relevant models of computation).
Publisher DOI
A Framework for Building Data Structures from Communication Protocols
2025-06-15
preprintOpen access1st authorCorresponding
We present a general framework for designing efficient data structures for high-dimensional pattern-matching problems ($\exists \;? i\in[n], f(x_i,y)=1$) through communication models in which $f(x,y)$ admits sublinear communication protocols with exponentially-small error. Specifically, we reduce the data structure problem to the Unambiguous Arthur-Merlin (UAM) communication complexity of $f(x,y)$ under product distributions. We apply our framework to the Partial Match problem (a.k.a, matching with wildcards), whose underlying communication problem is sparse set-disjointness. When the database consists of $n$ points in dimension $d$, and the number of $\star$'s in the query is at most $w = c\log n \;(\ll d)$, the fastest known linear-space data structure (Cole, Gottlieb and Lewenstein, STOC'04) had query time $t \approx 2^w = n^c$, which is nontrivial only when $c<1$. By contrast, our framework produces a data structure with query time $n^{1-1/(c \log^2 c)}$ and space close to linear. To achieve this, we develop a one-sided $ε$-error communication protocol for Set-Disjointness under product distributions with $\tildeΘ(\sqrt{d\log(1/ε)})$ complexity, improving on the classical result of Babai, Frankl and Simon (FOCS'86). Building on this protocol, we show that the Unambiguous AM communication complexity of $w$-Sparse Set-Disjointness with $ε$-error under product distributions is $\tilde{O}(\sqrt{w \log(1/ε)})$, independent of the ambient dimension $d$, which is crucial for the Partial Match result. Our framework sheds further light on the power of data-dependent data structures, which is instrumental for reducing to the (much easier) case of product distributions.
Publisher OA PDF DOI
Faster Algorithms for Average-Case Orthogonal Vectors and Closes Pair Problems
Society for Industrial and Applied Mathematics eBooks · 2025-01-01
book-chapter
We study the average-case version of the Orthogonal Vectors problem, in which one is given as input n vectors from {0,1}d which are chosen randomly so that each coordinate is 1 independently with probability p. Kane and Williams [ITCS 2019] showed how to solve this problem in time O (n2-δp) for a constant δρ > 0 that depends only on p. However, it was previously unclear how to solve the problem faster in the hardest parameter regime where p may depend on d.
Publisher DOI
Statistical-Computational Trade-offs for Density Estimation
arXiv (Cornell University) · 2024-10-30
preprintOpen access
We study the density estimation problem defined as follows: given $k$ distributions $p_1, \ldots, p_k$ over a discrete domain $[n]$, as well as a collection of samples chosen from a ``query'' distribution $q$ over $[n]$, output $p_i$ that is ``close'' to $q$. Recently~\cite{aamand2023data} gave the first and only known result that achieves sublinear bounds in {\em both} the sampling complexity and the query time while preserving polynomial data structure space. However, their improvement over linear samples and time is only by subpolynomial factors. Our main result is a lower bound showing that, for a broad class of data structures, their bounds cannot be significantly improved. In particular, if an algorithm uses $O(n/\log^c k)$ samples for some constant $c>0$ and polynomial space, then the query time of the data structure must be at least $k^{1-O(1)/\log \log k}$, i.e., close to linear in the number of distributions $k$. This is a novel \emph{statistical-computational} trade-off for density estimation, demonstrating that any data structure must use close to a linear number of samples or take close to linear query time. The lower bound holds even in the realizable case where $q=p_i$ for some $i$, and when the distributions are flat (specifically, all distributions are uniform over half of the domain $[n]$). We also give a simple data structure for our lower bound instance with asymptotically matching upper bounds. Experiments show that the data structure is quite efficient in practice.
Publisher OA PDF DOI
Faster Algorithms for Average-Case Orthogonal Vectors and Closest Pair Problems
arXiv (Cornell University) · 2024-10-29
preprintOpen access
We study the average-case version of the Orthogonal Vectors problem, in which one is given as input $n$ vectors from $\{0,1\}^d$ which are chosen randomly so that each coordinate is $1$ independently with probability $p$. Kane and Williams [ITCS 2019] showed how to solve this problem in time $O(n^{2 - δ_p})$ for a constant $δ_p > 0$ that depends only on $p$. However, it was previously unclear how to solve the problem faster in the hardest parameter regime where $p$ may depend on $d$. The best prior algorithm was the best worst-case algorithm by Abboud, Williams and Yu [SODA 2014], which in dimension $d = c \cdot \log n$, solves the problem in time $n^{2 - Ω(1/\log c)}$. In this paper, we give a new algorithm which improves this to $n^{2 - Ω(\log\log c /\log c)}$ in the average case for any parameter $p$. As in the prior work, our algorithm uses the polynomial method. We make use of a very simple polynomial over the reals, and use a new method to analyze its performance based on computing how its value degrades as the input vectors get farther from orthogonal. To demonstrate the generality of our approach, we also solve the average-case version of the closest pair problem in the same running time.
Publisher OA PDF DOI
Statistical-Computational Trade-offs for Density Estimation
2024-01-01
article
Publisher DOI

Recent grants

AF:Small: Nearest Neighbor Search in High Dimensional Spaces
NSF · $450k · 2016–2020
AF:Small: Data-Dependent Algorithms for High-Dimensional Data
NSF · $350k · 2020–2024

Frequent coauthors

Robert Krauthgamer
Weizmann Institute of Science
42 shared
Piotr Indyk
Moscow Institute of Thermal Technology
31 shared
Ilya Razenshteyn
31 shared
Huy L. Nguyễn
23 shared
Krzysztof Onak
17 shared
Negev Shekel Nosatzki
Columbia University
12 shared
Aleksandar Nikolov
University of Toronto
10 shared
Peilin Zhong
Google (United States)
10 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Alexandr Andoni

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you