
Christopher Musco
VerifiedNew York University · Computer Science
Active 2014–2026
About
I am an Institute Associate Professor of Computer Science and Engineering at New York University. My students and I are part of NYU's Theoretical Computer Science Group and the VIDA Center. Our research is on the design and analysis of efficient algorithms for problems across science and engineering. We combine ideas from theoretical computer science with tools from computational mathematics. I am also interested in algorithmic aspects of modern machine learning, like fast inference and retrieval.
Research topics
- Computer Science
- Mathematics
- Information Retrieval
- Artificial Intelligence
- Data Mining
- Internet privacy
- Quantum mechanics
- Optics
- Applied mathematics
- Theoretical computer science
- Computer graphics (images)
- Human–computer interaction
- Algorithm
- Programming language
- Combinatorics
- Physics
- Computer vision
- World Wide Web
Selected publications
An Exact Algorithm for the Unanimous Vote Problem
Society for Industrial and Applied Mathematics eBooks · 2026-01-01
book-chapterConsider \(n\) independent, biased coins, each with a known probability of heads. Presented with an ordering of these coins, flip (i.e., toss) each coin once, in that order, until we have observed both a head and a tail, or flipped all coins. The Unanimous Vote problem asks us to find the ordering that minimizes the expected number of flips. Gkenosis et al. [GGHK18] gave a polynomial-time \(\phi\)-approximation algorithm for this problem, where \(\phi \approx 1.618\) is the golden ratio. They left open whether the problem was \(\mathsf {NP}\)-hard. We answer this question by giving an exact algorithm that runs in time \(O(n \log n)\). The Unanimous Vote problem is an instance of the more general Stochastic Boolean Function Evaluation problem: it thus becomes one of the only such problems known to be solvable in polynomial time. Our proof uses simple interchange arguments to show that the optimal ordering must be close to the ordering produced by a natural greedy algorithm. Beyond our main result, we compare the optimal ordering with the best adaptive strategy, proving a tight adaptivity gap of \(1.2 \pm o(1)\) for the Unanimous Vote problem.
Does block size matter in randomized block Krylov low-rank approximation?
Society for Industrial and Applied Mathematics eBooks · 2026-01-01 · 1 citations
book-chapterWe study the problem of computing a rank-\(k\) approximation of a matrix using randomized block Krylov iteration. Prior work has shown that, for block size \(b = 1\) or \(b = k\), a \((1+\varepsilon)\)-factor approximation to the best rank-\(k\) approximation can be obtained after \(\tilde{O}(k/\sqrt{\varepsilon})\) matrix-vector products with the target matrix. On the other hand, when \(b\) is between \(1\) and \(k\), the best known bound on the number of matrix-vector products scales with \(b(k-b)\), which could be as large as \(O(k^2)\). Nevertheless, in practice, the performance of block Krylov methods is often optimized by choosing a block size \(1 \ll b \ll k\). We address this theory-practice gap by proving that randomized block Krylov iteration produces a \((1+\varepsilon)\)-factor approximate rank-\(k\) approximation using \(\tilde{O}(k/\sqrt{\varepsilon})\) matrix-vector products for any block size \(1 \le b \le k\). Our analysis relies on new bounds for the minimum singular value of a random block Krylov matrix, which may be of independent interest. Similar bounds are central to recent breakthroughs on faster algorithms for sparse linear systems [Peng & Vempala 2021; SODA 2021; Nie, STOC 2022].
Fixed-Sparsity Matrix Approximation from Matrix-Vector Products
SIAM Journal on Matrix Analysis and Applications · 2026-04-06
preprintOpen accessSenior authorWe study the problem of approximating a matrix $\mathbf{A}$ with a matrix that has a fixed sparsity pattern (e.g., diagonal, banded, etc.), when $\mathbf{A}$ is accessed only by matrix-vector products. We describe a simple randomized algorithm that returns an approximation with the given sparsity pattern with Frobenius-norm error at most $(1+\varepsilon)$ times the best possible error. When each row of the desired sparsity pattern has at most $s$ nonzero entries, this algorithm requires $O(s/\varepsilon)$ non-adaptive matrix-vector products with $\mathbf{A}$. We also prove a matching lower-bound, showing that, for any sparsity pattern with $Θ(s)$ nonzeros per row and column, any algorithm achieving $(1+ε)$ approximation requires $Ω(s/\varepsilon)$ matrix-vector products in the worst case. We thus resolve the matrix-vector product query complexity of the problem up to constant factors, even for the well-studied case of diagonal approximation, for which no previous lower bounds were known.
Randomized block-Krylov subspace methods for low-rank approximation of matrix functions
Linear Algebra and its Applications · 2026-03-30
articleSenior authorCorrespondingEfficiently Constructing Sparse Navigable Graphs
Society for Industrial and Applied Mathematics eBooks · 2026-01-01
book-chapterGraph-based nearest neighbor search methods have seen a surge of popularity in recent years, offering state-of-the-art performance across a wide variety of applications. Central to these methods is the task of constructing a sparse navigable search graph for a given dataset endowed with a distance function. Unfortunately, doing so is computationally expensive, so heuristics are universally used in practice.
Quasi-optimal Hierarchically Semi-separable Matrix Approximation
SIAM Journal on Matrix Analysis and Applications · 2026-05-08
articleOpen accessWe present a randomized algorithm for producing a quasi-optimal hierarchically semi-separable (HSS) approximation to an $N\times N$ matrix $A$ using only matrix-vector products with $A$ and $A^T$. We prove that, using $O(k \log(N/k))$ matrix-vector products and ${O}(N k^2 \log(N/k))$ additional runtime, the algorithm returns an HSS matrix $B$ with rank-$k$ blocks whose expected Frobenius norm error $\mathbb{E}[\|A - B\|_F^2]$ is at most $O(\log(N/k))$ times worse than the best possible approximation error by an HSS rank-$k$ matrix. In fact, the algorithm we analyze in a simple modification of an empirically effective method proposed by [Levitt & Martinsson, SISC 2024]. As a stepping stone towards our main result, we prove two results that are of independent interest: a similar guarantee for a variant of the algorithm which accesses $A$'s entries directly, and explicit error bounds for near-optimal subspace approximation using projection-cost-preserving sketches. To the best of our knowledge, our analysis constitutes the first polynomial-time quasi-optimality result for HSS matrix approximation, both in the explicit access model and the matrix-vector product query model.
Matrix Product Sketching via Coordinated Sampling
ArXiv.org · 2025-01-29
preprintOpen accessSenior authorWe revisit the well-studied problem of approximating a matrix product, $\mathbf{A}^T\mathbf{B}$, based on small space sketches $\mathcal{S}(\mathbf{A})$ and $\mathcal{S}(\mathbf{B})$ of $\mathbf{A} \in \R^{n \times d}$ and $\mathbf{B}\in \R^{n \times m}$. We are interested in the setting where the sketches must be computed independently of each other, except for the use of a shared random seed. We prove that, when $\mathbf{A}$ and $\mathbf{B}$ are sparse, methods based on \emph{coordinated random sampling} can outperform classical linear sketching approaches, like Johnson-Lindenstrauss Projection or CountSketch. For example, to obtain Frobenius norm error $ε\|\mathbf{A}\|_F\|\mathbf{B}\|_F$, coordinated sampling requires sketches of size $O(s/ε^2)$ when $\mathbf{A}$ and $\mathbf{B}$ have at most $s \leq d,m$ non-zeros per row. In contrast, linear sketching leads to sketches of size $O(d/ε^2)$ and $O(m/ε^2)$ for $\mathbf{A}$ and $\mathbf{B}$. We empirically evaluate our approach on two applications: 1) distributed linear regression in databases, a problem motivated by tasks like dataset discovery and augmentation, and 2) approximating attention matrices in transformer-based language models. In both cases, our sampling algorithms yield an order of magnitude improvement over linear sketching.
An Exact Algorithm for the Unanimous Vote Problem
ArXiv.org · 2025-10-19
preprintOpen accessConsider $n$ independent, biased coins, each with a known probability of heads. Presented with an ordering of these coins, flip (i.e., toss) each coin once, in that order, until we have observed both a *head* and a *tail*, or flipped all coins. The Unanimous Vote problem asks us to find the ordering that minimizes the expected number of flips. Gkenosis et al. [arXiv:1806.10660] gave a polynomial-time $ϕ$-approximation algorithm for this problem, where $ϕ\approx 1.618$ is the golden ratio. They left open whether the problem was NP-hard. We answer this question by giving an exact algorithm that runs in time $O(n \log n)$. The Unanimous Vote problem is an instance of the more general Stochastic Boolean Function Evaluation problem: it thus becomes one of the only such problems known to be solvable in polynomial time. Our proof uses simple interchange arguments to show that the optimal ordering must be close to the ordering produced by a natural greedy algorithm. Beyond our main result, we compare the optimal ordering with the best adaptive strategy, proving a tight adaptivity gap of $1.2\pm o(1)$ for the Unanimous Vote problem.
Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning
Society for Industrial and Applied Mathematics eBooks · 2025-01-01
book-chapterWe present a new class of preconditioned iterative methods for solving linear systems of the form Ax = b. Our methods are based on constructing a low-rank Nyström approximation to A using sparse random matrix sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning.
Query Efficient Structured Matrix Learning
ArXiv.org · 2025-07-25
preprintOpen accessWe study the problem of learning a structured approximation (low-rank, sparse, banded, etc.) to an unknown matrix $A$ given access to matrix-vector product (matvec) queries of the form $x \rightarrow Ax$ and $x \rightarrow A^Tx$. This problem is of central importance to algorithms across scientific computing and machine learning, with applications to fast multiplication and inversion for structured matrices, building preconditioners for first-order optimization, and as a model for differential operator learning. Prior work focuses on obtaining query complexity upper and lower bounds for learning specific structured matrix families that commonly arise in applications. We initiate the study of the problem in greater generality, aiming to understand the query complexity of learning approximations from general matrix families. Our main result focuses on finding a near-optimal approximation to $A$ from any finite-sized family of matrices, $\mathcal{F}$. Standard results from matrix sketching show that $O(\log|\mathcal{F}|)$ matvec queries suffice in this setting. This bound can also be achieved, and is optimal, for vector-matrix-vector queries of the form $x,y\rightarrow x^TAy$, which have been widely studied in work on rank-$1$ matrix sensing. Surprisingly, we show that, in the matvec model, it is possible to obtain a nearly quadratic improvement in complexity, to $\tilde{O}(\sqrt{\log|\mathcal{F}|})$. Further, we prove that this bound is tight up to log-log factors.Via covering number arguments, our result extends to well-studied infinite families. As an example, we establish that a near-optimal approximation from any \emph{linear matrix family} of dimension $q$ can be learned with $\tilde{O}(\sqrt{q})$ matvec queries, improving on an $O(q)$ bound achievable via sketching techniques and vector-matrix-vector queries.
Recent grants
CAREER: Fast Linear Algebra: Algorithms and Fundamental Limits
NSF · $571k · 2021–2027
CAREER: Frontiers in Matrix Sketching
NSF · $563k · 2021–2027
Frequent coauthors
- 79 shared
Cameron Musco
University of Massachusetts Amherst
- 14 shared
David P. Woodruff
Carnegie Mellon University
- 12 shared
Aaron Sidford
- 11 shared
Raphael A. Meyer
New York University
- 11 shared
Michael Kapralov
- 10 shared
Michael B. Cohen
Amherst College
- 8 shared
Aécio Santos
- 8 shared
Juliana Freire
New York University
Labs
Theoretical Computer Science at NYUPI
Applying mathematical tools to a variety of disciplines in computer science
Education
Ph.D., Computer Science
Massachusetts Institute of Technology (MIT)
B.S., Applied Math and Computer Science
Yale University
Awards & honors
- Outstanding Paper Award Honorable Mention
- Selected as spotlight paper
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Christopher Musco
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup