Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Christopher Musco

Christopher Musco

Verified

New York University · Computer Science

Active 2014–2026

h-index19
Citations1.7k
Papers12770 last 5y
Funding$1.1M2 active
See your match with Christopher Musco — sign in to PhdFit.Sign in

About

I am an Institute Associate Professor of Computer Science and Engineering at New York University. My students and I are part of NYU's Theoretical Computer Science Group and the VIDA Center. Our research is on the design and analysis of efficient algorithms for problems across science and engineering. We combine ideas from theoretical computer science with tools from computational mathematics. I am also interested in algorithmic aspects of modern machine learning, like fast inference and retrieval.

Research topics

  • Computer Science
  • Mathematics
  • Information Retrieval
  • Artificial Intelligence
  • Data Mining
  • Internet privacy
  • Quantum mechanics
  • Optics
  • Applied mathematics
  • Theoretical computer science
  • Computer graphics (images)
  • Human–computer interaction
  • Algorithm
  • Programming language
  • Combinatorics
  • Physics
  • Computer vision
  • World Wide Web

Selected publications

  • An Exact Algorithm for the Unanimous Vote Problem

    Society for Industrial and Applied Mathematics eBooks · 2026-01-01

    book-chapter

    Consider \(n\) independent, biased coins, each with a known probability of heads. Presented with an ordering of these coins, flip (i.e., toss) each coin once, in that order, until we have observed both a head and a tail, or flipped all coins. The Unanimous Vote problem asks us to find the ordering that minimizes the expected number of flips. Gkenosis et al. [GGHK18] gave a polynomial-time \(\phi\)-approximation algorithm for this problem, where \(\phi \approx 1.618\) is the golden ratio. They left open whether the problem was \(\mathsf {NP}\)-hard. We answer this question by giving an exact algorithm that runs in time \(O(n \log n)\). The Unanimous Vote problem is an instance of the more general Stochastic Boolean Function Evaluation problem: it thus becomes one of the only such problems known to be solvable in polynomial time. Our proof uses simple interchange arguments to show that the optimal ordering must be close to the ordering produced by a natural greedy algorithm. Beyond our main result, we compare the optimal ordering with the best adaptive strategy, proving a tight adaptivity gap of \(1.2 \pm o(1)\) for the Unanimous Vote problem.

  • Does block size matter in randomized block Krylov low-rank approximation?

    Society for Industrial and Applied Mathematics eBooks · 2026-01-01 · 1 citations

    book-chapter

    We study the problem of computing a rank-\(k\) approximation of a matrix using randomized block Krylov iteration. Prior work has shown that, for block size \(b = 1\) or \(b = k\), a \((1+\varepsilon)\)-factor approximation to the best rank-\(k\) approximation can be obtained after \(\tilde{O}(k/\sqrt{\varepsilon})\) matrix-vector products with the target matrix. On the other hand, when \(b\) is between \(1\) and \(k\), the best known bound on the number of matrix-vector products scales with \(b(k-b)\), which could be as large as \(O(k^2)\). Nevertheless, in practice, the performance of block Krylov methods is often optimized by choosing a block size \(1 \ll b \ll k\). We address this theory-practice gap by proving that randomized block Krylov iteration produces a \((1+\varepsilon)\)-factor approximate rank-\(k\) approximation using \(\tilde{O}(k/\sqrt{\varepsilon})\) matrix-vector products for any block size \(1 \le b \le k\). Our analysis relies on new bounds for the minimum singular value of a random block Krylov matrix, which may be of independent interest. Similar bounds are central to recent breakthroughs on faster algorithms for sparse linear systems [Peng & Vempala 2021; SODA 2021; Nie, STOC 2022].

  • Fixed-Sparsity Matrix Approximation from Matrix-Vector Products

    SIAM Journal on Matrix Analysis and Applications · 2026-04-06

    preprintOpen accessSenior author

    We study the problem of approximating a matrix $\mathbf{A}$ with a matrix that has a fixed sparsity pattern (e.g., diagonal, banded, etc.), when $\mathbf{A}$ is accessed only by matrix-vector products. We describe a simple randomized algorithm that returns an approximation with the given sparsity pattern with Frobenius-norm error at most $(1+\varepsilon)$ times the best possible error. When each row of the desired sparsity pattern has at most $s$ nonzero entries, this algorithm requires $O(s/\varepsilon)$ non-adaptive matrix-vector products with $\mathbf{A}$. We also prove a matching lower-bound, showing that, for any sparsity pattern with $Θ(s)$ nonzeros per row and column, any algorithm achieving $(1+ε)$ approximation requires $Ω(s/\varepsilon)$ matrix-vector products in the worst case. We thus resolve the matrix-vector product query complexity of the problem up to constant factors, even for the well-studied case of diagonal approximation, for which no previous lower bounds were known.

  • Randomized block-Krylov subspace methods for low-rank approximation of matrix functions

    Linear Algebra and its Applications · 2026-03-30

    articleSenior authorCorresponding
  • Efficiently Constructing Sparse Navigable Graphs

    Society for Industrial and Applied Mathematics eBooks · 2026-01-01

    book-chapter

    Graph-based nearest neighbor search methods have seen a surge of popularity in recent years, offering state-of-the-art performance across a wide variety of applications. Central to these methods is the task of constructing a sparse navigable search graph for a given dataset endowed with a distance function. Unfortunately, doing so is computationally expensive, so heuristics are universally used in practice.

  • Quasi-optimal Hierarchically Semi-separable Matrix Approximation

    SIAM Journal on Matrix Analysis and Applications · 2026-05-08

    articleOpen access

    We present a randomized algorithm for producing a quasi-optimal hierarchically semi-separable (HSS) approximation to an $N\times N$ matrix $A$ using only matrix-vector products with $A$ and $A^T$. We prove that, using $O(k \log(N/k))$ matrix-vector products and ${O}(N k^2 \log(N/k))$ additional runtime, the algorithm returns an HSS matrix $B$ with rank-$k$ blocks whose expected Frobenius norm error $\mathbb{E}[\|A - B\|_F^2]$ is at most $O(\log(N/k))$ times worse than the best possible approximation error by an HSS rank-$k$ matrix. In fact, the algorithm we analyze in a simple modification of an empirically effective method proposed by [Levitt & Martinsson, SISC 2024]. As a stepping stone towards our main result, we prove two results that are of independent interest: a similar guarantee for a variant of the algorithm which accesses $A$'s entries directly, and explicit error bounds for near-optimal subspace approximation using projection-cost-preserving sketches. To the best of our knowledge, our analysis constitutes the first polynomial-time quasi-optimality result for HSS matrix approximation, both in the explicit access model and the matrix-vector product query model.

  • Matrix Product Sketching via Coordinated Sampling

    ArXiv.org · 2025-01-29

    preprintOpen accessSenior author

    We revisit the well-studied problem of approximating a matrix product, $\mathbf{A}^T\mathbf{B}$, based on small space sketches $\mathcal{S}(\mathbf{A})$ and $\mathcal{S}(\mathbf{B})$ of $\mathbf{A} \in \R^{n \times d}$ and $\mathbf{B}\in \R^{n \times m}$. We are interested in the setting where the sketches must be computed independently of each other, except for the use of a shared random seed. We prove that, when $\mathbf{A}$ and $\mathbf{B}$ are sparse, methods based on \emph{coordinated random sampling} can outperform classical linear sketching approaches, like Johnson-Lindenstrauss Projection or CountSketch. For example, to obtain Frobenius norm error $ε\|\mathbf{A}\|_F\|\mathbf{B}\|_F$, coordinated sampling requires sketches of size $O(s/ε^2)$ when $\mathbf{A}$ and $\mathbf{B}$ have at most $s \leq d,m$ non-zeros per row. In contrast, linear sketching leads to sketches of size $O(d/ε^2)$ and $O(m/ε^2)$ for $\mathbf{A}$ and $\mathbf{B}$. We empirically evaluate our approach on two applications: 1) distributed linear regression in databases, a problem motivated by tasks like dataset discovery and augmentation, and 2) approximating attention matrices in transformer-based language models. In both cases, our sampling algorithms yield an order of magnitude improvement over linear sketching.

  • An Exact Algorithm for the Unanimous Vote Problem

    ArXiv.org · 2025-10-19

    preprintOpen access

    Consider $n$ independent, biased coins, each with a known probability of heads. Presented with an ordering of these coins, flip (i.e., toss) each coin once, in that order, until we have observed both a *head* and a *tail*, or flipped all coins. The Unanimous Vote problem asks us to find the ordering that minimizes the expected number of flips. Gkenosis et al. [arXiv:1806.10660] gave a polynomial-time $ϕ$-approximation algorithm for this problem, where $ϕ\approx 1.618$ is the golden ratio. They left open whether the problem was NP-hard. We answer this question by giving an exact algorithm that runs in time $O(n \log n)$. The Unanimous Vote problem is an instance of the more general Stochastic Boolean Function Evaluation problem: it thus becomes one of the only such problems known to be solvable in polynomial time. Our proof uses simple interchange arguments to show that the optimal ordering must be close to the ordering produced by a natural greedy algorithm. Beyond our main result, we compare the optimal ordering with the best adaptive strategy, proving a tight adaptivity gap of $1.2\pm o(1)$ for the Unanimous Vote problem.

  • Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning

    Society for Industrial and Applied Mathematics eBooks · 2025-01-01

    book-chapter

    We present a new class of preconditioned iterative methods for solving linear systems of the form Ax = b. Our methods are based on constructing a low-rank Nyström approximation to A using sparse random matrix sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning.

  • Query Efficient Structured Matrix Learning

    ArXiv.org · 2025-07-25

    preprintOpen access

    We study the problem of learning a structured approximation (low-rank, sparse, banded, etc.) to an unknown matrix $A$ given access to matrix-vector product (matvec) queries of the form $x \rightarrow Ax$ and $x \rightarrow A^Tx$. This problem is of central importance to algorithms across scientific computing and machine learning, with applications to fast multiplication and inversion for structured matrices, building preconditioners for first-order optimization, and as a model for differential operator learning. Prior work focuses on obtaining query complexity upper and lower bounds for learning specific structured matrix families that commonly arise in applications. We initiate the study of the problem in greater generality, aiming to understand the query complexity of learning approximations from general matrix families. Our main result focuses on finding a near-optimal approximation to $A$ from any finite-sized family of matrices, $\mathcal{F}$. Standard results from matrix sketching show that $O(\log|\mathcal{F}|)$ matvec queries suffice in this setting. This bound can also be achieved, and is optimal, for vector-matrix-vector queries of the form $x,y\rightarrow x^TAy$, which have been widely studied in work on rank-$1$ matrix sensing. Surprisingly, we show that, in the matvec model, it is possible to obtain a nearly quadratic improvement in complexity, to $\tilde{O}(\sqrt{\log|\mathcal{F}|})$. Further, we prove that this bound is tight up to log-log factors.Via covering number arguments, our result extends to well-studied infinite families. As an example, we establish that a near-optimal approximation from any \emph{linear matrix family} of dimension $q$ can be learned with $\tilde{O}(\sqrt{q})$ matvec queries, improving on an $O(q)$ bound achievable via sketching techniques and vector-matrix-vector queries.

Recent grants

Frequent coauthors

Labs

Education

  • Ph.D., Computer Science

    Massachusetts Institute of Technology (MIT)

  • B.S., Applied Math and Computer Science

    Yale University

Awards & honors

  • Outstanding Paper Award Honorable Mention
  • Selected as spotlight paper
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Christopher Musco

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup