Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…

Martin Farach-Colton

· Leonard J. Shustek Professor of Computer Science and EngineeringVerified

New York University · Department of Computer Science

Active 1989–2026

h-index55
Citations12.5k
Papers32369 last 5y
Funding$2.6M
See your match with Martin Farach-Colton — sign in to PhdFit.Sign in

About

Martín Farach-Colton is the Leonard J. Shustek Professor of Computer Science and the Chair of the Department of Computer Science and Engineering at NYU Tandon School of Engineering. His research interests include the theory and practice of data structures for storage systems, graph algorithms, and streaming algorithms. He has contributed to applying mathematical principles to the development of algorithms and data structures, with a focus on storage and graph-related problems. Dr. Farach-Colton holds a B.S. in Mathematics and Chemistry from the University of South Carolina, an M.D. from The Johns Hopkins School of Medicine, and a Ph.D. in Computer Science from the University of Maryland. His work has been recognized through various honors, and he is actively involved in research through the Algorithms and Foundations Group at NYU Tandon.

Research topics

  • Computer Science
  • Algorithm
  • Mathematics
  • Theoretical computer science
  • Operating system
  • Parallel computing
  • Combinatorics
  • Discrete mathematics
  • Business
  • Programming language
  • Telecommunications
  • Computer hardware
  • Computer vision
  • Computer network

Selected publications

  • Writes Wrought Right, and Other Adventures in File System Optimization

    UNC Libraries · 2026-04-09

    articleOpen access

    File systems that employ write-optimized dictionaries (WODs) can perform random-writes, metadata updates, and recursive directory traversals orders of magnitude faster than conventional file systems. However, previous WOD-based file systems have not obtained all of these performance gains without sacrificing performance on other operations, such as file deletion, file or directory renaming, or sequential writes. Using three techniques, late-binding journaling , zoning , and range deletion , we show that there is no fundamental trade-off in write-optimization. These dramatic improvements can be retained while matching conventional file systems on all other operations. BetrFS 0.2 delivers order-of-magnitude better performance than conventional file systems on directory scans and small random writes and matches the performance of conventional file systems on rename, delete, and sequential I/O. For example, BetrFS 0.2 performs directory scans 2.2 × faster, and small random writes over two orders of magnitude faster, than the fastest conventional file system. But unlike BetrFS 0.1, it renames and deletes files commensurate with conventional file systems and performs large sequential I/O at nearly disk bandwidth. The performance benefits of these techniques extend to applications as well. BetrFS 0.2 continues to outperform conventional file systems on many applications, such as as rsync, git-diff, and tar, but improves git-clone performance by 35% over BetrFS 0.1, yielding performance comparable to other file systems.

  • History-Independent Dynamic Partitioning with Applications to B-Trees, Skip Lists and Fusion Trees

    ACM Transactions on Database Systems · 2026-04-25

    articleOpen access

    A data structure is history independent if its internal representation reveals nothing about the history of operations beyond what can be determined from the current contents of the data structure. History independence is typically viewed as a security or privacy guarantee, with the intent being to minimize risks incurred by a security breach or audit. Despite widespread advances in history independence, there is an important data-structural primitive that previous work has been unable to replace with an equivalent history-independent alternative— dynamic partitioning . In dynamic partitioning, we are given a dynamic set S of ordered elements and a size-parameter B , and the objective is to maintain a partition of S into ordered groups, each of size Θ ( B ). Dynamic partitioning is important throughout computer science, with applications to B-tree rebalancing, write-optimized dictionaries, log-structured merge trees, other external-memory indexes, geometric and spatial data structures, cache-oblivious data structures, and order-maintenance data structures. The lack of a history-independent dynamic-partitioning primitive has meant that designers of history-independent data structures have had to resort to complex alternatives. In this paper, we achieve history-independent dynamic partitioning. Our algorithm runs asymptotically optimally against an oblivious adversary, processing each insert/delete with O (1) operations in expectation and O ( B log N /log logN ) with high probability in set size N . We also use our dynamic partitioning scheme to build a history-independent B -tree, history-independent fusion tree, and external-memory skip list.

  • Optimal Bounds for Open Addressing Without Reordering

    arXiv (Cornell University) · 2025-01-04

    preprintOpen access1st authorCorresponding

    In this paper, we revisit one of the simplest problems in data structures: the task of inserting elements into an open-addressed hash table so that elements can later be retrieved with as few probes as possible. We show that, even without reordering elements over time, it is possible to construct a hash table that achieves far better expected search complexities (both amortized and worst-case) than were previously thought possible. Along the way, we disprove the central conjecture left by Yao in his seminal paper ``Uniform Hashing is Optimal''. All of our results come with matching lower bounds.

  • The Case for External Graph Sketching

    Society for Industrial and Applied Mathematics eBooks · 2025-01-01

    book-chapterOpen access

    Algorithms in the data stream model use O (polylog (N )) space to compute some property of an input of size N, and many of these algorithms are implemented and used in practice. However, sketching algorithms in the graph semi-streaming model use O (V polylog (V )) space for a V-vertex graph, and the fact that implementations of these algorithms are not used in the academic literature or in industrial applications may be because this space requirement is too large for RAM on today’s hardware.

  • Time To Replace Your Filter: How Maplets Simplify System Design

    ArXiv.org · 2025-10-07

    preprintOpen access

    Filters such as Bloom, quotient, and cuckoo filters are fundamental building blocks providing space-efficient approximate set membership testing. However, many applications need to associate small values with keys-functionality that filters do not provide. This mismatch forces complex workarounds that degrade performance. We argue that maplets-space-efficient data structures for approximate key-value mappings-are the right abstraction. A maplet provides the same space benefits as filters while natively supporting key-value associations with one-sided error guarantees. Through detailed case studies of SplinterDB (LSM-based key-value store), Squeakr (k-mer counter), and Mantis (genomic sequence search), we identify the common patterns and demonstrate how a unified maplet abstraction can lead to simpler designs and better performance. We conclude that applications benefit from defaulting to maplets rather than filters across domains including databases, computational biology, and networking.

  • The Case for External Graph Sketching

    ArXiv.org · 2025-04-24

    preprintOpen access

    Algorithms in the data stream model use $O(polylog(N))$ space to compute some property of an input of size $N$, and many of these algorithms are implemented and used in practice. However, sketching algorithms in the graph semi-streaming model use $O(V polylog(V))$ space for a $V$-vertex graph, and the fact that implementations of these algorithms are not used in the academic literature or in industrial applications may be because this space requirement is too large for RAM on today's hardware. In this paper we introduce the external semi-streaming model, which addresses the aspects of the semi-streaming model that limit its practical impact. In this model, the input is in the form of a stream and $O(V polylog(V))$ space is available, but most of that space is accessible only via block I/O operations as in the external memory model. The goal in the external semi-streaming model is to simultaneously achieve small space and low I/O cost. We present a general transformation from any vertex-based sketch algorithm to one which has a low sketching cost in the new model. We prove that this automatic transformation is tight or nearly (up to a $O(\log(V))$ factor) tight via an I/O lower bound for the task of sketching the input stream. Using this transformation and other techniques, we present external semi-streaming algorithms for connectivity, bipartiteness testing, $(1+ε)$-approximating MST weight, testing k-edge connectivity, $(1+ε)$-approximating the minimum cut of a graph, computing $ε$-cut sparsifiers, and approximating the density of the densest subgraph. These algorithms all use $O(V poly(\log(V), ε^{-1},k)$ space. For many of these problems, our external semi-streaming algorithms outperform the state of the art algorithms in both the sketching and external-memory models.

  • Efficiently Constructing Sparse Navigable Graphs

    ArXiv.org · 2025-07-17

    preprintOpen access

    Graph-based nearest neighbor search methods have seen a surge of popularity in recent years, offering state-of-the-art performance across a wide variety of applications. Central to these methods is the task of constructing a sparse navigable search graph for a given dataset endowed with a distance function. Unfortunately, doing so is computationally expensive, so heuristics are universally used in practice. In this work, we initiate the study of fast algorithms with provable guarantees for search graph construction. For a dataset with $n$ data points, the problem of constructing an optimally sparse navigable graph can be framed as $n$ separate but highly correlated minimum set cover instances. This yields a naive $O(n^3)$ time greedy algorithm that returns a navigable graph whose sparsity is at most $O(\log n)$ higher than optimal. We improve significantly on this baseline, taking advantage of correlation between the set cover instances to leverage techniques from streaming and sublinear-time set cover algorithms. By also introducing problem-specific pre-processing techniques, we obtain an $\tilde{O}(n^2)$ time algorithm for constructing an $O(\log n)$-approximate sparsest navigable graph under any distance function. The runtime of our method is optimal up to logarithmic factors under the Strong Exponential Time Hypothesis via a reduction from Monochromatic Closest Pair. Moreover, we prove that, as with general set cover, obtaining better than an $O(\log n)$-approximation is NP-hard, despite the significant additional structure present in the navigable graph problem. Finally, we show that our approach can also beat cubic time for the closely related and practically important problems of constructing $α$-shortcut reachable and $τ$-monotonic graphs, which are also used for nearest neighbor search. For such graphs, we obtain $\tilde{O}(n^{2.5})$ time or better algorithms.

  • History-Independent Concurrent Hash Tables

    ArXiv.org · 2025-03-26

    preprintOpen access

    A history-independent data structure does not reveal the history of operations applied to it, only its current logical state, even if its internal state is examined. This paper studies history-independent concurrent dictionaries, in particular, hash tables, and establishes inherent bounds on their space requirements. This paper shows that there is a lock-free history-independent concurrent hash table, in which each memory cell stores two elements and two bits, based on Robin Hood hashing. Our implementation is linearizable, and uses the shared memory primitive LL/SC. The expected amortized step complexity of the hash table is $O(c)$, where $c$ is an upper bound on the number of concurrent operations that access the same element, assuming the hash table is not overpopulated. We complement this positive result by showing that even if we have only two concurrent processes, no history-independent concurrent dictionary that supports sets of any size, with wait-free membership queries and obstruction-free insertions and deletions, can store only two elements of the set and a constant number of bits in each memory cell. This holds even if the step complexity of operations on the dictionary is unbounded.

  • Don't Melt Your Cache: Low-Associativity with Heat-Sink

    2025-07-16

    article

    Perhaps the most influential result in the theory of caches is the following theorem due to Sleator and Tarjan: With O(1) resource augmentation, the basic LRU eviction policy is guaranteed to be O(1)-competitive with the optimal offline policy.

  • History-Independent Dynamic Partitioning: Operation-Order Privacy in Ordered Data Structures

    ACM SIGMOD Record · 2025-04-28

    article

    A data structure is history independent if its internal representation reveals nothing about the history of operations beyond what can be determined from the current contents of the data structure. History independence is typically viewed as a security or privacy guarantee, with the intent being to minimize risks incurred by a security breach or audit. Despite widespread advances in history independence, there is an important data-structural primitive that previous work has been unable to replace with an equivalent history-independent alternative—dynamic partitioning. In dynamic partitioning, we are given a dynamic set S of ordered elements and a size-parameter B, and the objective is to maintain a partition of S into ordered groups, each of size θ(B). Dynamic partitioning is important throughout computer science, with applications to B-tree rebalancing, write-optimized dictionaries, log-structured merge trees, other external-memory indexes, geometric and spatial data structures, cache-oblivious data structures, and order-maintenance data structures. The lack of a historyindependent dynamic-partitioning primitive has meant that designers of history-independent data structures have had to resort to complex alternatives. In this paper, we achieve history-independent dynamic partitioning. Our algorithm runs asymptotically optimally against an oblivious adversary, processing each insert/delete with O(1) operations in expectation and O(B logN/ loglogN) with high probability in set size N.

Recent grants

Frequent coauthors

Awards & honors

  • Leonard J. Shustek Professor of Computer Science
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Martin Farach-Colton

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup