
Ken Birman
· N. Rama Rao Professor of Computer ScienceVerifiedCornell University · Computer Science
Active 1979–2025
About
Professor Ken Birman is the N. Rama Rao Professor of Computer Science at Cornell University, with a distinguished career focused on cloud computing, distributed systems, and fault-tolerant software. His recent research centers on Cascade and Vortex, innovative systems designed to overcome data movement bottlenecks in modern AI and machine learning applications. Cascade achieves significant speedups—often 5x to 100x faster than other platforms—by running AI or ML code directly where data is stored, using scheduling and planned placement to collocate computation and data. This approach departs from traditional cloud computing models, addressing privacy concerns and bandwidth limitations by operating close to data sources such as hospitals, factories, or airplanes. Vortex extends Cascade to support retrieval-augmented generation (RAG) large language models and vector databases, although this work is still in progress. Cascade is flexible, supporting multiple storage abstractions including POSIX file systems, key-value stores, and pub-sub systems, and aims to be compatible with popular AI frameworks like PyTorch and TensorFlow.
Research topics
- Computer Science
- Distributed computing
- Computer network
- Operating system
- Computer Security
- Engineering
- Database
Selected publications
Keep Your Friends Close: Leveraging Affinity Groups to Accelerate AI Inference Workflows
2025-08-28
articleSenior authorAI inference workflows are typically structured as a pipeline or graph of AI programs triggered by events. As events occur, the AIs perform inference or classification tasks under time pressure to respond or take some action. Standard techniques that reduce latency in other streaming settings (such as caching and optimization-driven scheduling) are of limited value because AI data access patterns (models, databases) change depending on the triggering event: a significant departure from traditional streaming. In this work, we propose a novel affinity grouping mechanism that makes it easier for developers to express application-specific data access correlations, enabling coordinated management of data objects in server clusters hosting streaming inference tasks. Our proposals are thus complementary to other approaches such as caching and scheduling. Experiments confirm the limitations of standard techniques, while showing that the proposed mechanism is able to maintain significantly lower latency as workload and scale-out increase, and yet requires only minor code changes.
Applied Sciences · 2025-07-24 · 6 citations
articleOpen accessThis study extends our prior blockchain-based traceability framework, WEave, for application to a furniture supply chain scenario, while using the original multi-tier apparel supply chain as an anchoring use case. We integrate circular economy principles such as product reuse, recycling traceability, and full lifecycle transparency to bolster sustainability and resilience in supply chains by enabling data-driven accountability and tracking for closed-loop resource flows. The enhanced approach can track post-consumer returns, use of recycled materials, and second-life goods, all represented using a closed-loop supply chain topology. We describe the extended network architecture and smart contract logic needed to capture circular lifecycle events, while proposing new metrics for evaluating lifecycle traceability and reuse auditability. To validate the extended framework, we outline simulation experiments that incorporate circular flows and cross-industry scenarios. Results from these simulations indicate improved transparency on recycled content, audit trails for returned products, and acceptable performance overhead when scaling to different product domains. Finally, we offer conclusions and recommendations for implementing WEave functionality into real-world settings consistent with the goals of digital, resilient, and sustainable supply chains.
Passing the Baton: High Throughput Distributed Disk-Based Vector Search with BatANN
arXiv (Cornell University) · 2025-12-10
preprintOpen accessSenior authorVector search underpins modern information-retrieval systems, including retrieval-augmented generation (RAG) pipelines and search engines over unstructured text and images. As datasets scale to billions of vectors, disk-based vector search has emerged as a practical solution. However, looking to the future, we must anticipate datasets too large for any single server and throughput demands that exceed the limits of locally attached SSDs. We present BatANN, a distributed disk-based approximate nearest neighbor (ANN) system that retains the logarithmic search efficiency of a single global graph while achieving near-linear throughput scaling in the number of servers. Our core innovation is that when accessing a neighborhood which is stored on another machine, we send the full state of the query to the other machine to continue executing there for improved locality. On 1B-point datasets at 0.95 recall using 10 servers, BatANN achieves 3.5-5.59x of the scatter-gather baseline and 1.44-2.09x the throughput of DistributedANN, respectively, while maintaining mean latency below 3 ms. Moreover, we get these results on standard TCP. To our knowledge, BatANN is the first open-source distributed disk-based vector search system to operate over a single global graph.
Diagnosing and Resolving Cloud Platform Instability with Multi-modal RAG LLMs
2025-03-30 · 1 citations
preprintOpen accessSenior authorToday's cloud-hosted applications and services are complex systems, and a performance or functional instability can have dozens or hundreds of potential root causes. Our hypothesis is that by combining the pattern matching capabilities of modern AI tools with a natural multi-modal RAG LLM interface, problem identification and resolution can be simplified. ARCA is a new multi-modal RAG LLM system that targets this domain. Step-wise evaluations show that ARCA outperforms state-of-the-art alternatives.
Procedia Computer Science · 2025-01-01 · 4 citations
articleOpen accessBuyer-driven commodity chains are characterized by commercial relationships between buyers and sellers that may obscure accountability due to complexity, thereby undermining sustainability efforts. Conventional methods to trace production, including ineffective human-led audits, risk reorienting global corporate governance towards the interests of private business and away from social benefit by limiting the role of objective data in the process. This study examines the relevant features of private, permissioned blockchain towards harnessing the transparency challenge by demonstrating the efficacy of our proposed framework against a simulation of a real-world multi-tier apparel supply chain. The simulation integrates a set of functional and operational requirements achieved through a combination of programmable smart contracts and underlying blockchain architecture. We then evaluate the framework both qualitatively and quantitatively before discussing the limitations of our work.
Accelerating Visual Anomaly Detection in Smart Manufacturing with RDMA-Enabled Data Infrastructure
Electronics · 2025-06-13
articleOpen accessSenior authorIndustrial Artificial Intelligence (IAI) services are increasingly integral to smart manufacturing, especially in quality assurance tasks like defect detection. This paper presents the design, implementation, and evaluation of a video-based visual anomaly detection (VAD) system that runs at inspection stations on a smart shop floor. Our system processes real-time video streams from multiple cameras mounted around a conveyor belt to detect surface-level defects in mechanical components. To meet stringent latency and accuracy requirements, an edge-cloud architecture powered by AI accelerators and InfiniBand networking is adopted. The IAI service features key frame extraction modules, fine-tuned lightweight VAD models, and optimization techniques such as batching and microservice-level parallelism. The design choices of AI modules are carefully evaluated to balance effectiveness and efficiency. As a result, the system latency is optimized by 57%. In addition to the high-performance solution, a cost-efficient alternative is also suggested that is able to complete the task within the time frame.
ArXiv.org · 2025-11-03
preprintOpen accessSenior authorThere is growing interest in deploying ML inference and knowledge retrieval as services that could support both interactive queries by end users and more demanding request flows that arise from AIs integrated into a end-user applications and deployed as agents. Our central premise is that these latter cases will bring service level latency objectives (SLOs). Existing ML serving platforms use batching to optimize for high throughput, exposing them to unpredictable tail latencies. Vortex enables an SLO-first approach. For identical tasks, Vortex's pipelines achieve significantly lower and more stable latencies than TorchServe and Ray Serve over a wide range of workloads, often enabling a given SLO target at more than twice the request rate. When RDMA is available, the Vortex advantage is even more significant.
Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows
arXiv (Cornell University) · 2024-02-27
preprintOpen accessWe consider ML query processing in distributed systems where GPU-enabled workers coordinate to execute complex queries: a computing style often seen in applications that interact with users in support of image processing and natural language processing. In such systems, coscheduling of GPU memory management and task placement represents a promising opportunity. We propose Compass, a novel framework that unifies these functions to reduce job latency while using resources efficiently, placing tasks where data dependencies will be satisfied, collocating tasks from the same job (when this will not overload the host or its GPU), and efficiently managing GPU memory. Comparison with other state of the art schedulers shows a significant reduction in completion times while requiring the same amount or even fewer resources. In one case, just half the servers were needed for processing the same workload.
Verifying a C Implementation of Derecho’s Coordination Mechanism Using VST and Coq
Lecture notes in computer science · 2024-01-01 · 1 citations
book-chapterDigital Twin-Driven Teat Localization and Shape Identification for Dairy Cow (Student Abstract)
Proceedings of the AAAI Conference on Artificial Intelligence · 2024-03-24 · 2 citations
articleOpen accessSenior authorDairy owners invest heavily to keep their animals healthy. There is good reason to hope that technologies such as computer vision and artificial intelligence (AI) could reduce costs, yet obstacles arise when adapting these advanced tools to farming environments. In this work, we applied AI tools to dairy cow teat localization and teat shape classification, obtaining a model that achieves a mean average precision of 0.783. This digital twin-driven approach is intended as a first step towards automating and accelerating the detection and treatment of hyperkeratosis, mastitis, and other medical conditions that significantly burden the dairy industry.
Recent grants
CiC: Science of Cloud-Scale Computing
NSF · $370k · 2011–2014
FIA: Collaborative Research: NEBULA: A Future Internet That Supports Trustworthy Cloud Computing
NSF · $1.0M · 2010–2014
Frequent coauthors
- 88 shared
Robbert van Renesse
Cornell University
- 39 shared
Danny Dolev
- 25 shared
K Ostrowski
Bialystok University of Technology
- 20 shared
Werner Vogels
- 19 shared
Mark Hayden
Great Ormond Street Hospital
- 19 shared
Anne-Marie Kermarrec
- 17 shared
Ýmir Vigfússon
Emory University
- 16 shared
Hakim Weatherspoon
Cornell University
Labs
Research in distributed systems, cloud computing, and reliable computing.
Education
- 1986
Ph.D., Computer Science
Cornell University
- 1982
M.S., Computer Science
Cornell University
- 1979
B.S., Computer Science
University of California, Berkeley
Awards & honors
- IEEE Fellow
- ACM Fellow
- IEEE Tsutomo Kanai award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ken Birman
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup