Ken Birman

· N. Rama Rao Professor of Computer ScienceVerified

Cornell University · Computer Science

Active 1979–2025

h-index55

Citations14.9k

Papers34816 last 5y

Funding$1.4M

Faculty page Lab page

See your match with Ken Birman — sign in to PhdFit.Sign in

About

Professor Ken Birman is the N. Rama Rao Professor of Computer Science at Cornell University, with a distinguished career focused on cloud computing, distributed systems, and fault-tolerant software. His recent research centers on Cascade and Vortex, innovative systems designed to overcome data movement bottlenecks in modern AI and machine learning applications. Cascade achieves significant speedups—often 5x to 100x faster than other platforms—by running AI or ML code directly where data is stored, using scheduling and planned placement to collocate computation and data. This approach departs from traditional cloud computing models, addressing privacy concerns and bandwidth limitations by operating close to data sources such as hospitals, factories, or airplanes. Vortex extends Cascade to support retrieval-augmented generation (RAG) large language models and vector databases, although this work is still in progress. Cascade is flexible, supporting multiple storage abstractions including POSIX file systems, key-value stores, and pub-sub systems, and aims to be compatible with popular AI frameworks like PyTorch and TensorFlow.

Research topics

Computer Science
Distributed computing
Computer network
Operating system
Computer Security
Engineering
Database

Selected publications

Keep Your Friends Close: Leveraging Affinity Groups to Accelerate AI Inference Workflows
2025-08-28
articleSenior author
AI inference workflows are typically structured as a pipeline or graph of AI programs triggered by events. As events occur, the AIs perform inference or classification tasks under time pressure to respond or take some action. Standard techniques that reduce latency in other streaming settings (such as caching and optimization-driven scheduling) are of limited value because AI data access patterns (models, databases) change depending on the triggering event: a significant departure from traditional streaming. In this work, we propose a novel affinity grouping mechanism that makes it easier for developers to express application-specific data access correlations, enabling coordinated management of data objects in server clusters hosting streaming inference tasks. Our proposals are thus complementary to other approaches such as caching and scheduling. Experiments confirm the limitations of standard techniques, while showing that the proposed mechanism is able to maintain significantly lower latency as workload and scale-out increase, and yet requires only minor code changes.
Publisher DOI
Enhancing Transparency in Buyer-Driven Commodity Chains for Complex Products: Extending a Blockchain-Based Traceability Framework Towards the Circular Economy
Applied Sciences · 2025-07-24 · 6 citations
articleOpen access
This study extends our prior blockchain-based traceability framework, WEave, for application to a furniture supply chain scenario, while using the original multi-tier apparel supply chain as an anchoring use case. We integrate circular economy principles such as product reuse, recycling traceability, and full lifecycle transparency to bolster sustainability and resilience in supply chains by enabling data-driven accountability and tracking for closed-loop resource flows. The enhanced approach can track post-consumer returns, use of recycled materials, and second-life goods, all represented using a closed-loop supply chain topology. We describe the extended network architecture and smart contract logic needed to capture circular lifecycle events, while proposing new metrics for evaluating lifecycle traceability and reuse auditability. To validate the extended framework, we outline simulation experiments that incorporate circular flows and cross-industry scenarios. Results from these simulations indicate improved transparency on recycled content, audit trails for returned products, and acceptable performance overhead when scaling to different product domains. Finally, we offer conclusions and recommendations for implementing WEave functionality into real-world settings consistent with the goals of digital, resilient, and sustainable supply chains.
Publisher OA PDF DOI
Passing the Baton: High Throughput Distributed Disk-Based Vector Search with BatANN
arXiv (Cornell University) · 2025-12-10
preprintOpen accessSenior author
Vector search underpins modern information-retrieval systems, including retrieval-augmented generation (RAG) pipelines and search engines over unstructured text and images. As datasets scale to billions of vectors, disk-based vector search has emerged as a practical solution. However, looking to the future, we must anticipate datasets too large for any single server and throughput demands that exceed the limits of locally attached SSDs. We present BatANN, a distributed disk-based approximate nearest neighbor (ANN) system that retains the logarithmic search efficiency of a single global graph while achieving near-linear throughput scaling in the number of servers. Our core innovation is that when accessing a neighborhood which is stored on another machine, we send the full state of the query to the other machine to continue executing there for improved locality. On 1B-point datasets at 0.95 recall using 10 servers, BatANN achieves 3.5-5.59x of the scatter-gather baseline and 1.44-2.09x the throughput of DistributedANN, respectively, while maintaining mean latency below 3 ms. Moreover, we get these results on standard TCP. To our knowledge, BatANN is the first open-source distributed disk-based vector search system to operate over a single global graph.
Publisher OA PDF DOI
Diagnosing and Resolving Cloud Platform Instability with Multi-modal RAG LLMs
2025-03-30 · 1 citations
preprintOpen accessSenior author
Today's cloud-hosted applications and services are complex systems, and a performance or functional instability can have dozens or hundreds of potential root causes. Our hypothesis is that by combining the pattern matching capabilities of modern AI tools with a natural multi-modal RAG LLM interface, problem identification and resolution can be simplified. ARCA is a new multi-modal RAG LLM system that targets this domain. Step-wise evaluations show that ARCA outperforms state-of-the-art alternatives.
Publisher OA PDF DOI
Enhancing transparency in buyer-driven commodity chains for complex products: a blockchain-based traceability framework demonstrated through an apparel supply chain simulation
Procedia Computer Science · 2025-01-01 · 4 citations
articleOpen access
Buyer-driven commodity chains are characterized by commercial relationships between buyers and sellers that may obscure accountability due to complexity, thereby undermining sustainability efforts. Conventional methods to trace production, including ineffective human-led audits, risk reorienting global corporate governance towards the interests of private business and away from social benefit by limiting the role of objective data in the process. This study examines the relevant features of private, permissioned blockchain towards harnessing the transparency challenge by demonstrating the efficacy of our proposed framework against a simulation of a real-world multi-tier apparel supply chain. The simulation integrates a set of functional and operational requirements achieved through a combination of programmable smart contracts and underlying blockchain architecture. We then evaluate the framework both qualitatively and quantitatively before discussing the limitations of our work.
Publisher DOI
Accelerating Visual Anomaly Detection in Smart Manufacturing with RDMA-Enabled Data Infrastructure
Electronics · 2025-06-13
articleOpen accessSenior author
Industrial Artificial Intelligence (IAI) services are increasingly integral to smart manufacturing, especially in quality assurance tasks like defect detection. This paper presents the design, implementation, and evaluation of a video-based visual anomaly detection (VAD) system that runs at inspection stations on a smart shop floor. Our system processes real-time video streams from multiple cameras mounted around a conveyor belt to detect surface-level defects in mechanical components. To meet stringent latency and accuracy requirements, an edge-cloud architecture powered by AI accelerators and InfiniBand networking is adopted. The IAI service features key frame extraction modules, fine-tuned lightweight VAD models, and optimization techniques such as batching and microservice-level parallelism. The design choices of AI modules are carefully evaluated to balance effectiveness and efficiency. As a result, the system latency is optimized by 57%. In addition to the high-performance solution, a cost-efficient alternative is also suggested that is able to complete the task within the time frame.
Publisher DOI
Vortex: Hosting ML Inference and Knowledge Retrieval Services With Tight Latency and Throughput Requirements
ArXiv.org · 2025-11-03
preprintOpen accessSenior author
There is growing interest in deploying ML inference and knowledge retrieval as services that could support both interactive queries by end users and more demanding request flows that arise from AIs integrated into a end-user applications and deployed as agents. Our central premise is that these latter cases will bring service level latency objectives (SLOs). Existing ML serving platforms use batching to optimize for high throughput, exposing them to unpredictable tail latencies. Vortex enables an SLO-first approach. For identical tasks, Vortex's pipelines achieve significantly lower and more stable latencies than TorchServe and Ray Serve over a wide range of workloads, often enabling a given SLO target at more than twice the request rate. When RDMA is available, the Vortex advantage is even more significant.
Publisher OA PDF DOI
Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows
arXiv (Cornell University) · 2024-02-27
preprintOpen access
We consider ML query processing in distributed systems where GPU-enabled workers coordinate to execute complex queries: a computing style often seen in applications that interact with users in support of image processing and natural language processing. In such systems, coscheduling of GPU memory management and task placement represents a promising opportunity. We propose Compass, a novel framework that unifies these functions to reduce job latency while using resources efficiently, placing tasks where data dependencies will be satisfied, collocating tasks from the same job (when this will not overload the host or its GPU), and efficiently managing GPU memory. Comparison with other state of the art schedulers shows a significant reduction in completion times while requiring the same amount or even fewer resources. In one case, just half the servers were needed for processing the same workload.
Publisher OA PDF DOI
Verifying a C Implementation of Derecho’s Coordination Mechanism Using VST and Coq
Lecture notes in computer science · 2024-01-01 · 1 citations
book-chapter
Publisher DOI
Digital Twin-Driven Teat Localization and Shape Identification for Dairy Cow (Student Abstract)
Proceedings of the AAAI Conference on Artificial Intelligence · 2024-03-24 · 2 citations
articleOpen accessSenior author
Dairy owners invest heavily to keep their animals healthy. There is good reason to hope that technologies such as computer vision and artificial intelligence (AI) could reduce costs, yet obstacles arise when adapting these advanced tools to farming environments. In this work, we applied AI tools to dairy cow teat localization and teat shape classification, obtaining a model that achieves a mean average precision of 0.783. This digital twin-driven approach is intended as a first step towards automating and accelerating the detection and treatment of hyperkeratosis, mastitis, and other medical conditions that significantly burden the dairy industry.
Publisher OA PDF DOI

Recent grants

CiC: Science of Cloud-Scale Computing
NSF · $370k · 2011–2014
FIA: Collaborative Research: NEBULA: A Future Internet That Supports Trustworthy Cloud Computing
NSF · $1.0M · 2010–2014

Frequent coauthors

Robbert van Renesse
Cornell University
88 shared
Danny Dolev
39 shared
K Ostrowski
Bialystok University of Technology
25 shared
Werner Vogels
20 shared
Mark Hayden
Great Ormond Street Hospital
19 shared
Anne-Marie Kermarrec
19 shared
Ýmir Vigfússon
Emory University
17 shared
Hakim Weatherspoon
Cornell University
16 shared

Labs

Ken Birman's LabPI
Research in distributed systems, cloud computing, and reliable computing.

Education

Ph.D., Computer Science
Cornell University
1986
M.S., Computer Science
Cornell University
1982
B.S., Computer Science
University of California, Berkeley
1979

Awards & honors

IEEE Fellow
ACM Fellow
IEEE Tsutomo Kanai award

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Ken Birman

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you