
Christina Delimitrou
VerifiedMassachusetts Institute of Technology · Electrical Engineering & Computer Science
Active 2011–2026
About
Christina Delimitrou is a Professor in Communications and Technology and an Associate Professor in the Department of Electrical Engineering and Computer Science at MIT. Her research areas include computer architecture, theory of computation, and artificial intelligence and decision-making, focusing on developing systems that interact with the external world through perception, communication, and action, while also learning, making decisions, and adapting to changing environments. She is involved in designing systems that sense, process, and transmit energy and information, leveraging computational, theoretical, and experimental tools to create groundbreaking sensors, energy transducers, and physical substrates for computation. Her work addresses shared challenges facing humanity through innovative system development.
Research topics
- Computer Science
- Engineering
- Engineering management
Selected publications
Benchmarking Compound AI Applications for Hardware-Software Co-Design
arXiv (Cornell University) · 2026-03-04
preprintOpen accessCompound AI applications, composed from interactions between Large Language Models (LLMs), Machine Learning (ML) models, external tools and data sources are quickly becoming an integral workload in datacenters. Their diverse sub-components and use-cases present a large configuration-space across the deployment stack -- ranging from applications and serving software down to hardware -- each of which may influence the application performance, deployment cost, and/or resource consumption. Despite their rapid adoption, however, the systems community lacks a standardized benchmark for analyzing this complicated design-space and guiding in system design. In this work, we present our benchmarking suite used for cross-stack analysis of Compound AI applications. Using this, we derive key takeaways and design principles spanning several layers of the stack for hardware-software co-design to unlock higher resource-efficiency.
Benchmarking Compound AI Applications for Hardware-Software Co-Design
arXiv (Cornell University) · 2026-03-04
articleOpen accessCompound AI applications, composed from interactions between Large Language Models (LLMs), Machine Learning (ML) models, external tools and data sources are quickly becoming an integral workload in datacenters. Their diverse sub-components and use-cases present a large configuration-space across the deployment stack -- ranging from applications and serving software down to hardware -- each of which may influence the application performance, deployment cost, and/or resource consumption. Despite their rapid adoption, however, the systems community lacks a standardized benchmark for analyzing this complicated design-space and guiding in system design. In this work, we present our benchmarking suite used for cross-stack analysis of Compound AI applications. Using this, we derive key takeaways and design principles spanning several layers of the stack for hardware-software co-design to unlock higher resource-efficiency.
The Importance of Generalizability in Machine Learning for Systems
2025-03-01
articleSenior authorUsing machine learning (ML) to tackle computer systems tasks is gaining popularity. One of the shortcomings of such ML-based approaches is the inability of models to generalize to out-ofdistribution data i.e., data whose distribution is different than the training dataset. We showcase that this issue exists in cloud environments by analyzing various ML models used to improve resource balance in Google’s fleet. We discuss the trade-offs associated with different techniques used to detect out-of-distribution data. Finally, we propose and demonstrate the efficacy of using Bayesian models to detect the model’s confidence in its output when used to improve cloud server resource balance.
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
ArXiv.org · 2025-04-12
preprintOpen accessSenior authorTraining LLMs in distributed environments presents significant challenges due to the complexity of model execution, deployment systems, and the vast space of configurable strategies. Although various optimization techniques exist, achieving high efficiency in practice remains difficult. Accurate performance models that effectively characterize and predict a model's behavior are essential for guiding optimization efforts and system-level studies. We propose Lumos, a trace-driven performance modeling and estimation toolkit for large-scale LLM training, designed to accurately capture and predict the execution behaviors of modern LLMs. We evaluate Lumos on a production ML cluster with up to 512 NVIDIA H100 GPUs using various GPT-3 variants, demonstrating that it can replay execution time with an average error of just 3.3%, along with other runtime details, across different models and configurations. Additionally, we validate its ability to estimate performance for new setups from existing traces, facilitating efficient exploration of model and deployment configurations.
Fair, Practical, and Efficient Carbon Accounting for LLM Serving
ACM SIGMETRICS Performance Evaluation Review · 2025-08-26
articleWe propose a framework for evaluating carbon attribution methods for multi-tenant LLM serving. The framework formalizes the problem using three key components: (1) a set of requests with varying prompt and decode lengths, (2) the LLM inference runtime including batching algorithms, and (3) a carbon emission model accounting for both operational carbon (proportional to power consumption and carbon intensity) and embodied carbon from hardware manufacturing. Using the Shapley value as ground truth for fair attribution, we demonstrate why simple 'leave-one-out' attribution methods fail to satisfy efficiency properties. The framework evaluates attribution methods against four criteria: scalability (computational complexity), fairness (minimizing deviation from Shapley values), sample efficiency (algorithmic approximations for complex cases), and incentivization (encouraging users to optimize their usage patterns).
Carbon- and Precedence-Aware Scheduling for Data Processing Clusters
ArXiv.org · 2025-02-13
preprintOpen accessSenior authorAs large-scale data processing workloads continue to grow, their carbon footprint raises concerns. Prior research on carbon-aware schedulers has focused on shifting computation to align with availability of low-carbon energy, but these approaches assume that each task can be executed independently. In contrast, data processing jobs have precedence constraints (i.e., outputs of one task are inputs for another) that complicate decisions, since delaying an upstream ``bottleneck'' task to a low-carbon period will also block downstream tasks, impacting the entire job's completion time. In this paper, we show that carbon-aware scheduling for data processing benefits from knowledge of both time-varying carbon and precedence constraints. Our main contribution is $\texttt{PCAPS}$, a carbon-aware scheduler that interfaces with modern ML scheduling policies to explicitly consider the precedence-driven importance of each task in addition to carbon. To illustrate the gains due to fine-grained task information, we also study $\texttt{CAP}$, a wrapper for any carbon-agnostic scheduler that adapts the key provisioning ideas of $\texttt{PCAPS}$. Our schedulers enable a configurable priority between carbon reduction and job completion time, and we give analytical results characterizing the trade-off between the two. Furthermore, our Spark prototype on a 100-node Kubernetes cluster shows that a moderate configuration of $\texttt{PCAPS}$ reduces carbon footprint by up to 32.9% without significantly impacting the cluster's total efficiency.
Carbon- and Precedence-Aware Scheduling for Data Processing Clusters
2025-08-27 · 5 citations
articleOpen accessSenior authorAs large-scale data processing workloads continue to grow, their carbon footprint raises concerns. Prior research on carbon-aware schedulers has focused on shifting computation to align with the availability of low-carbon energy, but these approaches assume that each task can be executed independently. In contrast, data processing jobs have precedence constraints that complicate decisions, since delaying an upstream "bottleneck" task to a low-carbon period also blocks downstream tasks, impacting makespan. In this paper, we show that carbon-aware scheduling for data processing benefits from knowledge of both time-varying carbon and precedence constraints. Our main contribution is PCAPS, a carbon-aware scheduler that builds on state-of-the-art scoring or probability-based techniques - in doing so, it explicitly relates the structural importance of each task against the time-varying characteristics of carbon intensity. To illustrate gains due to fine-grained task-level scheduling, we also study CAP, a wrapper for any carbon-agnostic scheduler that generalizes the provisioning ideas of PCAPS. Both techniques allow a user-configurable priority between carbon and makespan, and we give basic analytic results to relate the trade-off between these objectives. Our prototype on a 100-node Kubernetes cluster shows that a moderate configuration of PCAPS reduces carbon footprint by up to 32.9% without significantly impacting total efficiency.
The Importance of Generalizability in Machine Learning for Systems
IEEE Computer Architecture Letters · 2024-01-01 · 11 citations
articleSenior authorUsing machine learning (ML) to tackle computer systems tasks is gaining popularity. One of the shortcomings of such ML-based approaches is the inability of models to generalize to out-of-distribution data i.e., data whose distribution is different than the training dataset. We showcase that this issue exists in cloud environments by analyzing various ML models used to improve resource balance in Google's fleet. We discuss the trade-offs associated with different techniques used to detect out-of-distribution data. Finally, we propose and demonstrate the efficacy of using Bayesian models to detect the model's confidence in its output when used to improve cloud server resource balance
Characterizing a Memory Allocator at Warehouse Scale
2024-04-24 · 10 citations
articleOpen accessMemory allocation constitutes a substantial component of warehouse-scale computation. Optimizing the memory allocator not only reduces the datacenter tax, but also improves application performance, leading to significant cost savings.
Tales of the Tail: Past and Future
IEEE Micro · 2024-06-19
article1st authorCorrespondingTail latency has been the defining performance metric for interactive services since the inception of cloud computing. Although various hardware and software techniques have been employed to improve tail latency for these applications, recent trends across the cloud system stack require revisiting them. Over the past few years, cloud hardware has become increasingly heterogeneous, and cloud software has been dominated by event-driven modular programming frameworks, as well as the proliferation of artificial intelligence. To guarantee tail latency in this new landscape, several system advances are required. In this paper, we first review what tail latency means for cloud services, the key innovations that improved it in the past, the trends that require revisiting them, as well as the innovations that will be required for tail latency constraints to be met in the next generation of warehouse-scale computers.
Recent grants
Frequent coauthors
- 46 shared
Christos Kozyrakis
- 19 shared
Yu Gan
Google (United States)
- 16 shared
Yanqi Zhang
Beijing Institute of Petrochemical Technology
- 11 shared
N.V. Lazarev
- 9 shared
Zhuangzhuang Zhou
- 9 shared
Dailun Cheng
Cornell University
- 9 shared
Zhiru Zhang
- 9 shared
Meghna Pancholi
Columbia University
Labs
EECS Communication LabPI
Awards & honors
- 2025-26 EECS Faculty Award Roundup
- Eleven MIT faculty receive Presidential Early Career Awards
- 2024-25 EECS Faculty Award Roundup
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Christina Delimitrou
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup