Sarita V. Adve
· Richard T. Cheng ProfessorVerifiedUniversity of Illinois Urbana-Champaign · Computer Science
Active 1990–2026
About
Sarita V. Adve is a professor at the Siebel School of Computing and Data Science at the University of Illinois Urbana-Champaign. Her research areas include compilers, architecture, and parallel computing. She has taught courses such as Computer System Organization, Advanced Topics in Computer Architecture, and Immersive Computing Systems. Her work has contributed to the fields of high-performance computing and immersive technologies, and she has been recognized for her valuable contributions to these areas. She is involved in research related to immersive computing, XR systems, and high-performance computing, and has participated in collaborative projects and conferences highlighting her expertise in these domains.
Research topics
- Computer Science
- Artificial Intelligence
- Embedded system
- Operating system
- Distributed computing
- Engineering
- Software engineering
- Computer engineering
- Machine Learning
- Real-time computing
- Computer architecture
- Reliability engineering
- Programming language
- Data science
- Human–computer interaction
- Algorithm
Selected publications
Serving Compound Inference Systems on Datacenter GPUs
ArXiv.org · 2026-03-09
articleOpen accessSenior authorApplications in emerging domains such as XR are being built as compound inference systems, where multiple ML models are composed in the form of a task graph to service each request. Serving these compound systems efficiently raises two questions: how to apportion end-to-end latency and accuracy budgets between different tasks in a compound inference system, and how to allocate resources effectively for different models with varying resource requirements. We present JigsawServe, the first serving framework that jointly optimizes for latency, accuracy, and cost in terms of GPU resources by adaptively choosing model variants and performing fine-grained resource allocation by spatially partitioning the GPUs for each task of a compound inference system. Analytical evaluation of a system with a large number of GPUs shows that JigsawServe can increase the maximum serviceable demand (in requests per second) by 11.3x when compared to the closest prior work. Our empirical evaluation shows that for a large range of scenarios, JigsawServe consumes only 43.3% of the available GPU resources while meeting accuracy SLOs with less than 0.6% latency SLO violations. All of the features in JigsawServe contribute to this high efficiency -- sacrificing any one feature of accuracy scaling, GPU spatial partitioning, or task-graph-informed resource budgeting significantly reduces efficiency.
Serving Compound Inference Systems on Datacenter GPUs
arXiv (Cornell University) · 2026-03-09
preprintOpen accessSenior authorApplications in emerging domains such as XR are being built as compound inference systems, where multiple ML models are composed in the form of a task graph to service each request. Serving these compound systems efficiently raises two questions: how to apportion end-to-end latency and accuracy budgets between different tasks in a compound inference system, and how to allocate resources effectively for different models with varying resource requirements. We present JigsawServe, the first serving framework that jointly optimizes for latency, accuracy, and cost in terms of GPU resources by adaptively choosing model variants and performing fine-grained resource allocation by spatially partitioning the GPUs for each task of a compound inference system. Analytical evaluation of a system with a large number of GPUs shows that JigsawServe can increase the maximum serviceable demand (in requests per second) by 11.3x when compared to the closest prior work. Our empirical evaluation shows that for a large range of scenarios, JigsawServe consumes only 43.3% of the available GPU resources while meeting accuracy SLOs with less than 0.6% latency SLO violations. All of the features in JigsawServe contribute to this high efficiency -- sacrificing any one feature of accuracy scaling, GPU spatial partitioning, or task-graph-informed resource budgeting significantly reduces efficiency.
Ada: A Distributed, Power-Aware, Real-Time Scene Provider for XR
IEEE Transactions on Visualization and Computer Graphics · 2025-10-03
articleOpen accessSenior authorReal-time scene provisioning-reconstructing and delivering scene data to requesting XR applications during runtime-is central to enabling spatial computing in modern XR systems. However, existing solutions struggle to balance latency, power and scene fidelity under XR device constraints, and often rely on designs that are either closed, application-specific designs, or both. We present Ada, the first open distributed, power-aware, application-agnostic real-time scene provisioning system. Through computation offloading along with algorithmic and system innovations, Ada provides high-fidelity scenes with stable performance across all evaluated scene sizes and with low power consumption. To isolate the benefits of Ada's algorithmic and design innovations over the closest prior work [82], which is on-device and CPU-based, we configure a comparable on-device, CPU-based variant of Ada (AdaLocal-CPU). We show this variant achieves up to 6.8× lower scene request latency and higher scene fidelity compared to the prior work. Furthermore, Ada's final distributed GPU-accelerated implementation reduces latency by an additional 2×, highlighting the benefits of GPU acceleration and distributed computing. Additionally, Ada also lowers the incremental power cost of scene provisioning by 24% compared to the best on-device variant (AdaLocal-GPU). Finally, Ada flexibly adapts to diverse latency, power, scene fidelity, and network bandwidth requirements.
IEEE Journal of Solid-State Circuits · 2025-09-26
articleWe present EPOCHS-1, a 12 nm, 64 mm<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> system-on-chip (SoC) with a high degree of heterogeneity. It features four Linux-SMP-capable RISC-V cores, 14 different types of accelerators, a distributed memory hierarchy, and various peripherals. EPOCHS-1’s memory hierarchy has the flexibility to support a diverse set of accelerators and can scale to support complex applications with 34% and 25% reduction in latency and energy, respectively. A subset of the SoC’s 23 power and 35 clock domains is regulated with a fully-decentralized power-allocation scheme and hybrid unified voltage and frequency scaling (HUVFS) that combines an in-package switched regulator with a per-tile low dropout (LDO). Combined, these techniques achieve up to a <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.57\times $</tex-math> </inline-formula> speedup versus a centralized power management baseline. Designed with an agile methodology, EPOCHS-1 is based on an open-source SoC architecture and features only open-source components, either third-party or newly designed, thus enabling design reuse for future research projects.
FastFlip: Compositional SDC Resiliency Analysis
2025-02-22 · 2 citations
articleOpen accessTo efficiently harden programs susceptible to Silent Data Corruptions (SDCs), developers need to invoke error injection analyses to find particularly vulnerable instructions and then selectively protect them using appropriate compiler-level SDC detection mechanisms. However, these error injection analyses are both expensive and monolithic: they must be run from scratch after even small changes to the code, such as optimizations or bug fixes. This high recurring cost keeps such software-directed resiliency analyses out of standard software engineering practices such as regression testing. We present FastFlip, the first approach tailored to seamlessly incorporate resiliency analysis within the iterative software development workflow. FastFlip combines empirical error injection and symbolic SDC propagation analyses to enable fast and compositional error injection analysis of evolving programs. When developers modify a program, FastFlip often has to re-analyze only the modified program sections, which can save a significant amount of analysis time. We evaluated FastFlip with five benchmark programs. In our experiments, for each benchmark, we analyzed the original version plus two modified versions. The compositional nature of FastFlip speeds up the analysis of the incrementally modified versions by 3.2× (geomean) and up to 17.2×. The results demonstrate that FastFlip can effectively select a set of instructions to protect against SDCs that minimizes the runtime protection cost while protecting against a developer-specified target fraction of all tested SDC-causing errors.
Is WTSN the missing piece for low latency in general-purpose Wi-Fi?
2025-02-12
articleOpen accessThe high latency and variability of current Wi-Fi networks severely impairs interactive networked applications like extended reality and cloud gaming, and even negatively affects web browsing. Recently, wireless Time-Sensitive Networking (WTSN) has emerged to offer powerful time synchronization and scheduling capabilities that can enable deterministic low latency. However, WTSN relies on precise advance knowledge of packet arrival times and tight integration between applications and a centralized network controller, limiting its scope to niche settings. Resolving WTSN's dependence on knowledge of packet arrival times is key to determining whether it can be a low latency enabler in general-purpose Wi-Fi. Thus, in this work, we ask: are the stringent assumptions of WTSN necessary to achieve the low latency benefits? Contrary to prevailing assumptions, we find that it is indeed possible to enable low tail and mean latency without prior knowledge of precise packet arrival even in the presence of high throughput background flows. We demonstrate this in simulation using a WTSN-enabled multipath design that partitions the network into two logical paths: one with very low latency and high reliability, and another offering high throughput at the expense of latency and reliability. Further, we describe how our design and WTSN can both complement the powerful OFDMA capabilities of Wi-Fi and present initial results for the same. We conclude by discussing deployability and promising future directions.
RemoteVIO: Offloading Head Tracking in an End-to-End XR System
2025-03-26 · 8 citations
articleOpen accessSenior authorPower consumption, and the resulting limitation to computational load, is a first-order constraint in designing comfortable all-day-wear extended reality (XR) devices that can provide rich immersive experiences. This paper concerns reducing XR device power consumption by offloading head tracking, one of the top CPU and power consumers, to a remote server. We present RemoteVIO, the first open-source end-to-end XR system that offloads head tracking (visual inertial odometry or VIO) to a remote server. Our work distinguishes itself from past studies on computation offloading in XR by properly addressing two under-explored but critical aspects: 1) a comprehensive evaluation of user experience in a complete end-to-end XR system and 2) a quantification of the net power savings on real hardware.
XRgo: Design and Evaluation of Rendering Offload for Low-Power Extended Reality Devices
2025-03-26 · 4 citations
articleOpen accessSenior authorExtended reality (XR) devices must render high-quality 3D graphics at low latency to deliver truly immersive experiences. However, XR devices are severely power- and resource-constrained, limiting the quality of on-device (local) rendering. Offloading rendering to a powerful remote machine can enhance graphics quality, but network latency can degrade the overall experience. To mask latency, XR systems reproject the rendered frame to compensate for user motion since the rendered pose. Traditional reprojection, known as TimeWarp, uses a lightweight mechanism to compensate for latency in rotational motion, but not translational motion. Compensating for translational motion is more expensive, but is increasingly important at higher latencies.
FastFlip: Compositional Error Injection Analysis
arXiv (Cornell University) · 2024-03-20
preprintOpen accessInstruction-level error injection analyses aim to find instructions where errors often lead to unacceptable outcomes like Silent Data Corruptions (SDCs). These analyses require significant time, which is especially problematic if developers wish to regularly analyze software that evolves over time. We present FastFlip, a combination of empirical error injection and symbolic SDC propagation analyses that enables fast, compositional error injection analysis of evolving programs. FastFlip calculates how SDCs propagate across program sections and correctly accounts for unexpected side effects that can occur due to errors. Using FastFlip, we analyze five benchmarks, plus two modified versions of each benchmark. FastFlip speeds up the analysis of incrementally modified programs by $3.2\times$ (geomean). FastFlip selects a set of instructions to protect against SDCs that minimizes the runtime cost of protection while protecting against a developer-specified target fraction of all SDC-causing errors.
2024-02-18 · 16 citations
articleModern heterogeneous SoCs feature a mix of many hardware accelerators and general-purpose cores that run many applications in parallel. This brings challenges in managing how the accelerators access shared resources, e.g., the memory hierarchy, communication channels, and on-chip power. We address these challenges through flexible orchestration of data on a 74Tbps network-on-chip (NoC) for dynamic management of the resources under contention and a distributed hardware power management (DHPM) scheme. Developing and testing these ideas requires a comprehensive evaluation platform. Hence, we built an SoC that features 14 types of accelerators next to 4 RISC-V cores capable of running many simultaneous applications on top of a Linux-SMP operating system. Building such a platform was made possible in part by the reuse of open-source hardware (OSH) components [1]. However, even with a growing OSH community, the lack of available SoC designs keeps other researchers from performing evaluations of this kind; this is demonstrated by the unprecedented degree of heterogeneity and complexity of our chip compared to prior academic SoCs in the literature. To allow other academic and industrial research teams to pursue SoC design innovations without having to reinvent the wheel, we plan to publicly release the synthesizable design of our SoC with its software stack.
Recent grants
SHF: Medium: Software Engineering for Hardware Errors
NSF · $1.2M · 2020–2025
SHF: Small: Software-Driven Hardware Resiliency
NSF · $450k · 2013–2017
SHF: Small: DeNovo: Rethinking Hardware for Disciplined Parallelism
NSF · $500k · 2010–2014
CPA-CSA-T: Low Cost and Comprehensive Hardware Reliability
NSF · $500k · 2008–2013
NSF · $450k · 2016–2021
Frequent coauthors
- 128 shared
Jean-Yves L’Excellent
- 124 shared
Iain Duff
Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique
- 120 shared
Bora Uçar
- 118 shared
David Padua
- 116 shared
Alfredo Buttari
Institut de Recherche en Informatique de Toulouse
- 70 shared
Abdou Guermouche
Numerical Algorithms Group (United Kingdom)
- 66 shared
Patrick Amestoy
- 65 shared
Christian Lengauer
Labs
Siebel School of Computing and Data SciencePI
Education
- 1990
Ph.D., Computer Science
Massachusetts Institute of Technology
- 1985
M.S., Electrical Engineering and Computer Science
Massachusetts Institute of Technology
- 1982
B.S., Electrical Engineering
University of Bombay
Awards & honors
- Best Paper Award at ISMAR'25
- IEEE Transactions of Visualization and Computer Graphics pub…
- Honored for valuable contributions to the field of high-perf…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Sarita V. Adve
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup