Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Sarita V. Adve

Sarita V. Adve

· Richard T. Cheng ProfessorVerified

University of Illinois Urbana-Champaign · Computer Science

Active 1990–2026

h-index59
Citations13.9k
Papers27432 last 5y
Funding$3.4M
See your match with Sarita V. Adve — sign in to PhdFit.Sign in

About

Sarita V. Adve is a professor at the Siebel School of Computing and Data Science at the University of Illinois Urbana-Champaign. Her research areas include compilers, architecture, and parallel computing. She has taught courses such as Computer System Organization, Advanced Topics in Computer Architecture, and Immersive Computing Systems. Her work has contributed to the fields of high-performance computing and immersive technologies, and she has been recognized for her valuable contributions to these areas. She is involved in research related to immersive computing, XR systems, and high-performance computing, and has participated in collaborative projects and conferences highlighting her expertise in these domains.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Embedded system
  • Operating system
  • Distributed computing
  • Engineering
  • Software engineering
  • Computer engineering
  • Machine Learning
  • Real-time computing
  • Computer architecture
  • Reliability engineering
  • Programming language
  • Data science
  • Human–computer interaction
  • Algorithm

Selected publications

  • Serving Compound Inference Systems on Datacenter GPUs

    ArXiv.org · 2026-03-09

    articleOpen accessSenior author

    Applications in emerging domains such as XR are being built as compound inference systems, where multiple ML models are composed in the form of a task graph to service each request. Serving these compound systems efficiently raises two questions: how to apportion end-to-end latency and accuracy budgets between different tasks in a compound inference system, and how to allocate resources effectively for different models with varying resource requirements. We present JigsawServe, the first serving framework that jointly optimizes for latency, accuracy, and cost in terms of GPU resources by adaptively choosing model variants and performing fine-grained resource allocation by spatially partitioning the GPUs for each task of a compound inference system. Analytical evaluation of a system with a large number of GPUs shows that JigsawServe can increase the maximum serviceable demand (in requests per second) by 11.3x when compared to the closest prior work. Our empirical evaluation shows that for a large range of scenarios, JigsawServe consumes only 43.3% of the available GPU resources while meeting accuracy SLOs with less than 0.6% latency SLO violations. All of the features in JigsawServe contribute to this high efficiency -- sacrificing any one feature of accuracy scaling, GPU spatial partitioning, or task-graph-informed resource budgeting significantly reduces efficiency.

  • Serving Compound Inference Systems on Datacenter GPUs

    arXiv (Cornell University) · 2026-03-09

    preprintOpen accessSenior author

    Applications in emerging domains such as XR are being built as compound inference systems, where multiple ML models are composed in the form of a task graph to service each request. Serving these compound systems efficiently raises two questions: how to apportion end-to-end latency and accuracy budgets between different tasks in a compound inference system, and how to allocate resources effectively for different models with varying resource requirements. We present JigsawServe, the first serving framework that jointly optimizes for latency, accuracy, and cost in terms of GPU resources by adaptively choosing model variants and performing fine-grained resource allocation by spatially partitioning the GPUs for each task of a compound inference system. Analytical evaluation of a system with a large number of GPUs shows that JigsawServe can increase the maximum serviceable demand (in requests per second) by 11.3x when compared to the closest prior work. Our empirical evaluation shows that for a large range of scenarios, JigsawServe consumes only 43.3% of the available GPU resources while meeting accuracy SLOs with less than 0.6% latency SLO violations. All of the features in JigsawServe contribute to this high efficiency -- sacrificing any one feature of accuracy scaling, GPU spatial partitioning, or task-graph-informed resource budgeting significantly reduces efficiency.

  • Ada: A Distributed, Power-Aware, Real-Time Scene Provider for XR

    IEEE Transactions on Visualization and Computer Graphics · 2025-10-03

    articleOpen accessSenior author

    Real-time scene provisioning-reconstructing and delivering scene data to requesting XR applications during runtime-is central to enabling spatial computing in modern XR systems. However, existing solutions struggle to balance latency, power and scene fidelity under XR device constraints, and often rely on designs that are either closed, application-specific designs, or both. We present Ada, the first open distributed, power-aware, application-agnostic real-time scene provisioning system. Through computation offloading along with algorithmic and system innovations, Ada provides high-fidelity scenes with stable performance across all evaluated scene sizes and with low power consumption. To isolate the benefits of Ada's algorithmic and design innovations over the closest prior work [82], which is on-device and CPU-based, we configure a comparable on-device, CPU-based variant of Ada (AdaLocal-CPU). We show this variant achieves up to 6.8× lower scene request latency and higher scene fidelity compared to the prior work. Furthermore, Ada's final distributed GPU-accelerated implementation reduces latency by an additional 2×, highlighting the benefits of GPU acceleration and distributed computing. Additionally, Ada also lowers the incremental power cost of scene provisioning by 24% compared to the best on-device variant (AdaLocal-GPU). Finally, Ada flexibly adapts to diverse latency, power, scene fidelity, and network bandwidth requirements.

  • EPOCHS-1: A 12 nm Highly Heterogeneous Open-Source SoC With Distributed Coin-Based Power Management and Integrated Hybrid Voltage Regulation

    IEEE Journal of Solid-State Circuits · 2025-09-26

    article

    We present EPOCHS-1, a 12 nm, 64 mm<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> system-on-chip (SoC) with a high degree of heterogeneity. It features four Linux-SMP-capable RISC-V cores, 14 different types of accelerators, a distributed memory hierarchy, and various peripherals. EPOCHS-1’s memory hierarchy has the flexibility to support a diverse set of accelerators and can scale to support complex applications with 34% and 25% reduction in latency and energy, respectively. A subset of the SoC’s 23 power and 35 clock domains is regulated with a fully-decentralized power-allocation scheme and hybrid unified voltage and frequency scaling (HUVFS) that combines an in-package switched regulator with a per-tile low dropout (LDO). Combined, these techniques achieve up to a <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.57\times $</tex-math> </inline-formula> speedup versus a centralized power management baseline. Designed with an agile methodology, EPOCHS-1 is based on an open-source SoC architecture and features only open-source components, either third-party or newly designed, thus enabling design reuse for future research projects.

  • FastFlip: Compositional SDC Resiliency Analysis

    2025-02-22 · 2 citations

    articleOpen access

    To efficiently harden programs susceptible to Silent Data Corruptions (SDCs), developers need to invoke error injection analyses to find particularly vulnerable instructions and then selectively protect them using appropriate compiler-level SDC detection mechanisms. However, these error injection analyses are both expensive and monolithic: they must be run from scratch after even small changes to the code, such as optimizations or bug fixes. This high recurring cost keeps such software-directed resiliency analyses out of standard software engineering practices such as regression testing. We present FastFlip, the first approach tailored to seamlessly incorporate resiliency analysis within the iterative software development workflow. FastFlip combines empirical error injection and symbolic SDC propagation analyses to enable fast and compositional error injection analysis of evolving programs. When developers modify a program, FastFlip often has to re-analyze only the modified program sections, which can save a significant amount of analysis time. We evaluated FastFlip with five benchmark programs. In our experiments, for each benchmark, we analyzed the original version plus two modified versions. The compositional nature of FastFlip speeds up the analysis of the incrementally modified versions by 3.2× (geomean) and up to 17.2×. The results demonstrate that FastFlip can effectively select a set of instructions to protect against SDCs that minimizes the runtime protection cost while protecting against a developer-specified target fraction of all tested SDC-causing errors.

  • Is WTSN the missing piece for low latency in general-purpose Wi-Fi?

    2025-02-12

    articleOpen access

    The high latency and variability of current Wi-Fi networks severely impairs interactive networked applications like extended reality and cloud gaming, and even negatively affects web browsing. Recently, wireless Time-Sensitive Networking (WTSN) has emerged to offer powerful time synchronization and scheduling capabilities that can enable deterministic low latency. However, WTSN relies on precise advance knowledge of packet arrival times and tight integration between applications and a centralized network controller, limiting its scope to niche settings. Resolving WTSN's dependence on knowledge of packet arrival times is key to determining whether it can be a low latency enabler in general-purpose Wi-Fi. Thus, in this work, we ask: are the stringent assumptions of WTSN necessary to achieve the low latency benefits? Contrary to prevailing assumptions, we find that it is indeed possible to enable low tail and mean latency without prior knowledge of precise packet arrival even in the presence of high throughput background flows. We demonstrate this in simulation using a WTSN-enabled multipath design that partitions the network into two logical paths: one with very low latency and high reliability, and another offering high throughput at the expense of latency and reliability. Further, we describe how our design and WTSN can both complement the powerful OFDMA capabilities of Wi-Fi and present initial results for the same. We conclude by discussing deployability and promising future directions.

  • RemoteVIO: Offloading Head Tracking in an End-to-End XR System

    2025-03-26 · 8 citations

    articleOpen accessSenior author

    Power consumption, and the resulting limitation to computational load, is a first-order constraint in designing comfortable all-day-wear extended reality (XR) devices that can provide rich immersive experiences. This paper concerns reducing XR device power consumption by offloading head tracking, one of the top CPU and power consumers, to a remote server. We present RemoteVIO, the first open-source end-to-end XR system that offloads head tracking (visual inertial odometry or VIO) to a remote server. Our work distinguishes itself from past studies on computation offloading in XR by properly addressing two under-explored but critical aspects: 1) a comprehensive evaluation of user experience in a complete end-to-end XR system and 2) a quantification of the net power savings on real hardware.

  • XRgo: Design and Evaluation of Rendering Offload for Low-Power Extended Reality Devices

    2025-03-26 · 4 citations

    articleOpen accessSenior author

    Extended reality (XR) devices must render high-quality 3D graphics at low latency to deliver truly immersive experiences. However, XR devices are severely power- and resource-constrained, limiting the quality of on-device (local) rendering. Offloading rendering to a powerful remote machine can enhance graphics quality, but network latency can degrade the overall experience. To mask latency, XR systems reproject the rendered frame to compensate for user motion since the rendered pose. Traditional reprojection, known as TimeWarp, uses a lightweight mechanism to compensate for latency in rotational motion, but not translational motion. Compensating for translational motion is more expensive, but is increasingly important at higher latencies.

  • FastFlip: Compositional Error Injection Analysis

    arXiv (Cornell University) · 2024-03-20

    preprintOpen access

    Instruction-level error injection analyses aim to find instructions where errors often lead to unacceptable outcomes like Silent Data Corruptions (SDCs). These analyses require significant time, which is especially problematic if developers wish to regularly analyze software that evolves over time. We present FastFlip, a combination of empirical error injection and symbolic SDC propagation analyses that enables fast, compositional error injection analysis of evolving programs. FastFlip calculates how SDCs propagate across program sections and correctly accounts for unexpected side effects that can occur due to errors. Using FastFlip, we analyze five benchmarks, plus two modified versions of each benchmark. FastFlip speeds up the analysis of incrementally modified programs by $3.2\times$ (geomean). FastFlip selects a set of instructions to protect against SDCs that minimizes the runtime cost of protection while protecting against a developer-specified target fraction of all SDC-causing errors.

  • 14.5 A 12nm Linux-SMP-Capable RISC-V SoC with 14 Accelerator Types, Distributed Hardware Power Management and Flexible NoC-Based Data Orchestration

    2024-02-18 · 16 citations

    article

    Modern heterogeneous SoCs feature a mix of many hardware accelerators and general-purpose cores that run many applications in parallel. This brings challenges in managing how the accelerators access shared resources, e.g., the memory hierarchy, communication channels, and on-chip power. We address these challenges through flexible orchestration of data on a 74Tbps network-on-chip (NoC) for dynamic management of the resources under contention and a distributed hardware power management (DHPM) scheme. Developing and testing these ideas requires a comprehensive evaluation platform. Hence, we built an SoC that features 14 types of accelerators next to 4 RISC-V cores capable of running many simultaneous applications on top of a Linux-SMP operating system. Building such a platform was made possible in part by the reuse of open-source hardware (OSH) components [1]. However, even with a growing OSH community, the lack of available SoC designs keeps other researchers from performing evaluations of this kind; this is demonstrated by the unprecedented degree of heterogeneity and complexity of our chip compared to prior academic SoCs in the literature. To allow other academic and industrial research teams to pursue SoC design innovations without having to reinvent the wheel, we plan to publicly release the synthesizable design of our SoC with its software stack.

Recent grants

Frequent coauthors

  • Jean-Yves L’Excellent

    128 shared
  • Iain Duff

    Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique

    124 shared
  • Bora Uçar

    120 shared
  • David Padua

    118 shared
  • Alfredo Buttari

    Institut de Recherche en Informatique de Toulouse

    116 shared
  • Abdou Guermouche

    Numerical Algorithms Group (United Kingdom)

    70 shared
  • Patrick Amestoy

    66 shared
  • Christian Lengauer

    65 shared

Labs

  • Siebel School of Computing and Data SciencePI

Education

  • Ph.D., Computer Science

    Massachusetts Institute of Technology

    1990
  • M.S., Electrical Engineering and Computer Science

    Massachusetts Institute of Technology

    1985
  • B.S., Electrical Engineering

    University of Bombay

    1982

Awards & honors

  • Best Paper Award at ISMAR'25
  • IEEE Transactions of Visualization and Computer Graphics pub…
  • Honored for valuable contributions to the field of high-perf…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Sarita V. Adve

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup