
About
Henry Hoffmann is the Liew Family Chair for the Department of Computer Science at the University of Chicago. He received the President's Award for Early Career Scientists and Engineers (PECASE) in 2019 and was granted early tenure in 2018. Hoffmann is a member of the ASPLOS Hall of Fame and leads the Self-aware computing group (SEEC project), conducting research on adaptive techniques for power, energy, accuracy, and performance management in computing systems. His work focuses on building self-aware computing systems that understand high-level goals and automatically adapt their behavior to meet those goals optimally, integrating disciplines such as operating systems, computer architecture, control theory, and machine learning. Hoffmann has spent the last 18 years working on multicore architectures and system software in both academia and industry, including contributions to the Raw processor at MIT and Tilera Corporation, where his implementation of the BDTI Communications Benchmark on Tilera's 64-core TILE64 processor achieved the highest certified performance of any programmable processor. His research areas include self-aware and adaptive computing, computer systems, and computer architecture, with a focus on addressing the challenges of modern computer systems that must balance multiple, often competing, goals such as high performance and low energy consumption.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Computer Security
- Programming language
- Software engineering
- Real-time computing
- Data science
- Distributed computing
- Multimedia
- Simulation
- Human–computer interaction
- Operating system
- Computer network
Selected publications
WASL: Harmonizing Uncoordinated Adaptive Modules in Multi-Tenant Cloud Systems
2026-04-23
articleOpen accessSenior authorModern cloud applications increasingly rely on adaptive control modules, such as dynamic resource tuning or system reconfiguration, to meet strict quality-of-service (QoS) objectives. However, when multiple independently developed adaptation modules are colocated on a shared infrastructure, their uncoordinated behavior causes interference leading to QoS violations. Existing approaches require centralized control or inter-module communication, violating modularity and limiting adoption in multi-tenant environments.
Keeper: Automated Testing and Fixing of Machine Learning Software—RCR Report
ACM Transactions on Software Engineering and Methodology · 2025-06-05
articleThis artifact aims to provide source code, benchmark suite, results, and materials used in our study “Keeper: Automated Testing and Fixing of Machine Learning Software” [ 3 ]. We developed an automated testing and fixing tool Keeper and its IDE plugin for ML software. It automatically detects software defects and attempts to change how ML APIs are used to alleviate software misbehavior. This artifact provides guidelines to set up and execute Keeper and also guidelines to interpret our evaluation results. We hope this artifact can motivate and help future research to further tackle ML API misuses. All related data are available online.
A Deep Probabilistic Framework for Continuous Time Dynamic Graph Generation
Proceedings of the AAAI Conference on Artificial Intelligence · 2025-04-11
articleOpen accessSenior authorRecent advancements in graph representation learning have shifted attention towards dynamic graphs, which exhibit evolving topologies and features over time. The increased use of such graphs creates a paramount need for generative models suitable for applications such as data augmentation, obfuscation, and anomaly detection. However, there are few generative techniques that handle continuously changing temporal graph data; existing work largely relies on augmenting static graphs with additional temporal information to model dynamic interactions between nodes. In this work, we propose a fundamentally different approach: We instead directly model interactions as a joint probability of an edge forming between two nodes at a given time. This allows us to autoregressively generate new synthetic dynamic graphs in a largely assumption free, scalable, and inductive manner. We formalize this approach as DG-Gen, a generative framework for continuous time dynamic graphs, and demonstrate its effectiveness over five datasets. Our experiments demonstrate that DG-Gen not only generates higher fidelity graphs compared to traditional methods but also significantly advances link prediction tasks.
Quality Measures for Dynamic Graph Generative Models
ArXiv.org · 2025-03-03
preprintOpen accessSenior authorDeep generative models have recently achieved significant success in modeling graph data, including dynamic graphs, where topology and features evolve over time. However, unlike in vision and natural language domains, evaluating generative models for dynamic graphs is challenging due to the difficulty of visualizing their output, making quantitative metrics essential. In this work, we develop a new quality metric for evaluating generative models of dynamic graphs. Current metrics for dynamic graphs typically involve discretizing the continuous-evolution of graphs into static snapshots and then applying conventional graph similarity measures. This approach has several limitations: (a) it models temporally related events as i.i.d. samples, failing to capture the non-uniform evolution of dynamic graphs; (b) it lacks a unified measure that is sensitive to both features and topology; (c) it fails to provide a scalar metric, requiring multiple metrics without clear superiority; and (d) it requires explicitly instantiating each static snapshot, leading to impractical runtime demands that hinder evaluation at scale. We propose a novel metric based on the \textit{Johnson-Lindenstrauss} lemma, applying random projections directly to dynamic graph data. This results in an expressive, scalar, and application-agnostic measure of dynamic graph similarity that overcomes the limitations of traditional methods. We also provide a comprehensive empirical evaluation of metrics for continuous-time dynamic graphs, demonstrating the effectiveness of our approach compared to existing methods. Our implementation is available at https://github.com/ryienh/jl-metric.
2025-10-28
articleThis article presents a highly compact Coarse-Grained Reconfigurable Array (CGRA) specialized for processing Digital Signal Processing (DSP) and Machine Learning (ML) operations with an outstanding micro-architectural efficiency. The CGRA consists of high functionality Processing Elements (PEs) supported by strategically placed interconnections and bidirectional data buffers made of programmable cyclic registers. These novel features accelerate large length correlations, Fast Fourier Transforms and other DSP/ML related functions. It is a resource compact CGRA with very small dimensions, i.e., <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$4 \times 4$</tex> PEs and synthesized using a 22nm CMOS technology. The design of CGRA has an AMBA interface making it an industry standard coprocessor for a system-on-chip. The novelty presented in this paper is an accepted United States patent.
Lupe: Integrating the Top-down Approach with DNN Execution on Ultra-Low-Power Devices
2025-05-04 · 1 citations
articleOpen accessSenior authorExecuting deep neural networks (DNNs) on ultra-low-power (ULP) microcontrollers creates enormous opportunities for new intelligent edge applications. However, manually writing optimized DNN programs for ULP devices is time consuming and error prone due to the difficulty of managing on-device accelerators. Many prior works address this problem by creating special libraries that tailor common DNN building blocks for unique accelerators of ULP devices. This is a bottom-up approach, as developers build DNNs by assembling library calls. Unfortunately, the encapsulation overhead inherent in this approach greatly reduces accelerator utilization and overall performance. Instead, we advocate for a top-down approach. We present Lupe, a code generation framework, that converts high-level DNN algorithm descriptions to ULP-optimized code. Lupe provides top-down intermittent support that significantly reduces overhead while maintaining intermittent safety. We demonstrate Lupe's benefits on an MSP430 [54], achieving 12.36× and 2.22× average speedup over two prior works across a variety of DNN models in continuous power. Moreover, Lupe reduces the average intermittent runtime costs of prior works by 96.65% and 71.15%, respectively.
Position Paper: Voltage Change is not Energy Consumption
2025-05-06
articleOpen accessSenior authorEnergy-harvesting sensors utilize local, ambient energy resources to operate and thus eliminate the need for batteries. A key challenge for such systems is avoiding power failures during application execution. Energy-aware runtimes avoid such failures by reasoning about the task's energy consumption and the current energy available to the system. However, the energy consumption estimates profiled by prior work fail to account for incoming energy, producing incorrect energy consumption estimates which could lead to power failures and missed deadlines. This work analyzes the impact of incoming energy on the profiled energy consumption and argues that future energy-aware runtimes must be mindful of harvested energy when profiling a task's energy consumption.
SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding
ArXiv.org · 2025-06-12
preprintOpen accessLow-latency decoding for large language models (LLMs) is crucial for applications like chatbots and code assistants, yet generating long outputs remains slow in single-query settings. Prior work on speculative decoding (which combines a small draft model with a larger target model) and tensor parallelism has each accelerated decoding. However, conventional approaches fail to apply both simultaneously due to imbalanced compute requirements (between draft and target models), KV-cache inconsistencies, and communication overheads under small-batch tensor-parallelism. This paper introduces SwiftSpec, a system that targets ultra-low latency for LLM decoding. SwiftSpec redesigns the speculative decoding pipeline in an asynchronous and disaggregated manner, so that each component can be scaled flexibly and remove draft overhead from the critical path. To realize this design, SwiftSpec proposes parallel tree generation, tree-aware KV cache management, and fused, latency-optimized kernels to overcome the challenges listed above. Across 5 model families and 6 datasets, SwiftSpec achieves an average of 1.75x speedup over state-of-the-art speculative decoding systems and, as a highlight, serves Llama3-70B at 348 tokens/s on 8 Nvidia Hopper GPUs, making it the fastest known system for low-latency LLM serving at this scale.
WatchHAR: Real-time On-device Human Activity Recognition System for Smartwatches
2025-10-11 · 1 citations
preprintOpen accessDespite advances in practical and multimodal fine-grained Human Activity Recognition (HAR), a system that runs entirely on smartwatches in unconstrained environments remains elusive. We present WatchHAR, an audio and inertial-based HAR system that operates fully on smartwatches, addressing privacy and latency issues associated with external data processing. By optimizing each component of the pipeline, WatchHAR achieves compounding performance gains. We introduce a novel architecture that unifies sensor data preprocessing and inference into an end-to-end trainable module, achieving 5x faster processing while maintaining over 90% accuracy across more than 25 activity classes. WatchHAR outperforms state-of-the-art models for event detection and activity classification while running directly on the smartwatch, achieving 9.3 ms processing time for activity event detection and 11.8 ms for multimodal activity classification. This research advances on-device activity recognition, realizing smartwatches' potential as standalone, privacy-aware, and minimally-invasive continuous activity tracking devices.
KVMSR+UDWeave: Extreme-Scaling with Fine-grained Parallelism on the UpDown Graph Supercomputer
2025-11-07 · 1 citations
articleProgramming irregular graph applications is challenging on today’s scalable supercomputers. We describe a novel programming model, KVMSR+UDWeave, that supports extreme scaling by exposing fine-grained parallelism. By enabling the expression of maximum parallelism, it opens the door for extreme scaling, even on both small and large graph problems.
Recent grants
XPS: FULL: CCA: Collaborative Research: CASH: Cost-aware Adaptation of Software and Hardware
NSF · $300k · 2014–2019
CSR: Medium: Understanding and Automatically Adjusting Performance Sensitive Software Configurations
NSF · $1.1M · 2018–2023
NSF · $444k · 2018–2024
Frequent coauthors
- 42 shared
Anant Agarwal
The Ohio State University
- 35 shared
Martina Maggio
Robert Bosch (Germany)
- 22 shared
Marco D. Santambrogio
Politecnico di Milano
- 21 shared
Nikita Mishra
Manipal University Jaipur
- 21 shared
Shan Lu
Microsoft (United States)
- 19 shared
Connor Imes
- 17 shared
Huazhe Zhang
Zhejiang University
- 15 shared
John Lafferty
Yale University
Education
- 2011
Ph.D., Electrical Engineering and Computer Science
Massachusetts Institute of Technology (MIT)
Other
Massachusetts Institute of Technology (MIT)
Awards & honors
- President's Award for Early Career Scientists and Engineers…
- ASPLOS Hall of Fame
- DOE Early Career Award (2015)
- Most Influential Paper Award, SEAMS (2025)
- Test of Time Award Honorable Mention, IEEE Micro (2021)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Henry Hoffmann
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup