Todd Austin
· S. Jack Hu Collegiate Professor of Computer Science and EngineeringProfessor, EECS – Computer Science and EngineeringProfessor (courtesy), EECS – Electrical and Computer EngineeringDirector, Computer Engineering LabUniversity of Michigan · Computer Science and Engineering
Active 1990–2025
About
Todd Austin is a professor of electrical engineering and computer science at the University of Michigan in Ann Arbor. His research interests include computer architecture, robust and secure system design, hardware and software verification, and performance analysis tools and techniques. He is currently the director of the Center for Future Architectures Research (C-FAR), a multi-university SRC/DARPA-funded center that seeks to develop technologies to scale the performance and efficiency of future computing systems. Prior to his academic career, Professor Austin was a Senior Computer Architect at Intel's Microcomputer Research Labs in Hillsboro, Oregon. He is credited with creating the SimpleScalar Tool Set, a widely used collection of computer architecture performance analysis tools. He co-authors the undergraduate textbook 'Structured Computer Architecture' with Andrew Tanenbaum. Additionally, he is the co-founder of SimpleScalar LLC and InTempo Design LLC. His notable recognitions include being a Sloan Research Fellow in 2002 and receiving the ACM Maurice Wilkes Award in 2007 for his innovative contributions in computer architecture, including the SimpleScalar Toolkit and the DIVA and Razor architectures.
Research topics
- Computer Science
- Computer Security
- Computer hardware
- Operating system
- Embedded system
- Computer engineering
- Programming language
- Distributed computing
- Computer architecture
- Algorithm
- Parallel computing
Selected publications
ZKProphet: Understanding Performance of Zero-Knowledge Proofs on GPUs
2025-10-12 · 1 citations
articleSenior authorZero-Knowledge Proofs (ZKP) are protocols which construct cryptographic proofs to demonstrate knowledge of a secret input in a computation without revealing any information about the secret. ZKPs enable novel applications in private and verifiable computing such as anonymized cryptocurrencies and blockchain scaling and have seen adoption in several real-world systems. Prior work has accelerated ZKPs on GPUs by leveraging the inherent parallelism in core computation kernels like Multi-Scalar Multiplication (MSM). However, we find that a systematic characterization of execution bottlenecks in ZKPs, as well as their scalability on modern GPU architectures, is missing in the literature.This paper presents ZKProphet, a comprehensive performance study of Zero-Knowledge Proofs on GPUs. Following massive speedups of MSM, we find that ZKPs are bottlenecked by kernels like Number-Theoretic Transform (NTT), as they account for up to 90% of the proof generation latency on GPUs when paired with optimized MSM implementations. Available NTT implementations under-utilize GPU compute resources and often do not employ architectural features like asynchronous compute and memory operations. We observe that the arithmetic operations underlying ZKPs execute exclusively on the GPU’s 32-bit integer pipeline and exhibit limited instruction-level parallelism due to data dependencies. Their performance is thus limited by the available integer compute units. While one way to scale the performance of ZKPs is adding more compute units, we discuss how runtime parameter tuning for optimizations like precomputed inputs and alternative data representations can extract additional speedup. With this work, we provide the ZKP community a roadmap to scale performance on GPUs and construct definitive GPU-accelerated ZKPs for their application requirements and available hardware resources.
DOME: Automated Validation of Data-Oblivious Program Execution
2025-05-05
articleSenior authorModern processors employ various micro-architectural optimizations to enhance application performance. While these optimizations significantly improve efficiency, they also introduce micro-architectural side channels that can leak sensitive information. Over the years, numerous hardware and software defenses have been developed to mitigate these vulnerabilities, including data-oblivious programming, randomized caches, and security domain isolation. Systems often combine these techniques to achieve robust security by eliminating observable secret-dependent behavior, a property known as data-obliviousness. However, verifying the effectiveness of these mitigation techniques in protecting security-critical applications, such as cryptographic libraries, remains a significant challenge. In this work, we introduce DOME, a security testing framework designed to detect secret-dependent behavior that could potentially leak sensitive information. DOME is both micro-architecture and software-agnostic, requiring only the ability to manipulate secret values for testing. DOME systematically refines randomly generated secret inputs by analyzing their corresponding PMU (Performance Monitoring Unit) events generated during execution on the system under test. Using unsupervised machine learning algorithms, it identifies pairs of differentiating inputs that produce distinct PMU-based execution traces, revealing secret-dependent behavior and violations of data-obliviousness. If no such inputs are found, DOME concludes that it cannot detect evidence of non-data-oblivious behavior, providing strong confidence in the deployed defenses. To validate DOME, we evaluated it on publicly available cryptographic libraries and data-oblivious benchmarks. Our results demonstrate that DOME is low-effort yet highly effective at identifying non-data-oblivious behaviors. It confirmed two previously discovered vulnerabilities and uncovered four new vulnerabilities in the latest version of Libgcrypt, impacting three cryptographic algorithms (RSA, DSA, and ElGamal), all of which we have responsibly disclosed to the developers. DOME offers a practical framework for assessing and improving the security of sensitive applications, providing system designers with a robust method to verify the effectiveness of side-channel mitigation techniques.
Zipper: Latency-Tolerant Optimizations for High-Performance Buses
2025-01-20 · 1 citations
articleSenior authorAs heterogeneous designs take over the world of hardware designs, the data bus plays a vital role in interconnecting hosts and accelerators. While past works have emphasized increasing communication bandwidth for data-hungry workloads, this work focuses on optimizing latency for latency-sensitive acceleration applications. We first study the pattern of various accelerator workloads and demonstrate that various optimization opportunities exist to reduce the communication latency overhead. To help developers exploit these opportunities, we introduce Zipper---a protocol optimization layer that reduces communication costs by enabling device and request level parallelism and exploiting data locality for existing bus standards. We applied Zipper to two domains and implemented the end-to-end system on a heterogeneous hardware platform with an integrated FPGA. Our physical experiments show that Zipper provides 8x speedup for one accelerator with 4.3% logic overhead and 1.5x speedup for another with 0.9% logic overhead.
ZKProphet: Understanding Performance of Zero-Knowledge Proofs on GPUs
ArXiv.org · 2025-09-17
preprintOpen accessSenior authorZero-Knowledge Proofs (ZKP) are protocols which construct cryptographic proofs to demonstrate knowledge of a secret input in a computation without revealing any information about the secret. ZKPs enable novel applications in private and verifiable computing such as anonymized cryptocurrencies and blockchain scaling and have seen adoption in several real-world systems. Prior work has accelerated ZKPs on GPUs by leveraging the inherent parallelism in core computation kernels like Multi-Scalar Multiplication (MSM). However, we find that a systematic characterization of execution bottlenecks in ZKPs, as well as their scalability on modern GPU architectures, is missing in the literature. This paper presents ZKProphet, a comprehensive performance study of Zero-Knowledge Proofs on GPUs. Following massive speedups of MSM, we find that ZKPs are bottlenecked by kernels like Number-Theoretic Transform (NTT), as they account for up to 90% of the proof generation latency on GPUs when paired with optimized MSM implementations. Available NTT implementations under-utilize GPU compute resources and often do not employ architectural features like asynchronous compute and memory operations. We observe that the arithmetic operations underlying ZKPs execute exclusively on the GPU's 32-bit integer pipeline and exhibit limited instruction-level parallelism due to data dependencies. Their performance is thus limited by the available integer compute units. While one way to scale the performance of ZKPs is adding more compute units, we discuss how runtime parameter tuning for optimizations like precomputed inputs and alternative data representations can extract additional speedup. With this work, we provide the ZKP community a roadmap to scale performance on GPUs and construct definitive GPU-accelerated ZKPs for their application requirements and available hardware resources.
EDL: Efficient Data-Oblivious Loops
2025-11-12
articleSenior authorData-oblivious programs are essential solutions for maintaining privacy in computation, crucial amidst rising concerns regarding privacy breaches and data misuse. These programs safeguard sensitive information in today's data-driven landscape. However, their limited practicality due to massive performance overheads poses an obstacle to their widespread adoption. Resolving these inefficiencies not only ensures the viability of data oblivious techniques but also fosters trust in privacy-preserving technologies. Previous work showed that one of the contributors to these significant overheads is loop conversion techniques. Principles of data-oblivious programming demand that decision-making should not depend on sensitive data in order to maintain obliviousness. Due to this requirement, data-oblivious programs are stripped of any heuristics that could terminate loops early, thereby forcing loops to always execute the worst-case number of times. In response to these challenges, this paper introduces an innovative approach, termed Efficient Data Oblivious Loop (EDL), aimed at reducing the impact of loops in data-oblivious program performance while maintaining complete obliviousness. EDL presents a technique that lowers the trip count of these loops by leveraging information derived from safely profiling the algorithms. The algorithm profiling is used to determine the most common trip count of a particular loop across many non-data-oblivious executions of the program, this information is in turn used to determine a new upper bound for the data-oblivious version of the program. This ensures most executions of the algorithm will produce correct results even with the new lowered trip count. For the minority of executions that fail to produce correct results due to a higher trip count than the new trip count, EDL can detect the inputs and prompt a rerun of these executions with a higher trip count. Experimental results demonstrate the effectiveness of this approach, with EDL yielding an average speedup of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$2.0 \times$</tex>, and up to <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$4.5 \times$</tex>. While this approach does not work for all algorithms, it resulted in a significant enhancement in the efficiency of many data-oblivious programs. This paper describes this technique in detail and provides a guideline to identify programs that could benefit from this optimization.
Octal: Efficient Automatic Data-Oblivious Program Transformations to Eliminate Side-Channel Leakage
2024-10-07
articleSenior authorDuring the 1970s, a curious class of programs called data-oblivious algorithms started to catch the attention of researchers because of the numerous applications they enabled due to their unique properties. In particular, data-oblivious algorithms execute independently from their input data. In the context of secure applications, data-oblivious algorithms prevent an attacker from learning information about the data an algorithm is processing by observing that algorithm’s execution. However, programmers often avoid these algorithms because they require a highly stylized form of programming, resulting in a potentially error-prone design and implementation process. In addition, data-oblivious programs are often less efficient than their native counterparts due to their inability to employ data-dependent heuristics. To address these potential problems and facilitate the adoption of data-oblivious programming, we present Octal, a tool that automates the design and implementation of data-oblivious programs. Octal streamlines the development of data-oblivious algorithms by automating data-oblivious transformations into the compiler. Moreover, Octal facilitates the development of efficient data-oblivious algorithms using a guided transformation mechanism that effectively navigates the algorithm design space.We evaluate Octal’s ability to transform native workloads from the VIP-Bench benchmark suite and show that Octal reduces the lines of programmer-written code by an average of 19.5%, compared to manual conversion. Using case studies, we demonstrate how Octal’s guided transformations can optimize an inefficient data-oblivious algorithm. Further, we use Channelizer, a side-channel validation tool, to show that Octal-generated code contains no program-level side channels. By automating data-oblivious transformation and providing guidance on program performance, Octal can aid programmers in developing more secure and efficient programs.
Channelizer: Explainable ML Inference for Validating Side-Channel Resistant Systems
2024-05-16 · 1 citations
articleSenior authorSecurity validation is an essential step in the design of secure systems. One important security goal is achieving freedom from digital side channels. Eliminating control, memory, and micro-architectural side channels is a vital task since these vulnerabilities have been utilized to breach many secure systems, including Intel SGX, AMD SEV, and ARM TrustZone. To address side channels, privacy-enhanced systems deploy defenses that mitigate system behaviors that could inadvertently disclose sensitive data. These defenses include the utilization of data-oblivious algorithms and encrypted computation. To help system designers establish confidence that their privacy-enhanced designs are free of digital side channels, we have developed an open-source tool called Channelizer. Channelizer executes simple binary decision programs that are designed to excite digital side channels. As these programs run on varied inputs, numerous hardware performance monitoring unit (PMU) events are collected and fed to an ML-based inference framework. Using cross-validation, a subset of binary decision program executions is used to train a multi-model ML framework. If the fully trained ML model is incapable of successfully predicting the binary decision outcome of other executions by only looking at system PMU events, we have generated a strong level of confidence that the system is eliminating digital side channels. A key benefit of our ML- based inference framework, compared to past work, is that Channelizer is both agnostic to the design it is analyzing and the side channels it is trying to discover. In our analyses, we show systems that attempt to eliminate side channels and fail, systems that don't even try and fail spectacularly, and systems that are successful at avoiding side-channel inference. We successfully fix compiler and language level problems we encountered, but we were unable to fix a disclosed CPU microarchitectural side channel that was detected. Additionally, we provide an ML explainability framework that helps designers discern what aspect of their design is leaking sensitive information.
Computer system with moving target defenses against vulnerability attacks
OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information) · 2024-04-19
articleOpen access1st authorCorrespondingA computer system includes an ensemble moving target defense architecture that protects the computer system against attack using one or more composable protection layers that change each churn cycle, thereby requiring an attacker to acquire information needed for an attack (e.g., code and pointers) and successfully deploy the attack, before the layers have changed state. Each layer may deploy a respective attack information asset protection providing multiple respective attack protections each churn cycle, wherein the respective attack information asset protections may differ.
2024-09-29
articleEmbedded systems are evolving in complexity, leading to the emergence of multiple threats. The co-design and execution of software on the embedded systems further exacerbate the attack surface, making them more vulnerable to sophisticated attacks. As embedded systems are used in critical areas, ensuring their security is crucial. In this special session paper, primarily four major topics regarding embedded systems’ security are discussed. Firstly, this paper initially explores timing channel analysis at a microarchitectural level in heterogeneous hardware to address the security challenges. It then delves into exploring software-based fuzzing techniques to detect vulnerabilities and enhance embedded system security. Additionally, the paper discusses strategies for improving security in IoT devices with a layered defense strategy known as Snowflake IoT. Finally, it examines approaches to securing large and complex monolithic systems. The challenges and opportunities for securing the embedded systems according to the scale and type of attacks.
Exploring the Efficiency of Data-Oblivious Programs
2023-04-01 · 3 citations
articleSenior authorData-oblivious programs have gained popularity due to their application in security, but are often dismissed because of anticipated performance loss. In order to better understand these performance concerns, this paper details the first performance characterization of data-oblivious programs. We study mechanical data-oblivious transformations applied to twenty workloads from the VIP-Bench benchmark suite and find that, overall, performance overheads vary widely, with a geomean slowdown of 7.4×. This variance can be attributed to whether or not the data-oblivious transformations affect the workload’s asymptotic complexity. Performance overheads are much lower for the fourteen workloads whose complexity is unaffected, at 1.9× geomean. Further, by reducing control hazards, we find that dataoblivious transformations often result in improved per-instruction performance (e.g., better branch and memory performance) and increase the number of instructions the processor can execute in parallel (e.g., IPC). Leveraging lessons from analyzing these overheads, we study four notably slow data-oblivious workloads and show how algorithmic changes can significantly improve performance–achieving an average 86.4× speedup over the mechanically produced baseline programs. While data-oblivious program execution often incurs overheads, the contributions of this paper show that these overheads can be overcome by compiler and algorithmic optimizations, bringing us closer to achieving efficient and widely-used data-oblivious programs.
Recent grants
CAREER: New Directions in Speculative Execution
NSF · $300k · 2001–2008
Collaborative Research: Application Specific Architecture Customization and Co-Exploration
NSF · $150k · 2003–2008
Frequent coauthors
- 48 shared
Valeria Bertacco
University of Michigan–Ann Arbor
- 16 shared
Serge Leef
Bellevue College
- 16 shared
Massimo Alioto
National University of Singapore
- 15 shared
Brad Calder
- 15 shared
Trevor Mudge
University of Michigan–Ann Arbor
- 15 shared
Salessawi Ferede Yitbarek
Addis Ababa University
- 14 shared
Kypros Constantinides
- 11 shared
Zelalem Birhanu Aweke
University of Michigan–Ann Arbor
Awards & honors
- Sloan Research Fellow (2002)
- ACM Maurice Wilkes Award (2007)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Todd Austin
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup