Sasa Misailovic

· Associate ProfessorVerified

University of Illinois Urbana-Champaign · Computer Science

Active 2007–2026

h-index28

Citations3.5k

Papers11748 last 5y

Funding$1.5M1 active

Faculty page

See your match with Sasa Misailovic — sign in to PhdFit.Sign in

About

Sasa Misailovic is an Associate Professor at the Siebel School of Computing and Data Science at the University of Illinois Urbana-Champaign. His research areas include compilers, architecture, and parallel computing, programming languages, formal methods, software engineering, security, and privacy. He teaches courses related to programming languages, compiler construction, formal software development methods, and advanced topics in programming systems. Misailovic has contributed to the field through research that intersects software engineering and control theory, and his work has been recognized with awards such as a 10-year most influential paper at the SEAMS 2025 Conference. His recent projects involve automating decision-making under uncertainty, studying NLP test code development, and addressing issues in Unicode subword tokenization.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Programming language
Operating system
Computer architecture
Computer engineering
Embedded system
Distributed computing
Software engineering
Engineering
Theoretical computer science

Selected publications

Nautilus: An Auto-Scheduling Tensor Compiler for Efficient Tiled GPU Kernels
arXiv (Cornell University) · 2026-04-16
preprintOpen accessSenior author
We present Nautilus, a novel tensor compiler that moves toward fully automated math-to-kernel optimization. Nautilus compiles a high-level algebraic specification of tensor operators into efficient tiled GPU kernels. Nautilus's successive lowering design allows high-level optimizations, expression rewrites, and tile optimizations to be jointly applied in a single end-to-end system. Nautilus presents a novel auto-scheduler that discovers sequences of high-level optimizations, while preserving the regular program structure needed by tile optimizers. Nautilus's auto-scheduler captures complex interactions and trade-offs in the high-level optimizations, including aggressive global transformations like advanced reduction fusion. Nautilus is the first end-to-end tensor compiler capable of starting from a math-like description of attention and automatically discovering FlashAttention-3-like kernels, offloading the entire burden of optimization from the programmer to the compiler. Across five transformer-based models and 150 evaluation configurations on NVIDIA GH200 and RTX 5090 GPUs, Nautilus achieves up to 23% higher throughput than state-of-the-art compilers on GH200 and up to 42% on RTX 5090, while matching or exceeding manually written cuDNN kernels on many long-sequence configurations.
Publisher DOI
Nautilus: An Auto-Scheduling Tensor Compiler for Efficient Tiled GPU Kernels
ArXiv.org · 2026-04-16
articleOpen accessSenior author
We present Nautilus, a novel tensor compiler that moves toward fully automated math-to-kernel optimization. Nautilus compiles a high-level algebraic specification of tensor operators into efficient tiled GPU kernels. Nautilus's successive lowering design allows high-level optimizations, expression rewrites, and tile optimizations to be jointly applied in a single end-to-end system. Nautilus presents a novel auto-scheduler that discovers sequences of high-level optimizations, while preserving the regular program structure needed by tile optimizers. Nautilus's auto-scheduler captures complex interactions and trade-offs in the high-level optimizations, including aggressive global transformations like advanced reduction fusion. Nautilus is the first end-to-end tensor compiler capable of starting from a math-like description of attention and automatically discovering FlashAttention-3-like kernels, offloading the entire burden of optimization from the programmer to the compiler. Across five transformer-based models and 150 evaluation configurations on NVIDIA GH200 and RTX 5090 GPUs, Nautilus achieves up to 23% higher throughput than state-of-the-art compilers on GH200 and up to 42% on RTX 5090, while matching or exceeding manually written cuDNN kernels on many long-sequence configurations.
Publisher OA PDF
RefineStat: Efficient Exploration for Probabilistic Program Synthesis
ArXiv.org · 2025-09-01
preprintOpen accessSenior author
Probabilistic programming offers a powerful framework for modeling uncertainty, yet statistical model discovery in this domain entails navigating an immense search space under strict domain-specific constraints. When small language models are tasked with generating probabilistic programs, they frequently produce outputs that suffer from both syntactic and semantic errors, such as flawed inference constructs. Motivated by probabilistic programmers' domain expertise and debugging strategies, we introduce RefineStat, a language model--driven framework that enforces semantic constraints ensuring synthesized programs contain valid distributions and well-formed parameters, and then applies diagnostic-aware refinement by resampling prior or likelihood components whenever reliability checks fail. We evaluate RefineStat on multiple probabilistic-programming code-generation tasks using smaller language models (SLMs) and find that it produces programs that are both syntactically sound and statistically reliable, often matching or surpassing those from closed-source large language models (e.g., OpenAI o3).
Publisher OA PDF DOI
AURA: Precise Abstract Interpretation of Probabilistic Programs with Interval Data Uncertainty
Lecture notes in computer science · 2025-10-06
book-chapterSenior author
Publisher DOI
Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs
ArXiv.org · 2025-10-09
preprintOpen accessSenior author
Operator fusion has become a key optimization for deep learning, which combines multiple deep learning operators to improve data reuse and reduce global memory transfers. However, existing tensor compilers struggle to fuse complex reduction computations involving loop-carried dependencies, such as attention mechanisms. This paper introduces Neptune, a tensor compiler for advanced operator fusion for sequences of reduction operators. Neptune presents a new approach for advanced operator fusion, which intentionally breaks some existing dependencies and compensates by constructing algebraic correction expressions that allow the kernel to produce the correct result. Applying Neptune's advanced operator fusion to a plain attention operator generates operators equivalent to FlashAttention and FlashDecoding. On ten attention-based benchmarks, Neptune, starting from a plain attention code and a high-level scheduling template, outperforms existing compilers like Triton, TVM, and FlexAttention, including Triton-based implementations of FlashAttention. Across four different GPU architectures from NVIDIA and AMD, Neptune-generated kernels have an average speedup of $1.35\times$ over the next best alternative, with up to $2.65\times$ speedup on Nvidia GPUs and up to $3.32\times$ on AMD GPUs, demonstrating its effectiveness for deep learning workloads.
Publisher OA PDF DOI
Safety and Trust in Artificial Intelligence with Abstract Interpretation
Foundations and Trends® in Programming Languages · 2025-06-26 · 3 citations
articleOpen access
Deep neural networks (DNNs) now dominate the AI landscape and have shown impressive performance in diverse application domains, including vision, natural language processing (NLP), and healthcare. However, both public and private entities have been increasingly expressing significant concern about the potential of state-of-the-art AI models to cause societal and financial harm. This lack of trust arises from their black-box construction and vulnerability against natural and adversarial noise. As a result, researchers have spent considerable time developing automated methods for building safe and trustworthy DNNs. Abstract interpretation has emerged as the most popular framework for efficiently analyzing realistic DNNs among the various approaches. However, due to fundamental differences in the computational structure (e.g., high nonlinearity) of DNNs compared to traditional programs, developing efficient DNN analyzers has required tackling significantly different research challenges than encountered for programs. In this monograph, we describe state-of-the-art approaches based on abstract interpretation for analyzing DNNs. These approaches include the design of new abstract domains, synthesis of novel abstract transformers, abstraction refinement, and incremental analysis. We will discuss how the analysis results can be used to: (i) formally check whether a trained DNN satisfies desired output and gradient-based safety properties, (ii) guide the model updates during training towards satisfying safety properties, and (iii) reliably explain and interpret the black-box workings of DNNs.
Publisher DOI
FastFlip: Compositional SDC Resiliency Analysis
2025-02-22 · 2 citations
articleOpen accessSenior author
To efficiently harden programs susceptible to Silent Data Corruptions (SDCs), developers need to invoke error injection analyses to find particularly vulnerable instructions and then selectively protect them using appropriate compiler-level SDC detection mechanisms. However, these error injection analyses are both expensive and monolithic: they must be run from scratch after even small changes to the code, such as optimizations or bug fixes. This high recurring cost keeps such software-directed resiliency analyses out of standard software engineering practices such as regression testing. We present FastFlip, the first approach tailored to seamlessly incorporate resiliency analysis within the iterative software development workflow. FastFlip combines empirical error injection and symbolic SDC propagation analyses to enable fast and compositional error injection analysis of evolving programs. When developers modify a program, FastFlip often has to re-analyze only the modified program sections, which can save a significant amount of analysis time. We evaluated FastFlip with five benchmark programs. In our experiments, for each benchmark, we analyzed the original version plus two modified versions. The compositional nature of FastFlip speeds up the analysis of the incrementally modified versions by 3.2× (geomean) and up to 17.2×. The results demonstrate that FastFlip can effectively select a set of instructions to protect against SDCs that minimizes the runtime protection cost while protecting against a developer-specified target fraction of all tested SDC-causing errors.
Publisher DOI
Enforcing Temporal Constraints for LLM Agents
ArXiv.org · 2025-12-25
articleOpen accessSenior author
LLM-based agents are deployed in safety-critical applications, yet current guardrail systems fail to prevent violations of temporal safety policies, requirements that govern the ordering and sequencing of agent actions. For instance, agents may access sensitive data before authenticating users or process refunds to unauthorized payment methods, violations that require reasoning about sequences of action rather than an individual action. Existing guardrails rely on imprecise natural language instructions or post-hoc monitoring, and provide no formal guarantees that agents will satisfy temporal constraints. We present Agent-C, a novel framework that provides run-time guarantees ensuring LLM agents adhere to formal temporal safety properties. Agent-C introduces a domain-specific language for expressing temporal properties (e.g., authenticate before accessing data), translates specifications to first-order logic, and uses SMT solving to detect non-compliant agent actions during token generation. When the LLM attempts to generate a non-compliant tool call, Agent-C leverages constrained generation techniques to ensure that every action generated by the LLM complies with the specification, and to generate a compliant alternative to a non-compliant agent action. We evaluate Agent-C across two real-world applications: retail customer service and airline ticket reservation system, and multiple language models (open and closed-source). Our results demonstrate that Agent-C achieves perfect safety (100% conformance, 0% harm), while improving task utility compared to state-of-the-art guardrails and unrestricted agents. On SoTA closed-source models, Agent-C improves conformance (77.4% to 100% for Claude Sonnet 4.5 and 83.7% to 100% for GPT-5), while simultaneously increasing utility (71.8% to 75.2% and 66.1% to 70.6%, respectively), representing a new SoTA frontier for reliable agentic reasoning.
Publisher OA PDF
Enforcing Temporal Constraints for LLM Agents
arXiv (Cornell University) · 2025-12-25
preprintOpen accessSenior author
LLM-based agents are deployed in safety-critical applications, yet current guardrail systems fail to prevent violations of temporal safety policies, requirements that govern the ordering and sequencing of agent actions. For instance, agents may access sensitive data before authenticating users or process refunds to unauthorized payment methods, violations that require reasoning about sequences of action rather than an individual action. Existing guardrails rely on imprecise natural language instructions or post-hoc monitoring, and provide no formal guarantees that agents will satisfy temporal constraints. We present Agent-C, a novel framework that provides run-time guarantees ensuring LLM agents adhere to formal temporal safety properties. Agent-C introduces a domain-specific language for expressing temporal properties (e.g., authenticate before accessing data), translates specifications to first-order logic, and uses SMT solving to detect non-compliant agent actions during token generation. When the LLM attempts to generate a non-compliant tool call, Agent-C leverages constrained generation techniques to ensure that every action generated by the LLM complies with the specification, and to generate a compliant alternative to a non-compliant agent action. We evaluate Agent-C across two real-world applications: retail customer service and airline ticket reservation system, and multiple language models (open and closed-source). Our results demonstrate that Agent-C achieves perfect safety (100% conformance, 0% harm), while improving task utility compared to state-of-the-art guardrails and unrestricted agents. On SoTA closed-source models, Agent-C improves conformance (77.4% to 100% for Claude Sonnet 4.5 and 83.7% to 100% for GPT-5), while simultaneously increasing utility (71.8% to 75.2% and 66.1% to 70.6%, respectively), representing a new SoTA frontier for reliable agentic reasoning.
Publisher DOI
DINGO: Constrained Inference for Diffusion LLMs
ArXiv.org · 2025-05-29
preprintOpen access
Diffusion LLMs have emerged as a promising alternative to conventional autoregressive LLMs, offering significant potential for improved runtime efficiency. However, existing diffusion models lack the ability to provably enforce user-specified formal constraints, such as regular expressions, which makes them unreliable for tasks that require structured outputs, such as fixed-schema JSON generation. Unlike autoregressive models that generate tokens sequentially, diffusion LLMs predict a block of tokens in parallel. This parallelism makes traditional constrained decoding algorithms, which are designed for sequential token prediction, ineffective at preserving the true output distribution. To address this limitation, we propose DINGO, a dynamic programming-based constrained decoding strategy that is both efficient and provably distribution-preserving. DINGO enables sampling of output strings with the highest probability under the model's predicted distribution, while strictly satisfying any user-specified regular expression. On standard symbolic math and JSON generation benchmarks, DINGO achieves up to a 68 percentage point improvement over unconstrained inference
Publisher OA PDF DOI

Recent grants

CAREER:Program Analyses for Improving Reliability of Probabilistic Software
NSF · $512k · 2019–2025
SHF: Medium: Collaborative Research: Testing in the Era of Approximation
NSF · $250k · 2017–2020
Collaborative Research: SHF: Medium: Natural Language Models with Execution Data for Software Testing
NSF · $300k · 2023–2027
SHF: Small: Probabilistic Programming and Statistical Verification for Safe Autonomy
NSF · $450k · 2020–2024

Frequent coauthors

Martin Rinard
Massachusetts Institute of Technology
33 shared
Michael Carbin
23 shared
Henry Hoffmann
University of Chicago
14 shared
Keyur Joshi
University of Illinois Urbana-Champaign
13 shared
Gagandeep Singh
Chaudhary Charan Singh Haryana Agricultural University
12 shared
Shubham Ugare
11 shared
Stelios Sidiroglou
10 shared
Jacob Laurel
University of Illinois Urbana-Champaign
10 shared

Labs

Siebel School of Computing and Data SciencePI

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Sasa Misailovic

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you