Mahesh Viswanathan
· ProfessorVerifiedUniversity of Illinois Urbana-Champaign · Computer Science
Active 1975–2026
About
Mahesh Viswanathan is a professor at the Siebel School of Computing and Data Science at the University of Illinois Urbana-Champaign. He holds a Ph.D. in Computer and Information Science from the University of Pennsylvania, obtained in 2000. His research areas include Programming Languages, Formal Methods, Software Engineering, and Theory and Algorithms. He has taught a variety of courses related to computer science, such as Discrete Structures, Intro to Computer Systems, Algorithms, Logic in Computer Science, Formal Models of Computation, and Applied Machine Learning. His work focuses on advancing understanding in these fields, contributing to the development of computing education and research in formal methods and algorithms.
Research topics
- Computer Science
- Computer Security
- Data Mining
- Mathematics
- Statistics
- Theoretical computer science
- Algorithm
Selected publications
Efficient Dynamic Algorithms to Predict Short Races
ArXiv.org · 2026-03-03
articleOpen accessSenior authorWe introduce and study the problem of detecting short races in an observed trace. Specifically, for a race type $R$, given a trace $σ$ and window size $w$, the task is to determine whether there exists an $R$-race $(e_1, e_2)$ in $σ$ such that the subtrace starting with $e_1$ and ending with $e_2$ contains at most $w$ events. We present a monitoring framework for short-race prediction and instantiate the framework for happens-before and sync-preserving races, yielding efficient detection algorithms. Our happens-before algorithm runs in the same time as FastTrack but uses space that scales with $\log w$ as opposed to $\log |σ|$. For sync-preserving races, our algorithm runs faster and consumes significantly less space than SyncP. Our experiments validate the effectiveness of these short-race detection algorithms: they run more efficiently, use less memory, and detect significantly more races under the same budget, offering a reasonable balance between resource usage and predictive power.
arXiv (Cornell University) · 2026-05-12
preprintOpen accessSenior author[Abridged] Production LLM deployments receive feedback from a non-random fraction of users: thumbs sit mostly in the tails of the satisfaction distribution, and a naive average over them can land 40-50 percentage points away from true system quality. We treat this as a topic- and sentiment- stratified selection-bias problem and propose a three-agent hierarchical Bayesian pipeline that does not require ground-truth labels on individual interactions. A Topic Clustering Agent partitions the stream via UMAP + HDBSCAN over text embeddings; a Bias Modeling Agent fits a two-stage hierarchical Beta-Binomial under NUTS, inferring per-topic selection rates $s_c$ and quality $q_c$ with partial pooling; a Synthesis Agent reweights $q_c$ by true topic prevalence $\hatπ_c = n_c/N$ to report a bias-corrected aggregate posterior $\bar Q = \sum_c \hatπ_c q_c$ with credible interval, plus drift signals for online recalibration. Validation uses UltraFeedback (N=10,232 retained interactions, $C=18$ clusters, $Q^\star=0.6249$) with simulated topic- and sentiment-dependent selection biases. We compare five Bayesian variants against Naive and IPW baselines. A mild prior on the feedback channel (typical positive-feedback rate and negative-to-positive ratio, both readable from any production dashboard without labels) keeps Hierarchical-Informed within 4-13 pp of $Q^\star$ as the bias ratio sweeps from 1:1 to 30:1, with 95% credible intervals covering $Q^\star$ in 50/50 random-seed replicates at $κ_{\max}=10$. Without channel-side priors, every weak-prior variant misses $Q^\star$ by 22-33 pp: the per-cluster sufficient statistics admit a one-parameter family of equally good fits, and the prior on the bias channel (not on latent quality) is what breaks the degeneracy.
ArXiv.org · 2026-05-12
articleOpen accessSenior author[Abridged] Production LLM deployments receive feedback from a non-random fraction of users: thumbs sit mostly in the tails of the satisfaction distribution, and a naive average over them can land 40-50 percentage points away from true system quality. We treat this as a topic- and sentiment- stratified selection-bias problem and propose a three-agent hierarchical Bayesian pipeline that does not require ground-truth labels on individual interactions. A Topic Clustering Agent partitions the stream via UMAP + HDBSCAN over text embeddings; a Bias Modeling Agent fits a two-stage hierarchical Beta-Binomial under NUTS, inferring per-topic selection rates $s_c$ and quality $q_c$ with partial pooling; a Synthesis Agent reweights $q_c$ by true topic prevalence $\hatπ_c = n_c/N$ to report a bias-corrected aggregate posterior $\bar Q = \sum_c \hatπ_c q_c$ with credible interval, plus drift signals for online recalibration. Validation uses UltraFeedback (N=10,232 retained interactions, $C=18$ clusters, $Q^\star=0.6249$) with simulated topic- and sentiment-dependent selection biases. We compare five Bayesian variants against Naive and IPW baselines. A mild prior on the feedback channel (typical positive-feedback rate and negative-to-positive ratio, both readable from any production dashboard without labels) keeps Hierarchical-Informed within 4-13 pp of $Q^\star$ as the bias ratio sweeps from 1:1 to 30:1, with 95% credible intervals covering $Q^\star$ in 50/50 random-seed replicates at $κ_{\max}=10$. Without channel-side priors, every weak-prior variant misses $Q^\star$ by 22-33 pp: the per-cluster sufficient statistics admit a one-parameter family of equally good fits, and the prior on the bias channel (not on latent quality) is what breaks the degeneracy.
Efficient Dynamic Algorithms to Predict Short Races
arXiv (Cornell University) · 2026-03-03
preprintOpen accessSenior authorWe introduce and study the problem of detecting short races in an observed trace. Specifically, for a race type $R$, given a trace $σ$ and window size $w$, the task is to determine whether there exists an $R$-race $(e_1, e_2)$ in $σ$ such that the subtrace starting with $e_1$ and ending with $e_2$ contains at most $w$ events. We present a monitoring framework for short-race prediction and instantiate the framework for happens-before and sync-preserving races, yielding efficient detection algorithms. Our happens-before algorithm runs in the same time as FastTrack but uses space that scales with $\log w$ as opposed to $\log |σ|$. For sync-preserving races, our algorithm runs faster and consumes significantly less space than SyncP. Our experiments validate the effectiveness of these short-race detection algorithms: they run more efficiently, use less memory, and detect significantly more races under the same budget, offering a reasonable balance between resource usage and predictive power.
Efficient Timestamping for Sampling-based Race Detection
ArXiv.org · 2025-04-09
preprintOpen accessSenior authorDynamic race detection based on the happens before (HB) partial order has now become the de facto approach to quickly identify data races in multi-threaded software. Most practical implementations for detecting these races use timestamps to infer causality between events and detect races based on these timestamps. Such an algorithm updates timestamps (stored in vector clocks) at every event in the execution, and is known to induce excessive overhead. Random sampling has emerged as a promising algorithmic paradigm to offset this overhead. It offers the promise of making sound race detection scalable. In this work we consider the task of designing an efficient sampling based race detector with low overhead for timestamping when the number of sampled events is much smaller than the total events in an execution. To solve this problem, we propose (1) a new notion of freshness timestamp, (2) a new data structure to store timestamps, and (3) an algorithm that uses a combination of them to reduce the cost of timestamping in sampling based race detection. Further, we prove that our algorithm is close to optimal -- the number of vector clock traversals is bounded by the number of sampled events and number of threads, and further, on any given dynamic execution, the cost of timestamping due to our algorithm is close to the amount of work any timestamping-based algorithm must perform on that execution, that is it is instance optimal. Our evaluation on real world benchmarks demonstrates the effectiveness of our proposed algorithm over prior timestamping algorithms that are agnostic to sampling.
Approximate Algorithms for Verifying Differential Privacy with Gaussian Distributions
2025-11-19
articleOpen accessSenior authorThe verification of differential privacy algorithms that employ Gaussian distributions is little understood. This paper tackles the challenge of verifying such programs by introducing a novel approach to approximating probability distributions of loop-free programs that sample from both discrete and continuous distributions with computable probability density functions, including Gaussian and Laplace. We establish that verifying $(ε,δ)$-differential privacy for these programs is \emph{almost decidable}, meaning the problem is decidable for all values of $δ$ except those in a finite set. Our verification algorithm is based on computing probabilities to any desired precision by combining integral approximations, and tail probability bounds. The proposed methods are implemented in the tool, DipApprox, using the FLINT library for high-precision integral computations, and incorporate optimizations to enhance scalability. We validate {\ourtool} on fundamental privacy-preserving algorithms, such as Gaussian variants of the Sparse Vector Technique and Noisy Max, demonstrating its effectiveness in both confirming privacy guarantees and detecting violations.
Checking δ-Satisfiability of Reals with Integrals
Proceedings of the ACM on Programming Languages · 2025-04-09 · 1 citations
articleOpen accessSenior authorMany synthesis and verification problems can be reduced to determining the truth of formulas over the real numbers. These formulas often involve constraints with integrals in them. To this end, we extend the framework of δ-decision procedures with techniques for handling integrals of user-specified real functions. We implement this decision procedure in the tool ∫dReal, which is built on top of dReal. We evaluate ∫dReal on a suite of problems that include formulas verifying the fairness of algorithms and the privacy and the utility of privacy mechanisms and formulas that synthesize parameters for the desired utility of privacy mechanisms. The performance of the tool in these experiments demonstrates the effectiveness of ∫dReal.
The Decision Problem for Regular First Order Theories
Proceedings of the ACM on Programming Languages · 2025-01-07
articleOpen accessSenior authorThe Entscheidungsproblem , or the classical decision problem, asks whether a given formula of first-order logic is satisfiable. In this work, we consider an extension of this problem to regular first-order theories , i.e., (infinite) regular sets of formulae. Building on the elegant classification of syntactic classes as decidable or undecidable for the classical decision problem, we show that some classes (specifically, the EPR and Gurevich classes), which are decidable in the classical setting, become undecidable for regular theories. On the other hand, for each of these classes, we identify a subclass that remains decidable in our setting, leaving a complete classification as a challenge for future work. Finally, we observe that our problem generalises prior work on automata-theoretic verification of uninterpreted programs and propose a semantic class of existential formulae for which the problem is decidable.
International Journal of Science and Research (IJSR) · 2025-09-25
articleOpen accessSenior authorDistributing encrypted data through cloud storage significantly enhances security in remote communication services. Symmetric-key encryption ensures that only users possessing valid keys can access and decrypt the information. Given the diversity of subscription models, efficient key management is essential for effective access control in broadcast services. The proposed solution utilizes a key tree structure (KTR) to manage key distribution, accommodating complex subscription preferences and user behaviors. This approach supports all subscription operations in wireless broadcast environments. Instead of maintaining separate key sets for each application, users need only a single key set for all their subscribed services. The KTR mechanism determines the minimum number of keys that must be updated to maintain broadcast security while minimizing rekeying overhead.
Efficient Timestamping for Sampling-Based Race Detection
Proceedings of the ACM on Programming Languages · 2025-06-10
articleOpen accessSenior authorDynamic race detection based on the happens before (HB) partial order has now become the de facto approach to quickly identify data races in multi-threaded software. Most practical implementations for detecting these races use timestamps to infer causality between events and detect races based on these timestamps. Such an algorithm updates timestamps (stored in vector clocks) at every event in the execution, and is known to induce excessive overhead. Random sampling has emerged as a promising algorithmic paradigm to offset this overhead. It offers the promise of making sound race detection scalable. In this work we consider the task of designing an efficient sampling based race detector with low overhead for timestamping when the number of sampled events is much smaller than the total events in an execution. To solve this problem, we propose (1) a new notion of freshness timestamp, (2) a new data structure to store timestamps, and (3) an algorithm that uses a combination of them to reduce the cost of timestamping in sampling based race detection. Further, we prove that our algorithm is close to optimal --- the number of vector clock traversals is bounded by the number of sampled events and number of threads, and further, on any given dynamic execution, the cost of timestamping due to our algorithm is close to the amount of work any timestamping-based algorithm must perform on that execution, that is it is instance optimal. Our evaluation on real world benchmarks demonstrates the effectiveness of our proposed algorithm over prior timestamping algorithms that are agnostic to sampling.
Recent grants
SHF: Small: New Algorithmic Paradigms in Dynamic Analysis of Multithreaded Software
NSF · $250k · 2020–2025
SHF: Small: Verifying Open Concurrent Real Time Systems
NSF · $476k · 2010–2014
Monitoring and Checking of Distributed Systems with respect to Formal Specifications
NSF · $270k · 2004–2008
CAREER: Next Generation Model Checking
NSF · $400k · 2005–2011
TWC: Medium: Collaborative: Automated Formal Analysis of Security Protocols with Private Coin Tosses
NSF · $595k · 2013–2018
Frequent coauthors
- 59 shared
Rohit Chadha
University of Missouri
- 36 shared
Umang Mathur
- 29 shared
A. Prasad Sistla
University of Illinois Chicago
- 26 shared
Pavithra Prabhakar
- 24 shared
Nima Roohi
Amazon (United States)
- 23 shared
Geir E. Dullerud
University of Illinois Urbana-Champaign
- 20 shared
Sayan Mitra
University of Illinois Urbana-Champaign
- 17 shared
Dileep Kini
Capital University
Labs
Siebel School of Computing and Data SciencePI
Education
- 2000
Ph.D., Computer Science
University of Illinois at Urbana-Champaign
- 1996
M.S., Computer Science
University of Illinois at Urbana-Champaign
- 1992
B.S., Electrical and Electronics Engineering
University of Madras
Awards & honors
- Celebration of Excellence 2021
- Celebration of Excellence 2022
- Celebration of Excellence 2023
- Celebration of Excellence 2024
- Celebration of Excellence 2025
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Mahesh Viswanathan
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup