Yuriy Brun

· Professor (on leave through Spring 2026)Verified

University of Massachusetts Amherst · Information Science and Technology

Active 1974–2026

h-index39

Citations7.7k

Papers17839 last 5y

Funding$3.4M1 active

Faculty page Lab page Website

See your match with Yuriy Brun — sign in to PhdFit.Sign in

About

Yuriy Brun is a professor at the Manning College of Information and Computer Sciences at the University of Massachusetts Amherst, currently on leave through Spring 2026. His research focuses on making it easier to build and deploy software systems while ensuring they adhere to desirable behavioral constraints. His work centers around automation and software behavior, developing techniques that automatically enforce behavior on systems, mine behavioral models to help developers understand system behavior, and repair systems to satisfy behavioral requirements. Brun works closely with developers and systems to understand challenges and automate prevention of system failures, with the long-term goal of creating self-adaptive systems that self-monitor, self-manage, and self-correct in dynamic environments. He also focuses on software fairness, developing automated methods to test for bias in software, especially in machine learning and data mining contexts, and to enforce fairness constraints during learning processes. His multidisciplinary research combines distributed systems, information theory, theoretical computer science, security, and machine learning, often involving open-source development and collaboration.

Research topics

Computer Science
Programming language
Theoretical computer science
Software engineering
Mathematics
Real-time computing
Distributed computing

Selected publications

Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification Replication Package
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-16
otherOpen access
Publisher DOI
Automatically Engineering Trusted Software: A Research Roadmap
ACM Transactions on Software Engineering and Methodology · 2026-03-02
article1st authorCorresponding
Recent advances in automated programming have the potential to reduce human involvement in the software engineering process, but this can lead to less trustworthy software. We envision a three-pronged approach to automating the engineering of trustworthy software that involves (1) eliciting requirements from users and automatically generating formal specifications encoding users’ intent, (2) automatically synthesizing source code conforming to those specifications, and (3) automatically synthesizing formal proofs to verify the correctness of the produced software. We describe this vision and the state of the art in each of these three areas, and the research challenges that must be overcome in each area and in their integration.
Publisher DOI
Your Model Is Unfair, Are You Even Aware? Inverse Relationship Between Comprehension and Trust in Explainability Visualizations of Biased ML Models
IEEE Transactions on Visualization and Computer Graphics · 2025-12-05 · 1 citations
articleSenior author
Systems relying on ML have become ubiquitous, but so has biased behavior within them. Research shows that bias significantly affects stakeholders' trust in systems and how they use them. Further, stakeholders of different backgrounds view and trust the same systems differently. Thus, how ML models' behavior is explained plays a key role in comprehension and trust. We survey explainability visualizations, creating a taxonomy of design characteristics. We conduct user studies to evaluate five state-of the-art visualization tools (LIME, SHAP, CP, Anchors, and ELI5) for model explainability, measuring how taxonomy characteristics affect comprehension, bias perception, and trust for non-expert ML users. Surprisingly, we find an inverse relationship between comprehension and trust: the better users understand the models, the less they trust them. We investigate the cause and find that this relationship is strongly mediated by bias perception: more comprehensible visualizations increase people's perception of bias, and increased bias perception reduces trust. We confirm this relationship is causal: Manipulating explainability visualizations to control comprehension, bias perception, and trust, we show that visualization design can significantly (p < 0.001) increase comprehension, increase perceived bias, and reduce trust. Conversely, reducing perceived model bias, either by improving model fairness or by adjusting visualization design, significantly increases trust even when comprehension remains high. Our work advances understanding of how comprehension affects trust and systematically investigates visualization's role in facilitating responsible ML applications.
Publisher DOI
Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification
2025-04-26 · 4 citations
article
Formal verification using proof assistants, such as Coq, enables the creation of high-quality software. However, the verification process requires significant expertise and manual effort to write proofs. Recent work has explored automating proof synthesis using machine learning and large language models (LLMs). This work has shown that identifying relevant premises, such as lemmas and definitions, can aid synthesis. We present Rango, a fully automated proof synthesis tool for Coq that automatically identifies relevant premises and also similar proofs from the current project and uses them during synthesis. Rango uses retrieval augmentation at every step of the proof to automatically determine which proofs and premises to include in the context of its fine-tuned LLM. In this way, Rango adapts to the project and to the evolving state of the proof. We create a new dataset, CoqStoq, of 2,226 open-source Coq projects and 196,929 theorems from GitHub, which includes both training data and a curated evaluation benchmark of well-maintained projects. On this benchmark, Rango synthesizes proofs for 32.0% of the theorems, which is 29% more theorems than the prior state-of-the-art tool Tactician. Our evaluation also shows that Rango adding relevant proofs to its context leads to a 47% increase in the number of theorems proven.
Publisher DOI
Your Model Is Unfair, Are You Even Aware? Inverse Relationship Between Comprehension and Trust in Explainability Visualizations of Biased ML Models
ArXiv.org · 2025-07-31
preprintOpen accessSenior author
Systems relying on ML have become ubiquitous, but so has biased behavior within them. Research shows that bias significantly affects stakeholders' trust in systems and how they use them. Further, stakeholders of different backgrounds view and trust the same systems differently. Thus, how ML models' behavior is explained plays a key role in comprehension and trust. We survey explainability visualizations, creating a taxonomy of design characteristics. We conduct user studies to evaluate five state-of-the-art visualization tools (LIME, SHAP, CP, Anchors, and ELI5) for model explainability, measuring how taxonomy characteristics affect comprehension, bias perception, and trust for non-expert ML users. Surprisingly, we find an inverse relationship between comprehension and trust: the better users understand the models, the less they trust them. We investigate the cause and find that this relationship is strongly mediated by bias perception: more comprehensible visualizations increase people's perception of bias, and increased bias perception reduces trust. We confirm this relationship is causal: Manipulating explainability visualizations to control comprehension, bias perception, and trust, we show that visualization design can significantly (p < 0.001) increase comprehension, increase perceived bias, and reduce trust. Conversely, reducing perceived model bias, either by improving model fairness or by adjusting visualization design, significantly increases trust even when comprehension remains high. Our work advances understanding of how comprehension affects trust and systematically investigates visualization's role in facilitating responsible ML applications.
Publisher OA PDF DOI
Bias, Accuracy, and Trust: Gender-Diverse Perspectives on Large Language Models
ArXiv.org · 2025-06-27
preprintOpen access
Large language models (LLMs) are becoming increasingly ubiquitous in our daily lives, but numerous concerns about bias in LLMs exist. This study examines how gender-diverse populations perceive bias, accuracy, and trustworthiness in LLMs, specifically ChatGPT. Through 25 in-depth interviews with non-binary/transgender, male, and female participants, we investigate how gendered and neutral prompts influence model responses and how users evaluate these responses. Our findings reveal that gendered prompts elicit more identity-specific responses, with non-binary participants particularly susceptible to condescending and stereotypical portrayals. Perceived accuracy was consistent across gender groups, with errors most noted in technical topics and creative tasks. Trustworthiness varied by gender, with men showing higher trust, especially in performance, and non-binary participants demonstrating higher performance-based trust. Additionally, participants suggested improving the LLMs by diversifying training data, ensuring equal depth in gendered responses, and incorporating clarifying questions. This research contributes to the CSCW/HCI field by highlighting the need for gender-diverse perspectives in LLM development in particular and AI in general, to foster more inclusive and trustworthy systems.
Publisher OA PDF DOI
QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning
2025-04-26 · 2 citations
articleSenior author
Formal verification is a promising method for producing reliable software, but the difficulty of manually writing verification proofs severely limits its utility in practice. Recent methods have automated some proof synthesis by guiding a search through the proof space using a theorem prover. Unfortunately, the theorem prover provides only the crudest estimate of progress, resulting in effectively undirected search. To address this problem, we create QEDCartographer, an automated proofsynthesis tool that combines supervised and reinforcement learning to more effectively explore the proof space. QEDCartographer incorporates the proofs' branching structure, enabling rewardfree search and overcoming the sparse reward problem inherent to formal verification. We evaluate QEDCartographer using the CoqGym benchmark of 68.5 K theorems from 124 open-source Coq projects. QEDCartographer fully automatically proves <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{2 1. 4 \%}$</tex> of the test-set theorems. Previous search-based proof-synthesis tools Tok, Tac, ASTactic, Passport, and Proverbot9001, which rely only on supervised learning, prove <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$9.6 \%, 9.8 \%, 10.9 \%$</tex>, 12.5 %, and 19.8 %, respectively. Diva, which combines 62 tools, proves 19.2 %. Comparing to the most effective prior tool, Proverbot9001, QEDCartographer produces 26 % shorter proofs 27 % faster, on average over the theorems both tools prove. Together, QEDCartographer and non-learning-based CoqHammer prove 31.8 % of the theorems, while CoqHammer alone proves <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{2 6. 6 \%}$</tex>. Our work demonstrates that reinforcement learning is a fruitful research direction for improving proof-synthesis tools' search mechanisms.
Publisher DOI
The Cost of Avoiding Backpropagation
ArXiv.org · 2025-06-27
preprintOpen access
Forward-mode automatic differentiation (FmAD) and zero-order (ZO) optimization have been proposed as memory-efficient alternatives to backpropagation (BP) for gradient computation, especially in low-resource settings. However, their practical benefits remain unclear due to two key gaps: a lack of comparison against memory-efficient BP variants, such as activation checkpointing, and a lack of a unified theoretical analysis. This work presents a comprehensive theoretical and empirical comparison of BP, FmAD, and ZO methods. Our theoretical analysis shows that while FmAD, and ZO can reduce memory usage, they incur significant costs in accuracy, convergence speed, and computation compared to BP with checkpointing. These drawbacks worsen with larger models or constrained perturbation budgets. Empirical experiments on large language and vision-language models show that BP with checkpointing outperforms FmAD and ZO variants, including those enhanced with variance reduction, achieving up to 31.1% higher accuracy, 34.8% faster convergence, and 3.8x fewer computations at comparable memory usage. Our results highlight fundamental limitations of FmAD and ZO, and reaffirm BP with checkpointing as the most effective strategy for model training under memory-constrained settings. Our code is available at https://github.com/Astuary/The_Cost_of_Avoiding_Backpropagation.
Publisher OA PDF DOI
Thinking Forward: Memory-Efficient Federated Finetuning of Language Models
arXiv (Cornell University) · 2024-05-24
preprintOpen access
Finetuning large language models (LLMs) in federated learning (FL) settings has become increasingly important as it allows resource-constrained devices to finetune a model using private data. However, finetuning LLMs using backpropagation requires excessive memory (especially from intermediate activations) for resource-constrained devices. While Forward-mode Auto-Differentiation (AD) can significantly reduce memory footprint from activations, we observe that directly applying it to LLM finetuning results in slow convergence and poor accuracy. In this paper, we introduce Spry, an FL algorithm that splits trainable weights of an LLM among participating clients, such that each client computes gradients using forward-mode AD that are closer estimations of the true gradients. Spry achieves a low memory footprint, high accuracy, and fast convergence. We formally prove that the global gradients in Spry are unbiased estimators of true global gradients for homogeneous data distributions across clients, while heterogeneity increases bias of the estimates. We also derive Spry's convergence rate, showing that the gradients decrease inversely proportional to the number of FL rounds, indicating the convergence up to the limits of heterogeneity. Empirically, Spry reduces the memory footprint during training by 1.4-7.1x in contrast to backpropagation, while reaching comparable accuracy, across a wide range of language tasks, models, and FL settings. Spry reduces the convergence time by 1.2-20.3x and achieves 5.2-13.5% higher accuracy against zero-order methods. When finetuning Llama2-7B with LoRA, compared to the peak memory consumption of 33.9GB of backpropagation, Spry only consumes 6.2GB of peak memory. For OPT13B, the reduction is from 76.5GB to 10.8GB. Spry makes feasible previously impossible FL deployments on commodity edge devices. Our source code is available at https://github.com/Astuary/Spry.
Publisher OA PDF DOI
Attack-Resilient Image Watermarking Using Stable Diffusion
arXiv (Cornell University) · 2024-01-08 · 3 citations
preprintOpen access
Watermarking images is critical for tracking image provenance and proving ownership. With the advent of generative models, such as stable diffusion, that can create fake but realistic images, watermarking has become particularly important to make human-created images reliably identifiable. Unfortunately, the very same stable diffusion technology can remove watermarks injected using existing methods. To address this problem, we present ZoDiac, which uses a pre-trained stable diffusion model to inject a watermark into the trainable latent space, resulting in watermarks that can be reliably detected in the latent vector even when attacked. We evaluate ZoDiac on three benchmarks, MS-COCO, DiffusionDB, and WikiArt, and find that ZoDiac is robust against state-of-the-art watermark attacks, with a watermark detection rate above 98% and a false positive rate below 6.4%, outperforming state-of-the-art watermarking methods. We hypothesize that the reciprocating denoising process in diffusion models may inherently enhance the robustness of the watermark when faced with strong attacks and validate the hypothesis. Our research demonstrates that stable diffusion is a promising approach to robust watermarking, able to withstand even stable-diffusion--based attack methods. ZoDiac is open-sourced and available at https://github.com/zhanglijun95/ZoDiac.
Publisher OA PDF DOI

Recent grants

SHF: Medium: Fairness in Software Systems
NSF · $1.1M · 2018–2024
TWC: Medium: Collaborative: Developer Crowdsourcing: Capturing, Understanding, and Addressing Security-related Blind Spots in APIs
NSF · $399k · 2015–2019
SHF: Medium: Collaborative Research: Semi and Fully Automated Program Repair and Synthesis via Semantic Code Search
NSF · $457k · 2016–2022
EAGER: Exploring the Feasibility of Software Testing Techniques to Evaluate Fairness Algorithms in Software Systems
NSF · $131k · 2017–2018
SHF: Small: Toward Fully Automated Formal Software Verification
NSF · $636k · 2022–2027

Frequent coauthors

Nenad Medvidović
31 shared
Michael D. Ernst
Seattle University
30 shared
Ivan Beschastnikh
University of British Columbia
23 shared
Manish Motwani
Georgia Institute of Technology
20 shared
Claire Le Goues
Carnegie Mellon University
11 shared
David Notkin
University of Washington
11 shared
Emily First
10 shared
Arvind Krishnamurthy
Stanford University
10 shared

Labs

LASER and PLASMA laboratoriesPI

Education

PhD, Computer Science Department
University of Southern California
2008
MEng, Electrical Engineering and Computer Science
Massachusetts Institute of Technology
2003
BS, Mathematics
Massachusetts Institute of Technology
2003
BS, Electrical Engineering and Computer Science
Massachusetts Institute of Technology
2003

Awards & honors

NSF CAREER Award (2015)
SEAMS Most Influential (test of time) Paper Award (2020)
IEEE TCSC Young Achiever in Scalable Computing Award (2013)
ACM SIGSOFT and SIGPLAN Distinguished Paper Awards (2011, 20…
Best Paper Award (2017)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Yuriy Brun

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you