
Sergey F. Feizi
· ProfessorVerifiedUniversity of Maryland, College Park · Computer Science
Active 2005–2026
About
Soheil Feizi is an Associate Professor in the Computer Science department at the University of Maryland, College Park (UMD), and the director of the Reliable AI Lab. He is currently on leave from his academic position to serve as the Founder and CEO of RELAI, a startup focused on advancing AI reliability. Feizi holds a Ph.D. from MIT and completed postdoctoral research at Stanford University. He has an extensive publication record with over 100 peer-reviewed papers and has delivered more than 50 invited talks. His research centers on developing reliable and trustworthy Artificial Intelligence (AI), with a particular focus on robustness to natural and adversarial input variations, generalizability to unforeseen data domains, and interpretability of both test and training time predictions. He is interested in the reliability analysis of both predictive and generative AI models. Feizi has received numerous awards recognizing his contributions, including the ONR Young Investigator Award, NSF CAREER Award, ARO Early Career Program Award, and multiple best paper awards. His work has been supported by national agencies such as NSF, DARPA, ARL, ONR, DOE, and NIST, as well as industry partners including Meta, IBM, Amazon, Qualcomm, and Capital One. His research has been featured in major media outlets like the Washington Post, BBC, MIT Technology Review, Bloomberg, and The Wire. Demonstrating his commitment to AI safety and reliability, he testified before the U.S. House's Bipartisan Task Force on AI. Additionally, Feizi is dedicated to promoting diversity in STEM and has mentored students at various educational levels through multiple programs.
Research topics
- Natural Language Processing
- Machine Learning
- Computer Science
- Information Retrieval
- Artificial Intelligence
Selected publications
Failing to Explore: Language Models on Interactive Tasks
Open MIND · 2026-01-29
preprintSenior authorWe evaluate language models on their ability to explore interactive environments under a limited interaction budget. We introduce three parametric tasks with controllable exploration difficulty, spanning continuous and discrete environments. Across state-of-the-art models, we find systematic under-exploration and suboptimal solutions, with performance often significantly worse than simple explore--exploit heuristic baselines and scaling weakly as the budget increases. Finally, we study two lightweight interventions: splitting a fixed budget into parallel executions, which surprisingly improves performance despite a no-gain theoretical result for our tasks, and periodically summarizing the interaction history, which preserves key discoveries and further improves exploration.
Decomposition-Enhanced Training for Post-Hoc Attributions in Language Models
2026-01-01
articleOpen accessSriram Balasubramanian, Samyadeep Basu, Koustava Goswami, Ryan A. Rossi, Varun Manjunatha, Roshan Santhosh, Ruiyi Zhang, Soheil Feizi, Nedim Lipka. Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2026.
Failing to Explore: Language Models on Interactive Tasks
ArXiv.org · 2026-01-29
articleOpen accessSenior authorWe evaluate language models on their ability to explore interactive environments under a limited interaction budget. We introduce three parametric tasks with controllable exploration difficulty, spanning continuous and discrete environments. Across state-of-the-art models, we find systematic under-exploration and suboptimal solutions, with performance often significantly worse than simple explore--exploit heuristic baselines and scaling weakly as the budget increases. Finally, we study two lightweight interventions: splitting a fixed budget into parallel executions, which surprisingly improves performance despite a no-gain theoretical result for our tasks, and periodically summarizing the interaction history, which preserves key discoveries and further improves exploration.
Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry
arXiv (Cornell University) · 2026-05-12
articleOpen accessSenior authorAutonomous AI agents increasingly extend their capabilities through Agent Skills: modular filesystem packages whose SKILL.md files describe when and how agents should use them. While this design enables scalable, on-demand capability expansion, it also introduces a semantic supply-chain risk in which natural-language metadata and instructions can affect which skills are admitted, surfaced, selected, and loaded. We study SKILL.md - only attacks across three registry-facing stages of the Agent Skill lifecycle, using real ClawHub skills and realistic registry mechanisms. In Discovery, short textual triggers can manipulate embedding-based retrieval and improve adversarial skill visibility, achieving up to 86% pairwise win rate and 80% Top-10 placement. In Selection, description-only framing biases agents toward functionally equivalent adversarial variants, which are selected in 77.6% of paired trials on average. In Governance, semantic evasion strategies cause malicious skills to avoid a blocking verdict in 36.5%-100% of cases. Overall, our results show that SKILL.md is not passive documentation but operational text that shapes which third-party capabilities agents find, trust, and use.
SpecHop: Continuous Speculation for Accelerating Multi-Hop Retrieval Agents
ArXiv.org · 2026-05-21
articleOpen accessSenior authorLarge language models increasingly use external tools such as web search and document retrieval to solve information-intensive tasks. However, multi-hop tool use in complex tasks introduces substantial latency, since the model must repeatedly wait for tool observations before continuing. We study how to accelerate such trajectories without changing the final trajectory the model would have taken without acceleration, assuming access to faster but less reliable speculator tools. We develop a theoretical framework for lossless speculation in multi-hop tool-use settings, characterizing the optimal achievable latency gain. We propose SpecHop, a continuous speculation framework that maintains multiple speculative threads, verifies predicted observations asynchronously as target tool outputs arrive, commits correct branches, and rolls back incorrect ones. This preserves accuracy while reducing wall-clock latency. We show that SpecHop can approach oracle latency gains with enough active threads. Empirically, on retrieval-augmented multi-hop tasks, SpecHop closely matches theoretical predictions and reduces latency by up to 40\% in some settings. Code: https://github.com/mehrdadsaberi/spechop
SpecHop: Continuous Speculation for Accelerating Multi-Hop Retrieval Agents
arXiv (Cornell University) · 2026-05-21
preprintOpen accessSenior authorLarge language models increasingly use external tools such as web search and document retrieval to solve information-intensive tasks. However, multi-hop tool use in complex tasks introduces substantial latency, since the model must repeatedly wait for tool observations before continuing. We study how to accelerate such trajectories without changing the final trajectory the model would have taken without acceleration, assuming access to faster but less reliable speculator tools. We develop a theoretical framework for lossless speculation in multi-hop tool-use settings, characterizing the optimal achievable latency gain. We propose SpecHop, a continuous speculation framework that maintains multiple speculative threads, verifies predicted observations asynchronously as target tool outputs arrive, commits correct branches, and rolls back incorrect ones. This preserves accuracy while reducing wall-clock latency. We show that SpecHop can approach oracle latency gains with enough active threads. Empirically, on retrieval-augmented multi-hop tasks, SpecHop closely matches theoretical predictions and reduces latency by up to 40\% in some settings. Code: https://github.com/mehrdadsaberi/spechop
Early Stopping for Large Reasoning Models via Confidence Dynamics
arXiv (Cornell University) · 2026-04-06
articleOpen accessSenior authorLarge reasoning models rely on long chain-of-thought generation to solve complex problems, but extended reasoning often incurs substantial computational cost and can even degrade performance due to overthinking. A key challenge is determining when the model should stop reasoning and produce the final answer. In this work, we study the confidence of intermediate answers during reasoning and observe two characteristic behaviors: correct reasoning trajectories often reach high-confidence answers early, while incorrect rollouts tend to produce long, unproductive reasoning traces and exhibit less reliable confidence dynamics. Motivated by these observations, we propose CoDE-Stop (Confidence Dynamics Early Stop), an early stopping method that leverages the dynamics of intermediate answer confidence to decide when to terminate reasoning, requiring no additional training and easily integrating into existing models. We evaluate CoDE-Stop on diverse reasoning and science benchmarks across multiple models. Compared to prior early stopping methods, it achieves a more favorable accuracy-compute tradeoff and reduces total token usage by 25-50% compared to standard full-length reasoning. In addition, we provide analyses of confidence dynamics during reasoning, offering insights into how confidence changes in both correct and incorrect trajectories.
Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry
arXiv (Cornell University) · 2026-05-12
preprintOpen accessSenior authorAutonomous AI agents increasingly extend their capabilities through Agent Skills: modular filesystem packages whose SKILL.md files describe when and how agents should use them. While this design enables scalable, on-demand capability expansion, it also introduces a semantic supply-chain risk in which natural-language metadata and instructions can affect which skills are admitted, surfaced, selected, and loaded. We study SKILL.md - only attacks across three registry-facing stages of the Agent Skill lifecycle, using real ClawHub skills and realistic registry mechanisms. In Discovery, short textual triggers can manipulate embedding-based retrieval and improve adversarial skill visibility, achieving up to 86% pairwise win rate and 80% Top-10 placement. In Selection, description-only framing biases agents toward functionally equivalent adversarial variants, which are selected in 77.6% of paired trials on average. In Governance, semantic evasion strategies cause malicious skills to avoid a blocking verdict in 36.5%-100% of cases. Overall, our results show that SKILL.md is not passive documentation but operational text that shapes which third-party capabilities agents find, trust, and use.
Attacker’s Noise Can Manipulate Your Audio-based LLM in the Real World
2026-01-01
articleOpen accessThis paper investigates the real-world vulnerabilities of audio-based large language models (ALLMs), such as Qwen2-Audio.We first demonstrate that an adversary can craft stealthy audio perturbations to manipulate ALLMs into exhibiting specific targeted behaviors, such as eliciting responses to wake-keywords (e.g., "Hey Qwen"), or triggering harmful behaviors (e.g., "Change my calendar event").Subsequently, we show that playing adversarial background noise during user interaction with the ALLMs can significantly degrade the response quality.Crucially, our research illustrates the scalability of these attacks to real-world scenarios, impacting other innocent users when these adversarial noises are played through the air.Further, we discuss the transferability of the attack and potential defensive measures.
Early Stopping for Large Reasoning Models via Confidence Dynamics
arXiv (Cornell University) · 2026-04-06
preprintOpen accessSenior authorLarge reasoning models rely on long chain-of-thought generation to solve complex problems, but extended reasoning often incurs substantial computational cost and can even degrade performance due to overthinking. A key challenge is determining when the model should stop reasoning and produce the final answer. In this work, we study the confidence of intermediate answers during reasoning and observe two characteristic behaviors: correct reasoning trajectories often reach high-confidence answers early, while incorrect rollouts tend to produce long, unproductive reasoning traces and exhibit less reliable confidence dynamics. Motivated by these observations, we propose CoDE-Stop (Confidence Dynamics Early Stop), an early stopping method that leverages the dynamics of intermediate answer confidence to decide when to terminate reasoning, requiring no additional training and easily integrating into existing models. We evaluate CoDE-Stop on diverse reasoning and science benchmarks across multiple models. Compared to prior early stopping methods, it achieves a more favorable accuracy-compute tradeoff and reduces total token usage by 25-50% compared to standard full-length reasoning. In addition, we provide analyses of confidence dynamics during reasoning, offering insights into how confidence changes in both correct and incorrect trajectories.
Recent grants
CAREER: Information-Theoretic and Statistical Foundations of Generative Models
NSF · $590k · 2020–2026
Frequent coauthors
- 38 shared
Muriel Médard
Massachusetts Institute of Technology
- 32 shared
Sahil Singla
- 20 shared
Samyadeep Basu
- 20 shared
Alexander Levine
University of Maryland, College Park
- 20 shared
Yogesh Balaji
- 19 shared
Rama Chellappa
- 17 shared
Manolis Kellis
Massachusetts Institute of Technology
- 16 shared
Mazda Moayeri
University of Maryland, College Park
Labs
CLIP LabPI
Awards & honors
- PECASE (2025)
- ARO Early Career Award (2023)
- Amazon Research Award (2023)
- ONR Young Investigator (2022)
- NSF CAREER Award (2020)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Sergey F. Feizi
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup