
Guofei Gu
· Professor, Computer Science & Engineering, Eppright Professor in Engineering, Presidential Impact FellowVerifiedTexas A&M University · Computer Science & Engineering
Active 2003–2026
About
Guofei Gu is a Professor in the Department of Computer Science & Engineering at Texas A&M University, holding the Eppright Professor in Engineering title and serving as a Presidential Impact Fellow. He earned his Ph.D. in Computer Science from Georgia Institute of Technology in 2008, his M.S. in Computer Science from Fudan University in 2003, and his B.E. in Computer Science from Nanjing University of Posts and Telecommunications in 2000. His research interests encompass network and system security, including internet malware, botnet and APT detection, defense, and analysis, as well as software-defined programmable security in SDN, NFV, cloud, and edge environments. He also focuses on mobile and IoT security, AI security, web and social networking security, intrusion detection, and anomaly detection. Guofei Gu has received numerous awards and honors for his contributions, including the TEES Research Impact Award, Dean of Engineering Excellence Award, and the NSF CAREER Award, among others.
Research topics
- Computer Science
- Computer Security
- Computer network
- Operating system
- Distributed computing
Selected publications
Semantics Over Syntax: Uncovering Pre-Authentication 5G Baseband Vulnerabilities
arXiv (Cornell University) · 2026-04-05
preprintOpen accessModern 5G user equipment (UE) processes Radio Resource Control (RRC) configuration messages during early control-plane exchanges, before authentication and integrity protection are established. Prior work for testing 5G UEs has largely focused on constructing syntactically invalid inputs. In contrast, we show that syntactically valid but semantically inconsistent messages, which violate specification-level field constraints or cross-field dependencies, can drive baseband implementations into invalid states, triggering assertion failures or modem crashes. These findings reveal semantic inconsistencies in pre-authentication signaling as a critical yet underexplored attack surface in 5G UE implementations. To address this gap, we present Constraint-Guided Semantic Testing (ConSeT), a framework that systematically extracts specification-level constraints and leverages them to generate targeted semantic violations for testing 5G UEs. ConSeT decodes RRC messages into structured fields, derives schema-based rules, infers cross-field dependencies using a Large Language Model (LLM) in an evidence-bounded manner, and produces syntactically valid test cases that intentionally violate semantic constraints. We evaluate ConSeT on both commercial and open-source 5G UEs. On commercial smartphones, it uncovers 7 previously unknown vulnerabilities through responsible disclosure, including 3 high-severity CVEs, affecting 64 chipset models and over 542 commercially available smartphone models. On the open-source OAI UE, ConSeT additionally triggers 29 distinct crash sites.
A Security Analysis of the OpenClaw AI Agent Framework
arXiv (Cornell University) · 2026-03-29
preprintOpen accessSenior authorAI agent frameworks connecting large language model (LLM) reasoning to host execution surfaces -- shell, filesystem, containers, and messaging -- introduce security challenges structurally distinct from conventional software. We present a systematic taxonomy of 470 advisories filed against OpenClaw, an open-source AI agent runtime, organized by architectural layer and trust-violation type. Vulnerabilities cluster along two orthogonal axes: (1) the system axis, reflecting the architectural layer (exec policy, gateway, channel, sandbox, browser, plugin, agent/prompt); and (2) the attack axis, reflecting adversarial techniques (identity spoofing, policy bypass, cross-layer composition, prompt injection, supply-chain escalation). Patch-differential evidence yields three principal findings. First, three Moderate- or High-severity advisories in the Gateway and Node-Host subsystems compose into a complete unauthenticated remote code execution (RCE) path -- spanning delivery, exploitation, and command-and-control -- from an LLM tool call to the host process. Second, the exec allowlist, the primary command-filtering mechanism, relies on a closed-world assumption that command identity is recoverable via lexical parsing. This is invalidated by shell line continuation, busybox multiplexing, and GNU option abbreviation. Third, a malicious skill distributed via the plugin channel executed a two-stage dropper within the LLM context, bypassing the exec pipeline and demonstrating that the skill distribution surface lacks runtime policy enforcement. The dominant structural weakness is per-layer trust enforcement rather than unified policy boundaries, making cross-layer attacks resilient to local remediation.
TraceScope: Interactive URL Triage via Decoupled Checklist Adjudication
arXiv (Cornell University) · 2026-04-23
articleOpen accessModern phishing campaigns increasingly evade snapshot-based URL classifiers using interaction gates (e.g., checkbox/slider challenges), delayed content rendering, and logo-less credential harvesters. This shifts URL triage from static classification toward an interactive forensics task: an analyst must actively navigate the page while isolating themselves from potential runtime exploits. We present TraceScope, a decoupled triage pipeline that operationalizes this workflow at scale. To prevent the observer effect and ensure safety, a sandboxed operator agent drives a real GUI browser guided by visual motivation to elicit page behavior, freezing the session into an immutable evidence bundle. Separately, an adjudicator agent circumvents LLM context limitations by querying evidence on demand to verify a MITRE ATT&CK checklist, and generates an audit-ready report with extracted indicators of compromise (IOCs) and a final verdict. Evaluated on 708 reachable URLs from existing dataset (241 verified phishing from PhishTank and 467 benign from Tranco-derived crawling), TraceScope achieves 0.94 precision and 0.78 recall, substantially improving recall over three prior visual/reference-based classifiers while producing reproducible, analyst-grade evidence suitable for review. More importantly, we manually curated a dataset of real-world phishing emails to evaluate our system in a practical setting. Our evaluation reveals that TraceScope demonstrates superior performance in a real-world scenario as well, successfully detecting sophisticated phishing attempts that current state-of-the-art defenses fail to identify.
Building a Security OS With Software Defined Infrastructure
UNC Libraries · 2026-04-03
articleOpen accessThe recent emergence of Software-Defined Infrastructure (SDI) offers a number of useful tools for managing, monitoring, containing, shepherding, and recovering computing units within an enterprise, cloud, or data center. As SDI utilities grow and the types of resources that can be abstracted into software-managed control and data planes increase, there is a pressing need for datacenter-level operating systems (OSes). Such a datacenter-level OS can further abstract and easily capture higher-level policy goals, and push them down to different types of hardware and software, ranging from application processes to storage and networking. This paper thus proposes S2OS, an SDI-defined Security OS, which offers an easy-to-use, programmable security model for monitoring and dynamically securing applications. We anticipate S2OS could unlock a wide range of unprecedented security opportunities, including fine-grained and dynamic security programmability at infrastructure scale, and information flow tracking across an entire infrastructure.
Semantics Over Syntax: Uncovering Pre-Authentication 5G Baseband Vulnerabilities
arXiv (Cornell University) · 2026-04-05
articleOpen accessModern 5G user equipment (UE) processes Radio Resource Control (RRC) configuration messages during early control-plane exchanges, before authentication and integrity protection are established. Prior work for testing 5G UEs has largely focused on constructing syntactically invalid inputs. In contrast, we show that syntactically valid but semantically inconsistent messages, which violate specification-level field constraints or cross-field dependencies, can drive baseband implementations into invalid states, triggering assertion failures or modem crashes. These findings reveal semantic inconsistencies in pre-authentication signaling as a critical yet underexplored attack surface in 5G UE implementations. To address this gap, we present Constraint-Guided Semantic Testing (ConSeT), a framework that systematically extracts specification-level constraints and leverages them to generate targeted semantic violations for testing 5G UEs. ConSeT decodes RRC messages into structured fields, derives schema-based rules, infers cross-field dependencies using a Large Language Model (LLM) in an evidence-bounded manner, and produces syntactically valid test cases that intentionally violate semantic constraints. We evaluate ConSeT on both commercial and open-source 5G UEs. On commercial smartphones, it uncovers 7 previously unknown vulnerabilities through responsible disclosure, including 3 high-severity CVEs, affecting 64 chipset models and over 542 commercially available smartphone models. On the open-source OAI UE, ConSeT additionally triggers 29 distinct crash sites.
TraceScope: Interactive URL Triage via Decoupled Checklist Adjudication
arXiv (Cornell University) · 2026-04-23
preprintOpen accessModern phishing campaigns increasingly evade snapshot-based URL classifiers using interaction gates (e.g., checkbox/slider challenges), delayed content rendering, and logo-less credential harvesters. This shifts URL triage from static classification toward an interactive forensics task: an analyst must actively navigate the page while isolating themselves from potential runtime exploits. We present TraceScope, a decoupled triage pipeline that operationalizes this workflow at scale. To prevent the observer effect and ensure safety, a sandboxed operator agent drives a real GUI browser guided by visual motivation to elicit page behavior, freezing the session into an immutable evidence bundle. Separately, an adjudicator agent circumvents LLM context limitations by querying evidence on demand to verify a MITRE ATT&CK checklist, and generates an audit-ready report with extracted indicators of compromise (IOCs) and a final verdict. Evaluated on 708 reachable URLs from existing dataset (241 verified phishing from PhishTank and 467 benign from Tranco-derived crawling), TraceScope achieves 0.94 precision and 0.78 recall, substantially improving recall over three prior visual/reference-based classifiers while producing reproducible, analyst-grade evidence suitable for review. More importantly, we manually curated a dataset of real-world phishing emails to evaluate our system in a practical setting. Our evaluation reveals that TraceScope demonstrates superior performance in a real-world scenario as well, successfully detecting sophisticated phishing attempts that current state-of-the-art defenses fail to identify.
On the Security Risks of Memory Adaptation and Augmentation in Data-plane DoS Mitigation
2026-01-01
articleA Security Analysis of the OpenClaw AI Agent Framework
arXiv (Cornell University) · 2026-03-29
articleOpen accessSenior authorAI agent frameworks connecting large language model (LLM) reasoning to host execution surfaces--shell, filesystem, containers, and messaging--introduce security challenges structurally distinct from conventional software. We present a systematic taxonomy of 190 advisories filed against OpenClaw, an open-source AI agent runtime, organized by architectural layer and trust-violation type. Vulnerabilities cluster along two orthogonal axes: (1) the system axis, reflecting the architectural layer (exec policy, gateway, channel, sandbox, browser, plugin, agent/prompt); and (2) the attack axis, reflecting adversarial techniques (identity spoofing, policy bypass, cross-layer composition, prompt injection, supply-chain escalation). Patch-differential evidence yields three principal findings. First, three Moderate- or High-severity advisories in the Gateway and Node-Host subsystems compose into a complete unauthenticated remote code execution (RCE) path--spanning delivery, exploitation, and command-and-control--from an LLM tool call to the host process. Second, the exec allowlist, the primary command-filtering mechanism, relies on a closed-world assumption that command identity is recoverable via lexical parsing. This is invalidated by shell line continuation, busybox multiplexing, and GNU option abbreviation. Third, a malicious skill distributed via the plugin channel executed a two-stage dropper within the LLM context, bypassing the exec pipeline and demonstrating that the skill distribution surface lacks runtime policy enforcement. The dominant structural weakness is per-layer trust enforcement rather than unified policy boundaries, making cross-layer attacks resilient to local remediation.
Incentivizing Security Excellence in Cyber Liability Insurance
2025-06-30
articleSenior authorThis paper investigates current practices and inherent shortcomings in Cyber Liability Insurance (CLI), highlighting critical gaps due to inconsistent risk assessment methodologies and weak incentives for cybersecurity excellence. We employ a mixed-method research design that combines a quantitative pilot study of 209 businesses, direct policy-purchasing experiments, and structured interviews with 26 Chief Information Security Officers (CISO). The findings reveal significant premium variations—from $600 to nearly $7,000 for identical coverages—due primarily to inconsistent security assessments and policy complexities. Furthermore, empirical policy acquisitions demonstrated that multiple insurers approved coverage with minimal or no cybersecurity diligence, underscoring systemic issues in insurer incentive alignment. Based on these insights, we propose a structured framework advocating customized dynamic cyber liability policies with risk-based premiums, stan-dardized assessments, and incentives aligned with proactive cybersecurity practices. This framework promotes transparency, incentivizes better security practices, and outlines clear directions for future research, contributing toward a more responsive cyber-liability insurance market.
LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights
ACM Computing Surveys · 2025-09-23 · 22 citations
reviewOpen accessLarge Language Models (LLMs) are emerging as transformative tools for software vulnerability detection. Traditional methods, including static and dynamic analysis, face limitations in efficiency, false-positive rates, and scalability with modern software complexity. Through code structure analysis, pattern identification, and repair suggestion generation, LLMs demonstrate a novel approach to vulnerability mitigation. This survey examines LLMs in vulnerability detection, analyzing problem formulation, model selection, application methodologies, datasets, and evaluation metrics. We investigate current research challenges, emphasizing cross-language detection, multimodal integration, and repository-level analysis. Based on our findings, we propose solutions addressing dataset scalability, model interpretability, and low-resource scenarios. Our contributions include: (1) a systematic analysis of LLM applications in vulnerability detection; (2) a unified framework examining patterns and variations across studies; and (3) identification of key challenges and research directions. This work advances the understanding of LLM-based vulnerability detection. The latest findings are maintained at https://github.com/OwenSanzas/LLM-For-Vulnerability-Detection
Recent grants
SaTC: CORE: Small: Adversarial Learning via Modeling Interpretation
NSF · $500k · 2018–2023
NSF · $250k · 2013–2018
NSF · $350k · 2016–2021
NSF · $750k · 2022–2024
CAREER: Coordination- and Correlation-based Botnet Defense
NSF · $432k · 2010–2017
Frequent coauthors
- 119 shared
Vinod Yegneswaran
- 118 shared
Phillip Porras
SRI International
- 101 shared
Seungsoo Lee
PricewaterhouseCoopers (South Korea)
- 100 shared
Jinwoo Kim
Kwangwoon University
- 98 shared
Jae Hyun Nam
University of Minnesota
- 98 shared
Seungwon Shin
Hongik University
- 98 shared
Minjae Seo
Gachon University
- 25 shared
Seungwon Shin
Korea Advanced Institute of Science and Technology
Awards & honors
- TEES Research Impact Award, TAMU, 2017-2018
- Dean of Engineering Excellence Award, TAMU, 2017-2018
- College of Engineering Charles H. Barclay Jr. '45 Faculty Fe…
- Finalist (top 10) for CSAW 2016 Best Applied Security Paper…
- Best Paper Award, The 35th IEEE International Conference on…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Guofei Gu
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup