Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Yinzhi Cao

Yinzhi Cao

· Technical Director of the JHU Information Security Institute and Associate ProfessorVerified

Johns Hopkins University · Computer Science

Active 2002–2026

h-index23
Citations2.9k
Papers10362 last 5y
Funding$1.7M1 active
See your match with Yinzhi Cao — sign in to PhdFit.Sign in

About

Yinzhi Cao is an associate professor of computer science at Johns Hopkins University and serves as the technical director of the Johns Hopkins University Information Security Institute. His research focuses on the security and privacy of web, network, and mobile systems. He is a member of the Data Science and AI Institute and an affiliate of the Institute for Assured Autonomy. Cao’s current research projects include vulnerability analysis of web applications and security, privacy, and fairness analysis of machine learning systems. He has received several awards, including an NSF CAREER Award in 2021, the DARPA Young Faculty Award in 2022, and Amazon Research Awards in 2017 and 2022. Cao joined Johns Hopkins University in 2018 after serving as an assistant professor at Lehigh University. He earned his Bachelor of Engineering in electronic engineering from Tsinghua University in China in 2008 and completed his PhD in computer science at Northwestern University in 2014.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Computer Security
  • Machine Learning
  • Data Mining
  • Operating system
  • Theoretical computer science

Selected publications

  • Comment and Control: Hijacking Agentic Workflows via Context-Grounded Evolution

    ArXiv.org · 2026-05-11

    articleOpen accessSenior author

    Automation platforms such as GitHub Actions and n8n are increasingly adopting so-called agentic workflows, which integrate Large Language Model (LLM) agents for tasks such as code review and data synchronization. While bringing convenience for developers, this integration exposes a new risk: An adversary may control and craft certain inputs, such as GitHub issue comments, to manipulate the LLM agent for unwanted actions, such as credential exfiltration and arbitrary command execution. To our knowledge, no prior academic work has studied such a risk in agentic workflows. In this paper, we design the first detection and exploitation framework, called JAW, to hijack agentic workflows hosted on automation platforms via a novel approach called Context-Grounded Evolution. Our key idea is to evolve agentic workflow inputs under the contexts derived from hybrid program analysis for hijacking purposes. Specifically, JAW generates agentic workflow contexts through three analyses: (i) static path-feasibility analysis to identify feasible agent-invocation paths and the input constraints required to trigger them, (ii) dynamic prompt-provenance analysis to determine how that input is transformed and embedded into the LLM context, and (iii) capability analysis to identify the actions and restrictions available to the agent at runtime. Our evaluation of JAW on GitHub workflows and n8n templates showed that 4714 GitHub workflows and eight n8n templates can be successfully hijacked, for example, to leak user credentials. Our findings span 15 widely-used GitHub Actions, including official GitHub Actions for Claude Code, Gemini CLI, Qwen CLI, and Cursor CLI, and two official n8n nodes. We responsibly disclosed all findings to the affected vendors and received many acknowledgements, fixes, and bug bounties, notably from GitHub, Google, and Anthropic.

  • Privy: From Fine Print to Fair Practice in Privacy Rights Exercise

    arXiv (Cornell University) · 2026-05-03

    preprintOpen access

    Privacy regulations such as the CCPA and GDPR grant individuals rights over their personal data, yet it remains challenging for most users to exercise them in practice due to vague policy interpretation and unapproachable settings on web interfaces. We introduce Privy, an LLM-powered browser assistant that guides users through exercising their privacy rights on websites. Privy automatically analyzes a website's privacy policy and surfaces the specific rights available as action labels in a side panel. When a user selects a right, Privy provides step-by-step guidance and navigation, presenting direct links, generating email templates, or guiding form completion. Users can also request on-demand policy evidence and rights education to enhance their literacy. A technical evaluation across 14 websites shows that Privy extracts rights with high precision (0.979) and completes 96.3\% of privacy tasks in an average of 3.2 steps. A user study (N=15) also demonstrates the overall high-level of perceived helpfulness among users. Our findings suggest that comprehension and usability are not two separate challenges but a single interaction problem, and that effective privacy support requires integration of policy understanding and privacy actions. We offer design suggestions for future privacy assistants.

  • Comment and Control: Hijacking Agentic Workflows via Context-Grounded Evolution

    arXiv (Cornell University) · 2026-05-11

    preprintOpen accessSenior author

    Automation platforms such as GitHub Actions and n8n are increasingly adopting so-called agentic workflows, which integrate Large Language Model (LLM) agents for tasks such as code review and data synchronization. While bringing convenience for developers, this integration exposes a new risk: An adversary may control and craft certain inputs, such as GitHub issue comments, to manipulate the LLM agent for unwanted actions, such as credential exfiltration and arbitrary command execution. To our knowledge, no prior academic work has studied such a risk in agentic workflows. In this paper, we design the first detection and exploitation framework, called JAW, to hijack agentic workflows hosted on automation platforms via a novel approach called Context-Grounded Evolution. Our key idea is to evolve agentic workflow inputs under the contexts derived from hybrid program analysis for hijacking purposes. Specifically, JAW generates agentic workflow contexts through three analyses: (i) static path-feasibility analysis to identify feasible agent-invocation paths and the input constraints required to trigger them, (ii) dynamic prompt-provenance analysis to determine how that input is transformed and embedded into the LLM context, and (iii) capability analysis to identify the actions and restrictions available to the agent at runtime. Our evaluation of JAW on GitHub workflows and n8n templates showed that 4714 GitHub workflows and eight n8n templates can be successfully hijacked, for example, to leak user credentials. Our findings span 15 widely-used GitHub Actions, including official GitHub Actions for Claude Code, Gemini CLI, Qwen CLI, and Cursor CLI, and two official n8n nodes. We responsibly disclosed all findings to the affected vendors and received many acknowledgements, fixes, and bug bounties, notably from GitHub, Google, and Anthropic.

  • Privy: From Fine Print to Fair Practice in Privacy Rights Exercise

    ArXiv.org · 2026-05-03

    articleOpen access

    Privacy regulations such as the CCPA and GDPR grant individuals rights over their personal data, yet it remains challenging for most users to exercise them in practice due to vague policy interpretation and unapproachable settings on web interfaces. We introduce Privy, an LLM-powered browser assistant that guides users through exercising their privacy rights on websites. Privy automatically analyzes a website's privacy policy and surfaces the specific rights available as action labels in a side panel. When a user selects a right, Privy provides step-by-step guidance and navigation, presenting direct links, generating email templates, or guiding form completion. Users can also request on-demand policy evidence and rights education to enhance their literacy. A technical evaluation across 14 websites shows that Privy extracts rights with high precision (0.979) and completes 96.3\% of privacy tasks in an average of 3.2 steps. A user study (N=15) also demonstrates the overall high-level of perceived helpfulness among users. Our findings suggest that comprehension and usability are not two separate challenges but a single interaction problem, and that effective privacy support requires integration of policy understanding and privacy actions. We offer design suggestions for future privacy assistants.

  • Robust and Efficient AI-Based Attack Recovery in Autonomous Drones

    ArXiv.org · 2025-05-20

    preprintOpen accessSenior author

    We introduce an autonomous attack recovery architecture to add common sense reasoning to plan a recovery action after an attack is detected. We outline use-cases of our architecture using drones, and then discuss how to implement this architecture efficiently and securely in edge devices.

  • Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness

    ArXiv.org · 2025-06-06

    preprintOpen access

    Growing concerns over data privacy and security highlight the importance of machine unlearning--removing specific data influences from trained models without full retraining. Techniques like Membership Inference Attacks (MIAs) are widely used to externally assess successful unlearning. However, existing methods face two key limitations: (1) maximizing MIA effectiveness (e.g., via online attacks) requires prohibitive computational resources, often exceeding retraining costs; (2) MIAs, designed for binary inclusion tests, struggle to capture granular changes in approximate unlearning. To address these challenges, we propose the Interpolated Approximate Measurement (IAM), a framework natively designed for unlearning inference. IAM quantifies sample-level unlearning completeness by interpolating the model's generalization-fitting behavior gap on queried samples. IAM achieves strong performance in binary inclusion tests for exact unlearning and high correlation for approximate unlearning--scalable to LLMs using just one pre-trained shadow model. We theoretically analyze how IAM's scoring mechanism maintains performance efficiently. We then apply IAM to recent approximate unlearning algorithms, revealing general risks of both over-unlearning and under-unlearning, underscoring the need for stronger safeguards in approximate unlearning systems. The code is available at https://github.com/Happy2Git/Unlearning_Inference_IAM.

  • Robust and Efficient AI-Based Attack Recovery in Autonomous Drones

    2025-10-21

    book-chapterOpen accessSenior author

    Abstract We introduce an autonomous attack recovery architecture to add common sense reasoning to plan a recovery action after an attack is detected. We outline use-cases of our architecture using drones, and then discuss how to implement this architecture efficiently and securely in edge devices.

  • AUTHNET: Neural Network with Integrated Authentication Logic

    Frontiers in artificial intelligence and applications · 2025-10-21 · 1 citations

    book-chapterOpen access

    Model stealing, i.e., unauthorized access and exfiltration of deep learning models, has emerged as a significant security threat. The misuse and illegal replication of models pose major risks to financial assets and competitive advantage. Traditional protection methods, such as model watermarking, are passive and challenging to enforce, while active defenses often face limitations in terms of efficiency and the security required for widespread deployment. To this end, we propose a native authentication mechanism, called AUTHNET, which integrates authentication logic as part of the model without any additional structures. Our key insight is to reuse redundant neurons with low activation and embed authentication bits in an intermediate layer, called a gate layer. Then, AUTHNET fine-tunes the layers after the gate layer to embed authentication logic so that only inputs with secret key can trigger the correct logic of AUTHNET. It provides the last line of defense, i.e., even being exfiltrated, the model is not usable as the adversary cannot generate valid inputs without the key. We theoretically demonstrate the high sensitivity of AUTHNET to the secret key, which means that precise key provision is essential for achieving good performance of AUTHNET. AUTHNET is compatible with any convolutional neural network, where our extensive evaluations show that AUTHNET successfully achieves the goal in rejecting unauthenticated users (whose average accuracy drops to 22.03%) with a trivial accuracy decrease (1.18% on average) for legitimate users, and is robust against adaptive attacks, providing efficient and lightweight protection.

  • Autonomous Data Scientist (ADaS): A Practical Example of the Future Role of AI in Cybersecurity

    2025-09-15

    article

    The volume of network traffic and elusiveness of modern cyber threats challenge the ability of cyber analysts to identify anomalous behavior in networks. In this work, we propose an autonomous agent to enhance analysts’ detection capabilities. To this end, we designed an agent to analyze feature-rich network traffic by autonomously executing the Data Analytic Development Process: (1) data engineering–prepares datasets, (2) feature engineering–selects the most representative features, (3) model engineering–chooses the best algorithm-feature combination, and (4) analytic deployment–applies the optimized analytic to the dataset and identifies anomalous behavior. We refer to this agent as an Autonomous Data Scientist (ADaS), utilizing reinforcement learning as an orchestrator to determine the optimal combination of features and unsupervised clustering algorithm. ADaS operates without labeled training data, and results on popular public datasets (NB15, IoT-23, and KDD’99) demonstrate high detection and low false alarm rates, making it reliable and novel for anomaly detection.

  • Follow My Flow: Unveiling Client-Side Prototype Pollution Gadgets from One Million Real-World Websites

    2025-05-12

    articleSenior author

    Prototype pollution vulnerability often has further consequences—such as Cross-site Scripting (XSS) and cookie manipulation—that are achieved via so-called gadgets, i.e., code snippets that change the control- or data-flow of a victim program for malicious purposes. Prior works face challenges in finding prototype pollution gadgets for such consequences because the control- or data-flow change sometimes needs the injection of complex property values to replace existing undefined ones through prototype pollution, which may not be seen before or cannot be solved by existing constraint solvers. In this paper, we design a dynamic analysis framework, called Gala, to automatically detect client-side prototype pollution gadgets among real-world websites, and implement an open-source version of Gala. Our key insight is to borrow existing defined values on non-vulnerable websites to victim ones where such values are undefined, thus guiding the property injection to flow to the sinks in gadgets. Our evaluation of Gala against one-million websites reveals 133 zero-day gadgets that are not found by prior works. For example, one gadget was from Meta's software and another from the Vue framework. Both have acknowledged and fixed it, with Meta rewarding us a bug bounty and Vue assigning CVE-2024-6783. Our evaluation also shows that 23 websites with prototype pollution vulnerabilities—which do not have further consequences as reported by prior works—have consequences due to gadgets found by Gala. In addition to the Meta and Vue gadgets, we also responsibly disclosed all the zero-day gadgets and those newly-discovered prototype pollution consequences to their developers.

Recent grants

Frequent coauthors

Education

  • Ph.D., Computer Science

    University of Illinois at Urbana-Champaign

    2009
  • M.S., Computer Science

    University of Illinois at Urbana-Champaign

    2004
  • B.S., Computer Science

    University of Science and Technology of China

    2002

Awards & honors

  • NSF CAREER Award (2021)
  • DARPA Young Faculty Award (2022)
  • Amazon Research Awards (2022)
  • Amazon Research Awards (2017)
  • Distinguished Paper Award at IEEE Security and Privacy
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Yinzhi Cao

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup