Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Neil Gong

Neil Gong

· Associate Professor of Electrical and Computer EngineeringVerified

Duke University · Electrical and Computer Engineering

Active 2011–2026

h-index43
Citations6.4k
Papers230153 last 5y
Funding$2.3M3 active
See your match with Neil Gong — sign in to PhdFit.Sign in

About

Neil Zhenqiang Gong is a professor whose webpage lists his current and former group members, including Ph.D. students, postdocs, master students, and undergraduate students. His students have worked on topics such as safe and secure generative AI systems, privacy protection via adversarial examples, secure federated learning, and digital forensic artifacts in Android devices. Many of his former students have gone on to positions as research scientists at leading technology companies like Google Deepmind and Meta, or as tenure-track assistant professors at various universities. The webpage highlights the academic and professional trajectories of his students, reflecting his mentorship in areas related to security, privacy, and AI systems. However, the page text does not provide a direct narrative or detailed description of his own research focus, background, or key contributions.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Computer Security
  • Data Mining
  • Machine Learning
  • Distributed computing
  • Mathematics
  • Theoretical computer science
  • History
  • Archaeology
  • Econometrics

Selected publications

  • ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data

    2026-01-01

    preprintOpen accessSenior author

    Prompt injection attacks aim to contaminate the input data of an LLM to mislead it into completing an attackerchosen task instead of the intended task.In many applications and agents, the input data originates from multiple sources, with each source contributing a segment of the overall input.In these multi-source scenarios, an attacker may control only a subset of the sources and contaminate the corresponding segments, but typically does not know the order in which the segments are arranged within the input.Existing prompt injection attacks either assume that the entire input data comes from a single source under the attacker's control or ignore the uncertainty in the ordering of segments from different sources.As a result, their success is limited in domains involving multi-source data.In this work, we propose ObliInjection, the first prompt injection attack targeting LLM applications and agents with multisource input data.ObliInjection introduces two key technical innovations: the order-oblivious loss, which quantifies the likelihood that the LLM will complete the attacker-chosen task regardless of how the clean and contaminated segments are ordered; and the orderGCG algorithm, which is tailored to minimize the order-oblivious loss and optimize the contaminated segments.Comprehensive experiments across three datasets spanning diverse application domains and twelve LLMs demonstrate that ObliInjection is highly effective, even when only one out of 6-100 segments in the input data is contaminated.Our code and data are available at: https://github.com/ReachalWang/ObliInjection.

  • PromptLocate: Localizing Prompt Injection Attacks

    ArXiv.org · 2025-10-14

    preprintOpen accessSenior author

    Prompt injection attacks deceive a large language model into completing an attacker-specified task instead of its intended task by contaminating its input data with an injected prompt, which consists of injected instruction(s) and data. Localizing the injected prompt within contaminated data is crucial for post-attack forensic analysis and data recovery. Despite its growing importance, prompt injection localization remains largely unexplored. In this work, we bridge this gap by proposing PromptLocate, the first method for localizing injected prompts. PromptLocate comprises three steps: (1) splitting the contaminated data into semantically coherent segments, (2) identifying segments contaminated by injected instructions, and (3) pinpointing segments contaminated by injected data. We show PromptLocate accurately localizes injected prompts across eight existing and eight adaptive attacks.

  • Mitigating Watermark Forgery in Generative Models via Randomized Key Selection

    ArXiv.org · 2025-07-10

    preprintOpen access

    Watermarking enables GenAI providers to verify whether content was generated by their models. A watermark is a hidden signal in the content, whose presence can be detected using a secret watermark key. A core security threat are forgery attacks, where adversaries insert the provider's watermark into content \emph{not} produced by the provider, potentially damaging their reputation and undermining trust. Existing defenses resist forgery by embedding many watermarks with multiple keys into the same content, which can degrade model utility. However, forgery remains a threat when attackers can collect sufficiently many watermarked samples. We propose a defense that is provably forgery-resistant \emph{independent} of the number of watermarked content collected by the attacker, provided they cannot easily distinguish watermarks from different keys. Our scheme does not further degrade model utility. We randomize the watermark key selection for each query and accept content as genuine only if a watermark is detected by \emph{exactly} one key. We focus on the image and text modalities, but our defense is modality-agnostic, since it treats the underlying watermarking method as a black-box. Our method provably bounds the attacker's success rate and we empirically observe a reduction from near-perfect success rates to only $2\%$ at negligible computational overhead.

  • Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework

    ACM Transactions on Management Information Systems · 2025-06-06

    articleOpen access

    With rich visual data, such as images, becoming readily associated with items, visually-aware recommendation systems (VARS) have been widely used in different applications. Recent studies have shown that VARS are vulnerable to item-image adversarial attacks, which add human-imperceptible perturbations to the clean images associated with those items. Attacks on VARS pose new security challenges to a wide range of applications, such as e-commerce and social media, where VARS are widely used. How to secure VARS from such adversarial attacks becomes a critical problem. Currently, there is still a lack of systematic studies on how to design defense strategies against visual attacks on VARS. In this article, we attempt to fill this gap by proposing an adversarial image denoising and detection framework to secure VARS. Our proposed method can simultaneously (1) secure VARS from adversarial attacks characterized by local perturbations by image denoising based on global vision transformers; and (2) accurately detect adversarial examples using a novel contrastive learning approach. Meanwhile, our framework is designed to be used as both a filter and a detector so that they can be jointly trained to improve the flexibility of our defense strategy to a variety of attacks and VARS models. Our approach is uniquely tailored for VARS, addressing the distinct challenges in scenarios where adversarial attacks can differ across industries, for instance, causing misclassification in e-commerce or misrepresentation in real estate. We have conducted extensive experimental studies with two popular attack methods (FGSM and PGD). Our experimental results on two real-world datasets show that our defense strategy against visual attacks is effective and outperforms existing methods on different attacks. Moreover, our method demonstrates high accuracy in detecting adversarial examples, complementing its robustness across various types of adversarial attacks.

  • Model Poisoning Attacks to Federated Learning via Multi-Round Consistency

    2025-06-10 · 15 citations

    articleSenior author

    Model poisoning attacks are critical security threats to Federated Learning (FL). Existing model poisoning attacks suffer from two key limitations: 1) they achieve suboptimal effectiveness when defenses are deployed, and/or 2) they require knowledge of the model updates or local training data on genuine clients. In this work, we make a key observation that their suboptimal effectiveness arises from only leveraging model-update consistency among malicious clients within individual training rounds, making the attack effect self-cancel across training rounds. In light of this observation, we propose PoisonedFL, which enforces multi-round consistency among the malicious clients’ model updates while not requiring any knowledge about the genuine clients. Our empirical evaluation on five benchmark datasets shows that PoisonedFL breaks eight state-of-the-art defenses and outperforms seven existing model poisoning attacks. Our study shows that FL systems are considerably less robust than previously thought, underlining the urgency for the development of new defense mechanisms. Our source code is available at https://github.com/xyq7/PoisonedFL/.

  • PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features

    ArXiv.org · 2025-10-15

    preprintOpen access

    LLM-integrated applications are vulnerable to prompt injection attacks, where an attacker contaminates the input to inject malicious instructions, causing the LLM to follow the attacker's intent instead of the original user's. Existing prompt injection detection methods often have sub-optimal performance and/or high computational overhead. In this work, we propose PIShield, an effective and efficient detection method based on the observation that instruction-tuned LLMs internally encode distinguishable signals for prompts containing injected instructions. PIShield leverages residual-stream representations and a simple linear classifier to detect prompt injection, without expensive model fine-tuning or response generation. We conduct extensive evaluations on a diverse set of short- and long-context benchmarks. The results show that PIShield consistently achieves low false positive and false negative rates, significantly outperforming existing baselines. These findings demonstrate that internal representations of instruction-tuned LLMs provide a powerful and practical foundation for prompt injection detection in real-world applications.

  • GRID: Protecting Training Graph from Link Stealing Attacks on GNN Models

    2025-05-12 · 2 citations

    article

    Graph neural networks (GNNs) have exhibited superior performance in various classification tasks on graph-structured data. However, they encounter the potential vulnerability from the link stealing attacks, which can infer the presence of a link between two nodes via measuring the similarity of its incident nodes' prediction vectors produced by a GNN model. Such attacks pose severe security and privacy threats to the training graph used in GNN models. In this work, we propose a novel solution, called Graph Link Disguise (GRID), to defend against link stealing attacks with the formal guarantee of GNN model utility for retaining prediction accuracy. The key idea of GRID is to add carefully crafted noises to the nodes' prediction vectors for disguising adjacent nodes as n-hop indirect neighboring nodes. We take into account the graph topology and select only a subset of nodes (called core nodes) covering all links for adding noises, which can avert the noises offset and have the further advantages of reducing both the distortion loss and the computation cost. Our crafted noises can ensure 1) the noisy prediction vectors of any two adjacent nodes have their similarity level like that of two non-adjacent nodes and 2) the model prediction is unchanged to ensure zero utility loss. Extensive experiments on five datasets are conducted to show the effectiveness of our proposed GRID solution against different representative link-stealing attacks under transductive settings and inductive settings respectively, as well as two influence-based attacks. Meanwhile, it achieves a much better privacy-utility trade-off than existing methods when extended to GNNs.

  • WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents

    ArXiv.org · 2025-10-01

    preprintOpen accessSenior author

    Multiple prompt injection attacks have been proposed against web agents. At the same time, various methods have been developed to detect general prompt injection attacks, but none have been systematically evaluated for web agents. In this work, we bridge this gap by presenting the first comprehensive benchmark study on detecting prompt injection attacks targeting web agents. We begin by introducing a fine-grained categorization of such attacks based on the threat model. We then construct datasets containing both malicious and benign samples: malicious text segments generated by different attacks, benign text segments from four categories, malicious images produced by attacks, and benign images from two categories. Next, we systematize both text-based and image-based detection methods. Finally, we evaluate their performance across multiple scenarios. Our key findings show that while some detectors can identify attacks that rely on explicit textual instructions or visible image perturbations with moderate to high accuracy, they largely fail against attacks that omit explicit instructions or employ imperceptible perturbations. Our datasets and code are released at: https://github.com/Norrrrrrr-lyn/WAInjectBench.

  • DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks

    2025-05-12 · 5 citations

    articleSenior author

    LLM-integrated applications and agents are vulnerable to prompt injection attacks, where an attacker injects prompts into their inputs to induce attacker-desired outputs. A detection method aims to determine whether a given input is contaminated by an injected prompt. However, existing detection methods have limited effectiveness against state-of-the-art attacks, let alone adaptive ones. In this work, we propose DataSentinel, a game-theoretic method to detect prompt injection attacks. Specifically, DataSentinel fine-tunes an LLM to detect inputs contaminated with injected prompts that are strategically adapted to evade detection. We formulate this as a minimax optimization problem, with the objective of fine-tuning the LLM to detect strong adaptive attacks. Furthermore, we propose a gradient-based method to solve the minimax optimization problem by alternating between the inner max and outer min problems. Our evaluation results on multiple benchmark datasets and LLMs show that DataSentinel effectively detects both existing and adaptive prompt injection attacks. Our code and data are available at: https://github.com/liu00222/Open-Prompt-Injection.

  • Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment

    2025-10-13 · 1 citations

    articleOpen accessSenior author

    Prompt injection attack, where an attacker injects a prompt into the original one, aiming to make an Large Language Model (LLM) follow the injected prompt to perform an attacker-chosen task, represent a critical security threat. Existing attacks primarily focus on crafting these injections at inference time, treating the LLM itself as a static target. Our experiments show that these attacks achieve some success, but there is still significant room for improvement. In this work, we introduces a more foundational attack vector: poisoning the LLM’s alignment process to amplify the success of future prompt injection attacks. Specifically, we propose PoisonedAlign, a method that strategically creates poisoned alignment samples to poison an LLM’s alignment dataset. Our experiments across five LLMs and two alignment datasets show that when even a small fraction of the alignment data is poisoned, the resulting model becomes substantially more vulnerable to a wide range of prompt injection attacks. Crucially, this vulnerability is instilled while the LLM’s performance on standard capability benchmarks remains largely unchanged, making the manipulation difficult to detect through automated, general-purpose performance evaluations. The code for implementing the attack is at https://github.com/Sadcardation/PoisonedAlign.

Recent grants

Frequent coauthors

Labs

Awards & honors

  • NSF CAREER Award (2018)
  • Army Research Office Young Investigator Program (YIP) Award…
  • Rising Star Award from the Association of Chinese Scholars i…
  • IBM Faculty Award (2020, 2023)
  • Facebook Research Award (2021)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Neil Gong

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup