Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Hanghang  Tong

Hanghang Tong

· ProfessorVerified

University of Illinois Urbana-Champaign · Computer Science

Active 2004–2026

h-index58
Citations18.1k
Papers534261 last 5y
Funding$1.8M1 active
See your match with Hanghang Tong — sign in to PhdFit.Sign in

About

The main focus of IDEA Lab@UIUC lies in large scale data mining, machine learning and AI, especially for graph and multimedia data with applications to social networks analysis, healthcare, cyber-security, cyber-physical systems, agriculture and e-commerce.

Research topics

  • Computer Science
  • Machine Learning
  • Artificial Intelligence
  • Data Mining
  • Theoretical computer science
  • Econometrics
  • Operations management
  • Physics
  • Economics

Selected publications

  • Code as Agent Harness

    ArXiv.org · 2026-05-18

    articleOpen access

    Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is no longer only a target output. It increasingly serves as an operational substrate for agent reasoning, acting, environment modeling, and execution-based verification. We frame this shift through the lens of agent harnesses and introduce code as agent harness: a unified view that centers code as the basis for agent infrastructure. To systematically study this perspective, we organize the survey around three connected layers. First, we study the harness interface, where code connects agents to reasoning, action, and environment modeling. Second, we examine harness mechanisms: planning, memory, and tool use for long-horizon execution, together with feedback-driven control and optimization that make harness reliable and adaptive. Third, we discuss scaling the harness from single-agent systems to multi-agent settings, where shared code artifacts support multi-agent coordination, review, and verification. Across these layers, we summarize representative methods and practical applications of code as agent harness, spanning coding assistants, GUI/OS automation, embodied agents, scientific discovery, personalization and recommendation, DevOps, and enterprise workflows. We further outline open challenges for harness engineering, including evaluation beyond final task success, verification under incomplete feedback, regression-free harness improvement, consistent shared state across multiple agents, human oversight for safety-critical actions, and extensions to multimodal environments. By centering code as the harness of agentic AI, this survey provides a unified roadmap toward executable, verifiable, and stateful AI agent systems.

  • Mixture of Sequence: Theme-Aware Mixture-of-Experts for Long-Sequence Recommendation

    2026-04-12

    articleSenior author
  • Code as Agent Harness

    arXiv (Cornell University) · 2026-05-18

    preprintOpen access

    Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is no longer only a target output. It increasingly serves as an operational substrate for agent reasoning, acting, environment modeling, and execution-based verification. We frame this shift through the lens of agent harnesses and introduce code as agent harness: a unified view that centers code as the basis for agent infrastructure. To systematically study this perspective, we organize the survey around three connected layers. First, we study the harness interface, where code connects agents to reasoning, action, and environment modeling. Second, we examine harness mechanisms: planning, memory, and tool use for long-horizon execution, together with feedback-driven control and optimization that make harness reliable and adaptive. Third, we discuss scaling the harness from single-agent systems to multi-agent settings, where shared code artifacts support multi-agent coordination, review, and verification. Across these layers, we summarize representative methods and practical applications of code as agent harness, spanning coding assistants, GUI/OS automation, embodied agents, scientific discovery, personalization and recommendation, DevOps, and enterprise workflows. We further outline open challenges for harness engineering, including evaluation beyond final task success, verification under incomplete feedback, regression-free harness improvement, consistent shared state across multiple agents, human oversight for safety-critical actions, and extensions to multimodal environments. By centering code as the harness of agentic AI, this survey provides a unified roadmap toward executable, verifiable, and stateful AI agent systems.

  • Few-Shot Knowledge Graph Completion via Transfer Knowledge from Similar Tasks

    2025-11-10

    articleOpen accessSenior author

    Knowledge graphs (KGs) are essential in many AI applications but often suffer from incompleteness, limiting their utility. Many relations in KGs have only a few examples, making it challenging to train accurate models. Few-shot learning offers a promising direction by enabling KG completion with only a small number of training triplets. However, most existing approaches treat each relation independently and fail to leverage shared information across tasks. In this paper, we introduce TransNet, a transfer learning method for few-shot KG completion that captures task relationships and reuses knowledge from related tasks. TransNet further incorporates meta-learning to effectively handle unseen relations. Experiments on standard benchmarks demonstrate that TransNet achieves strong performance compared to prior methods. Code and data will be released upon acceptance.

  • Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

    ArXiv.org · 2025-10-01

    preprintOpen accessSenior author

    Supervised fine-tuning (SFT) is the standard approach for post-training large language models (LLMs), yet it often shows limited generalization. We trace this limitation to its default training objective: negative log likelihood (NLL). While NLL is classically optimal when training from scratch, post-training operates in a different paradigm and could violate its optimality assumptions, where models already encode task-relevant priors and supervision can be long and noisy. Rather than proposing a single universally superior replacement loss, we systematically study various probability-based objectives and characterize when and why different objectives succeed or fail under varying conditions. Through comprehensive experiments and extensive ablation studies across 8 model backbones, 27 benchmarks, and 7 domains, we uncover a critical dimension that governs objective behavior: the model-capability continuum. Near the model-strong end, prior-leaning objectives that downweight low-probability tokens (e.g., $-p$, $-p^{10}$, thresholded variants) consistently outperform NLL; toward the model-weak end, NLL dominates; in between, no single objective prevails. Our theoretical analysis further elucidates how objectives trade places across the continuum, providing a principled foundation for adapting objectives to model capability. The code is provided at https://github.com/GaotangLi/Beyond-Log-Likelihood.

  • <scp>InterFormer:</scp> Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction

    2025-11-08

    articleOpen access
  • Embracing Plasticity: Balancing Stability and Plasticity in Continual Recommender Systems

    2025-07-13 · 1 citations

    articleOpen accessSenior author

    In the era of big data and AI, recommender systems must adapt to evolving user preferences and new users/items to maintain high-quality recommendations. Fine-tuning, which updates model parameters using only new data, offers an efficient alternative to full retraining but struggles to balance stability (retaining past knowledge) and plasticity (adapting to new knowledge). While existing methods prioritize stability to address catastrophic forgetting, we argue that plasticity must also be explicitly strengthened, especially for users with rapidly changing preferences. In this work, we propose PlastIcity and StAbility balancing continual recommender systems (PISA), a novel framework that adaptively balances stability and plasticity based on user preference shifts. PISA quantifies preference shifts as changes in user distances to item clusters, and then guides user embeddings by prioritizing stability for stable users and plasticity for dynamic users. To achieve this, PISA leverages backward knowledge from the previous model and forward knowledge from fine-tuning on current data. During training, PISA maximizes mutual information between user-specific parameters and the relevant reference knowledge. Theoretically, we show that enhancing plasticity mitigates distribution shifts more effectively than fine-tuning alone. Empirically, extensive experiments on three real-world datasets validate PISA's superiority over existing methods and highlight the contributions of its components.

  • Hephaestus: Mixture Generative Modeling with Energy Guidance for Large-scale QoS Degradation

    ArXiv.org · 2025-10-19

    preprintOpen access

    We study the Quality of Service Degradation (QoSD) problem, in which an adversary perturbs edge weights to degrade network performance. This setting arises in both network infrastructures and distributed ML systems, where communication quality, not just connectivity, determines functionality. While classical methods rely on combinatorial optimization, and recent ML approaches address only restricted linear variants with small-size networks, no prior model directly tackles the QoSD problem under nonlinear edge-weight functions. This work proposes \PIMMA, a self-reinforcing generative framework that synthesizes feasible solutions in latent space, to fill this gap. Our method includes three phases: (1) Forge: a Predictive Path-Stressing (PPS) algorithm that uses graph learning and approximation to produce feasible solutions with performance guarantee, (2) Morph: a new theoretically grounded training paradigm for Mixture of Conditional VAEs guided by an energy-based model to capture solution feature distributions, and (3) Refine: a reinforcement learning agent that explores this space to generate progressively near-optimal solutions using our designed differentiable reward function. Experiments on both synthetic and real-world networks show that our approach consistently outperforms classical and ML baselines, particularly in scenarios with nonlinear cost functions where traditional methods fail to generalize.

  • Adversarial Bias: Data Poisoning Attacks on Fairness

    ArXiv.org · 2025-11-11

    preprintOpen accessSenior author

    With the growing adoption of AI and machine learning systems in real-world applications, ensuring their fairness has become increasingly critical. The majority of the work in algorithmic fairness focus on assessing and improving the fairness of machine learning systems. There is relatively little research on fairness vulnerability, i.e., how an AI system's fairness can be intentionally compromised. In this work, we first provide a theoretical analysis demonstrating that a simple adversarial poisoning strategy is sufficient to induce maximally unfair behavior in naive Bayes classifiers. Our key idea is to strategically inject a small fraction of carefully crafted adversarial data points into the training set, biasing the model's decision boundary to disproportionately affect a protected group while preserving generalizable performance. To illustrate the practical effectiveness of our method, we conduct experiments across several benchmark datasets and models. We find that our attack significantly outperforms existing methods in degrading fairness metrics across multiple models and datasets, often achieving substantially higher levels of unfairness with a comparable or only slightly worse impact on accuracy. Notably, our method proves effective on a wide range of models, in contrast to prior work, demonstrating a robust and potent approach to compromising the fairness of machine learning systems.

  • Imposing the 'Right' Structural Constraints in High-Dimensional Regression

    Harvard Data Science Review · 2025-10-16

    articleOpen access1st authorCorresponding

Recent grants

Frequent coauthors

Labs

  • IDEA Lab@UIUCPI

    Large scale data mining, machine learning and AI, especially for graph and multimedia data with applications to social networks analysis, healthcare, cyber-security, cyber-physical systems, agriculture and e-commerce.

Education

  • Ph.D.

    University of Illinois at Urbana-Champaign

  • M.S.

    University of Illinois at Urbana-Champaign

  • B.S.

    University of Illinois at Urbana-Champaign

Awards & honors

  • ACM Fellow, 2025
  • Senior Member, AAAI, 2025
  • University Scholar, UIUC, 2024
  • IEEE ICDM 2022 10-Year Highest Impact Paper Award, 2022
  • Springer Knowl. Inf. Syst. (KAIS) on “Best-ranked paper of I…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Hanghang Tong

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup