Heng Ji

· ProfessorVerified

University of Illinois Urbana-Champaign · Computer Science

Active 2002–2026

h-index40

Citations6.3k

Papers364224 last 5y

Funding$771k

Faculty page

See your match with Heng Ji — sign in to PhdFit.Sign in

Research topics

Artificial Intelligence
Computer Science
Natural Language Processing
Information Retrieval
Philosophy
Machine Learning
Linguistics
Epistemology
Data Mining
Sociology
Engineering
Psychology
Humanities
Political Science
Human–computer interaction
Art
Ecology
Biology
Library science
Algorithm
World Wide Web
Communication
Cognitive science
Data science

Selected publications

Must Read: A Comprehensive Survey of Computational Persuasion
ACM Computing Surveys · 2026-03-19 · 2 citations
articleOpen access
Persuasion is a fundamental aspect of communication, influencing decision-making across diverse contexts, from everyday conversations to high-stakes scenarios such as politics, marketing, and law. The rise of conversational Artificial Intelligence (AI) systems has significantly expanded the scope of persuasion, introducing both opportunities and risks. AI-driven persuasion can be leveraged for beneficial applications, but also poses threats through unethical influence. Moreover, AI systems are not only persuaders, but also susceptible to persuasion, making them vulnerable to adversarial attacks and bias reinforcement. Despite rapid advancements in AI-generated persuasive content, our understanding of what makes persuasion effective remains limited due to its inherently subjective and context-dependent nature. In this survey, we provide a comprehensive overview of persuasion, structured around three key perspectives: (1) AI as a Persuader , which explores AI-generated persuasive content and its applications; (2) AI as a Persuadee , which examines AI’s susceptibility to influence and manipulation; and (3) AI as a Persuasion Judge , which analyzes AI’s role in evaluating persuasive strategies, detecting manipulation, and ensuring ethical persuasion. We introduce a taxonomy for persuasion research and discuss key challenges for future research to enhance the safety, fairness, and effectiveness of AI-powered persuasion while addressing the risks posed by increasingly capable language models.
Publisher DOI
oMeBench: Towards Robust Benchmarking of LLMs in Organic Mechanism Elucidation and Reasoning
ArXiv.org · 2025-10-09
preprintOpen accessSenior author
Organic reaction mechanisms are the stepwise elementary reactions by which reactants form intermediates and products, and are fundamental to understanding chemical reactivity and designing new molecules and reactions. Although large language models (LLMs) have shown promise in understanding chemical tasks such as synthesis design, it is unclear to what extent this reflects genuine chemical reasoning capabilities, i.e., the ability to generate valid intermediates, maintain chemical consistency, and follow logically coherent multi-step pathways. We address this by introducing oMeBench, the first large-scale, expert-curated benchmark for organic mechanism reasoning in organic chemistry. It comprises over 10,000 annotated mechanistic steps with intermediates, type labels, and difficulty ratings. Furthermore, to evaluate LLM capability more precisely and enable fine-grained scoring, we propose oMeS, a dynamic evaluation framework that combines step-level logic and chemical similarity. We analyze the performance of state-of-the-art LLMs, and our results show that although current models display promising chemical intuition, they struggle with correct and consistent multi-step reasoning. Notably, we find that using prompting strategy and fine-tuning a specialist model on our proposed dataset increases performance by 50% over the leading closed-source model. We hope that oMeBench will serve as a rigorous foundation for advancing AI systems toward genuine chemical reasoning.
Publisher OA PDF DOI
Community Moderation and the New Epistemology of Fact Checking on Social Media
TUbilio (Technical University of Darmstadt) · 2025-05-26 · 2 citations
preprintOpen access
Social media platforms have traditionally relied on internal moderation teams and partnerships with independent fact-checking organizations to identify and flag misleading content. Recently, however, platforms including X (formerly Twitter) and Meta have shifted towards community-driven content moderation by launching their own versions of crowd-sourced fact-checking -- Community Notes. If effectively scaled and governed, such crowd-checking initiatives have the potential to combat misinformation with increased scale and speed as successfully as community-driven efforts once did with spam. Nevertheless, general content moderation, especially for misinformation, is inherently more complex. Public perceptions of truth are often shaped by personal biases, political leanings, and cultural contexts, complicating consensus on what constitutes misleading content. This suggests that community efforts, while valuable, cannot replace the indispensable role of professional fact-checkers. Here we systemically examine the current approaches to misinformation detection across major platforms, explore the emerging role of community-driven moderation, and critically evaluate both the promises and challenges of crowd-checking at scale.
Publisher OA PDF DOI
From Talk to Triage: Pluralism is Necessary but Not Sufficient for AI Alignment
2025-10-10
articleOpen access
As AI systems become both more powerful and prevalent, ensuring that their actions align with human values is paramount. The challenge of AI alignment is thus an interdisciplinary one that involves not only a technical challenge for computer science but one with important ties to the psychology of moral values, decision-making, and trust. Early work identified a static set of universal values, without considering the key questions of to whom and to which values AI should be aligned. This perspective paper challenges the notion of universal alignment and instead argues for dynamic, context-specific alignability across different domains, tasks, and users. Specifically, we emphasize the need to go beyond traditional pluralism and rethink how AI alignment can be achieved through a qualitative and quantitative research process that involves identifying context-specific values, developing alignable AI algorithms using limited human feedback, and evaluating alignment through assessing both an AI’s values and actions, while considering how humans trust and delegate to the AI. We discuss several paths forward for our proposed framework, including the potential ethical and societal implications of context-specific alignability, and draw on examples ranging from chatbots to value-aligned decision-making in the medical triage domain.
Publisher DOI
Variational Supervised Contrastive Learning
ArXiv.org · 2025-06-09
preprintOpen access
Contrastive learning has proven to be highly efficient and adaptable in shaping representation spaces across diverse modalities by pulling similar samples together and pushing dissimilar ones apart. However, two key limitations persist: (1) Without explicit regulation of the embedding distribution, semantically related instances can inadvertently be pushed apart unless complementary signals guide pair selection, and (2) excessive reliance on large in-batch negatives and tailored augmentations hinders generalization. To address these limitations, we propose Variational Supervised Contrastive Learning (VarCon), which reformulates supervised contrastive learning as variational inference over latent class variables and maximizes a posterior-weighted evidence lower bound (ELBO) that replaces exhaustive pair-wise comparisons for efficient class-aware matching and grants fine-grained control over intra-class dispersion in the embedding space. Trained exclusively on image data, our experiments on CIFAR-10, CIFAR-100, ImageNet-100, and ImageNet-1K show that VarCon (1) achieves state-of-the-art performance for contrastive learning frameworks, reaching 79.36% Top-1 accuracy on ImageNet-1K and 78.29% on CIFAR-100 with a ResNet-50 encoder while converging in just 200 epochs; (2) yields substantially clearer decision boundaries and semantic organization in the embedding space, as evidenced by KNN classification, hierarchical clustering results, and transfer-learning assessments; and (3) demonstrates superior performance in few-shot learning than supervised baseline and superior robustness across various augmentation strategies. Our code is available at https://github.com/ziwenwang28/VarContrast.
Publisher OA PDF DOI
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
ArXiv.org · 2025-04-26
preprintOpen access
Large Language Models (LLMs) are advancing at an amazing speed and have become indispensable across academia, industry, and daily applications. To keep pace with the status quo, this survey probes the core challenges that the rise of LLMs poses for evaluation. We identify and analyze two pivotal transitions: (i) from task-specific to capability-based evaluation, which reorganizes benchmarks around core competencies such as knowledge, reasoning, instruction following, multi-modal understanding, and safety; and (ii) from manual to automated evaluation, encompassing dynamic dataset curation and "LLM-as-a-judge" scoring. Yet, even with these transitions, a crucial obstacle persists: the evaluation generalization issue. Bounded test sets cannot scale alongside models whose abilities grow seemingly without limit. We will dissect this issue, along with the core challenges of the above two transitions, from the perspectives of methods, datasets, evaluators, and metrics. Due to the fast evolving of this field, we will maintain a living GitHub repository (links are in each section) to crowd-source updates and corrections, and warmly invite contributors and collaborators.
Publisher OA PDF DOI
A Survey on Post-training of Large Language Models
ArXiv.org · 2025-03-08 · 1 citations
preprintOpen access
The emergence of Large Language Models (LLMs) has fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration. However, their pre-trained architectures often reveal limitations in specialized contexts, including restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance. These challenges necessitate advanced post-training language models (PoLMs) to address these shortcomings, such as OpenAI-o1/o3 and DeepSeek-R1 (collectively known as Large Reasoning Models, or LRMs). This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Efficiency, which optimizes resource utilization amidst increasing complexity; Integration and Adaptation, which extend capabilities across diverse modalities while addressing coherence issues. Charting progress from ChatGPT's alignment strategies to DeepSeek-R1's innovative reasoning advancements, we illustrate how PoLMs leverage datasets to mitigate biases, deepen reasoning capabilities, and enhance domain adaptability. Our contributions include a pioneering synthesis of PoLM evolution, a structured taxonomy categorizing techniques and datasets, and a strategic agenda emphasizing the role of LRMs in improving reasoning proficiency and domain flexibility. As the first survey of its scope, this work consolidates recent PoLM advancements and establishes a rigorous intellectual framework for future research, fostering the development of LLMs that excel in precision, ethical robustness, and versatility across scientific and societal applications.
Publisher OA PDF DOI
Artificial intelligence unlocks the future of oral organoid research
Translational dental research. · 2025-07-01 · 4 citations
articleOpen access
Oral organoids, emerging as a powerful method for modeling oral development and diseases, show potential in fundamental research and clinical applications. However, their translational application in clinics is limited by low construction efficiency, labor-intensive data processing, and the complexity of integrating multi-omics data. Artificial intelligence (AI) offers promising solutions to overcome these limitations. AI-based robots can optimize culture conditions to enhance construction efficiency. Furthermore, AI enables the efficient analysis of organoid images and multi-omics data to elucidate underlying molecular mechanisms. The integration of oral organoids and AI could potentially overcome the limitations in organoid research and accelerate clinical translation. In this review, we first summarize the main types of oral organoids in the field and their construction strategies. Then, we introduce their application in regenerative medicine and the modeling of oral disease. We also examine the current limitations and discuss AI-based approaches to address these challenges, highlighting the critical role of AI in benefiting basic and translational oral research. • Revisits the history of oral organoid models. • Presents advancements and challenges in oral organoid models. • Discusses the potential of AI in accelerating clinical translation of oral organoid.
Publisher DOI
Automating Intervention Discovery from Scientific Literature: A Progressive Ontology Prompting and Dual-LLM Framework
2025-09-01
article
Identifying effective interventions from the scientific literature is challenging due to the high volume of publications, specialized terminology, and inconsistent reporting formats, making manual curation laborious and prone to oversight. To address this challenge, this paper proposes a novel framework leveraging large language models (LLMs), which integrates a progressive ontology prompting (POP) algorithm with a dual-agent system, named LLM-Duo. On the one hand, the POP algorithm conducts a prioritized breadth-first search (BFS) across a predefined ontology, generating structured prompt templates and action sequences to guide the automatic annotation process. On the other hand, the LLM-Duo system features two specialized LLM agents, an explorer and an evaluator, working collaboratively and adversarially to continuously refine annotation quality. We showcase the real-world applicability of our framework through a case study focused on speech-language intervention discovery. Experimental results show that our approach surpasses advanced baselines, achieving more accurate and comprehensive annotations through a fully automated process. Our approach successfully identified 2,421 interventions from a corpus of 64,177 research articles in the speech-language pathology domain, culminating in the creation of a publicly accessible intervention knowledge base with great potential to benefit the speech-language pathology community.
Publisher DOI
Alice: Proactive Learning with Teacher's Demonstrations for Weak-to-Strong Generalization
ArXiv.org · 2025-04-09
preprintOpen accessSenior author
The growing capabilities of large language models (LLMs) present a key challenge of maintaining effective human oversight. Weak-to-strong generalization (W2SG) offers a promising framework for supervising increasingly capable LLMs using weaker ones. Traditional W2SG methods rely on passive learning, where a weak teacher provides noisy demonstrations to train a strong student. This hinders students from employing their knowledge during training and reaching their full potential. In this work, we introduce Alice (pro{A}ctive {l}earning w{i}th tea{c}her's D{e}monstrations), a framework that leverages complementary knowledge between teacher and student to enhance the learning process. We probe the knowledge base of the teacher model by eliciting their uncertainty, and then use these insights together with teachers' responses as demonstrations to guide student models in self-generating improved responses for supervision. In addition, for situations with significant capability gaps between teacher and student models, we introduce cascade Alice, which employs a hierarchical training approach where weak teachers initially supervise intermediate models, who then guide stronger models in sequence. Experimental results demonstrate that our method significantly enhances the W2SG performance, yielding substantial improvements in three key tasks compared to the original W2SG: knowledge-based reasoning (+4.0%), mathematical reasoning (+22.62%), and logical reasoning (+12.11%). This highlights the effectiveness of our new W2SG paradigm that enables more robust knowledge transfer and supervision outcome.
Publisher OA PDF DOI