Jimeng Sun
· ProfessorVerifiedUniversity of Illinois Urbana-Champaign · Computer Science
Active 1999–2026
About
Jimeng Sun is a professor at the Siebel School of Computing and Data Science at the University of Illinois Urbana-Champaign. His research interests focus on artificial intelligence (AI) for healthcare, including deep learning for drug discovery, clinical trial optimization, computational phenotyping, clinical predictive modeling, treatment recommendation, and health monitoring. He is involved in data and information systems, bioinformatics, and computational biology, applying AI techniques to advance healthcare solutions. Sun has contributed to the development of deep learning methods specifically tailored for healthcare applications and has been recognized for his influence in the field. He has taught courses related to deep learning and AI in medicine and actively engages in research that leverages AI to improve clinical outcomes and accelerate medical research processes.
Research topics
- Computer Science
- Artificial Intelligence
- Data Mining
- Machine Learning
- Political Science
- Data science
- Economics
- Geography
- Econometrics
- Medicine
- Business
- Operations research
- Engineering
- Actuarial science
- Cognitive science
- Bioinformatics
- Programming language
- Meteorology
- Mathematics
- Environmental health
- Statistics
- World Wide Web
- Biology
Selected publications
SocialStep: Fast Prediction of Social Determinants of Health
2026-04-30
articleSenior authorAccelerating clinical evidence synthesis with large language models
npj Digital Medicine · 2025-08-07 · 29 citations
articleOpen accessSenior authorClinical evidence synthesis largely relies on systematic reviews (SR) of clinical studies from medical literature. Here, we propose a generative artificial intelligence (AI) pipeline named TrialMind to streamline study search, study screening, and data extraction tasks in SR. We chose published SRs to build TrialReviewBench, which contains 100 SRs and 2,220 clinical studies. For study search, it achieves high recall rates (Ours 0.711-0.834 v.s. Human baseline 0.138-0.232). For study screening, TrialMind beats previous document ranking methods in a 1.5-2.6 fold change. For data extraction, it outperforms a GPT-4's accuracy by 16-32%. In a pilot study, human-AI collaboration with TrialMind improved recall by 71.4% and reduced screening time by 44.2%, while in data extraction, accuracy increased by 23.5% with a 63.4% time reduction. Medical experts preferred TrialMind's synthesized evidence over GPT-4's in 62.5%-100% of cases. These findings show the promise of accelerating clinical evidence synthesis driven by human-AI collaboration.
s3: You Don't Need That Much Data to Train a Search Agent via RL
ArXiv.org · 2025-05-20
preprintOpen accessRetrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference. Recent advances have enabled LLMs to act as search agents via reinforcement learning (RL), improving information acquisition through multi-turn interactions with retrieval engines. However, existing approaches either optimize retrieval using search-only metrics (e.g., NDCG) that ignore downstream utility or fine-tune the entire LLM to jointly reason and retrieve-entangling retrieval with generation and limiting the real search utility and compatibility with frozen or proprietary models. In this work, we propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward: the improvement in generation accuracy over naive RAG. s3 requires only 2.4k training samples to outperform baselines trained on over 70x more data, consistently delivering stronger downstream performance across six general QA and five medical QA benchmarks.
BioDSA-1K: Benchmarking Data Science Agents for Biomedical Research
ArXiv.org · 2025-05-22
preprintOpen accessSenior authorValidating scientific hypotheses is a central challenge in biomedical research, and remains difficult for artificial intelligence (AI) agents due to the complexity of real-world data analysis and evidence interpretation. In this work, we present BioDSA-1K, a benchmark designed to evaluate AI agents on realistic, data-driven biomedical hypothesis validation tasks. BioDSA-1K consists of 1,029 hypothesis-centric tasks paired with 1,177 analysis plans, curated from over 300 published biomedical studies to reflect the structure and reasoning found in authentic research workflows. Each task includes a structured hypothesis derived from the original study's conclusions, expressed in the affirmative to reflect the language of scientific reporting, and one or more pieces of supporting evidence grounded in empirical data tables. While these hypotheses mirror published claims, they remain testable using standard statistical or machine learning methods. The benchmark enables evaluation along four axes: (1) hypothesis decision accuracy, (2) alignment between evidence and conclusion, (3) correctness of the reasoning process, and (4) executability of the AI-generated analysis code. Importantly, BioDSA-1K includes non-verifiable hypotheses: cases where the available data are insufficient to support or refute a claim, reflecting a common yet underexplored scenario in real-world science. We propose BioDSA-1K as a foundation for building and evaluating generalizable, trustworthy AI agents for biomedical discovery.
Plants · 2025-12-03 · 1 citations
articleOpen accessGlobal climate change has intensified land desertification in the arid and semi-arid regions of northwestern China, highlighting the urgent need to cultivate plant species with ideal architecture and well-developed root systems to combat ecosystem degradation. Amorpha fruticosa is widely used as a windbreak and sand-fixation shrub; however, its rapid growth and high transpiration during the early planting stage often result in excessive water loss, low survival rates, and limited vegetation restoration effectiveness. Plant growth retardants (PGRts) are known to suppress apical dominance and promote branching. In this study, one-year-old A. fruticosa seedlings were treated with different combinations of paclobutrazol (PP333) and uniconazole (S3307) to investigate their effects on plant morphology and biomass allocation; it aims to determine the optimal formula for cultivating shrub structures with excellent windbreak and sand-fixation effects in land desertification areas. The results showed that both PP333 and S3307 significantly inhibited plant height while promoting basal stem diameter, branching, and root development. Among all treatments, the S3307 200 mg·L−1 + PP333 200 mg·L−1 combination (SD3) was the most effective, resulting in the greatest increases in basal diameter, branch number, total root length, and root-to-shoot ratio, while significantly reducing height increment, leaf length and leaf area (p < 0.05). Under the S3307 200 mg·L−1 + PP333 300 mg·L−1 treatment (SD4), leaf width and specific leaf area were reduced by 17.92% and 38.89%, respectively, compared with the control. Correlation analysis revealed significant positive or negative relationships among most growth traits, with leaf length negatively correlated with other morphological indicators. Fresh and dry weights of both aboveground and root tissues were significantly positively correlated with basal diameter (R = 0.38) and branch basal diameter (R = 0.33). Principal component analysis demonstrated that the SD3 treatment achieved the highest comprehensive score (2.91), indicating its superiority in promoting a compact yet robust plant architecture. Overall, the SD3 treatment improved drought resistance and sand-fixation capacity of A. fruticosa by “dwarfing and strengthening plants while optimizing root–shoot allocation.” These findings provide theoretical support for large-scale cultivation and vegetation restoration in arid and semi-arid regions and offer a technical reference for growth regulation and windbreak and sand-fixation capacity in other xerophytic shrub species.
medRxiv · 2025-04-25 · 3 citations
preprintOpen accessBackground: Positron Emission Tomography (PET) scans are a crucial tool in the diagnosing and monitoring of a number of complex conditions, including cancer, heart health, and especially cognitive brain function. However, they are also often much more expensive than comparable imaging modalities such as X-Ray and magnetic resonance imaging (MRI), which can limit their availability and the impact of their use in both medical and machine learning settings. We propose to address this problem by using generative models to simulate the PET scan results based on prior MRI. Methods: While recent work has yielded impressive realism in image generation, this PET synthesis task presents a series of technical challenges based on the scarcity of paired data as well as the complexity and nuance of the 3D images. So, we propose MRI2PET to generate AV45-PET scans from T1-weighted MRI images. MRI2PET is a 3D diffusion-based method which makes use of style transferred pre-training and a Laplacian pyramid loss to address these challenges by utilizing larger available unpaired MRI datasets and structural similarities between the MRI and PET images while simultaneously emphasizing the crucial details. Findings: We evaluate MRI2PET through a series of studies on the ADNI dataset where we show that it both generates realistic images and improves clinically-based disease classification. When compared to training on only the original AV45-PET data, MRI2PET augmentation increases AUROC of brain scan classification to 0.780 ± 0.005 from 0.688 ± 0.014 when classifying brain scans into one of three clinically defined groups: cognitively normal, mild cognitive impairment, and Alzheimer's Disease. Interpretation: The capability to generate high quality, clinically relevant PET scans from MRI has the potential to expand the utility of cost-effective and accessible imaging workflows and improve both image-based machine learning capabilities and patient care. Funding: US National Institute on Aging, US National Institutes of Health, US National Science Foundation.
CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making
ArXiv.org · 2025-06-15
preprintOpen accessIn medical visual question answering (Med-VQA), achieving accurate responses relies on three critical steps: precise perception of medical imaging data, logical reasoning grounded in visual input and textual questions, and coherent answer derivation from the reasoning process. Recent advances in general vision-language models (VLMs) show that large-scale reinforcement learning (RL) could significantly enhance both reasoning capabilities and overall model performance. However, their application in medical domains is hindered by two fundamental challenges: 1) misalignment between perceptual understanding and reasoning stages, and 2) inconsistency between reasoning pathways and answer generation, both compounded by the scarcity of high-quality medical datasets for effective large-scale RL. In this paper, we first introduce Med-Zero-17K, a curated dataset for pure RL-based training, encompassing over 30 medical image modalities and 24 clinical tasks. Moreover, we propose a novel large-scale RL framework for Med-VLMs, Consistency-Aware Preference Optimization (CAPO), which integrates rewards to ensure fidelity between perception and reasoning, consistency in reasoning-to-answer derivation, and rule-based accuracy for final responses. Extensive experiments on both in-domain and out-of-domain scenarios demonstrate the superiority of our method over strong VLM baselines, showcasing strong generalization capability to 3D Med-VQA benchmarks and R1-like training paradigms.
RDMA: Cost Effective Agent-Driven Rare Disease Mining from Electronic Health Records
ArXiv.org · 2025-07-14
preprintOpen accessSenior authorRare diseases affect 1 in 10 Americans yet remain systematically underdocumented in clinical records. ICD-based systems cannot capture their breadth, over 50\% of Orphanet codes lack a direct ICD mapping and only 2.2\% of HPO codes have matching ICD codes, leaving patient populations invisible and delaying diagnosis. Mining unstructured clinical notes offers a direct path forward, but real notes are long, noisy, and abbreviation-dense, and limited annotations make fine-tuning infeasible, demanding approaches that generalize without task-specific training. We present Rare Disease Mining Agents (RDMA), an agentic framework equipping smaller quantized LLMs with tools for abbreviation resolution, implicit phenotype reasoning, and ontology grounding against Orphanet and HPO. RDMA substantially outperforms fine-tuned and RAG-based baselines across benchmarks with different data characteristics, without any task-specific training. A small quantized model achieves maximal performance, reducing inference costs by up to 10x and local hardware costs by up to 17x, enabling private deployment on standard hardware without cloud-based PHI exposure. RDMA's uncertainty-flagging mechanism further reduces expert annotation burden while preserving agreement quality, supporting scalable rare disease documentation in clinical practice. Available at https://github.com/jhnwu3/RDMA.
MediSim: Multi-granular simulation for enriching longitudinal, multi-modal electronic health records
Patterns · 2025-05-08 · 1 citations
articleOpen accessSenior authorWe introduce MediSim, a multi-modal generative model for simulating and augmenting electronic health records across multiple modalities, including structured codes, clinical notes, and medical imaging. MediSim employs a multi-granular, autoregressive architecture to simulate missing modalities and visits and iterative, reinforcement learning-based training to improve simulation in low-data settings. Additionally, it utilizes encoder-decoder model pairs to handle complex modalities like notes and images. Experiments on outpatient claims and inpatient ICU datasets have demonstrated MediSim's superiority over baselines in predicting missing codes, creating enriched data, and improving downstream predictive modeling. Specifically, MediSim improved over 74% on missing code prediction, enabled up to 65% better downstream predictive performance compared to original deficient records missing either some visits or entire data modalities, and successfully produced realistic note and X-ray samples for use in downstream tasks. MediSim's ability to generate comprehensive, high-dimensional EHR data has the potential to significantly improve AI applications throughout healthcare.
MEDS: Building Models and Tools in a Reproducible Health AI Ecosystem
2025-08-03
articleOpen accessKDD ’25, Toronto, ON, Canada
Recent grants
NSF · $400k · 2020–2024
BigData:IA:Collaborative Research: TIMES: A tensor factorization platform for spatio-temporal data
NSF · $755k · 2020–2024
BigData:IA:Collaborative Research: TIMES: A tensor factorization platform for spatio-temporal data
NSF · $774k · 2018–2020
Collaborative Research: SCH: Fair Federated Representation Learning for Breast Cancer Risk Scoring
NSF · $350k · 2022–2026
NSF · $50k · 2020–2022
Frequent coauthors
- 162 shared
Cao Xiao
- 106 shared
Lucas M. Glass
IQVIA (United States)
- 44 shared
M. Brandon Westover
Harvard University
- 37 shared
Tianfan Fu
Rensselaer Polytechnic Institute
- 36 shared
Shenda Hong
- 35 shared
Walter F. Stewart
- 29 shared
Chaoqi Yang
- 28 shared
Christos Faloutsos
Carnegie Mellon University
Labs
Siebel School of Computing and Data SciencePI
Education
- 2006
Ph.D., Computer Science
University of Illinois at Urbana-Champaign
- 2002
M.S., Computer Science
University of Illinois at Urbana-Champaign
- 1999
B.S., Computer Science
University of Science and Technology of China
Awards & honors
- Jimeng Sun and Kaiyu Guan rank among 12 Illinois scientists…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Jimeng Sun
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup