
Li Shen
VerifiedUniversity of Pennsylvania · Rehabilitation Medicine
Active 1970–2026
About
Li Shen, Ph.D., FAIMBE, FACMI, FAMIA, is a Professor of Informatics in Biostatistics and Epidemiology at the University of Pennsylvania's Perelman School of Medicine. He is also a Senior Fellow at the Penn Institute for Biomedical Informatics (IBI), Faculty in the Penn Graduate Group in Bioengineering, Applied Mathematics and Computational Science, and the Mahoney Institute for Neurosciences. Additionally, he serves as Associate Director for Bioinformatics at IBI, Faculty Director of the IBI Bioinformatics Core, Co-Director of the Penn Center for AI and Data Science for Integrated Diagnostics (AI2D), and Interim Director of the Division of Informatics in the Department of Biostatistics, Epidemiology, and Informatics. His research expertise focuses on bioinformatics strategies for brain-wide genome-wide association studies to advance Alzheimer’s disease research. His work spans artificial intelligence (AI), machine learning (ML), biomedical and health informatics, natural language processing (NLP), large language models (LLMs), medical image computing, network science, and multi-omics and systems biology, with applications across complex disorders. Dr. Shen has authored over 450 peer-reviewed articles and his research is supported by NIH and NSF. His primary focus is on developing and applying advanced AI/ML/informatics methods to analyze large-scale biobank and health datasets, aiming to improve understanding, early detection, treatment, prevention, and healthcare of complex disorders. He also explores emerging frontiers such as generative AI, agentic AI, and trustworthy multimodal AI to push the boundaries of biomedical research. Dr. Shen has served on various scientific journal editorial boards, grant review committees, and professional meeting organizing committees in medical image computing, biomedical and health informatics, and computational biology. He served as the Executive Director of the MICCAI Society from 2016 to 2019. He is recognized as a fellow of the American Institute for Medical and Biological Engineering (AIMBE), the American College of Medical Informatics (ACMI), and the American Medical Informatics Association (AMIA). He is also a distinguished member of the Association for Computing Machinery (ACM) and a distinguished contributor of the IEEE Computer Society.
Research topics
- Neuroscience
- Biology
- Medicine
- Genetics
- Computer Science
- Internal medicine
- Artificial Intelligence
- Theoretical computer science
- Mathematics
- Psychology
- Algorithm
- Evolutionary biology
- Oncology
- Pathology
Selected publications
Strength-Adaptive Adversarial Training
IEEE Transactions on Pattern Analysis and Machine Intelligence · 2026-01-01
preprintOpen accessAdversarial training (AT) has been shown to effectively enhance a network's resilience against adversarial attack. However, conventional AT, which relies on a fixed pre-specified perturbation budget, suffers from several limitations when training robust models. First, enforcing the same perturbation budget across networks with different capacities leads to varying levels of robustness disparity between natural and robust accuracies, which deviates from the desired outcome of a robust network. Second, because the perturbation budget is fixed throughout training, the attack strength fails to scale adaptively with the evolving robustness of the model. This mismatch often results in robust overfitting and further degradation of adversarial robustness. To address these limitations, we propose a novel technique called Strength-Adaptive Adversarial Training (SAAT). In SAAT, the adversary incorporates an adversarial-loss constraint to guide the generation of adversarial training data. This constraint allows the perturbation budget to adapt dynamically based on the current training state, which effectively mitigates robust overfitting. Moreover, by explicitly regulating the attack strength through the adversarial loss, SAAT enables precise control over the robustness disparity between natural accuracy and adversarial robustness. Extensive experiments demonstrate that SAAT substantially improves adversarial robustness over standard AT.
npj Precision Oncology · 2026-03-31
articleOpen accessThe profound spatial and temporal heterogeneity of non-small cell lung cancer (NSCLC) drives unpredictable responses to neoadjuvant chemoimmunotherapy (NCI), highlighting the need for effective predictive biomarkers to optimize treatment. In this multicenter study, we evaluated the ability of habitat imaging to predict major pathological response (MPR) to NCI by capturing spatial-temporal tumor heterogeneity, using pre- and post-treatment CT scans from 394 patients with resectable non-small cell lung cancer across three institutions. A radiomics-based predictive framework integrating global texture descriptors, spatial heterogeneity features, and longitudinal imaging information was constructed to distinguish pathological responders from non-responders. Models based on global texture or spatial heterogeneity features alone achieved areas under the receiver operating characteristic curve (AUCs) ranging from 0.71 to 0.80 across validation cohorts, whereas the integrated model further improved discrimination, achieving an AUC of up to 0.85 in external validation. These findings demonstrate that habitat imaging provides a robust approach for predicting MPR and supporting patient stratification and personalized treatment planning in NSCLC.
bioRxiv (Cold Spring Harbor Laboratory) · 2026-03-19
articleOpen access1st authorCorrespondingAbstract DNA supercoiling is essential for the developmental cycle of Chlamydia trachomatis , yet its role in shaping antibiotic responses remains poorly understood. We investigated how the fluoroquinolone moxifloxacin (Mox) influenced C. trachomatis growth across developmental stages with its distinct supercoiling levels. Early Mox exposure completely halted bacterial growth, whereas treatment during mid-cycle produced enlarged, persistent forms and abolished formation of infectious progeny. These stage-specific outcomes coincided with inhibition of DNA replication, depletion of DNA gyrase, and transcriptional repression of ompA and omcB , accompanied by preserved or elevated expression of the stress-responsive groESL1 operon. Mox also elicited compensatory downregulation of topoisomerase I (TopA), consistent with attempts to rebalance intracellular supercoiling. Together, these data demonstrate that fluoroquinolone susceptibility in C. trachomatis reflects stage-dependent supercoiling levels. Perturbation of supercoiling homeostasis drives developmental arrest and persistence phenotypes, highlighting coordinated gyrase-TopA activity as a key determinant of fluoroquinolone tolerance and a potential target for overcoming persistent infection. Importance C. trachomatis , a medically significant bacterial pathogen, can persist under antimicrobial pressure, complicating treatment strategy. This study links supercoiling homeostasis to fluoroquinolone tolerance, offering mechanistic insights into chlamydial adaptation to antibiotic stress and identifying potential targets to overcome persistence—an urgent challenge in global reproductive health.
Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration
IEEE Transactions on Pattern Analysis and Machine Intelligence · 2026-01-01
preprintOpen accessDecentralized Federated Learning has emerged as an alternative to centralized architectures due to its faster training, privacy preservation, and reduced communication overhead. In decentralized communication, the server aggregation phase in Centralized Federated Learning shifts to the client side, which means that clients connect with each other in a peer-to-peer manner. However, compared to the centralized mode, data heterogeneity in Decentralized Federated Learning will cause larger variances between aggregated models, which leads to slow convergence in training and poor generalization performance in tests. To address these issues, we introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. It consists of two main components: the Moreau envelope function, which primarily addresses parameter inconsistencies among clients caused by data heterogeneity, and Nesterov's extrapolation step, which accelerates the aggregation phase. Theoretically, we prove the optimization error bound and generalization error bound of the algorithm, providing a further understanding of the nature of the algorithm and the theoretical perspectives on the hyperparameter choice. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed, computational cost, and generalization performance on CIFAR10/100 and Tiny-ImageNet with various non-iid data distributions. Moreover, extensive experiments are conducted to validate the theoretical properties of DFedCata, showing strong consistency between theory and empirical observations. Our code is available at https://github.com/zzylyxx/DFedCata.
University of Liverpool · 2026-01-01
dissertationOpen access1st authorCorrespondingJournal of Clinical Medicine · 2026-03-10
articleOpen accessSenior authorBackground/Objectives: The use of GLP-1 RAs has dramatically increased with expanded indications for diabetes mellitus and obesity. Delayed gastric emptying due to these medications can lead to increased residual gastric content (RGC). While previous studies have focused on Esophagogastroduodenoscopy (EGD), few have specifically analyzed the impact of GLP-1 RAs on residual gastric content in patients undergoing concurrent colonoscopy with adequate bowel preparation. Methods: A retrospective, case–control study was conducted at Shanghai East Hospital from January 2023 to June 2025. Adult patients with increased RGC were identified as cases. Controls without increased RGC were randomly selected at a 1:2 ratio, matched for age and sex. Multivariable logistic regression was used to assess the independent association between GLP-1 RAs use and increased RGC. Results: Among 131,255 procedures screened, 3746 patients were included (1257 with increased RGC and 2489 controls). GLP-1 RAs users had higher odds of increased RGC in both unadjusted [OR 15.20 (95% CI 5.98–38.61)] and adjusted analyses [aOR = 13.31 (95% CI 5.07–34.93)]. Other significant risk factors for RGC included diabetes-related complications [aOR = 8.89 (3.15–25.12)]. Interestingly, among the enrolled patients who used GLP-1 RAs and underwent concurrent colonoscopy, 19 of the 22 patients (86.4%) exhibited increased RGC, whereas only 3 (13.6%) did not. Conclusions: Perioperative use of GLP-1 RAs is associated with an increased residual gastric content in patients undergoing EGD alone or with concurrent colonoscopy. There was no aspiration event related to residual gastric content. Our study highlights the need for vigilant preoperative assessment and individualized periprocedural management in patients on GLP-1 RAs undergoing endoscopic procedures, despite having standardized adequate bowel preparation.
IEEE Access · 2026-01-01
articleOpen accessIn novel power systems, complex coupling among stations and devices with diverse synchronization mechanisms profoundly impacts system stability. To enhance the transient synchronization stability of hybrid systems integrating grid-forming (GFM) converters and grid-following (GFL) stations, this study investigates the transient interaction mechanisms during current-limiting operation. Considering the current-limiting characteristic, a dynamic model is developed to evaluate the transient synchronization stability of the hybrid GFM and GFL converter system, and the phase plane method is adopted to theoretically analyze the influence of key parameters on transient synchronization stability. The theoretical analysis show that reducing the current saturation angles of both devices, which entails injecting more reactive current into the grid during fault conditions, improves system stability by reducing the acceleration area and increasing the deceleration area through the voltage-power coupling effect between GFM and GFL units; decreasing the GFM converter’s active power command primarily enhances its own transient stability; shortening the GFM converter’s line distance benefits the transient stability of both units. Time-domain simulations are further carried out to verify the correctness of the above theoretical conclusions. The novelty of this study lies in revealing the dynamic coupling mechanism of the GFM-GFL hybrid system under current-limiting operation and clarifying the influence patterns of critical control parameters on transient synchronization stability.
Statistics in Medicine · 2026-04-26
articleOpen accessWith the advent of high-throughput techniques, multi-omics data and various clinical outcomes have been collected for a range of diseases. Multi-omics data play a crucial role in uncovering complex biological processes, yet simultaneous representation learning of such high-dimensional, heterogeneous multi-modality data along with clinical outcomes remains limited. To address this gap, we propose a supervised knowledge-guided Bayesian factor model for integrative analysis of multi-omics and clinical outcome data. The proposed method simultaneously extracts an informative low-dimensional representation and predicts one or more clinical outcomes of interest. The two-level adaptive shrinkage in the novel hierarchical priors allows for the identification of both active modalities and features, resulting in a biologically meaningful structural identification of the high-dimensional data. Moreover, the method is robust to noisy edges in biological graphs that do not align with ground truth. Finally, the proposed method can handle different data types including both continuous and categorical data. Extensive simulation studies and real data analyses of Alzheimer's disease (AD) data demonstrate the advantages of the proposed approach over existing methods. Notably, our analysis of multi-omics and imaging phenotype data from ADNI provides meaningful insights into the underlying biological mechanisms of AD.
QAgent: A modular Search Agent with Interactive Query Understanding
ArXiv.org · 2025-10-09
preprintOpen accessLarge language models (LLMs) excel at natural language tasks but are limited by their static parametric knowledge, especially in knowledge-intensive task. Retrieval-augmented generation (RAG) mitigates this by integrating external information. However, (1) traditional RAG struggles with complex query understanding, and (2) even search agents trained with reinforcement learning (RL), despite their promise, still face generalization and deployment challenges. To address these limitations, we propose QAgent, a unified agentic RAG framework that employs a search agent for adaptive retrieval. This agent optimizes its understanding of the query through interactive reasoning and retrieval. To facilitate real-world application, we focus on modular search agent for query understanding that are plug-and-play in complex systems. Secifically, the agent follows a multi-step decision process trained with RL to maximize retrieval quality and support accurate downstream answers. We further analyze the strengths and weaknesses of end-to-end RL and propose a strategy that focuses on effective retrieval, thereby enhancing generalization in LLM applications. Experiments show QAgent excels at QA and serves as a plug-and-play module for real-world deployment.
Non-contact surgical navigation: in action.
ASVIDE · 2025-12-01
article
Recent grants
Informatics Algorithms for Genomic Analysis of Brain Imaging Data
NIH · $1.4M · 2020–2026
SCH: INT: Mining Drug-Drug Interaction Induced Adverse Effects from Health Record Databases
NSF · $1.2M · 2016–2018
NIH · $351k · 1991
Artificial Intelligence Strategies for Alzheimer's Disease Research
NIH · $6.7M · 2021–2026
NSF · $316k · 2019–2023
Frequent coauthors
- 1375 shared
Andrew J. Saykin
Indiana University
- 780 shared
Shannon L. Risacher
- 772 shared
Michael W. Weiner
University of California, San Francisco
- 568 shared
Clifford R. Jack
WinnMed
- 568 shared
Kwangsik Nho
Indiana University School of Medicine
- 528 shared
Michael Donohue
Janssen (United States)
- 456 shared
Paul Aisen
University of Southern California
- 431 shared
John C. Morris
Washington University in St. Louis
Labs
Shen LabPI
Education
- 2004
PhD, Computer Science
Dartmouth College
Awards & honors
- Fellow of the American Institute for Medical and Biological…
- Fellow of the American College of Medical Informatics (ACMI)
- Fellow of the American Medical Informatics Association (AMIA…
- Distinguished Member of the Association for Computing Machin…
- Distinguished Contributor of the IEEE Computer Society
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Li Shen
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup