Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Marylyn D Ritchie

Marylyn D Ritchie

· Ph.D.Verified

University of Pennsylvania · Rehabilitation Medicine

Active 1993–2026

h-index114
Citations61.2k
Papers815227 last 5y
Funding$60.1M4 active
See your match with Marylyn D Ritchie — sign in to PhdFit.Sign in

About

Marylyn D Ritchie, PhD, is an Adjunct Professor of Genetics at the University of Pennsylvania's Perelman School of Medicine. She serves as the Director of the Institute for Biomedical Informatics and is the Vice President for Research Informatics at the University of Pennsylvania Health System. Additionally, she is the Director of the Division of Informatics within the Department of Biostatistics, Epidemiology, and Informatics at the same institution. Her research expertise encompasses computational genomics, bioinformatics, epistasis, pharmacogenomics, big data, evolutionary computation, genetic epidemiology, statistical genetics, systems genomics, and translational informatics, with a focus on cardiovascular disease. Dr. Ritchie's work involves applying advanced computational and statistical methods to understand genetic and molecular mechanisms underlying complex diseases, contributing significantly to the fields of biomedical informatics and genomics.

Research topics

  • Genetics
  • Biology
  • Medicine
  • Evolutionary biology
  • Internal medicine
  • Computer Science
  • Political Science
  • Bioinformatics
  • Computational biology
  • Data science
  • Pathology
  • Endocrinology
  • Virology
  • Environmental health
  • Psychiatry
  • Medical emergency
  • Demography
  • Cardiology
  • Surgery
  • Immunology
  • Clinical psychology

Selected publications

  • Proteomic risk score for early prediction of kidney disease progression in individuals with APOL1 high-risk genotypes

    Nature Medicine · 2026-04-15

    articleOpen access

    Abstract Individuals of African ancestry carrying APOL1 (apolipoprotein L1) high-risk genotypes face a markedly increased risk of kidney failure, yet tools to identify those individuals likely to progress to chronic kidney disease are lacking. Here we profiled plasma proteomes of 851 Penn Medicine BioBank participants of African ancestry (285 males and 566 females) with APOL1 high-risk genotypes and preserved estimated glomerular filtration rate (eGFR) (≥60 ml min −1 1.73 m −2 ). Using elastic net Cox regression adjusted for age, sex, eGFR and albuminuria, we derived a nine-protein APOL1 Proteomic Risk Score (APRS) that predicts a composite outcome of ≥40% eGFR decline, kidney failure or death. APRS achieved a time-dependent area under the receiver operating characteristic curve (tAUC) of 86.5%, outperforming the Kidney Failure Risk Equation (66.1%) and polygenic risk scores, with 10-year event rates of 62.5% versus 3.3% across risk quintiles. External validation in Atherosclerosis Risk in Communities and UK Biobank cohorts confirmed robust accuracy (tAUC 82–85%) and consistent performance across demographic and clinical subgroups. Plasma levels of APRS component proteins correlated with kidney tissue fibrosis and tubular injury pathways, indicating strong biological plausibility. By enabling early and accurate prediction of disease progression in APOL1 high-risk individuals, APRS bridges the gap between genetic susceptibility and clinical translation. This scalable and biologically informed approach provides a precision medicine framework for early intervention and may accelerate development of APOL1-targeted therapies to reduce kidney disease disparities.

  • Enabling Few-Shot Alzheimer's Disease Diagnosis on Biomarker Data with Tabular LLMs

    ArXiv.org · 2025-07-31

    preprintOpen access

    Early and accurate diagnosis of Alzheimer's disease (AD), a complex neurodegenerative disorder, requires analysis of heterogeneous biomarkers (e.g., neuroimaging, genetic risk factors, cognitive tests, and cerebrospinal fluid proteins) typically represented in a tabular format. With flexible few-shot reasoning, multimodal integration, and natural-language-based interpretability, large language models (LLMs) offer unprecedented opportunities for prediction with structured biomedical data. We propose a novel framework called TAP-GPT, Tabular Alzheimer's Prediction GPT, that adapts TableGPT2, a multimodal tabular-specialized LLM originally developed for business intelligence tasks, for AD diagnosis using structured biomarker data with small sample sizes. Our approach constructs few-shot tabular prompts using in-context learning examples from structured biomedical data and finetunes TableGPT2 using the parameter-efficient qLoRA adaption for a clinical binary classification task of AD or cognitively normal (CN). The TAP-GPT framework harnesses the powerful tabular understanding ability of TableGPT2 and the encoded prior knowledge of LLMs to outperform more advanced general-purpose LLMs and a tabular foundation model (TFM) developed for prediction tasks. To our knowledge, this is the first application of LLMs to the prediction task using tabular biomarker data, paving the way for future LLM-driven multi-agent frameworks in biomedical informatics.

  • Integrative multi-omics approaches identify molecular pathways and improve Alzheimer’s Disease risk prediction

    medRxiv · 2025-06-02 · 1 citations

    preprintOpen accessSenior authorCorresponding

    Alzheimer's Disease (AD) is the most prevalent condition that impacts the aging population, with no effective treatment or singular underlying causal factor identified. As a complex disease, characterizing the genetic risk of developing AD has proven to be difficult; polygenic scores (PGS) exclusively use common variants which fail to fully capture disease heterogeneity. This study used univariate and multivariate approaches to characterize AD risk. Genome-, transcriptome-, and proteome-wide association studies (GWAS, TWAS, and PWAS) were conducted on 15,480 individuals from the Alzheimer's Disease Sequencing Project (ADSP) R4 release to identify AD-associated signals, followed by pathway enrichment analysis. Integrative risk models (IRMs) were developed using genetically-regulated components of gene and protein expression and clinical covariates. Elastic-net logistic regression and random forest classifiers were evaluated using area under the receiver operating characteristic (AUROC), area under the precision-recall curve (AUPRC), F1-score, and balanced accuracy. These IRMs were compared against baseline PGS and covariate models. We identified 104 genomic, 319 transcriptomic, and 17 proteomic associations with AD under significant thresholds. Putatively novel associations were enriched in signaling, myeloid differentiation, and immune pathways. The best-performing IRM, random forest with transcriptomic and covariate features, achieved an AUROC of 0.703 and AUPRC of 0.622, significantly outperforming PGS and baseline models. Integrating univariate discovery approaches with multivariate modeling enhances AD risk prediction and offers insights into underlying biological processes.

  • Genetic polymorphisms and adverse reactions to antituberculosis therapy

    Pharmacogenomics · 2025-04-13 · 2 citations

    reviewOpen access

    Tuberculosis is the leading cause of death from a single infectious agent globally, with the highest burden in low-and middle-income countries. Successful treatment requires prolonged administration of multiple drugs. The increasing threat of multidrug-resistant tuberculosis has prompted the development of a robust pipeline for new drugs. While generally safe and well tolerated, adverse drug reactions (ADRs) to TB drugs have a considerable impact on treatment outcomes. Pharmacogenetic testing has been implemented for some diseases to identify at-risk individuals and prevent ADRs. For tuberculosis treatment, the use of pharmacogenetic testing to optimize complex regimens and avoid ADRs is appealing, but there has been minimal implementation. To improve the use of pharmacogenetics, understanding both the pharmacology of relevant drugs and population-specific pathophysiology of ADRs are essential. This review highlights the major treatment-limiting ADRs with TB drugs, the current understanding of drug metabolic pathways, ADR pathophysiology, and known pharmacogenetic risk alleles. We highlight research gaps and barriers to meaningful clinical use and implementation of pharmacogenomic testing to prevent adverse reactions to TB drugs.

  • Ascending Aortic Dimensions and Body Size

    JACC. Cardiovascular imaging · 2025-08-22 · 2 citations

    article
  • Integrating polygenic scores with clinical, lifestyle, and social risk factors to improve heart failure risk prediction

    2025-11-16

    articleOpen accessSenior authorCorresponding

    <ns3:p> Heart failure (HF) is highly prevalent, high-burden disorder with its prevalence expected to increase. Early detection of HF can reduce morbidity and mortality; therefore, novel early detection methods are needed. Polygenic scores (PGS) can combine common variants across the genome and provide phenotype-specific risk scores. However, there are also many well-known, non-genomic risk factors of HF, in the clinical, lifestyle, and social determinant of health (SDOH) domains, and it is not clear how genetic and non-genetic risk factors collectively contribute to HF risk. To address this question, we assessed whether combining HF PGS with clinical, lifestyle, and SDOH risk factors improves risk prediction. Leveraging data from the <ns3:italic>All of Us</ns3:italic> Research Program (n = 22,275), clinical risk factors were aggregated into a clinical risk score (CRS) while lifestyle and SDOH risk factors were aggregated into a polyexposure score (PXS). Feature selection was conducted with LASSO regression and statistical significance thresholding from logistic regression models (p &lt; 0.05). Features were included in the model if they were statistically significant and important in <ns3:italic>≥</ns3:italic> 95% of 1000 iterations. To assess model performance, logistic regressions with HF case/control status were conducted with each risk score individually, as well as integrated models. The integrated model (PGS + CRS + PXS) performed better than individual risk scores (AUROC = 0.763, AUPRC = 0.047, F1 score = 0.062, balanced accuracy = 0.683). To assess the validity of the CRS and PXS, an integrated model with the PGS along with clinical and exposure risk factors as independent features was also evaluated. Based on AUPRC and F1 score, this integrated risk model (PGS + CRS risk factors + PXS risk factors) performed better than the combining the PGS with the CRS and PXS (AUROC = 0.738, AUPRC = 0.047, F1 score = 0.066, balanced accuracy = 0.657). These findings demonstrate that integration of risk factors across multiple domains can improve HF prediction. Knowing that PGS combined with clinical, lifestyle, and SDOH risk factors is predictive of HF risk provides greater opportunity for the identification of individuals at risk of HF prior to disease onset with the goal of prevention or early intervention. </ns3:p>

  • A one-shot, lossless algorithm for cross-cohort learning in mixed-outcomes analysis

    Patterns · 2025-07-30

    articleOpen access

    In cross-cohort studies, integrating diverse datasets is essential and challenging due to cohort-specific variations, distributed data storage, and privacy concerns. Traditional methods often require data pooling or harmonization, which can reduce efficiency and limit the scope of cross-cohort learning. We introduce mixWAS, a one-shot, lossless algorithm that efficiently integrates distributed electronic health record (EHR) datasets via summary statistics. Unlike existing approaches, mixWAS preserves cohort-specific covariate associations and supports simultaneous mixed-outcome analyses. Simulations demonstrate that mixWAS outperforms conventional methods in accuracy and efficiency across various scenarios. Applied to EHR data from seven cohorts in the US, mixWAS identified 4,530 significant cross-cohort genetic associations among traits such as blood lipids, BMI, and circulatory diseases. Validation with an independent UK EHR dataset confirmed 97.7% of these associations, underscoring the algorithm's robustness. By enabling lossless cross-cohort integration, mixWAS improves the precision of multi-outcome analyses and expands the potential for actionable insights in healthcare research.

  • A novel computational analysis integrating social determinants information from EHR and literature with Alzheimer’s disease biological knowledge through large language models and knowledge graphs

    Innovation in Aging · 2025-09-22 · 1 citations

    articleOpen access

    Background and Objectives: Alzheimer's disease (AD) and AD-related dementias (ADRD) are expected to affect over 100 million people by 2050, placing a significant strain on public health systems. Social determinants of health (SDoH), which include factors such as socioeconomic conditions and environment, play a crucial role in AD risk. Despite growing evidence, the understanding of SDoH's impact on AD remains limited. Research Design and Methods: This study leverages large language models and knowledge graphs (KGs) to extract AD-related SDoH knowledge from literature and electronic health records (EHR). We integrate this knowledge into biological research on AD through KG construction and graph deep learning, performing KG-link predictions validated by multimodal biological data from single-cell RNA-seq and proteomics. Results: We generated an SDoH knowledge graph with around 92k triplets, integrating literature and EHR data. In various link prediction experiments, we observed higher accuracy when integrating SDoH into knowledge graphs. Additionally, exploratory predictions uncovered potential SDoH-gene interactions, many of which were validated through differential expression analysis using proteomics and RNA-seq data. Discussion and Implications: This novel KG-based analysis enhances link prediction in AD-related biomedical networks by integrating SDoH and biological knowledge. Our findings highlight the potential interaction between social determinants and biological factors in AD, offering insights into more personalized and socially aware healthcare interventions.

  • A loss-of-function missense variant in ANGPTL3 exerts protective effects against kidney disease risk

    Atherosclerosis · 2025-07-18

    articleOpen access
  • Author Correction: Common-variant and rare-variant genetic architecture of heart failure across the allele-frequency spectrum

    Nature Genetics · 2025-11-28

    articleOpen access

Recent grants

Frequent coauthors

  • Anurag Verma

    307 shared
  • Dana C. Crawford

    Case Western Reserve University

    282 shared
  • Dan M. Roden

    Vanderbilt University

    268 shared
  • Sarah A. Pendergrass

    Geisinger Medical Center

    255 shared
  • Yuki Bradford

    University of Pennsylvania

    224 shared
  • Gail P. Jarvik

    Seattle University

    216 shared
  • Jun Liu

    University of California, San Francisco

    207 shared
  • Joshua C. Denny

    National Institutes of Health

    203 shared

Labs

  • Marylyn D Ritchie LabPI

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Marylyn D Ritchie

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup