Danielle L. Mowery

Verified

University of Pennsylvania · Rehabilitation Medicine

Active 1989–2026

h-index28

Citations2.8k

Papers184107 last 5y

Funding—

Faculty page

See your match with Danielle L. Mowery — sign in to PhdFit.Sign in

About

Danielle L. Mowery, PhD, MS, MS, FAMIA, FACMI, is an Assistant Professor of Informatics in Biostatistics and Epidemiology at the Hospital of the University of Pennsylvania. She serves as the Chief Research Information Officer for Penn Medicine and is the Scientific Director of the Clinical Research Informatics Core at the Institute for Biomedical Informatics. Dr. Mowery is a senior fellow at the Institute for Biomedical Informatics and a member of the Penn Institute for Translational Medicine and Therapeutics (ITMAT). She also holds associate memberships in the Abramson Cancer Center's Cancer Control Program, the Penn Center for Nutritional Science and Medicine (PenNSAM), and the Penn Medicine Center for Applied Health Informatics (CAHI). Additionally, she co-directs the Data Integration Core at the Penn Obstructive Sleep Apnea Center. Her research focuses on biomedical informatics, with particular expertise in natural language processing, health information systems, and data science applications in clinical and translational research. Dr. Mowery's educational background includes a BS in Biological Sciences from the University of Pittsburgh, followed by MS degrees in Health Information Systems and Natural Language Processing, and a PhD in Biomedical Informatics from the University of Pittsburgh. Her work involves developing computational models for disease identification, analyzing social determinants of health, and advancing machine learning techniques for healthcare data. She has contributed to numerous studies on hospital outcomes, COVID-19, long COVID, and other health informatics topics, emphasizing the integration of AI and natural language processing to improve clinical decision-making and patient care.

Research topics

Medicine
Political Science
Internal medicine
Medical emergency
Pathology
Pediatrics
Genetics
Emergency medicine
Intensive care medicine

Selected publications

Development and Validation of a Large Language Model Case Identification Strategy for Eosinophilic Esophagitis
Gastro Hep Advances · 2026-01-01
articleOpen access
Background and Aims: Epidemiologic research in eosinophilic esophagitis (EoE) is limited by the accuracy and efficiency of case identification algorithms. We aimed to evaluate rule-based natural language processing (RB-NLP) and large language model–based natural language processing (LLM-NLP) pipelines for identifying EoE diagnoses and features from unstructured text. Methods: We identified gastrointestinal pathology reports with any mention of “eosinophil” paired with gastroenterology clinic notes. Three hundred randomly selected patients were divided into training (n = 200, 56 with EoE) and testing (n = 100, 36 with EoE) sets. Manual chart review was the reference standard. RB-NLP used spaCy with medspaCy’s clinical components; LLM-NLP prompts were developed through iterative human-in-the-loop refinement. In the validation set, we compared International Classification of Diseases (ICD) codes, RB-NLP, and LLM-NLP against the reference standard using sensitivity (recall), positive predictive value (precision), and F1 score. Results: In the validation set, ICD codes alone had a sensitivity 0.86 (95% confidence interval [CI]: 0.75–0.97), a positive predictive value of 0.97 (95% CI: 0.91–1.0), and an F1 value of 0.91 (95% CI: 0.84–1.0). Combining ICD and LLM-assigned diagnosis yielded a 3-point improvement in F1 score (95% CI: −0.01 to 0.07; P = .2) compared to ICD alone. In a larger cohort (n = 580), the LLM + ICD approach identified the most EoE cases (n = 203) and captured 15% of cases missed by ICD codes. Clinical characteristics varied depending on the case identification strategy used. Conclusion: Combining LLM-NLP with a single ICD code reduced false negatives and modestly improved the F1 score compared to either method alone. This may represent a scalable approach to enhance EoE case identification in real-world data.
Publisher DOI
Automated Safety Plan Scoring in Outpatient Mental Health Settings Using Large Language Models: Exploratory Study
JMIR Mental Health · 2025-09-03 · 1 citations
articleOpen accessSenior author
Background: The safety planning intervention (SPI) is a suicide prevention intervention that results in a written plan to help patients reduce suicide risk. High-quality safety plans-that is, those that are the most complete, personalized, and specific-are more effective in reducing suicide risk. Measuring SPI quality is labor-intensive, which means that clinicians rarely get specific, actionable feedback on their use of the SPI. Objective: This study aimed to develop the Safety Plan Fidelity Rater, an automated tool that assesses the quality of written safety plans leveraging 3 large language models (LLMs)-GPT-4, LLaMA 3, and o3-mini. Methods: Using 266 deidentified safety plans from outpatient mental health settings in New York, LLMs analyzed four key steps: warning signs, internal coping strategies, making the environment safe, and reasons for living. We compared the predictive performance of the three LLMs, optimizing scoring systems, prompts, and parameters. Results: Findings showed that LLaMA 3 and o3-mini outperformed GPT-4, with different step-specific scoring systems recommended based on weighted F1-scores. Conclusions: These findings highlight LLMs' potential to provide clinicians with timely and accurate feedback on safety plan quality, which could greatly improve its implementation in community practice.
Publisher DOI
Exploring the Potential of Large Language Models for Automated Safety Plan Scoring in Outpatient Mental Health Settings
medRxiv · 2025-03-27
preprintOpen accessSenior authorCorresponding
The Safety Planning Intervention (SPI) produces a plan to help manage patients' suicide risk. High-quality safety plans - that is, those with greater fidelity to the original program model - are more effective in reducing suicide risk. We developed the Safety Planning Intervention Fidelity Rater (SPIFR), an automated tool that assesses the quality of SPI using three large language models (LLMs)-GPT-4, LLaMA 3, and o3-mini. Using 266 deidentified SPI from outpatient mental health settings in New York, LLMs analyzed four key steps: warning signs, internal coping strategies, making environments safe, and reasons for living. We compared the predictive performance of the three LLMs, optimizing scoring systems, prompts, and parameters. Results showed that LLaMA 3 and o3-mini outperformed GPT-4, with different step-specific scoring systems recommended based on weighted F1-scores. These findings highlight LLMs' potential to provide clinicians with timely and accurate feedback on SPI practices, enhancing this evidence-based suicide prevention strategy.
Publisher OA PDF DOI
MedVidDeID: Protecting Privacy in Clinical Encounter Video Recordings
SSRN Electronic Journal · 2025-01-01 · 2 citations
preprintOpen access
Publisher DOI
S919 Large Language Models Combined With a Single Diagnostic Code Detect Eosinophilic Esophagitis in Electronic Health Records With High Accuracy
The American Journal of Gastroenterology · 2025-10-01
article
Introduction: Large-scale epidemiologic research for eosinophilic esophagitis (EoE) is hampered by variable accuracy of International Classification of Diseases (ICD) codes and case identification algorithms. We aimed to develop and validate a natural language processing (NLP) pipeline using large language models (LLMs) to identify EoE features and diagnosis from unstructured text, comparing performance with ICD codes and ICD + LLM combination. Methods: We identified gastrointestinal pathology reports with any mention of “eosinophil” paired with preceding GI clinic notes. Three hundred randomly selected patients were divided into training (N = 200, 56 with EoE) and testing (N = 100, 36 with EoE) sets. Manual chart review was used to assign an EoE reference standard. LLM prompt development used a human-in-the-loop approach with iterative refinements. Training concluded once the LLM-assigned diagnosis exceeded the F1 score (harmonic mean of precision and recall) of ICD codes. Using the test set, we compared performance of ICD codes, LLM-derived diagnostic features, LLM-assigned diagnosis, and a combined LLM + ICD. Performance metrics included sensitivity (recall), PPV (precision), and F1 score. Nonparametric bootstrap resampling (1,000 replicates) was used to estimate 95% confidence intervals (CIs) with statistical significance based on whether they excluded zero. Results: In the training set, LLM-derived diagnostic features demonstrated the highest sensitivity (0.98; 95% CI 0.95, 1.00), while LLM-assigned diagnosis had the highest PPV (0.92; 95% CI 0.84-1.0) and specificity (0.97; 95% CI 0.95,1.00). Comparably high F1 scores were achieved with LLM-derived features (0.89; 95% CI 0.83, 0.95) and ICD + LLM diagnosis (0.88; 95% CI 0.81, 0.95). In the independent test set, ICD codes alone showed a sensitivity of 0.86 (95% CI 0.75, 0.97), PPV of 0.97 (95% CI 0.91, 1.0), and an F1 of 0.91 (95% CI 0.84, 1.0). Combining ICD and LLM-assigned diagnosis yielded a 3-point improvement in F1 score (95% CI -0.01, 0.07; P = 0.2) compared to ICD alone. This combination method significantly improved sensitivity (0.92 [0.83,1.00]; P = 0.008) and F1 score (0.94 [0.89,1.00]; P = 0.047) relative to LLM-assigned diagnosis. Conclusion: Combining the LLM-assigned diagnosis with a single diagnostic code reduced false negatives and modestly improved the F1 score compared to either method alone, suggesting a scalable approach for improving EoE case identification in real-world data.
Publisher DOI
International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium
UNC Libraries · 2025-06-26
articleOpen access
Publisher DOI
S920 Diagnostic Codes Performance for Identification of Eosinophilic Gastrointestinal Diseases Varies by Reference Standard Definition and Disease Subtype
The American Journal of Gastroenterology · 2025-10-01
article
Introduction: Eosinophilic gastrointestinal diseases (EGIDs) are understudied. Internal Classification of Diseases (ICD) codes are a potential tool to facilitate EGID research, but their accuracy is unknown. We aimed to evaluate the ICD performance metrics for identifying EGIDs using multiple case definitions and across subtypes. Methods: We identified GI pathology reports containing “eosinophil” and paired them with preceding clinic notes from a large health system. Reference standard diagnoses were assigned via chart review and compared to ICD codes. The EoE reference standard required positive histology with variable definitions of esophageal dysfunction and diagnostic history. Defining reference standard for eosinophilic gastritis (EoG) and eosinophilic enteritis (EoN) included liberal (any eosinophils plus symptoms) and restrictive (increased eosinophils plus symptoms) criteria; only liberal criteria were used for eosinophilic colitis (EoC) due to limited detail. For non-EoE EGIDs, a rule-based natural language processing approach captured numeric or descriptive eosinophil increases. EoG and EoN, lacking distinct ICD codes, were evaluated individually and combined under the eosinophilic gastroenteritis code. Performance metrics included sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, Cohen’s kappa, and F1 score (harmonic mean of precision and recall). Results: Defining EoE reference standard with symptoms of dysphagia and food impaction only, the ICD achieved an F1 score of 0.70 (95% confidence interval [CI] 0.59-0.80), with a PPV of 0.64 and sensitivity of 0.77. Including additional esophageal dysfunction symptoms yielded a modest improvement in F1 score (ΔF1 = 0.02; 95% CI 0.0 to 0.08). Further addition of history of EoE documented in note free text resulted in higher PPV (0.95), specificity (0.98), and F1 score (0.84; 95% CI 0.78-0.90), with similar sensitivity (0.76). The diagnostic performance of ICD codes for identifying EoG or EoN demonstrated low PPV (0.17-0.33) and sensitivity (0.08-0.33) across individual subtypes and in combination, with slightly better performance for restrictive definitions. No patients with EoC received the corresponding ICD code, resulting in zero sensitivity and undefined PPV. Conclusion: The diagnostic performance of ICD codes for EoE is influenced by the specific criteria and clinical data elements incorporated in the reference standard definition. The ICD codes for non-EoE EGIDs demonstrate high false positive and false negative rates. These findings highlight the need for improved case identification methods to facilitate EGID research.
Publisher DOI
Evaluating Large Language Models for Automatic Detection of In-Hospital Cardiac Arrest: Multi-Site Analysis of Clinical Notes
medRxiv · 2025-08-06
preprintOpen accessSenior authorCorresponding
In-hospital cardiac arrest (IHCA) affects over 200,000 patients annually in the United States, yet its detection through manual chart review remains resource-intensive and often delayed. We evaluated the performance of four open-source large language models (LLMs) and GPT-4o in identifying IHCA cases from 2,674 clinical notes across five hospitals. While GPT-4o achieved the highest performance (F1-score: 0.90, recall: 0.97), several open-source models demonstrated comparable capabilities, suggesting their viability for clinical applications. Our systematic analysis of model outputs revealed that performance was strongly influenced by site-specific documentation practices, with inter-site agreement rates varying by over 20%. Through detailed error analysis, we identified key challenges including medical terminology hallucinations and structural inconsistencies in model reasoning. These findings establish a framework for implementing LLM-based IHCA detection systems while highlighting critical considerations for their clinical deployment.
Publisher OA PDF DOI
Exploring themes and disparities in SI research across low-and-middle income countries using natural language processing (Preprint)
2025-11-13
articleOpen accessSenior author
<sec> <title>BACKGROUND</title> Despite bearing the largest burden of suicide globally, low- and middle-income countries (LMICs) remain significantly underrepresented in suicide research compared to high-income countries (HICs). </sec> <sec> <title>OBJECTIVE</title> This study leverages natural language processing (NLP) and qualitative analysis to examine disparities in suicide research between LMICs and HICs with a particular focus on suicidal ideation (SI). Additionally, the study aims to identify themes in LMIC SI research and explore trends within these themes over time. </sec> <sec> <title>METHODS</title> An analysis of 15,708 articles published between 1968 and 2022 was conducted, extracting country of focus using article titles and abstracts through NLP methods. Among the 1,458 articles focusing on SI in LMICs, Latent Dirichlet Allocation (LDA) and manual qualitative theming were used to identify and examine thematic clusters. </sec> <sec> <title>RESULTS</title> Results show that only 26.6% of SI articles focused on LMICs while 70.5% focused on HICs. Among the papers focusing on LMICS, five key themes emerged, including: developmental and lifespan context; interpersonal and risk factors; clinical instruments and care; gender, sex, perinatal and discrimination; and suicide attempts and death. Analyses were also used to identify trends in how often these themes were discussed within LMIC research over time. </sec> <sec> <title>CONCLUSIONS</title> Findings highlight the need for increased SI research in LMICs in additional to improved research infrastructure and support. This study also demonstrates the utility of NLP and qualitative methodologies for large-scale research syntheses and provides promising directions for future global disparities research. </sec>
Publisher DOI
International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: Retrospective Cohort Study
UNC Libraries · 2025-05-13
articleOpen access
BACKGROUND: Many countries have experienced 2 predominant waves of COVID-19-related hospitalizations. Comparing the clinical trajectories of patients hospitalized in separate waves of the pandemic enables further understanding of the evolving epidemiology, pathophysiology, and health care dynamics of the COVID-19 pandemic. OBJECTIVE: In this retrospective cohort study, we analyzed electronic health record (EHR) data from patients with SARS-CoV-2 infections hospitalized in participating health care systems representing 315 hospitals across 6 countries. We compared hospitalization rates, severe COVID-19 risk, and mean laboratory values between patients hospitalized during the first and second waves of the pandemic. METHODS: Using a federated approach, each participating health care system extracted patient-level clinical data on their first and second wave cohorts and submitted aggregated data to the central site. Data quality control steps were adopted at the central site to correct for implausible values and harmonize units. Statistical analyses were performed by computing individual health care system effect sizes and synthesizing these using random effect meta-analyses to account for heterogeneity. We focused the laboratory analysis on C-reactive protein (CRP), ferritin, fibrinogen, procalcitonin, D-dimer, and creatinine based on their reported associations with severe COVID-19. RESULTS: Data were available for 79,613 patients, of which 32,467 were hospitalized in the first wave and 47,146 in the second wave. The prevalence of male patients and patients aged 50 to 69 years decreased significantly between the first and second waves. Patients hospitalized in the second wave had a 9.9% reduction in the risk of severe COVID-19 compared to patients hospitalized in the first wave (95% CI 8.5%-11.3%). Demographic subgroup analyses indicated that patients aged 26 to 49 years and 50 to 69 years; male and female patients; and black patients had significantly lower risk for severe disease in the second wave than in the first wave. At admission, the mean values of CRP were significantly lower in the second wave than in the first wave. On the seventh hospital day, the mean values of CRP, ferritin, fibrinogen, and procalcitonin were significantly lower in the second wave than in the first wave. In general, countries exhibited variable changes in laboratory testing rates from the first to the second wave. At admission, there was a significantly higher testing rate for D-dimer in France, Germany, and Spain. CONCLUSIONS: Patients hospitalized in the second wave were at significantly lower risk for severe COVID-19. This corresponded to mean laboratory values in the second wave that were more likely to be in typical physiological ranges on the seventh hospital day compared to the first wave. Our federated approach demonstrated the feasibility and power of harmonizing heterogeneous EHR data from multiple international health care systems to rapidly conduct large-scale studies to characterize how COVID-19 clinical trajectories evolve.
Publisher DOI

Frequent coauthors

Wendy W. Chapman
75 shared
Antoine Neuraz
Centre de Recherche des Cordeliers
49 shared
Emily Schriver
University of Pennsylvania Health System
49 shared
Gabriel A. Brat
Harvard University
40 shared
Amelia L.M. Tan
Harvard University
40 shared
Chuan Hong
Duke University
39 shared
Tianxi Cai
Harvard University
38 shared
David A. Hanauer
University of Michigan–Ann Arbor
38 shared

Education

MS, Health & Rehabilitation Sciences
University of Pittsburgh
PhD, Biomedical Informatics
University of Pittsburgh
BS, Biological Sciences
University of Pittsburgh
MS, Biomedical Informatics
University of Pittsburgh
Post doctorate, Biomedical Informatics
University of Utah
2018

Awards & honors

Honorable mention, NIH N3C Long COVID Challenge 2023

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Danielle L. Mowery

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you