Naveen Muthu
VerifiedUniversity of Pennsylvania · Rehabilitation Medicine
Active 1978–2026
Research topics
- Medicine
- Computer science
- Medical emergency
- Emergency medicine
- Knowledge management
Selected publications
medRxiv · 2026-01-16
articleOpen accessArtificial intelligence models in healthcare often fail to improve patient outcomes despite strong predictive performance because they are frequently developed with limited understanding of clinical workflows and system implementation. We demonstrate a human-centered design approach to define prediction targets before model development, ensuring alignment with actionable clinical interventions. Using pediatric acute kidney injury as a case study, we convened a multidisciplinary working group and applied three complementary methods: user stories to elicit role-specific prediction targets, a People, Environment, Technology, and Tasks (PETT) Scan to analyze sociotechnical system factors, and process mapping to identify workflow leverage points. This approach revealed that different clinical roles require distinct prediction targets, with shared barriers including inadequate monitoring practices, poor visibility of at-risk patients, and unclear trajectories for kidney injury progression. By integrating clinical context before algorithm development, we identified high-impact prediction targets that support actionable interventions for hospitalists, nephrologists, and intensivists, demonstrating how human-centered design can bridge technical model performance and real-world clinical utility.
JMIR Human Factors · 2026-02-23
articleOpen accessBackground: Exposure to patients and clinical diagnoses drives learning in graduate medical education (GME). Measuring practice data, how each trainee experiences that exposure, is critical to planned learning processes including the assessment of trainee needs. We previously developed and validated an automated system to accurately identify resident provider-patient interactions. Objective: In this follow-up study, we use human-centered design methods to meet two objectives: (1) to understand trainees' planned learning needs and (2) to design, build, and validate the usability and use of a tool based on our automated resident provider-patient interaction system to meet these needs. Methods: We collected data from 2 institutions new to the American Medical Association's "Advancing Change" initiative, using a mixed methods approach with purposive sampling. First, interviews and formative prototype testing yielded qualitative data that we analyzed in several coding cycles. We built interview guides to collect data required for a work domain assessment, learning use case elicitation, and ultimately design requirement identification. We structured coding efforts within 2 existing theoretical models. Feature prioritization matrix analysis then transformed qualitative analysis outputs into actionable prototype elements that were refined through formative usability methods. Finally, qualitative data from a summative usability test validated the final prototype with measures of usefulness, usability, and intent to use. We used quantitative methods (eg, time on task and task completion rate in summative testing). Results: We represented the GME work domain assessment through process-map-design artifacts that provide target opportunities for intervention. Of the identified decision-making opportunities, trainee-mentor meetings stood out as optimal for delivering reliable practice-area information. We designed a "midpoint" report for the use case of such meetings. We arrived at a final prototype through formative testing and design iteration. This final version showed 5 essential visualizations. Summative usability testing resulted in high performance in subjective and objective metrics. Insufficient baseline data were captured to draw comparative conclusions in a formal evaluation against existing tools or workarounds to support planned learning. However, the prevailing reported absence of tools and the ad hoc nature of approaches that do exist strongly imply an unmet need for the type of usable summary method delivered in our tool. We collected data from June 2021 through September 2023. Eight resident physicians composed the validation sample, including 4 (50%) residents from the Children's Hospital of Philadelphia and 4 (50%) residents from the University of Rochester Medical Center. Conclusions: We describe the multisite development of a tool providing visualizations of log-level electronic health record data, using human-centered design methods. Delivered at an identified point in GME, the tool is ideal for fostering the development of master adaptive learners. The resulting prototype is validated with high performance on a summative usability test. Additionally, the design, development, and assessment process may be applied to other tools and topics within clinical informatics.
Applied Clinical Informatics · 2025-01-01 · 1 citations
articleOpen accessBACKGROUND: Primary care pediatricians play an important role in genetic testing, including referrals, test ordering, responding to results, assessing risk, treatment, and managing care. As genetic testing rapidly evolves to include new tests identifying patients at risk for certain conditions, alert-based clinical decision support is insufficient in assisting pediatric primary care providers in working with patients, parents, genetics, and other specialties. Supporting pediatricians in the return of these results requires addressing gaps in genetics training and integrating genetics into practice with education, information resources, and specialized tools. OBJECTIVES: This study aimed to capture requirements for developing systems and processes to support primary care pediatricians in the return of genome-informed risk assessments. METHODS: We performed a requirements analysis to inform the design of clinical decision support tools and processes for pediatric providers of patients who received a genome informed risk assessment, a novel test that combines polygenic risk scores with patient and family histories to deliver a risk assessment for common medical conditions. We developed an interview guide consisting of scenario presentations, questionnaires, and semi-structured questions to elicit provider responses on a broad set of requirements to manage results with patients and caregivers. RESULTS: Twenty providers from 10 primary care pediatric practices within a single health system participated in the study. The findings demonstrated that providers feel responsible to be involved in the process of returning results but require a support system that integrates education, provider and patient information resources, effective communication with genetics, and electronic health record decision support tools that can accommodate a range of clinical scenarios and provider workflow preferences. CONCLUSION: Supporting providers with the return of genetic testing results such as the genome informed risk assessment requires a comprehensive approach to decision support consisting of education, communication, and a comprehensive and integrated set of electronic health record tools.
Human performance evaluation of a pediatric artificial intelligence sepsis model
Journal of the American Medical Informatics Association · 2025-07-19 · 2 citations
articleOpen accessOBJECTIVE: To assess the influence of an implemented artificial intelligence model predicting pediatric sepsis (defined by IPSO-Improving Pediatric Sepsis Outcomes collaborative) in the emergency department (ED) on human performance measures. MATERIALS AND METHODS: Two ED sites within a large pediatric health system in the Southeastern United States between January 1, 2021 and April 1, 2024. We interviewed ED providers and nurses within 72 hours of caring for a patient identified as potentially having sepsis by the predictive model. Thematic analysis of qualitative data was combined with electronic health record queries to assess measures of human performance, including situation awareness, explainability, human-computer agreement, workload, trust, automation bias, and relationship between staff and patients. RESULTS: We interviewed 40 clinicians. Participants found that the sepsis alert improved situation awareness, leading to changes in patient care management, resource allocation, and/or monitoring. Participants reported an average trust in the model-based alert of 3.8/5. Only 28% (555/1977) of sepsis huddles were done without alert firing, suggesting some automation bias. Treatment with antibiotics for IPSO sepsis cases was similar pre- and post-intervention without a huddle (9.3% vs 10.5%), though treatment doubled with huddle intervention (22.7%). NASA Task Load Index increased from 43 to 57 post-intervention. There was no report of adverse relationships with patients post-intervention. DISCUSSION: Human performance appeared to be generally positive with improved situation awareness and satisfaction with the alert-driven huddle. However, there was some evidence of automation bias and a slight increase in workload with the intervention. CONCLUSION: This study demonstrates the feasibility of evaluating multiple dimensions of human performance using a mixed methods approach for an AI model implemented in clinical practice. Future studies should aim to reduce the measurement burden of human performance metrics associated with AI implementation in acute care settings and assess the correlation between human performance measures and clinical outcomes.
ArXiv.org · 2025-09-19
preprintOpen accessWe present UNIPHY+, a unified physiological foundation model (physioFM) framework designed to enable continuous human health and diseases monitoring across care settings using ubiquitously obtainable physiological data. We propose novel strategies for incorporating contextual information during pretraining, fine-tuning, and lightweight model personalization via multi-modal learning, feature fusion-tuning, and knowledge distillation. We advocate testing UNIPHY+ with a broad set of use cases from intensive care to ambulatory monitoring in order to demonstrate that UNIPHY+ can empower generalizable, scalable, and personalized physiological AI to support both clinical decision-making and long-term health monitoring.
2025-10-26
articleSurvivorship Care Plans (SCPs) are clinical documents that summarize treatments, long-term health risks, and evidence-based recommendations for cancer and hematopoietic stem cell transplantation (HSCT) survivors. Despite their clinical value, SCPs remain underutilized due to limited automation, high documentation burden, and workflow misalignment. Manual SCP generation is time-consuming, error-prone, and burdensome—particularly in complex cases requiring hours of chart review and manual calculation. To address these challenges, we developed a semi-automated SCP generation system grounded in principles of Artificial Intelligence (AI) implementation science, focusing on clinical context-aware integration, sustainable workflow alignment, and human-centered design. The system employs an Extract, Transform, and Load (ETL) pipeline to extract survivorship-relevant data from Epic Clarity, processing both structured and unstructured Electronic Health Record (EHR) data. Structured data are processed using deterministic rules, whose outputs are reviewed by clinical experts in an iterative, human-in-the-loop process to validate accuracy and refine rule logic. Unstructured notes are analyzed using a BERT-based NLP model to identify documentation of radiation therapy. In collaboration with a large pediatric healthcare system in the United States, we retrospectively identified a cohort of patients less than age 30 treated for cancer or HSCT between January 2011 and December 2021. Using a validation cohort of 864 patients, our system achieved ≥99.5% concordance for 53 out of 57 chemotherapy agent exposures, with most discrepancies attributable to human abstraction errors. Ongoing work includes usability testing with clinicians, co-design with survivorship coordinators, and evaluation of implementation outcomes such as trust, safety, and integration into clinical workflows.
medRxiv · 2025-02-06
preprintOpen accessBackground: Acute kidney injury (AKI) is common among children with critical illness and is associated with high morbidity and mortality. Risk prediction models designed for clinical decision support implementation offer an opportunity to identify and proactively mitigate AKI risks. Existing models have been primarily validated on single-center data, owing partly to the lack of appropriately detailed multicenter datasets. Objective: To determine the accuracy of a single-center model to predict new AKI at 72 hours of ICU admission across two multicenter datasets and extend this model to improve prediction accuracy while maintaining acceptable alert burden. Derivation and Validation Cohorts: We separately derived models in two datasets: PEDSNET-VPS, created through the linkage of PEDSnet electronic health record (EHR) extraction with Virtual Pediatric Systems (VPS); and the PICU Data Collaborative dataset, created through EHR extraction and harmonization from eight participating institutions. Derivation datasets comprised temporal and location-specific spit of these datasets (80%), while the holdout test split comprised the remaining (20%). Prediction Model: We recalibrated an existing single-center model and measured discrimination and accuracy. We then add features guided by precision and recall measures. All features were available at 12 hours of ICU admission. We measure discrimination and accuracy at multiple cut-points and identify the features contributing most to the risk score. Results: In two datasets comprising 186,540 ICU admissions, we report an incidence of early AKI of 2.2 - 2.7%. Initial recalibration of an existing single-center model demonstrated poor discrimination (AUROC 0.60 - 0.78). Following the addition of new features, we report higher AUROC values of 0.79 - 0.80 and AUPRC values of 0.13 - 0.21 in both datasets. We report accuracy at several cutpoints as well as cross-validate between datasets. Conclusions: In this first use of two new multicenter datasets, we report improved discrimination and accuracy in a model designed specifically for implementation, balancing sensitivity and precision to predict patients at risk for AKI development.
medRxiv · 2025-03-26 · 2 citations
preprintOpen accessImportance: Pediatric sepsis accounts for over 72,000 US hospitalizations annually with significant mortality and morbidity. Many pediatric hospitals struggle to promptly identify and treat sepsis. This study demonstrates the feasibility of a multi-tiered artificial intelligence (AI) to enhance sepsis clinical decision-making within a complex emergency department (ED) workflow. Objectives: To develop and validate a local AI model predicting critical sepsis among ED patients who received a fluid bolus and a disposition to the Pediatric Intensive Care Unit (PICU) but had not yet received antibiotics. Design: Retrospective observational cross-section study. Setting: Urban, quaternary-care, academic healthcare system. Patients: Pediatric ED patients. Interventions: None. Measures and Main Results: The "Sepsis on ED to PICU Disposition" (SEPD) model aimed to predict critical sepsis within 72 hours of PICU disposition using a dataset totaling 5,534 patient encounters for model training and testing. During silent implementation, 1,058 encounters were used for validation. The SEPD model outperformed a vendor-developed sepsis model with an AUROC of 81.8%, compared to 57.5%. The model also demonstrated better precision-recall performance, showing more balanced identification of true positives. During silent implementation, the SEPD model maintained similar sensitivity (85.29%) and specificity (60.45%) to those observed during model testing. Conclusion: The SEPD model improved detection of critical sepsis among high-risk pediatric ED patients with a known PICU disposition, outperforming a vendor-developed sepsis model. Within a complex ED workflow, this model may facilitate timely sepsis identification and treatment in critically ill patients, who may have been missed during earlier stages of their ED course.
Journal of the American Medical Informatics Association · 2025-02-04
articleOpen accessOBJECTIVE: To assess the prevalence of recommended design elements in implemented electronic health record (EHR) interruptive alerts across pediatric care settings. MATERIALS AND METHODS: We conducted a 3-phase mixed-methods cross-sectional study. Phase 1 involved developing a codebook for alert content classification. Phase 2 identified the most frequently interruptive alerts at participating sites. Phase 3 applied the codebook to classify alerts. Inter-rater reliability (IRR) for the codebook and descriptive statistics for alert design contents were reported. RESULTS: We classified alert content on design elements such as the rationale for the alert's appearance, the hazard of ignoring it, directive versus informational content, administrative purpose, and whether it aligned with one of the Institute of Medicine's (IOM) domains of healthcare quality. Most design elements achieved an IRR above 0.7, with the exceptions for identifying directive content outside of an alert (IRR 0.58) and whether an alert was for administrative purposes only (IRR 0.36). IRR was poor for all IOM domains except equity. Institutions varied widely in the number of unique alerts and their designs. 78% of alerts stated their purpose, over half were directive, and 13% were informational. Only 2%-20% of alerts explained the consequences of inaction. DISCUSSION: This study raises important questions about the optimal balance of alert functions and desirable features of alert representation. CONCLUSION: Our study provides the first multi-center analysis of EHR alert design elements in pediatric care settings, revealing substantial variation in content and design. These findings underline the need for future research to experimentally explore EHR alert design best practices to improve efficiency and effectiveness.
Journal of the American Medical Informatics Association · 2025-07-19 · 2 citations
articleOpen accessOBJECTIVE: To conduct an independent external validation of an implemented vendor-developed emergency department (ED) pediatric sepsis predictive model. MATERIALS AND METHODS: We performed a retrospective cross-sectional study within 2 ED sites of a large pediatric health system between January 1, 2021 and April 1, 2024. A nurse-facing interruptive alert appeared when the model score exceeded the threshold, triggering clinicians to call a sepsis huddle. We compared model predictive performance with vendor-reported performance using definitions that accounted for model threshold and alert timing in clinical practice. Care processes and patient outcome measures included time to first antibiotics, time to first fluid bolus, 30-day mortality, ED to ICU admission rate, and ICU free days. RESULTS: The pre-intervention cohort consisted of 268 102 ED visits with 741 (0.28%) sepsis cases. The post-intervention cohort consisted of 331 061 ED visits with 1114 (0.34%) sepsis cases. Model predictive performance dropped from vendor-reported performance. Mean time to first antibiotic decreased from 112 to 102 minutes (P = .05, 95% confidence interval of difference, -19.1 to 0.1) and time to first bolus decreased by 16.7 minutes (P = .03, 95% confidence interval difference, -31.8 to -1.5) after the intervention. Decreases in 30-day mortality (6% [45/741] to 4% [52/1114]); ED to ICU admissions (87% [646/741] to 84% [941/1114]), and ICU free days (6 to 5) after the intervention did not meet statistical significance. DISCUSSION: Implementing the model led to significant reductions in time to fluid bolus and borderline decreases in time to antibiotics, with non-significant changes in mortality and ICU metrics. When implementing an externally developed model, local workflows, documentation patterns, and patient populations make it challenging to generalize published or reported model performance metrics to real world performance. CONCLUSION: When tailoring a vendor-developed pediatric ED sepsis model for real-world usage, predictive performance differed substantially. Post-implementation we found improvements in care process measures, suggesting such models may benefit sepsis care when adapted for specific clinical workflows.
Frequent coauthors
- 64 shared
Irit R. Rasooly
Children's Hospital of Philadelphia
- 42 shared
Christopher P. Bonafide
Children's Hospital of Philadelphia
- 35 shared
Akshar V. Patel
The Ohio State University
- 35 shared
Shakila Khan
Children's Hospital of Philadelphia
- 35 shared
Toby Maurer
Indiana University School of Medicine
- 35 shared
Evan Orenstein
Children's Healthcare of Atlanta
- 35 shared
Debra T. Choi
Michael E. DeBakey VA Medical Center
- 35 shared
Carmel Crock
The Royal Victorian Eye & Ear Hospital
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Naveen Muthu
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup