Maryellen L. Giger
· A.N. Pritzker Distinguished Service ProfessorVerifiedUniversity of Chicago · Radiology
Active 1983–2026
About
Maryellen L. Giger, Ph.D., is the A.N. Pritzker Distinguished Service Professor of Radiology, the Committee on Medical Physics, and the College at the University of Chicago. She serves as the Vice-Chair of Radiology (Basic Science Research) and is the immediate past Director of the CAMPEP-accredited Graduate Programs in Medical Physics, as well as the Chair of the Committee on Medical Physics at the University. Her research over more than 30 years focuses on computer-aided diagnosis, including computer vision, machine learning, and deep learning, applied to various medical conditions such as breast cancer, lung cancer, prostate cancer, brain injury, lupus, bone diseases, and COVID-19. She has contributed significantly to the development of computational image-based analyses for risk assessment, diagnosis, prognosis, and response to therapy, with her work translating into clinical components and utilizing image-based phenotypes in imaging genomics studies. Giger has extended her AI research to include the analysis of COVID-19 on CT and chest radiographs and is the contact PI on the NIH NIBIB-funded & ARPA-H-funded Medical Imaging and Data Resource Center (MIDRC). She is a prominent figure in her field, having served on various NIH, DOD, and other funding agencies’ study sections, and is a former member of the NIBIB Advisory Council of NIH. Her leadership roles include past presidencies of the American Association of Physicists in Medicine (AAPM) and SPIE, where she was the inaugural Editor-in-Chief of the Journal of Medical Imaging. She is a member of the National Academy of Engineering and has received numerous awards, including the William D. Coolidge Gold Medal from the AAPM, the highest award given by the organization. Giger has more than 280 peer-reviewed publications, over 450 total publications, and holds more than 30 patents. She has mentored over 100 students and professionals across various levels and has been recognized with numerous honors for her contributions to medical physics, imaging, and biomedical engineering.
Research topics
- Computer Science
- Artificial Intelligence
- Medicine
- Medical physics
- Radiology
- Pathology
- Internal medicine
- Data science
Selected publications
bioRxiv (Cold Spring Harbor Laboratory) · 2026-01-15
articleOpen accessLupus nephritis (LuN) and renal allograft rejection (RAR) manifest inflammation and fibrosis that ultimately lead to kidney failure. To quantitatively assess spatial injury patterns, we collected high dimensional spatial proteomics data from 23 LuN, 33 RAR, and 8 kidney control (KC) biopsies. We developed a computational pipeline to segment and classify tubules, capillaries, and glomeruli in whole-slide images using three trained neural networks (Renal Damage diagnosis, RDDx). RDDx achieved high accuracy and generalizability, reliably identifying small capillaries and differentiating tubular and vascular inflammation in kidney tissues. Both LuN and RAR showed reduced tubular and capillary areas with expanded interstitial space. LuN displayed patchy clusters of stressed and inflamed tubules, whereas RAR exhibited diffuse injury. Within RAR, T cell-mediated rejection (TCMR) showed intense tubulitis while antibody-mediated rejection (ABMR) featured proliferating and inflamed capillaries near atrophic tubules. RDDx quantitative metric outputs correlated with histopathological scores, highlighting their reproducibility and clinical relevance. Stressed tubules in mildly inflamed LuN biopsies suggested they were a sensitive injury marker, while proliferating capillaries revealed microvascular remodeling in ABMR. These findings indicated RDDx can identify and quantify damage mechanisms specific to each renal disease thus facilitating future mechanistic studies and therapeutic target discovery.
AI-Aided Triage for GSWH: Validating an Interpretable HCT-Based Mortality Model
Journal of Neurotrauma · 2026-03-10
articleSenior authorCivilian gunshot wounds to the head (GSWH) carry high mortality yet lack standardized, imaging-based triage tools. Because initial noncontrast head computerized tomography (HCT) is universally obtained but not leveraged with validated, rapid, and reproducible methods, we developed and evaluated an interpretable, attention-based multiple-instance learning (MIL) model to predict in-hospital mortality from the initial HCT. In a retrospective cohort at a single level I trauma center (May 1, 2018–October 31, 2023), we included consecutive adults (≥16 years) with GSWH who underwent HCT, excluding those dead on arrival or without HCT. Of 222 patients, 106 (47.8%) survived to discharge and 116 (52.2%) died. We used a stratified random split to create a development set ( n = 168, 75.7%) and an independent test set ( n = 54, 24.3%); the development set was repeatedly partitioned 100 times into training and validation subsets to quantify performance uncertainty, and each of the 100 models was evaluated once on the test set. The MIL algorithm produced a prognostic severity score with case-level interpretability via attention maps. On the independent test set, discrimination for mortality was high (area under the curve: 0.92, 95% CI: 0.87–0.94) with sensitivity 0.88 (95% CI: 0.78–0.97) and specificity 0.87 (95% CI: 0.74–0.96) at the optimal operating point. Attention visualizations consistently highlighted brainstem, deep midline, and ventricular injury in high-mortality predictions, aligning with established high-risk neuroanatomy. These findings demonstrate that an interpretable, HCT-based MIL model can deliver objective, reproducible risk estimates and transparent case-level explanations, supporting early prognostication and imaging-first triage in penetrating brain injury.
High-fidelity multiclass instance segmentation of cells for spatial proteomics
2026-03-04
articleSenior authorHigh quality cell segmentations across cell classes remain elusive in spatial proteomics. Existing methods rely on expansion of cell nuclei, or on “pan-membrane” markers that are not actually uniformly expressed across all cells and tissues. Here, we present an extension of pseudo-spectral angle mapping (pSAM) for multiclass instance segmentation of cells. By predicting segmentations on class maps rather than images, cell shape is better captured. Improved representation of cell shape could help better infer interaction of cells in static images.
2026-02-13
articleUsing normal medical images as disparity probes: application with chest radiographs
2026-04-02
articleSenior author2026-04-02
articleSenior authorArtificial intelligence in medical imaging (AI-MI) has the potential to inform clinical care as Software as a Medical Device (SaMD), which is regulated by the Food and Drug Administration. Much remains to be explored regarding ‘off-label’ use of SaMD, which occurs when the AI-MI is used differently than the intended purpose given by the manufacturers label (‘on-label’). Unlike the off-label use of a pharmaceutical, which has biological impact on a patient that could potentially adjust multiple disease/conditions, off-label use of an AI-MI device poses unique risks. The potential dangers of off-label AI-MI use are related to task, performance, and population, and the unique risks and benefits of off-label AI-MI use can affect patient management in the clinical setting. Thus, research is needed to study and understand potential benefits and risks of off-label use in AI in medical imaging, including specific use cases. The purpose of this study was to quantitatively analyze the effects of one type of off-label use of AI-MI, when a different type of image is used as input to an AI-MI rather than the image type intended. Our dataset was comprised of conventional radiomic features extracted from 64 breast lesions (33 malignant, 31 benign) that had been obtained from both conventional and ultra-fast dynamic contrast-enhanced magnetic resonance (DCE-MR) images. We trained a support vector machine using radiomic features extracted from conventional DCE-MRI using 10-fold cross validation in the task of distinguishing between malignant and benign breast lesions. Then, within the testing folds, we evaluated the classifier performance in two scenarios: ‘on-label’ (on features extracted from conventional DCE-MRIs) and ‘off-label’ (on features extracted from ultra-fast DCE-MRIs). Classification performances were assessed using the area under the receiver operating characteristic curve (AUC). The AUC difference between the on-label and off-label use was compared using the DeLong test (P < 0.05 for statistical significance). The results demonstrated a significant decrease (P =0.015) in the model performance in classifying breast lesions as malignant or benign when evaluated upon ultrafast DCE-MRI lesion data (the off-label use, AUC = 0.53±0.07), compared to its onlabel use with conventional DCE-MRIs (0.68±0.08). These results have substantial implications for SaMD regulation and post-market surveillance, demonstrating the importance of intended use for AI-MI on a larger scale.
2026-04-02
articleSenior authorPrior research has demonstrated that AI systems can demonstrate statistical bias, highlighting the importance for methods to detect and mitigate bias prior to deployment. This study introduces a prototype software tool designed to detect and mitigate bias in an AI system at the post-processing stage (i.e., after classifier training) without retraining. We describe a quantitative pipeline tailored to binary data attributes and illustrate its utility for a classifier trained to predict ICU admission for COVID-19 on chest X-ray (CXR). The test dataset included 1048 patients with a 14.0% prevalence of ICU admission within 24 hours after CXR exam. For bias detection in the model’s predictions between the subgroups, we developed a Python function that computes seven different group-level fairness metrics that measure bias, using patient sex as an example attribute. To mitigate bias at the post-processing stage without requiring retraining of the underlying model, the tool was developed to perform group thresholding to improve fairness. The treatment equality ratio was 0.535 before bias mitigation, deviating the most from ideal fairness (1.0) of all fairness metrics evaluated. After mitigation, treatment equality improved to 0.996 while the model maintained 95% sensitivity, demonstrating that statistical bias could be reduced without compromising sensitivity and avoiding retraining. We also developed radar plot functions to visualize the metrics before and after mitigation. Our pipeline successfully quantifies statistical bias across multiple fairness metrics and includes actionable mitigation steps. Future work will evaluate additional AI models across various attributes and extend mitigation strategies across other model development stages to further refine and validate the pipeline.
Task-Based Sampling of Patient Data for Rigorous Machine Learning/AI Performance Assessment
Journal of Imaging Informatics in Medicine · 2026-03-10
articleOpen accessSenior authorTo assess the performance of an AI algorithm, an independent dataset is needed that matches the intended clinical claim and intended population (e.g., patient characteristics) for which the algorithm is meant. Using all available data for performance assessment may not be practical or optimal; to reduce the risk of sampling bias, the user is expected to utilize training and test data that are representative of the intended population. This work outlines a computational method for task-based sampling of data from a large repository and demonstrates its use, utilizing demographic characteristics and disease states as examples of the clinical attributes to match to an intended population. To run our developed task-based sampling algorithm, the user defines the initial cohort from which to sample, a target distribution profile, and a maximum allowable deviation in any subcategory. The functionality and results of the developed workflow are described in the context of sampling the Medical Imaging and Data Resource Center (MIDRC) data commons for algorithm performance assessment. An initial cohort of over 4000 patients was selected from the MIDRC public data commons. The task-based sampling algorithm was used to select samples matched to an approximate CDC demographic distribution with maximum allowable deviations of 5% and 10%. Resulting final cohorts of 542 and 870 unique patients with average clinical attribute differences of 1.0% and 2.1% were sampled, respectively. This investigation demonstrates that the developed task-based sampling algorithm can generate matched samples from a large dataset for reducing sampling bias in algorithm training and performance assessment.
2026-04-02
articleSenior authorThe Medical Imaging and Data Resource Center (MIDRC) is a multi-institutional initiative committed to advancing medical imaging research by providing a comprehensive, high-quality, and FAIR-compliant (Findable, Accessible, Interoperable, and Re-usable) imaging data commons. A critical and novel component of MIDRC is its sequestered datasets, termed SeqreT, which are not publicly accessible and are reserved specifically for independent evaluation of artificial intelligence (AI) algorithms. By preventing use of these datasets during model/algorithm development or training, SeqreT supports objective validation, reproducibility, and real-world performance benchmarking, crucial for regulatory assessments. This study reports on the current status and characteristics of several SeqreT datasets available for technology assessment requests. Collectively, these large, representative datasets support robust and reproducible AI performance evaluation in chest imaging tasks such as segmentation, characterization, explainability, and opacity detection.
2025-04-10
articleSenior authorEvaluation of AI/ML algorithm performance on a sequestered test set may lead to ingenuous and disingenuous use of the dataset, even though the data are not accessible to the developer. In the ‘ingenuous’ case, the resulting algorithm’s performance metric, for example the area under the receiving operator curve (AUC) for a classification algorithm, may unintentionally overestimate or underestimate the true algorithm performance. A developer may also attempt to learn from the sequestered test set through attempting to repeatedly evaluate the algorithm on subsets of the test set, i.e., a ‘disingenuous’ use that may lead to algorithm overfitting of the test set. Creating a metric that can be used to ‘dial in’ ideal data set sampling to avoid each of these issues is an important area of investigation by the Medical Imaging and Data Resource Center (MIDRC, midrc.org). Building upon our prior work to address the ingenuous case, we also now address disingenuous use of the test set through a hash-table implementation that incorporates the ThresholdoutAUC algorithm, and subsequently use the load factor metric to indicate overfitting to the test data. Furthermore, we devise analytical relationships between load factor and ThresholdoutAUC budget. Notably, the relationship between load factor and budget is dependent on a noise rate parameter. We unify these methods with our previous findings for ingenuous use of sequestered data, specifically the relationship between AUC variability and load factor via the use case of a classifier trained to predict COVID-19 severity. The results show that while AUC standard error is inversely related to the load factor, the budget parameter from ThresholdoutAUC is directly related to the load factor and noise rate. Thus, we anticipate using the load factor as a ‘dial’ that controls the number of test subsets eligible for evaluation. Specifically, if the developer requests to operate at a particular ThresholdoutAUC budget, a specific load factor and noise rate combination can be determined that limits AUC variation while meeting budget demand.
Recent grants
NIH · $2.4M · 2005
NIH · $4.2M · 2010
Protected Radiomics Analysis Commons for Deep Learning in Biomedical Discovery
NIH · $339k · 2018–2019
Lesion Composition and Quantitative Imaging Analysis on Breast Cancer Diagnosis
NIH · $3.3M · 2013–2019
NIH · $294k · 2001
Frequent coauthors
- 137 shared
Karen Drukker
- 104 shared
Hui Li
Army Medical University
- 89 shared
Heber MacMahon
University of Chicago
- 84 shared
Kunio Doi
University of Chicago
- 66 shared
Robert M. Nishikawa
University of Pittsburgh
- 59 shared
Carl J. Vyborny
- 57 shared
Heather M. Whitney
- 49 shared
Robert A. Schmidt
University of Utah Health Care
Labs
Maryellen Giger's LabPI
Education
Ph.D., Radiology
University of Chicago
Awards & honors
- William D. Coolidge Gold Medal from the American Association…
- EMBS Academic Career Achievement Award
- SPIE Harrison H. Barrett Award in Medical Imaging
- RSNA Honored Educator Award
- RSNA Outstanding Researcher Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Maryellen L. Giger
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup