
Greg Zaharchuk
VerifiedStanford University · Rheumatology
Active 1997–2026
About
Greg Zaharchuk is a Professor of Radiology specializing in Neuroimaging and Neurointervention at Stanford University. He is affiliated with the Center for Artificial Intelligence in Medicine & Imaging (AIMI), where his work focuses on the application of artificial intelligence to medical imaging, particularly in the field of neuroimaging. His research involves advancing imaging techniques and developing AI-driven solutions to improve diagnosis and treatment in neurological conditions. As a faculty member at Stanford, he contributes to the integration of cutting-edge AI technologies into clinical practice and medical research.
Research topics
- Computer Science
- Medicine
- Artificial Intelligence
- Internal medicine
- Radiology
- Nuclear medicine
- Pathology
- Cardiology
- Medical physics
- Neuroscience
- Psychology
- Statistics
- Risk analysis (engineering)
- Psychiatry
- Management
- Chemistry
- Mathematics
- Anesthesia
Selected publications
Evaluation of Image‐Level Harmonization Methods for Multi‐Center <scp>MR</scp> Neuroimaging
Journal of Magnetic Resonance Imaging · 2026-01-05 · 3 citations
articleOpen accessCorrespondingBACKGROUND: Multi-center imaging studies create large-scale data that are useful for identifying pathological patterns and robust training of deep learning models. However, variation due to site and scanner differences can confound analyses, emphasizing the need for harmonization. PURPOSE: To evaluate scanner-related differences in T1w and T2-FLAIR images in the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and assess the performance of publicly available image-level harmonization tools. STUDY TYPE: Retrospective. POPULATION: Scanner group analysis: 1143 ADNI3 subjects (233 GE, 173 Philips, 250 Siemens, with 487 Siemens subjects used as an independent reference group). Within-subject comparison: paired multi-vendor scan sessions from 8 subjects. FIELD STRENGTH/SEQUENCE: 3.0T, T1w, and T2-FLAIR MRI sequences. ASSESSMENT: Gray/white matter contrast ratio (G/W ratio), white matter hyperintensity (WMH) volume, and image feature similarity metrics (Fréchet Inception Distance [FID], Learned Perceptual Image Patch Similarity [LPIPS]) were compared across scanner vendors before and after harmonization with statistical (ComBat) and deep learning (HACA3) algorithms. STATISTICAL TESTS: One-way ANOVA and post hoc Games-Howell tests were conducted to assess differences between scanner groups across image pipelines (baseline, post-harmonization). Repeated-measures ANOVA and post hoc paired t-tests with Bonferroni correction were used to evaluate similarity metric changes pre- and post-harmonization for multi-vendor subjects. We defined statistical significance as p < 0.05. RESULTS: At baseline, significant image differences in G/W ratio and WMH volumes between vendors were identified. Both ComBat and HACA3 harmonization improved G/W ratio consistency for T1w and T2-FLAIR imaging across vendors, particularly for GE T2-FLAIRs. HACA3 led to the best similarity between scanner datasets: mean FID T1w/T2-FLAIR: 10.45/14.62 (Baseline); 7.45/11.71 (ComBat); 5.60/8.91 (HACA3). Only HACA3 harmonization resulted in non-significant differences between vendors for WMH volume. DATA CONCLUSION: HACA3 deep learning harmonization outperformed a statistical method, ComBat, improving MR contrast consistency and feature similarity across vendors. However, difficulties in harmonizing T2-FLAIRs highlight limitations in current multi-contrast MR harmonization tools. EVIDENCE LEVEL: 3. TECHNICAL EFFICACY: Stage 1.
CheXmix: Unified Generative Pretraining for Vision Language Models in Medical Imaging
arXiv (Cornell University) · 2026-04-24
preprintOpen accessRecent medical multimodal foundation models are built as multimodal LLMs (MLLMs) by connecting a CLIP-pretrained vision encoder to an LLM using LLaVA-style finetuning. This two-stage, decoupled approach introduces a projection layer that can distort visual features. This is especially concerning in medical imaging where subtle cues are essential for accurate diagnoses. In contrast, early-fusion generative approaches such as Chameleon eliminate the projection bottleneck by processing image and text tokens within a single unified sequence, enabling joint representation learning that leverages the inductive priors of language models. We present CheXmix, a unified early-fusion generative model trained on a large corpus of chest X-rays paired with radiology reports. We expand on Chameleon's autoregressive framework by introducing a two-stage multimodal generative pretraining strategy that combines the representational strengths of masked autoencoders with MLLMs. The resulting models are highly flexible, supporting both discriminative and generative tasks at both coarse and fine-grained scales. Our approach outperforms well-established generative models across all masking ratios by 6.0% and surpasses CheXagent by 8.6% on AUROC at high image masking ratios on the CheXpert classification task. We further inpaint images over 51.0% better than text-only generative models and outperform CheXagent by 45% on the GREEN metric for radiology report generation. These results demonstrate that CheXmix captures fine-grained information across a broad spectrum of chest X-ray tasks. Our code is at: https://github.com/StanfordMIMI/CheXmix.
CheXmix: Unified Generative Pretraining for Vision Language Models in Medical Imaging
ArXiv.org · 2026-04-24
articleOpen accessRecent medical multimodal foundation models are built as multimodal LLMs (MLLMs) by connecting a CLIP-pretrained vision encoder to an LLM using LLaVA-style finetuning. This two-stage, decoupled approach introduces a projection layer that can distort visual features. This is especially concerning in medical imaging where subtle cues are essential for accurate diagnoses. In contrast, early-fusion generative approaches such as Chameleon eliminate the projection bottleneck by processing image and text tokens within a single unified sequence, enabling joint representation learning that leverages the inductive priors of language models. We present CheXmix, a unified early-fusion generative model trained on a large corpus of chest X-rays paired with radiology reports. We expand on Chameleon's autoregressive framework by introducing a two-stage multimodal generative pretraining strategy that combines the representational strengths of masked autoencoders with MLLMs. The resulting models are highly flexible, supporting both discriminative and generative tasks at both coarse and fine-grained scales. Our approach outperforms well-established generative models across all masking ratios by 6.0% and surpasses CheXagent by 8.6% on AUROC at high image masking ratios on the CheXpert classification task. We further inpaint images over 51.0% better than text-only generative models and outperform CheXagent by 45% on the GREEN metric for radiology report generation. These results demonstrate that CheXmix captures fine-grained information across a broad spectrum of chest X-ray tasks. Our code is at: https://github.com/StanfordMIMI/CheXmix.
ArXiv.org · 2025-12-02
preprintOpen accessSenior authorPredicting outcomes in acute ischemic stroke (AIS) guides clinical decision-making, patient counseling, and resource allocation. Clinical notes contain rich contextual information, but their unstructured nature limits their use in traditional predictive models. We developed and evaluated the Chain-of-Thought (CoT) Outcome Prediction Engine (COPE), a reasoning-enhanced large language model framework, for predicting 90-day functional outcomes after AIS from unstructured clinical notes. This study included 464 AIS patients with discharge summaries and 90-day modified Rankin Scale (mRS) scores. COPE uses a two-step CoT framework based on sequential open-source LLaMA-3-8B models: the first generates clinical reasoning, and the second outputs an mRS prediction. We compared COPE with GPT-4.1, ClinicalBERT, a structured variable-based machine learning model (Clinical ML), and a single-step LLM without CoT. Performance was evaluated using mean absolute error (MAE), accuracy within +/-1 mRS point, and exact accuracy. COPE achieved an MAE of 1.01 (95% CI 0.92-1.11), +/-1 accuracy of 74.4% (69.9, 78.8%), and exact accuracy of 32.8% (28.0, 37.6%), comparable to GPT-4.1 and superior to ClinicalBERT [MAE 1.24 (1.13-1.36)], Clinical ML [1.28 (1.18-1.39)], and the single-step LLM [1.20 (1.09-1.33)]. Subgroup analyses showed consistent performance across sex and age, with slightly higher error among older patients, those undergoing thrombectomy, and those with longer summaries. These findings demonstrate that COPE, a lightweight, interpretable, and privacy-preserving open-source framework, provides an accurate and practical solution for outcome prediction from unstructured clinical text.
Proceedings on CD-ROM - International Society for Magnetic Resonance in Medicine. Scientific Meeting and Exhibition/Proceedings of the International Society for Magnetic Resonance in Medicine, Scientific Meeting and Exhibition · 2025-09-16
articleSenior authorMotivation: Create a publicly available dataset combining non-invasive ASL perfusion imaging and simultaneous 15O-H2O PET/MRI to facilitate research replication, deep learning model training, and advance open science initiatives. Goal(s): Introduce and validate a dataset comparing ASL perfusion imaging against gold-standard 15O-H2O PET/MRI to improve reproducibility studies and support research in healthy controls and Moyamoya patients. . Approach: Imaging data from 70 healthy controls and 50 Moyamoya patients using 3T PET/MRI covered structural, DWI, ASL perfusion, DSC, and 15O-H2O PET data. Data were harmonized and validated per BIDS specifications. Results: The dataset passed BIDs validation and included demographics, raw data, BIDS-compliant metadata, and derivatives(CBF maps). Impact: This dataset can significantly advance neuroimaging research by enabling reproducibility studies, validating ASL imaging against PET, and supporting artificial intelligence approaches. It provides a critical resource for scientists, clinicians, and data-driven innovations.
Deep Learning-Based Prediction of PET Amyloid Status Using Multi-Contrast MRI
Proceedings on CD-ROM - International Society for Magnetic Resonance in Medicine. Scientific Meeting and Exhibition/Proceedings of the International Society for Magnetic Resonance in Medicine, Scientific Meeting and Exhibition · 2025-09-16 · 1 citations
articleSenior authorMotivation: Accurate amyloid-beta positivity prediction is essential for identifying patients for Alzheimer's disease trials and treatments, and T1w only MRI-based predictions have showed moderate performance. Goal(s): To evaluate whether adding T2-FLAIR to T1w imaging enhances deep learning model performance for predicting amyloid PET positivity. Approach: Two EfficientNet models were trained on 4,058 multi-contrast MRI exams and validated using internal and external test sets, with statistical comparison of T1w-only and T1w+T2-FLAIR inputs. Results: The T1w+T2-FLAIR model significantly improved PET-based amyloid status prediction, showing robustness across internal and external test sets. Activation maps highlighted brain regions, particularly around ventricles, linked to white matter abnormalities. Impact: Adding T2-FLAIR to T1w MRI in deep learning models significantly improves amyloid PET positivity prediction, aiding early Alzheimer's disease detection. This approach enhances non-invasive opportunistic screening, potentially streamlining patient selection for clinical trials and targeted treatments.
Foundational Model for Real-Time Neuroimaging Spatial Normalization
Proceedings on CD-ROM - International Society for Magnetic Resonance in Medicine. Scientific Meeting and Exhibition/Proceedings of the International Society for Magnetic Resonance in Medicine, Scientific Meeting and Exhibition · 2025-09-16
articleSenior authorMotivation: Traditional spatial normalization methods like SPM12 are slow and struggle with large datasets, limiting their use in clinical settings that require rapid processing, especially for acute stroke. Goal(s): To develop a fast, scalable foundational deep learning model for spatial normalization across all MRI sequences Approach: Using a clinical dataset of 11,939 MR volumes across six common sequences, we trained a modified 3D U-Net with SPM12 results as the reference standard. Local normalized cross-correlation loss optimized training, and Dice Similarity Coefficient evaluated performance. Results: The model achieved an overall DSC of 0.98 across sequences, processing each volume in 0.7 seconds —120 times faster than SPM12. Impact: This foundation model represents the first AI method to standardize spatial normalization for a wide range of neuroimaging sequences, enabling real-time and consistent neuroimaging analyses for both clinical and research applications.
American Journal of Neuroradiology · 2025-11-06
articleOpen accessSenior author<h3>BACKGROUND AND PURPOSE:</h3> Predicting long-term clinical outcomes based on early acute ischemic stroke (AIS) information would be useful for many reasons, including patient counseling and clinical trial execution. This study investigates how different regions in brain imaging, including noninfarcted areas, contribute to the accuracy of predicting 90-day stroke outcomes by using deep learning (DL). <h3>MATERIALS AND METHODS:</h3> We developed and validated DL models in 449 patients with AIS, by using MRI DWI scans from 1–7 days poststroke and 90-day mRS outcome data. These models were trained on various inputs: infarct volumes, full-brain images, infarct masks, intensity-preserved infarct masks, and images in which the infarct region is removed, which we call lesion-neutralized images. Performance was assessed by using accuracy of predicting the specific mRS score, accuracy within ±1 mRS category, mean absolute error (MAE), and the area under the curve (AUC) to predict unfavorable outcome (mRS > 2). <h3>RESULTS:</h3> The model trained by using only infarct volume size reported the highest (worst) MAE of 1.51 points (95% CI, 1.40–1.61; <i>P</i> < .001), while the model trained with full-brain images achieved the lowest MAE of 1.07 points (95% CI, 0.99–1.16). Models with intermediate amounts of imaging information each improved on the volume-only predictions but did not reach the performance of the full brain images; infarct masks, intensity-preserved infarct masks, and lesion-neutralized images demonstrated MAEs of 1.25 (95% CI, 1.15–1.34; <i>P</i> = .002), 1.21 (95% CI, 1.11–1.30; <i>P</i> = .008), and 1.35 (95% CI, 1.24–1.45; <i>P</i> < .001), respectively. Similar results were seen for other prediction tasks, including AUC to predict unfavorable outcomes, ranging from 0.68 (95% CI, 0.63–0.73) for infarct volume to 0.86 (95% CI, 0.82–0.89) for full brain inputs. <h3>CONCLUSIONS:</h3> While the best performance came from by using the full brain imaging volume, we demonstrate that the infarct location, its signal characteristics, and importantly, the noninfarcted regions all contribute to the predictions. The noninfarcted areas may be a proxy for overall brain health and resilience, containing important information about potential outcomes.
Proceedings on CD-ROM - International Society for Magnetic Resonance in Medicine. Scientific Meeting and Exhibition/Proceedings of the International Society for Magnetic Resonance in Medicine, Scientific Meeting and Exhibition · 2025-09-16
articleSenior authorMotivation: AD patients require multiple visits for amyloid and tau imaging because PET cannot acquire multiple radiotracers in a single session. Goal(s): DL-based separation of amyloid and tau radiotracers from mixed-dose images could reduce AD patient visits, but current DL applications are hindered by computational costs and other challenges of list-mode dose-mixing. Approach: We propose count-mixing as a compute-efficient alternative for simulating dose-mixing, which can then be used for deep learning (DL)-based radiotracer separation. Results: PET/MR count-mixing can serve as an alternative to list-mode dose-mixing. The approach agrees with list-mode dose-mixing, exhibits enhanced quantitative performance, and equivalent anatomical preservation. Impact: Count-mixing provides a faster, compute-efficient way to generate realistic mixed-dose PET images, enhancing model training and scaling DL applications for radiotracer separation. This approach could enable simultaneous injection of multiple radiotracers in a single acquisition for AD patients.
Inter-Scanner Variability and Evaluation of T2-FLAIR Harmonization in Alzheimer’s Neuroimaging
Proceedings on CD-ROM - International Society for Magnetic Resonance in Medicine. Scientific Meeting and Exhibition/Proceedings of the International Society for Magnetic Resonance in Medicine, Scientific Meeting and Exhibition · 2025-09-16
articleSenior authorMotivation: Standardized protocols for multi-center imaging studies like ADNI attempt to harmonize imaging data, but evaluating image quality is essential to assess differences between sites. Goal(s): To highlight visual and quantitative differences in ADNI-protocol T2-FLAIR images across scanner manufacturers. Approach: ADNI3 T2-FLAIR images from three manufacturers were compared via gray matter/white matter contrast ratio (GM/WM). Additionally, we compared the ADNI-protocol and a GM/WM optimized T2-FLAIR sequence in a single scan session for 3 subjects. Results: ADNI3 GM/WM ratio was significantly different across manufacturers, with GE having the lowest GM/WM contrast. For this manufacturer, we found that a GM/WM optimized T2-FLAIR can improve inter-scanner harmonization. Impact: T2-FLAIR is an instrumental imaging sequence to visualize gray-white matter contrast and white matter abnormalities. However, inter-scanner variability may hinder analysis involving multi-site imaging datasets. Thus, we evaluated protocol changes that can improve harmonization of T2-FLAIR images across scanners.
Recent grants
Next Generation Brain PET Imaging
NIH · $2.7M · 2021–2026
NIH · $2.2M · 2018–2023
NIH · $5.4M · 2015
NIH · $441k · 2017
NIH · $4.9M · 2019
Frequent coauthors
- 492 shared
Srinivasan Beddhu
VA Salt Lake City Healthcare System
- 400 shared
Daniel T. Chang
- 364 shared
Daniel E. Weiner
Tufts Medical Center
- 327 shared
Gary K. Steinberg
Stanford Medicine
- 302 shared
Sylvia K. Plevritis
- 302 shared
Olivier Gevaert
- 300 shared
Kelsey Hopkins
Purdue University West Lafayette
- 300 shared
Mark R. Gilbert
University of Missouri
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Greg Zaharchuk
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup