Michael H. Goldbaum

· ProfessorVerified

University of California, San Diego · Ophthalmology

Active 1974–2025

h-index49

Citations16.5k

Papers23747 last 5y

Funding$4.7M

Faculty page

See your match with Michael H. Goldbaum — sign in to PhdFit.Sign in

About

Michael H. Goldbaum is a faculty member in the Department of Ophthalmology at UC San Diego, located at 9500 Gilman Drive, La Jolla, CA. His research focuses on clinical trials, research activities, and funding related to ophthalmology, particularly glaucoma and retinal diseases. He has contributed to the development and application of informatics and artificial intelligence models for glaucoma detection, retinal analysis, and other ophthalmic conditions. His work includes structured analysis of the retina, multimodal AI models using fundus photographs and OCT imaging, and deep learning approaches for diagnosing and managing ocular diseases. Dr. Goldbaum has been involved in numerous studies and publications advancing the understanding and diagnosis of eye diseases through innovative imaging and computational techniques.

Research topics

Artificial Intelligence
Computer Science
Medicine
Ophthalmology
Optometry
Machine Learning
Internal medicine
Cancer research
Engineering
Biology
Computer vision
Genetics
Algorithm

Selected publications

Choroidal Neovascularization following Intra-Arterial Melphalan Chemotherapy for Retinoblastoma
Case Reports in Ophthalmology · 2025-08-20
articleOpen access
Introduction: We present the unique case of a pediatric patient who underwent intra-arterial melphalan chemotherapy and subsequently developed choroidal neovascularization. Case Presentation: A 6-year-old male with a history of nonhereditary unilateral group D retinoblastoma treated with intra-arterial melphalan, cryotherapy, and diode laser consolidative therapy presented to establish care. Initial evaluation revealed a regressed retinoblastoma lesion with chorioretinal scars and calcification scattered in the midperiphery. Notably, the macula was largely within normal limits without evidence of prior malignancy or scarring. However, 7 months after establishing care, imaging was significant for intraretinal fluid, subretinal fluid, and subfoveal fibrosis of the treated eye, suggestive of choroidal neovascularization. The patient was managed with anti-VEGF therapy with resolution of subretinal fluid and improved visual acuity. Conclusion: This case represents the first description and management of a patient developing choroidal neovascularization after receiving intra-arterial melphalan treatment for retinoblastoma. Careful monitoring of patients following intra-arterial melphalan chemotherapy treatment is critical due to the potential for vision loss, including choroidal neovascularization, which may be an under-reported complication.
Publisher OA PDF DOI
Performance of General-Purpose Vision Language Models and Ophthalmology Foundation Models in Glaucoma Detection and Function Prediction
Translational Vision Science & Technology · 2025-11-19 · 1 citations
articleOpen access
Purpose: To evaluate the performance of vision-language models (VLMs), in glaucoma detection and visual field (VF) mean deviation (MD) prediction tasks using optical coherence tomography (OCT) images. Methods: A total of 27,610 SPECTRALIS OCT images from 1025 participants (1690 eyes), collected between 2008 and 2021 as part of the Diagnostic Innovations in Glaucoma Study (DIGS) and the African Descent and Glaucoma Evaluation Study (ADAGES), were included. Vision components of LLaVA and PaliGemma, as well as RETFound and ResNet-50 models, were fine-tuned for glaucoma classification and VF MD prediction. Models were trained using OCT circle scans centered on the optic nerve head. Three training configurations were compared. Performance was evaluated using area under the receiver operating characteristic curve (AUC), mean absolute error (MAE), and related metrics. Results: The LLaVA model, when both vision encoder and multi-layer projector were fine-tuned, achieved the best performance with an AUC of 0.92 (95% confidence interval [CI], 0.86-0.95) for glaucoma classification and an MAE of 1.79 dB (95% CI, 1.55-2.00) for VF MD prediction. RETFound and PaliGemma also performed well, with AUCs of 0.91 and 0.90 and MAEs of 1.87 dB and 1.84 dB, respectively. Models with frozen vision encoders showed reduced accuracy. Stratified analysis showed better glaucoma classification in older individuals and moderate-to-advanced cases. VF MD prediction was more accurate in younger individuals, with higher errors in advanced glaucoma. Conclusions: Fine-tuned VLMs demonstrated high performance in glaucoma detection and VF MD prediction, matching or exceeding specialized foundation models and traditional convolutional neural network (CNN)-based methods. Translational Relevance: This study highlights the potential of general-purpose AI models to be adapted for glaucoma care, enabling scalable decision support from OCT imaging.
Publisher OA PDF DOI
Assessing the Clinical Utility of Multimodal Large Language Models in the Diagnosis and Management of Pigmented Choroidal Lesions
Translational Vision Science & Technology · 2025-10-14
articleOpen access
Purpose: To evaluate the diagnostic and treatment recommendation performance of multimodal large language models (MLLMs) in identifying and classifying retinal lesions as choroidal nevus or melanoma, as well as compare their performance with expert human graders. Methods: This retrospective cross-sectional study included 48 eyes from 47 patients diagnosed with either choroidal nevus or melanoma. Patient demographics, including age, sex, ethnicity, best-corrected visual acuity (BCVA), and symptoms, were documented. Color fundus, autofluorescence, optical coherence tomography, and B-scan images were collected. The ocular images and patient characteristics were presented to ChatGPT 4.0, Gemini Advanced 1.5 Pro, and Perplexity Pro. Responses were recorded and compared with the clinical diagnoses and treatment recommendations made by two expert human graders. Diagnostic and treatment agreement, accuracy, sensitivity, and specificity were analyzed. Results: Gemini consistently outperformed ChatGPT and Perplexity across diagnostic and treatment prompts. The highest model performance was observed for prompts requesting treatment recommendations with clinical information, where Gemini achieved the highest accuracy (0.725), followed by Perplexity (0.647) and ChatGPT (0.314). Performance was lowest for prompts requiring strict clinical criteria, with all models showing poor sensitivity. Both human graders outperformed all MLLMs in accuracy and sensitivity on most prompts (P < 0.005). Accuracy did not improve when provided demographic or clinical data, except for Gemini. Conclusions: Human graders outperform current MLLMs, which show only moderate ability to diagnose choroidal nevi or melanoma from imaging. Translational Relevance: This study highlights limitations and potential of MLLMs in aiding diagnosis and treatment of choroidal lesions.
Publisher OA PDF DOI
Deep Learning Approach Predicts Longitudinal Retinal Nerve Fiber Layer Thickness Changes
Bioengineering · 2025-01-31 · 2 citations
articleOpen access
This study aims to develop deep learning (DL) models to predict the retinal nerve fiber layer (RNFL) thickness changes in glaucoma, facilitating the early diagnosis and monitoring of disease progression. Using the longitudinal data from two glaucoma studies (Diagnostic Innovations in Glaucoma Study (DIGS) and African Descent and Glaucoma Evaluation Study (ADAGES)), we constructed models using optical coherence tomography (OCT) scans from 251 participants (437 eyes). The models were trained to predict the RNFL thickness at a future visit based on previous scans. We evaluated four models: linear regression (LR), support vector regression (SVR), gradient boosting regression (GBR), and a custom 1D convolutional neural network (CNN). The GBR model achieved the best performance in predicting pointwise RNFL thickness changes (MAE = 5.2 μm, R2 = 0.91), while the custom 1D CNN excelled in predicting changes to average global and sectoral RNFL thickness, providing greater resolution and outperforming the traditional models (MAEs from 2.0–4.2 μm, R2 from 0.94–0.98). Our custom models used a novel approach that incorporated longitudinal OCT imaging to achieve consistent performance across different demographics and disease severities, offering potential clinical decision support for glaucoma diagnosis. Patient-level data splitting enhances the evaluation robustness, while predicting detailed RNFL thickness provides a comprehensive understanding of the structural changes over time.
Publisher OA PDF DOI
The Retinal Fundus Verdict on Intensive Blood-Pressure Therapy
Journal of the American College of Cardiology · 2025-10-01
article
Publisher DOI
A Novel Multimodal Implementation of a Foundation Artificial Intelligence Model Using Optic Nerve Head Fundus Photographs and OCT Imaging for Glaucoma Detection
Ophthalmology Science · 2025-11-20
articleOpen access
Purpose: To compare the performance of unimodal and multimodal implementation of the self-supervised learning model RETFound in detecting glaucoma using color fundus photographs (CFPs) and OCT images, and to assess its generalizability across different ethnicities, age groups, and disease severities. Design: Evaluation of a diagnostic technology. Subjects Participants and Controls: Fourteen thousand five hundred ten CFPs and 32 640 OCTs from 1948 eyes of 1098 participants (60.8% glaucoma, 39.2% healthy) from the Diagnostic Innovations in Glaucoma Study and the African Descent and Glaucoma Evaluation Study were included. Glaucoma was defined as photograph-based glaucomatous optic neuropathy with or without repeatable glaucoma visual field damage. Methods: A multimodal RETFound model was developed using paired CFPs and OCT images. The model was compared to unimodal RETFound models using solely CFP or OCT images. Performance was also stratified by race (Black vs. White), age (<60 vs. ≥60 years), and disease severity (mild vs. moderate-to-severe glaucoma). Main Outcome Measures: Diagnostic accuracy of unimodal and multimodal RETFound models using CFP and OCT for detecting glaucoma was assessed using the area under the receiver operating characteristic curve (AUC), precision, and recall. Results: = 0.005) models. Conclusions: The multimodal RETFound model demonstrated improved diagnostic ability compared with the CFP unimodal model but did not significantly outperform the OCT unimodal model in glaucoma detection. As clinical implementation of a unimodal artificial intelligence (AI) model is easier than a multimodal counterpart, our results suggest unimodal OCT AI models may be sufficient for detecting glaucoma. Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Publisher DOI
Retinal Ischemic Perivascular Lesions Are Associated With Stroke in Individuals With Atrial Fibrillation
Journal of the American Heart Association · 2024-08-27 · 8 citations
letterOpen access
Publisher DOI
Ultra-Widefield and Early Treatment Diabetic Retinopathy Study 7-Field Grading of Diabetic Retinopathy
JAMA Ophthalmology · 2024-08-15 · 4 citations
letterOpen access
Importance: High concordance in diabetic retinopathy (DR) outcomes between 7-field (7F) and ultra-widefield (UWF) images would allow for combining longitudinal assessments based on the 2 modalities both in clinical studies and clinical care. Objective: To compare 7F and UWF imaging with regard to DR severity and the associations of DR severity with risk factors, such as hemoglobin A1c, age, diabetes duration, and sex. Design, Setting, and Participants: This cross-sectional study describes the outcomes of the randomized clinical Diabetes Control and Complications Trial (DCCT) and its subsequent observational study, the Epidemiology of Diabetes Interventions and Complications (EDIC) study. Of the 1441 participants with type 1 diabetes in the DCCT, 1375 were enrolled in the EDIC study. Of the 1171 participants who were active between March 2019 and December 2021, 200° UWF color imaging and 7F fundus photographs were obtained for 785 participants once at the same visit. Central graders assessed 7F-UWF with a 7F template masking the retinal periphery and the full UWF image (UWF-global). Data were analyzed from January 2022 to March 2023. Exposures: Hemoglobin A1c was assessed quarterly during the DCCT and annually during the EDIC study using high-performance liquid chromatography. Main Outcomes and Measures: Retinopathy was determined independently for all imaging as mild, moderate, or severe nonproliferative DR (SNPDR) using the Early Treatment Diabetic Retinopathy Study (ETDRS) grading scale for the 7F images and the global ETDRS grading scale for the UWF images. Panretinal and focal photocoagulation were self-reported or based on scarring location and pattern observed during grading. Proliferative DR (PDR) was defined by observed neovascularization or evidence of panretinal photocoagulation. Results: Among the 785 participants included in this study, 420 (53%) were male and 365 (47%) were female. The mean (SD) age was 61 (7) years. DR grading between UWF-7F and 7F imaging was correlated for all outcomes, including for severe outcomes, such as SNPDR (κ, 0.73; concordance, 96%), PDR (κ, 0.74; concordance, 97%), scatter photocoagulation (κ, 0.97; concordance, 99%), and focal photocoagulation (κ, 0.71; concordance, 98%). Most DR severity scores were within 1 step (1410 of 1529 [92%]), and 3% (51 of 1529) were more than 2 steps apart (κ, 0.45; 95% CI, 0.42-0.49; weighted κ, 0.63; 95% CI, 0.60-0.67) on the ETDRS severity scale. DR severity assessed within the UWF-global area was higher compared to 7F (median [IQR] UWF-global score, 3 [2-3] vs median 7F level score, 2.0 [1-3]; P < .001), although the 2 modalities were correlated (1225 of 1508 [81%] 1-step agreement; weighted κ, 0.41). Conclusions and Relevance: Standard ETDRS 7F and UWF evaluations of DR were comparable for ETDRS severity levels as previously reported by Diabetic Retinopathy Clinical Research Retina Network reports. In addition, these evaluations of DR were comparable for DCCT/EDIT study outcomes and major study conclusions, suggesting that use of UWF imaging is not likely to introduce relevant measurement biases in future longitudinal studies. Trial Registration: ClinicalTrials.gov Identifiers: NCT00360815.
Publisher OA PDF DOI
Proactive Decision Support for Glaucoma Treatment: Predicting Surgical Interventions with Clinically Available Data
Bioengineering · 2024-01-30 · 14 citations
articleOpen access
A longitudinal ophthalmic dataset was used to investigate multi-modal machine learning (ML) models incorporating patient demographics and history, clinical measurements, optical coherence tomography (OCT), and visual field (VF) testing in predicting glaucoma surgical interventions. The cohort included 369 patients who underwent glaucoma surgery and 592 patients who did not undergo surgery. The data types used for prediction included patient demographics, history of systemic conditions, medication history, ophthalmic measurements, 24-2 VF results, and thickness measurements from OCT imaging. The ML models were trained to predict surgical interventions and evaluated on independent data collected at a separate study site. The models were evaluated based on their ability to predict surgeries at varying lengths of time prior to surgical intervention. The highest performing predictions achieved an AUC of 0.93, 0.92, and 0.93 in predicting surgical intervention at 1 year, 2 years, and 3 years, respectively. The models were also able to achieve high sensitivity (0.89, 0.77, 0.86 at 1, 2, and 3 years, respectively) and specificity (0.85, 0.90, and 0.91 at 1, 2, and 3 years, respectively) at an 0.80 level of precision. The multi-modal models trained on a combination of data types predicted surgical interventions with high accuracy up to three years prior to surgery and could provide an important tool to predict the need for glaucoma intervention.
Publisher OA PDF DOI
Glaucoma Detection and Feature Identification via GPT-4V Fundus Image Analysis
Ophthalmology Science · 2024-11-29 · 14 citations
articleOpen access
Purpose: The aim is to assess GPT-4V's (OpenAI) diagnostic accuracy and its capability to identify glaucoma-related features compared to expert evaluations. Design: Evaluation of multimodal large language models for reviewing fundus images in glaucoma. Subjects: A total of 300 fundus images from 3 public datasets (ACRIMA, ORIGA, and RIM-One v3) that included 139 glaucomatous and 161 nonglaucomatous cases were analyzed. Methods: Preprocessing ensured each image was centered on the optic disc. GPT-4's vision-preview model (GPT-4V) assessed each image for various glaucoma-related criteria: image quality, image gradability, cup-to-disc ratio, peripapillary atrophy, disc hemorrhages, rim thinning (by quadrant and clock hour), glaucoma status, and estimated probability of glaucoma. Each image was analyzed twice by GPT-4V to evaluate consistency in its predictions. Two expert graders independently evaluated the same images using identical criteria. Comparisons between GPT-4V's assessments, expert evaluations, and dataset labels were made to determine accuracy, sensitivity, specificity, and Cohen kappa. Main Outcome Measures: The main parameters measured were the accuracy, sensitivity, specificity, and Cohen kappa of GPT-4V in detecting glaucoma compared with expert evaluations. Results: GPT-4V successfully provided glaucoma assessments for all 300 fundus images across the datasets, although approximately 35% required multiple prompt submissions. GPT-4V's overall accuracy in glaucoma detection was slightly lower (0.68, 0.70, and 0.81, respectively) than that of expert graders (0.78, 0.80, and 0.88, for expert grader 1 and 0.72, 0.78, and 0.87, for expert grader 2, respectively), across the ACRIMA, ORIGA, and RIM-ONE datasets. In Glaucoma detection, GPT-4V showed variable agreement by dataset and expert graders, with Cohen kappa values ranging from 0.08 to 0.72. In terms of feature detection, GPT-4V demonstrated high consistency (repeatability) in image gradability, with an agreement accuracy of ≥89% and substantial agreement in rim thinning and cup-to-disc ratio assessments, although kappas were generally lower than expert-to-expert agreement. Conclusions: GPT-4V shows promise as a tool in glaucoma screening and detection through fundus image analysis, demonstrating generally high agreement with expert evaluations of key diagnostic features, although agreement did vary substantially across datasets. Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Publisher OA PDF DOI

Recent grants

NIH Grant R21EY013928
NIH · $144k · 2003
NIH Grant R01EY013235
NIH · $842k · 2004
NIH Grant R33EY013928
NIH · $1.8M · 2007
NIH Grant R01LM005759
NIH · $882k · 2000
NIH Grant R01EY005996
NIH · $1.0M · 1994

Frequent coauthors

Robert N. Weinreb
University of California, San Diego
101 shared
Christopher Bowd
Fleet Science Center
96 shared
Linda M. Zangwill
University of California, San Diego
96 shared
Jeffrey M. Liebmann
54 shared
Akram Belghith
University of California, San Diego
52 shared
Christopher A. Girkin
University of Alabama at Birmingham
50 shared
Mark Christopher
University of California, San Diego
41 shared
Massimo A. Fazio
University of Alabama at Birmingham
34 shared

Education

M.D., Ophthalmology
University of California, San Diego
1990
B.S., Biology
University of California, San Diego
1986

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Michael H. Goldbaum

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you