Marzyeh Ghassemi

Verified

Massachusetts Institute of Technology · Electrical Engineering & Computer Science

Active 2011–2024

h-index60

Citations13.9k

Papers368278 last 5y

Funding—

Faculty page

See your match with Marzyeh Ghassemi — sign in to PhdFit.Sign in

Research topics

Computer Science
Political Science
Artificial Intelligence
Computer Security
Psychology
Medicine
Machine Learning
Pathology
Data science
Engineering
Public relations
Medical education
Knowledge management
Law
Applied psychology
Business
Medical physics
Psychiatry
Risk analysis (engineering)
Social psychology
Mathematics

Selected publications

Peer review of GPT-4 technical report and systems card
PLOS Digital Health · 2024 · 91 citations
- Political Science
- Computer Science
- Computer Security
The study provides a comprehensive review of OpenAI's Generative Pre-trained Transformer 4 (GPT-4) technical report, with an emphasis on applications in high-risk settings like healthcare. A diverse team, including experts in artificial intelligence (AI), natural language processing, public health, law, policy, social science, healthcare research, and bioethics, analyzed the report against established peer review guidelines. The GPT-4 report shows a significant commitment to transparent AI research, particularly in creating a systems card for risk assessment and mitigation. However, it reveals limitations such as restricted access to training data, inadequate confidence and uncertainty estimations, and concerns over privacy and intellectual property rights. Key strengths identified include the considerable time and economic investment in transparent AI research and the creation of a comprehensive systems card. On the other hand, the lack of clarity in training processes and data raises concerns about encoded biases and interests in GPT-4. The report also lacks confidence and uncertainty estimations, crucial in high-risk areas like healthcare, and fails to address potential privacy and intellectual property issues. Furthermore, this study emphasizes the need for diverse, global involvement in developing and evaluating large language models (LLMs) to ensure broad societal benefits and mitigate risks. The paper presents recommendations such as improving data transparency, developing accountability frameworks, establishing confidence standards for LLM outputs in high-risk settings, and enhancing industry research review processes. It concludes that while GPT-4's report is a step towards open discussions on LLMs, more extensive interdisciplinary reviews are essential for addressing bias, harm, and risk concerns, especially in high-risk domains. The review aims to expand the understanding of LLMs in general and highlights the need for new reflection forms on how LLMs are reviewed, the data required for effective evaluation, and addressing critical issues like bias and risk.
Publisher OA PDF DOI
Considering Biased Data as Informative Artifacts in AI-Assisted Health Care
New England Journal of Medicine · 2023 · 106 citations
Senior authorCorresponding
- Computer Science
- Artificial Intelligence
- Psychology
Medical AI tools that are trained on skewed data may exhibit bias. The authors discuss how thinking of clinical data as artifacts can identify values, practices, and patterns of inequity in health care.
Publisher DOI
Mitigating the impact of biased artificial intelligence in emergency decision-making
Communications Medicine · 2022 · 75 citations
Senior authorCorresponding
- Artificial Intelligence
- Computer Science
- Psychology
BACKGROUND: Prior research has shown that artificial intelligence (AI) systems often encode biases against minority subgroups. However, little work has focused on ways to mitigate the harm discriminatory algorithms can cause in high-stakes settings such as medicine. METHODS: In this study, we experimentally evaluated the impact biased AI recommendations have on emergency decisions, where participants respond to mental health crises by calling for either medical or police assistance. We recruited 438 clinicians and 516 non-experts to participate in our web-based experiment. We evaluated participant decision-making with and without advice from biased and unbiased AI systems. We also varied the style of the AI advice, framing it either as prescriptive recommendations or descriptive flags. RESULTS: Participant decisions are unbiased without AI advice. However, both clinicians and non-experts are influenced by prescriptive recommendations from a biased algorithm, choosing police help more often in emergencies involving African-American or Muslim men. Crucially, using descriptive flags rather than prescriptive recommendations allows respondents to retain their original, unbiased decision-making. CONCLUSIONS: Our work demonstrates the practical danger of using biased models in health contexts, and suggests that appropriately framing decision support can mitigate the effects of AI bias. These findings must be carefully considered in the many real-world clinical scenarios where inaccurate or biased models may be used to inform important decisions.
Publisher OA PDF DOI
AI recognition of patient race in medical imaging: a modelling study
The Lancet Digital Health · 2022 · 497 citations
- Artificial Intelligence
- Machine Learning
- Medicine
BACKGROUND: Previous studies in medical imaging have shown disparate abilities of artificial intelligence (AI) to detect a person's race, yet there is no known correlation for race on medical imaging that would be obvious to human experts when interpreting the images. We aimed to conduct a comprehensive evaluation of the ability of AI to recognise a patient's racial identity from medical images. METHODS: Using private (Emory CXR, Emory Chest CT, Emory Cervical Spine, and Emory Mammogram) and public (MIMIC-CXR, CheXpert, National Lung Cancer Screening Trial, RSNA Pulmonary Embolism CT, and Digital Hand Atlas) datasets, we evaluated, first, performance quantification of deep learning models in detecting race from medical images, including the ability of these models to generalise to external environments and across multiple imaging modalities. Second, we assessed possible confounding of anatomic and phenotypic population features by assessing the ability of these hypothesised confounders to detect race in isolation using regression models, and by re-evaluating the deep learning models by testing them on datasets stratified by these hypothesised confounding variables. Last, by exploring the effect of image corruptions on model performance, we investigated the underlying mechanism by which AI models can recognise race. FINDINGS: In our study, we show that standard AI deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities, which was sustained under external validation conditions (x-ray imaging [area under the receiver operating characteristics curve (AUC) range 0·91-0·99], CT chest imaging [0·87-0·96], and mammography [0·81]). We also showed that this detection is not due to proxies or imaging-related surrogate covariates for race (eg, performance of possible confounders: body-mass index [AUC 0·55], disease distribution [0·61], and breast density [0·61]). Finally, we provide evidence to show that the ability of AI deep learning models persisted over all anatomical regions and frequency spectrums of the images, suggesting the efforts to control this behaviour when it is undesirable will be challenging and demand further study. INTERPRETATION: The results from our study emphasise that the ability of AI deep learning models to predict self-reported race is itself not the issue of importance. However, our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot, creates an enormous risk for all model deployments in medical imaging. FUNDING: National Institute of Biomedical Imaging and Bioengineering, MIDRC grant of National Institutes of Health, US National Science Foundation, National Library of Medicine of the National Institutes of Health, and Taiwan Ministry of Science and Technology.
DOI
Do as AI say: susceptibility in deployment of clinical decision-aids
npj Digital Medicine · 2021 · 420 citations
Senior authorCorresponding
- Computer Science
- Psychology
- Medicine
Artificial intelligence (AI) models for decision support have been developed for clinical settings such as radiology, but little work evaluates the potential impact of such systems. In this study, physicians received chest X-rays and diagnostic advice, some of which was inaccurate, and were asked to evaluate advice quality and make diagnoses. All advice was generated by human experts, but some was labeled as coming from an AI system. As a group, radiologists rated advice as lower quality when it appeared to come from an AI system; physicians with less task-expertise did not. Diagnostic accuracy was significantly worse when participants received inaccurate advice, regardless of the purported source. This work raises important considerations for how advice, AI and non-AI, should be deployed in clinical environments.
DOI
The role of machine learning in clinical research: transforming the future of evidence generation
Trials · 2021 · 277 citations
Senior authorCorresponding
- Political Science
- Computer Science
- Medicine
BACKGROUND: Interest in the application of machine learning (ML) to the design, conduct, and analysis of clinical trials has grown, but the evidence base for such applications has not been surveyed. This manuscript reviews the proceedings of a multi-stakeholder conference to discuss the current and future state of ML for clinical research. Key areas of clinical trial methodology in which ML holds particular promise and priority areas for further investigation are presented alongside a narrative review of evidence supporting the use of ML across the clinical trial spectrum. RESULTS: Conference attendees included stakeholders, such as biomedical and ML researchers, representatives from the US Food and Drug Administration (FDA), artificial intelligence technology and data analytics companies, non-profit organizations, patient advocacy groups, and pharmaceutical companies. ML contributions to clinical research were highlighted in the pre-trial phase, cohort selection and participant management, and data collection and analysis. A particular focus was paid to the operational and philosophical barriers to ML in clinical research. Peer-reviewed evidence was noted to be lacking in several areas. CONCLUSIONS: ML holds great promise for improving the efficiency and quality of clinical research, but substantial barriers remain, the surmounting of which will require addressing significant gaps in evidence.
Publisher OA PDF DOI
The false hope of current approaches to explainable artificial intelligence in health care
The Lancet Digital Health · 2021 · 1280 citations
1st authorCorresponding
- Computer Science
- Artificial Intelligence
- Computer Science
The black-box nature of current artificial intelligence (AI) has caused some to question whether AI must be explainable to be used in high-stakes scenarios such as medicine. It has been argued that explainable AI will engender trust with the health-care workforce, provide transparency into the AI decision making process, and potentially mitigate various kinds of bias. In this Viewpoint, we argue that this argument represents a false hope for explainable AI and that current explainability methods are unlikely to achieve these goals for patient-level decision support. We provide an overview of current explainability techniques and highlight how various failure cases can cause problems for decision making for individual patients. In the absence of suitable explainability methods, we advocate for rigorous internal and external validation of AI models as a more direct means of achieving the goals often associated with explainability, and we caution against having explainability be a requirement for clinically deployed models.
DOI
Reproducibility in machine learning for health research: Still a ways to go
Science Translational Medicine · 2021 · 253 citations
Senior authorCorresponding
- Machine Learning
- Computer Science
- Artificial Intelligence
Machine learning for health must be reproducible to ensure reliable clinical use. We evaluated 511 scientific papers across several machine learning subfields and found that machine learning for health compared poorly to other areas regarding reproducibility metrics, such as dataset and code accessibility. We propose recommendations to address this problem.
DOI

Frequent coauthors

Laura C. Rosella
Public Health Ontario
211 shared
Amol A. Verma
194 shared
Michael Fralick
191 shared
Timothy C. Y. Chan
University of Toronto
185 shared
Fahad Razak
St. Michael's Hospital
185 shared
Angela M. Cheung
University Health Network
185 shared
Adina Weinerman
Health Sciences Centre
185 shared
Shail Rawal
University Health Network
183 shared

Education

PhD, EECS
Massachusetts Institute of Technology
2017
MsC by Research, Biomedical Engineering
University of Oxford
2011
BSEE, Electrical Engineering
New Mexico State University
2005
BSCS, Computer Science
New Mexico State University
2005

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Marzyeh Ghassemi

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you