David Gifford

· Professor

Massachusetts Institute of Technology · Biological Engineering

Active 1977–2025

h-index81

Citations58.8k

Papers30043 last 5y

Funding$28.0M1 active

Faculty page Lab page

See your match with David Gifford — sign in to PhdFit.Sign in

About

David Gifford, PhD, is a Professor of Electrical Engineering and Computer Science, as well as a Professor of Biological Engineering at MIT. He received his BS from MIT in 1976 and his PhD from Stanford University in 1981. Since joining the MIT faculty in 1982, he has developed new machine learning techniques and algorithms to model transcriptional regulatory networks that control gene expression programs in living cells. His research group focuses on creating combined computational and experimental approaches to discover novel biology and human therapeutics, utilizing interpretable computational models trained and validated with experimental evidence. Gifford's work involves applying these models to problems in experiment design, developmental biology, gene regulation, immunology, genomics, and human therapeutics. His group evaluates models and uncovers new biology through multiplexed high-throughput experimental studies involving populations of cells and single cells. A key challenge addressed by his research is the incomplete knowledge of biological systems, leading to model uncertainty. His team actively develops uncertainty metrics for models to guide experiment design and improve model accuracy. His computational approaches incorporate large-scale linear and non-linear models, Bayesian methods, and deep learning. His current biological focus areas include motor neuron development, single-cell perturbation studies, chromatin accessibility regulation, the regulatory genome, antibody design, and peptide presentation by MHC proteins.

Research topics

Computational biology
Genetics
Biology
Computer Science
Artificial Intelligence
Evolutionary biology
Medicine
Virology

Selected publications

Deep mapping of the TCR-antigen interface using pMHC-pseudotyped viruses and yeast display
bioRxiv (Cold Spring Harbor Laboratory) · 2025-08-27 · 1 citations
preprintOpen access
T cell receptor (TCR) specificity is central to the efficacy of T cell therapies, yet scalable methods to map how TCR sequences shape antigen recognition remain limited. To address this, we introduce VelociRAPTR, a library-on-library approach that combines yeast-displayed TCR libraries with pMHC-displaying virus-like particles (pMHC-VLPs) to rapidly screen millions of TCR-antigen interactions. We show that pMHC-VLPs efficiently bind TCRs on yeast and generate equivalent data to recombinantly produced pMHC protein. We then apply VelociRAPTR to screen 47 million variants of the A6 and 868 TCRs against 92 pMHCs simultaneously, mutating both the CDR3 loops and cognate peptides. The resulting CDR3-pMHC maps reveal biased recognition patterns, where mutations to CDR3 loops can selectively constrain or broaden specificity to peptide analogs. These insights provide a foundation for engineering TCRs with defined pMHC binding profiles and improving models that predict TCR-antigen interactions, including the prediction of off-target recognition. By coupling the scale of yeast display with the modularity of VLPs, VelociRAPTR offers a generalizable strategy for generating deep, high-throughput protein-protein interaction data.
Publisher OA PDF DOI
CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods
Genome biology · 2024-02-22 · 56 citations
letterOpen access
BACKGROUND: The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS: Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS: Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Publisher OA PDF DOI
Training Data Attribution for Diffusion Models
arXiv (Cornell University) · 2023-06-03 · 2 citations
preprintOpen accessSenior author
Diffusion models have become increasingly popular for synthesizing high-quality samples based on training datasets. However, given the oftentimes enormous sizes of the training datasets, it is difficult to assess how training data impact the samples produced by a trained diffusion model. The difficulty of relating diffusion model inputs and outputs poses significant challenges to model explainability and training data attribution. Here we propose a novel solution that reveals how training data influence the output of diffusion models through the use of ensembles. In our approach individual models in an encoded ensemble are trained on carefully engineered splits of the overall training data to permit the identification of influential training examples. The resulting model ensembles enable efficient ablation of training data influence, allowing us to assess the impact of training data on model outputs. We demonstrate the viability of these ensembles as generative models and the validity of our approach to assessing influence.
Publisher OA PDF DOI
A pan-variant mRNA-LNP T cell vaccine protects HLA transgenic mice from mortality after infection with SARS-CoV-2 Beta
Frontiers in Immunology · 2023-03-08 · 9 citations
articleOpen accessSenior authorCorresponding
Licensed COVID-19 vaccines ameliorate viral infection by inducing production of neutralizing antibodies that bind the SARS-CoV-2 Spike protein and inhibit viral cellular entry. However, the clinical effectiveness of these vaccines is transitory as viral variants escape antibody neutralization. Effective vaccines that solely rely upon a T cell response to combat SARS-CoV-2 infection could be transformational because they can utilize highly conserved short pan-variant peptide epitopes, but a mRNA-LNP T cell vaccine has not been shown to provide effective anti-SARS-CoV-2 prophylaxis. Here we show a mRNA-LNP vaccine (MIT-T-COVID) based on highly conserved short peptide epitopes activates CD8 + and CD4 + T cell responses that attenuate morbidity and prevent mortality in HLA-A*02:01 transgenic mice infected with SARS-CoV-2 Beta (B.1.351). We found CD8 + T cells in mice immunized with MIT-T-COVID vaccine significantly increased from 1.1% to 24.0% of total pulmonary nucleated cells prior to and at 7 days post infection (dpi), respectively, indicating dynamic recruitment of circulating specific T cells into the infected lungs. Mice immunized with MIT-T-COVID had 2.8 (2 dpi) and 3.3 (7 dpi) times more lung infiltrating CD8 + T cells than unimmunized mice. Mice immunized with MIT-T-COVID had 17.4 times more lung infiltrating CD4 + T cells than unimmunized mice (7 dpi). The undetectable specific antibody response in MIT-T-COVID-immunized mice demonstrates specific T cell responses alone can effectively attenuate the pathogenesis of SARS-CoV-2 infection. Our results suggest further study is merited for pan-variant T cell vaccines, including for individuals that cannot produce neutralizing antibodies or to help mitigate Long COVID.
Publisher OA PDF DOI
Systematic elucidation of genetic mechanisms underlying cholesterol uptake
bioRxiv (Cold Spring Harbor Laboratory) · 2023-01-10 · 2 citations
preprintOpen access
Summary Genetic variation contributes greatly to LDL cholesterol (LDL-C) levels and coronary artery disease risk. By combining analysis of rare coding variants from the UK Biobank and genome-scale CRISPR-Cas9 knockout and activation screening, we have substantially improved the identification of genes whose disruption alters serum LDL-C levels. We identify 21 genes in which rare coding variants significantly alter LDL-C levels at least partially through altered LDL-C uptake. We use co-essentiality-based gene module analysis to show that dysfunction of the RAB10 vesicle transport pathway leads to hypercholesterolemia in humans and mice by impairing surface LDL receptor levels. Further, we demonstrate that loss of function of OTX2 leads to robust reduction in serum LDL-C levels in mice and humans by increasing cellular LDL-C uptake. Altogether, we present an integrated approach that improves our understanding of genetic regulators of LDL-C levels and provides a roadmap for further efforts to dissect complex human disease genetics.
Publisher OA PDF DOI
Constrained Submodular Optimization for Vaccine Design
Proceedings of the AAAI Conference on Artificial Intelligence · 2023-06-26 · 2 citations
articleOpen accessSenior author
Advances in machine learning have enabled the prediction of immune system responses to prophylactic and therapeutic vaccines. However, the engineering task of designing vaccines remains a challenge. In particular, the genetic variability of the human immune system makes it difficult to design peptide vaccines that provide widespread immunity in vaccinated populations. We introduce a framework for evaluating and designing peptide vaccines that uses probabilistic machine learning models, and demonstrate its ability to produce designs for a SARS-CoV-2 vaccine that outperform previous designs. We provide a theoretical analysis of the approximability, scalability, and complexity of our framework.
Publisher OA PDF DOI
Systematic elucidation of genetic mechanisms underlying cholesterol uptake
Cell Genomics · 2023-04-21 · 16 citations
articleOpen access
Genetic variation contributes greatly to LDL cholesterol (LDL-C) levels and coronary artery disease risk. By combining analysis of rare coding variants from the UK Biobank and genome-scale CRISPR-Cas9 knockout and activation screening, we substantially improve the identification of genes whose disruption alters serum LDL-C levels. We identify 21 genes in which rare coding variants significantly alter LDL-C levels at least partially through altered LDL-C uptake. We use co-essentiality-based gene module analysis to show that dysfunction of the RAB10 vesicle transport pathway leads to hypercholesterolemia in humans and mice by impairing surface LDL receptor levels. Further, we demonstrate that loss of function of OTX2 leads to robust reduction in serum LDL-C levels in mice and humans by increasing cellular LDL-C uptake. Altogether, we present an integrated approach that improves our understanding of the genetic regulators of LDL-C levels and provides a roadmap for further efforts to dissect complex human disease genetics.
Publisher DOI
Author response: A high-throughput yeast display approach to profile pathogen proteomes for MHC-II binding
2022-07-03 · 1 citations
peer-reviewOpen access
Yeast surface-displayed libraries, when coupled with pooled oligonucleotide synthesis and next-generation sequencing, can be used as a platform to assess binding of whole viral proteomes to class II major histocompatibility complex proteins.
Publisher DOI
A high-throughput yeast display approach to profile pathogen proteomes for MHC-II binding
bioRxiv (Cold Spring Harbor Laboratory) · 2022-02-24 · 1 citations
preprintOpen access
Abstract T cells play a critical role in the adaptive immune response, recognizing peptide antigens presented on the cell surface by Major Histocompatibility Complex (MHC) proteins. While assessing peptides for MHC binding is an important component of probing these interactions, traditional assays for testing peptides of interest for MHC binding are limited in throughput. Here we present a yeast display-based platform for assessing the binding of tens of thousands of user-defined peptides in a high throughput manner. We apply this approach to assess a tiled library covering the SARS-CoV-2 proteome and four dengue virus serotypes for binding to human class II MHCs, including HLA-DR401, -DR402, and -DR404. This approach identifies binders missed by computational prediction, highlighting the potential for systemic computational errors given even state-of-the-art training data, and underlines design considerations for epitope identification experiments. This platform serves as a framework for examining relationships between viral conservation and MHC binding, and can be used to identify potentially high-interest peptide binders from viral proteins. These results demonstrate the utility of this approach for determining high-confidence peptide-MHC binding.
Publisher OA PDF DOI
A high-throughput yeast display approach to profile pathogen proteomes for MHC-II binding
eLife · 2022-07-04 · 28 citations
articleOpen access
T cells play a critical role in the adaptive immune response, recognizing peptide antigens presented on the cell surface by major histocompatibility complex (MHC) proteins. While assessing peptides for MHC binding is an important component of probing these interactions, traditional assays for testing peptides of interest for MHC binding are limited in throughput. Here, we present a yeast display-based platform for assessing the binding of tens of thousands of user-defined peptides in a high-throughput manner. We apply this approach to assess a tiled library covering the SARS-CoV-2 proteome and four dengue virus serotypes for binding to human class II MHCs, including HLA-DR401, -DR402, and -DR404. While the peptide datasets show broad agreement with previously described MHC-binding motifs, they additionally reveal experimentally validated computational false positives and false negatives. We therefore present this approach as able to complement current experimental datasets and computational predictions. Further, our yeast display approach underlines design considerations for epitope identification experiments and serves as a framework for examining relationships between viral conservation and MHC binding, which can be used to identify potentially high-interest peptide binders from viral proteins. These results demonstrate the utility of our approach to determine peptide-MHC binding interactions in a manner that can supplement and potentially enhance current algorithm-based approaches.
Publisher DOI

Recent grants

NIH Grant P01NS055923
NIH · $13.2M · 2013
NIH Grant R01GM069676
NIH · $3.0M · 2008
Deep learning based antibody design using high-throughput affinity testing of synthetic sequences
NIH · $2.9M · 2018–2026
High-throughput methods for elucidating the control of chromatin accessibility
NIH · $2.3M · 2015–2020
NIH Grant T32HG004947
NIH · $957k · 2016

Frequent coauthors

Richard I. Sherwood
Brigham and Women's Hospital
71 shared
Richard A. Young
55 shared
Tommi Jaakkola
52 shared
Shaun Mahony
43 shared
Yuchun Guo
Fujian Agriculture and Forestry University
29 shared
Georg K. Gerber
Brigham and Women's Hospital
29 shared
Gerald R. Fink
Whitehead Institute for Biomedical Research
28 shared
Douglas A. Melton
University of Missouri
23 shared

Education

Ph.D., Biomolecular Engineering
Massachusetts Institute of Technology
1995
B.S., Chemical Engineering
University of California, Berkeley
1990

Awards & honors

Wishnok Prize

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with David Gifford

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you