
David Gifford
· ProfessorMassachusetts Institute of Technology · Biological Engineering
Active 1977–2025
About
David Gifford, PhD, is a Professor of Electrical Engineering and Computer Science, as well as a Professor of Biological Engineering at MIT. He received his BS from MIT in 1976 and his PhD from Stanford University in 1981. Since joining the MIT faculty in 1982, he has developed new machine learning techniques and algorithms to model transcriptional regulatory networks that control gene expression programs in living cells. His research group focuses on creating combined computational and experimental approaches to discover novel biology and human therapeutics, utilizing interpretable computational models trained and validated with experimental evidence. Gifford's work involves applying these models to problems in experiment design, developmental biology, gene regulation, immunology, genomics, and human therapeutics. His group evaluates models and uncovers new biology through multiplexed high-throughput experimental studies involving populations of cells and single cells. A key challenge addressed by his research is the incomplete knowledge of biological systems, leading to model uncertainty. His team actively develops uncertainty metrics for models to guide experiment design and improve model accuracy. His computational approaches incorporate large-scale linear and non-linear models, Bayesian methods, and deep learning. His current biological focus areas include motor neuron development, single-cell perturbation studies, chromatin accessibility regulation, the regulatory genome, antibody design, and peptide presentation by MHC proteins.
Research topics
- Computational biology
- Genetics
- Biology
- Computer Science
- Artificial Intelligence
- Evolutionary biology
- Medicine
- Virology
Selected publications
Deep mapping of the TCR-antigen interface using pMHC-pseudotyped viruses and yeast display
bioRxiv (Cold Spring Harbor Laboratory) · 2025-08-27 · 1 citations
preprintOpen accessT cell receptor (TCR) specificity is central to the efficacy of T cell therapies, yet scalable methods to map how TCR sequences shape antigen recognition remain limited. To address this, we introduce VelociRAPTR, a library-on-library approach that combines yeast-displayed TCR libraries with pMHC-displaying virus-like particles (pMHC-VLPs) to rapidly screen millions of TCR-antigen interactions. We show that pMHC-VLPs efficiently bind TCRs on yeast and generate equivalent data to recombinantly produced pMHC protein. We then apply VelociRAPTR to screen 47 million variants of the A6 and 868 TCRs against 92 pMHCs simultaneously, mutating both the CDR3 loops and cognate peptides. The resulting CDR3-pMHC maps reveal biased recognition patterns, where mutations to CDR3 loops can selectively constrain or broaden specificity to peptide analogs. These insights provide a foundation for engineering TCRs with defined pMHC binding profiles and improving models that predict TCR-antigen interactions, including the prediction of off-target recognition. By coupling the scale of yeast display with the modularity of VLPs, VelociRAPTR offers a generalizable strategy for generating deep, high-throughput protein-protein interaction data.
Genome biology · 2024-02-22 · 56 citations
letterOpen accessBACKGROUND: The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS: Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS: Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Training Data Attribution for Diffusion Models
arXiv (Cornell University) · 2023-06-03 · 2 citations
preprintOpen accessSenior authorDiffusion models have become increasingly popular for synthesizing high-quality samples based on training datasets. However, given the oftentimes enormous sizes of the training datasets, it is difficult to assess how training data impact the samples produced by a trained diffusion model. The difficulty of relating diffusion model inputs and outputs poses significant challenges to model explainability and training data attribution. Here we propose a novel solution that reveals how training data influence the output of diffusion models through the use of ensembles. In our approach individual models in an encoded ensemble are trained on carefully engineered splits of the overall training data to permit the identification of influential training examples. The resulting model ensembles enable efficient ablation of training data influence, allowing us to assess the impact of training data on model outputs. We demonstrate the viability of these ensembles as generative models and the validity of our approach to assessing influence.
Frontiers in Immunology · 2023-03-08 · 9 citations
articleOpen accessSenior authorCorrespondingLicensed COVID-19 vaccines ameliorate viral infection by inducing production of neutralizing antibodies that bind the SARS-CoV-2 Spike protein and inhibit viral cellular entry. However, the clinical effectiveness of these vaccines is transitory as viral variants escape antibody neutralization. Effective vaccines that solely rely upon a T cell response to combat SARS-CoV-2 infection could be transformational because they can utilize highly conserved short pan-variant peptide epitopes, but a mRNA-LNP T cell vaccine has not been shown to provide effective anti-SARS-CoV-2 prophylaxis. Here we show a mRNA-LNP vaccine (MIT-T-COVID) based on highly conserved short peptide epitopes activates CD8 + and CD4 + T cell responses that attenuate morbidity and prevent mortality in HLA-A*02:01 transgenic mice infected with SARS-CoV-2 Beta (B.1.351). We found CD8 + T cells in mice immunized with MIT-T-COVID vaccine significantly increased from 1.1% to 24.0% of total pulmonary nucleated cells prior to and at 7 days post infection (dpi), respectively, indicating dynamic recruitment of circulating specific T cells into the infected lungs. Mice immunized with MIT-T-COVID had 2.8 (2 dpi) and 3.3 (7 dpi) times more lung infiltrating CD8 + T cells than unimmunized mice. Mice immunized with MIT-T-COVID had 17.4 times more lung infiltrating CD4 + T cells than unimmunized mice (7 dpi). The undetectable specific antibody response in MIT-T-COVID-immunized mice demonstrates specific T cell responses alone can effectively attenuate the pathogenesis of SARS-CoV-2 infection. Our results suggest further study is merited for pan-variant T cell vaccines, including for individuals that cannot produce neutralizing antibodies or to help mitigate Long COVID.
Systematic elucidation of genetic mechanisms underlying cholesterol uptake
bioRxiv (Cold Spring Harbor Laboratory) · 2023-01-10 · 2 citations
preprintOpen accessSummary Genetic variation contributes greatly to LDL cholesterol (LDL-C) levels and coronary artery disease risk. By combining analysis of rare coding variants from the UK Biobank and genome-scale CRISPR-Cas9 knockout and activation screening, we have substantially improved the identification of genes whose disruption alters serum LDL-C levels. We identify 21 genes in which rare coding variants significantly alter LDL-C levels at least partially through altered LDL-C uptake. We use co-essentiality-based gene module analysis to show that dysfunction of the RAB10 vesicle transport pathway leads to hypercholesterolemia in humans and mice by impairing surface LDL receptor levels. Further, we demonstrate that loss of function of OTX2 leads to robust reduction in serum LDL-C levels in mice and humans by increasing cellular LDL-C uptake. Altogether, we present an integrated approach that improves our understanding of genetic regulators of LDL-C levels and provides a roadmap for further efforts to dissect complex human disease genetics.
Constrained Submodular Optimization for Vaccine Design
Proceedings of the AAAI Conference on Artificial Intelligence · 2023-06-26 · 2 citations
articleOpen accessSenior authorAdvances in machine learning have enabled the prediction of immune system responses to prophylactic and therapeutic vaccines. However, the engineering task of designing vaccines remains a challenge. In particular, the genetic variability of the human immune system makes it difficult to design peptide vaccines that provide widespread immunity in vaccinated populations. We introduce a framework for evaluating and designing peptide vaccines that uses probabilistic machine learning models, and demonstrate its ability to produce designs for a SARS-CoV-2 vaccine that outperform previous designs. We provide a theoretical analysis of the approximability, scalability, and complexity of our framework.
Systematic elucidation of genetic mechanisms underlying cholesterol uptake
Cell Genomics · 2023-04-21 · 16 citations
articleOpen accessGenetic variation contributes greatly to LDL cholesterol (LDL-C) levels and coronary artery disease risk. By combining analysis of rare coding variants from the UK Biobank and genome-scale CRISPR-Cas9 knockout and activation screening, we substantially improve the identification of genes whose disruption alters serum LDL-C levels. We identify 21 genes in which rare coding variants significantly alter LDL-C levels at least partially through altered LDL-C uptake. We use co-essentiality-based gene module analysis to show that dysfunction of the RAB10 vesicle transport pathway leads to hypercholesterolemia in humans and mice by impairing surface LDL receptor levels. Further, we demonstrate that loss of function of OTX2 leads to robust reduction in serum LDL-C levels in mice and humans by increasing cellular LDL-C uptake. Altogether, we present an integrated approach that improves our understanding of the genetic regulators of LDL-C levels and provides a roadmap for further efforts to dissect complex human disease genetics.
2022-07-03 · 1 citations
peer-reviewOpen accessYeast surface-displayed libraries, when coupled with pooled oligonucleotide synthesis and next-generation sequencing, can be used as a platform to assess binding of whole viral proteomes to class II major histocompatibility complex proteins.
A high-throughput yeast display approach to profile pathogen proteomes for MHC-II binding
bioRxiv (Cold Spring Harbor Laboratory) · 2022-02-24 · 1 citations
preprintOpen accessAbstract T cells play a critical role in the adaptive immune response, recognizing peptide antigens presented on the cell surface by Major Histocompatibility Complex (MHC) proteins. While assessing peptides for MHC binding is an important component of probing these interactions, traditional assays for testing peptides of interest for MHC binding are limited in throughput. Here we present a yeast display-based platform for assessing the binding of tens of thousands of user-defined peptides in a high throughput manner. We apply this approach to assess a tiled library covering the SARS-CoV-2 proteome and four dengue virus serotypes for binding to human class II MHCs, including HLA-DR401, -DR402, and -DR404. This approach identifies binders missed by computational prediction, highlighting the potential for systemic computational errors given even state-of-the-art training data, and underlines design considerations for epitope identification experiments. This platform serves as a framework for examining relationships between viral conservation and MHC binding, and can be used to identify potentially high-interest peptide binders from viral proteins. These results demonstrate the utility of this approach for determining high-confidence peptide-MHC binding.
A high-throughput yeast display approach to profile pathogen proteomes for MHC-II binding
eLife · 2022-07-04 · 28 citations
articleOpen accessT cells play a critical role in the adaptive immune response, recognizing peptide antigens presented on the cell surface by major histocompatibility complex (MHC) proteins. While assessing peptides for MHC binding is an important component of probing these interactions, traditional assays for testing peptides of interest for MHC binding are limited in throughput. Here, we present a yeast display-based platform for assessing the binding of tens of thousands of user-defined peptides in a high-throughput manner. We apply this approach to assess a tiled library covering the SARS-CoV-2 proteome and four dengue virus serotypes for binding to human class II MHCs, including HLA-DR401, -DR402, and -DR404. While the peptide datasets show broad agreement with previously described MHC-binding motifs, they additionally reveal experimentally validated computational false positives and false negatives. We therefore present this approach as able to complement current experimental datasets and computational predictions. Further, our yeast display approach underlines design considerations for epitope identification experiments and serves as a framework for examining relationships between viral conservation and MHC binding, which can be used to identify potentially high-interest peptide binders from viral proteins. These results demonstrate the utility of our approach to determine peptide-MHC binding interactions in a manner that can supplement and potentially enhance current algorithm-based approaches.
Recent grants
NIH · $13.2M · 2013
NIH · $3.0M · 2008
Deep learning based antibody design using high-throughput affinity testing of synthetic sequences
NIH · $2.9M · 2018–2026
High-throughput methods for elucidating the control of chromatin accessibility
NIH · $2.3M · 2015–2020
NIH · $957k · 2016
Frequent coauthors
- 71 shared
Richard I. Sherwood
Brigham and Women's Hospital
- 55 shared
Richard A. Young
- 52 shared
Tommi Jaakkola
- 43 shared
Shaun Mahony
- 29 shared
Yuchun Guo
Fujian Agriculture and Forestry University
- 29 shared
Georg K. Gerber
Brigham and Women's Hospital
- 28 shared
Gerald R. Fink
Whitehead Institute for Biomedical Research
- 23 shared
Douglas A. Melton
University of Missouri
Education
- 1995
Ph.D., Biomolecular Engineering
Massachusetts Institute of Technology
- 1990
B.S., Chemical Engineering
University of California, Berkeley
Awards & honors
- Wishnok Prize
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with David Gifford
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup