
Teresa Head-Gordon
· Chancellor's Professor of Chemistry, Bioengineering, and Chemical & Biomolecular EngineeringUniversity of California, Berkeley · Department of Chemical and Biomolecular Engineering
Active 1987–2026
About
Teresa Head-Gordon is the Chancellor's Professor of Chemistry, Bioengineering, and Chemical & Biomolecular Engineering at the University of California, Berkeley. She joined the UC Berkeley faculty in 2001 and has held the title of Chancellor's Professor from 2012 to 2020. Her research focuses on computation and theory in the areas of chemistry, materials, and biophysics, with an emphasis on the development of general computational models and methodologies applied to molecular liquids, macromolecular assemblies, protein biophysics, and catalysis. Her lab develops complex chemistry models, accelerated sampling methods, coarse graining and multiscale techniques, analytical solutions to the Poisson-Boltzmann Equation, and advanced self-consistent field (SCF) solvers and SCF-less methods for many-body physics. The methods and models from her research are widely disseminated through community software codes that scale on high-performance computing. She has held positions such as Schlumberger Professor at Cambridge University, Clare Hall Faculty, and is a Fellow of the American Institute for Medical and Biological Engineering and the American Chemical Society. Her work is at the intersection of energy, molecular biology, nanotechnology, and advanced scientific computing, contributing significantly to the development of theoretical and computational chemistry.
Selected publications
A Computational Community Blind Challenge on Pan-Coronavirus Drug Discovery Data
Journal of Chemical Information and Modeling · 2026-02-26 · 3 citations
articleComputational blind challenges offer critical, unbiased opportunities to assess and accelerate scientific progress, as demonstrated by a breadth of breakthroughs over the past decade. We report the outcomes and key insights from an open science community blind challenge focused on computational methods in drug discovery, using lead optimization data from the AI-driven Structure-enabled Antiviral Platform Discovery Consortium's pan-coronavirus antiviral discovery program, in partnership with Polaris and the OpenADMET project. This collaborative initiative invited global participants from both academia and industry to develop and apply computational methods to predict the biochemical potency and crystallographic ligand poses of small molecules against key coronavirus targets, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) and Middle East Respiratory Syndrome Coronavirus (MERS-CoV) main protease (Mpro), as well as multiple ADMET assay end points, using previously undisclosed comprehensive experimental drug discovery data sets as benchmarks. By evaluating submissions across multiple tasks and compounds, we established performance leaderboards and conducted meta-analyses to assess methodological strengths, common pitfalls, and areas for improvement. This analysis provides a foundation for best practices in real-world machine learning evaluation, grounded in community-driven benchmarking. We also highlight how next-generation platforms, such as Polaris, enable rigorous challenge design, embedded evaluation frameworks, and broad community engagement. This paper reports the collective findings of the challenge, offering a high-level overview of the data, evaluation infrastructure, and top-performing strategies. We further provide context and support for the accompanying papers authored by the challenge participants in this special issue, which explore individual approaches in greater depth. Together, these contributions aim to advance reproducible, trustworthy, and high-impact computational methods in drug discovery, and to explore best practices and pitfalls in future blind challenge design and execution, including planned initiatives for the OpenADMET project.
Energetics of Non-covalent Interactions of Protein-Ligand Complexes for Drug Discovery
ChemRxiv · 2026-02-05
articleOpen accessSenior authorAccurate modeling of non-covalent protein-ligand interactions is critical for applications such as enzyme engineering and drug discovery. Here, we present a dataset of 14,905 protein-ligand interaction energies using experimental structures derived from HiQBind, a high-quality protein-ligand structural database, and subsequently fragmented into dimer configurations that are classified into non-covalent interaction (NCI) types-including hydrogen bonds, hydrophobic contacts, halogen bonds, salt bridges, cation-π, and π − π interactions. Each NCI category was further evaluated with energy decomposition analysis (EDA) as a powerful framework to partition total protein-ligand energies into physically meaningful NCI components for electrostatics, Pauli repulsion, dispersion, polarization, and charge transfer. We further use this data to benchmark the performance of current classical force fields, and current state of the art machine learned interaction potential (MLIP). Together, this dataset provides a quantitative quantum mechanical survey of protein-ligand energetics, offering new insights into the molecular origins of protein-ligand NCIs to inform drug design, and establishing benchmarks for next-generation force field and MLIP development.
A Computational Community Blind Challenge on Pan-Coronavirus Drug Discovery Data
ChemRxiv · 2026-01-06
articleOpen accessComputational blind challenges offer critical, unbiased assessment opportunities to assess and accelerate scientific progress, as demonstrated by a breadth of breakthroughs over the last decade. We report the outcomes and key insights from an open science community blind challenge focused on computational methods in drug discovery, using lead optimization data from the AI-driven Structure-enabled Antiviral Platform (ASAP) Discovery Consortium’s pan-coronavirus antiviral discovery program, in partnership with Polaris and the OpenADMET project. This collaborative initiative invited global participants from both academia and industry to develop and apply computational methods to predict the biochemical potency and crystallographic ligand poses of small molecules against key coronavirus targets, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) and Middle East Respiratory Syndrome Coronavirus (MERS-CoV) main protease (Mpro), as well as multiple ADMET assay endpoints, using previously undisclosed comprehensive experimental drug discovery datasets as benchmarks. By evaluating submissions across multiple tasks and compounds, we established performance leaderboards and conducted meta-analyses to assess methodological strengths, common pitfalls, and areas for improvement. This analysis provides a foundation for best practices in real-world machine learning evaluation, grounded in community-driven benchmarking. We also highlight how next-generation platforms, such as Polaris, enable rigorous challenge design, embedded evaluation frameworks, and broad community engagement. This paper reports the collective findings of the challenge, offering a high-level overview of the data, evaluation infrastructure, and top- performing strategies. We further provide context and support for the accompanying papers authored by the challenge participants in this special issue, which explore individual approaches in greater depth. Together, these contributions aim to advance reproducible, trustworthy, and high-impact computational methods in drug discovery, and to explore best practices and pitfalls in future blind challenge design and execution, including planned initiatives for the OpenADMET project.
Foundation models for atomistic simulation of chemistry and materials
Nature Reviews Chemistry · 2026-02-11 · 6 citations
articleSenior authorOpen MIND · 2026-02-07
preprintSenior authorMany molecules' vibrational frequencies are sensitive to intermolecular electric fields, enabling them to probe the field in complex molecular environments. However, it is often unclear whether the probe is responding to the local electric field or other types of intermolecular interactions, inhibiting interpretation of the frequency and effectiveness as probes. This is especially true of molecules whose vibrational frequencies blueshift instead of the more typical redshift in hydrogen bonding configurations. Here we computationally investigate the causes of redshifting versus blueshifting over a range of vibrational reporters. First, we apply adiabatic energy decomposition analysis to a paradigmatic set of probes, finding that redshifting only occurs when electrostatic interactions are strong enough to overcome the dominant and large blueshifting contribution of Pauli repulsion. Furthermore, we demonstrate that field inhomogeneity can further shift the frequency of many probes substantially to either reinforce or counteract the shift expected from a homogeneous field. We find that redshifting is reinforced by electric field inhomogeneity, otherwise field inhomogeneity further weakens the electrostatic contribution relative to Pauli repulsion, leading to blueshifting. Further calculations indicate that the probe's response to field inhomogeneity can be understood by considering the mass of the atoms involved in the stretching mode and sign of the electric field. In explaining the interplay of different intermolecular interactions and field inhomogeneity for many probes, our results should enable the use and interpretation of spectroscopic probes and their connection to electric fields in more complex systems.
SmileyLlama: modifying large language models for directed chemical space exploration
Nature Computational Science · 2026-05-11 · 3 citations
preprintOpen accessSenior authorCorrespondingHere we show that large language models (LLMs) can be transformed via supervised fine-tuning of engineered prompts into SmileyLlama for exploring the chemical space of drug molecules. We benchmark SmileyLlama against pretrained LLMs and chemical language models trained from scratch for generating valid and novel drug-like molecules, and use direct preference optimization to both improve SmileyLlama's adherence to a prompt and as part of the iMiner reinforcement learning framework to predict molecules with optimized three-dimensional conformations and high binding affinity to drug targets. By training an LLM to speak directly as a chemical language model, while retaining most of its natural language capabilities, we show that SmileyLlama can reliably generate molecules with user-specified properties rather than acting only as a chatbot with knowledge of chemistry or as a virtual assistant. While SmileyLlama is geared toward drug discovery, the supervised fine-tuning/direct preference optimization/LLM framework can be extended to other chemical, biological and materials applications.
The Journal of Physical Chemistry B · 2026-01-05 · 3 citations
articleSenior authorCorrespondingThe majority of machine learning scoring functions used in drug discovery for predicting protein-ligand binding poses and affinities have been trained on the PDBBind data set. However, it is unclear whether these new scoring functions are actually an improvement over traditional models since often the training and test sets are cross-contaminated with proteins and ligands with high similarity, and hence, they may not perform comparably well in binding prediction of unrelated protein-ligand complexes. In this work, we have carefully prepared a new split of the PDBBind data set to control for data leakage, defined as proteins and ligands with high sequence and structural similarity. The resulting leak-proof (LP)-PDBBind data are used to retrain four popular SFs: AutoDock Vina, Random Forest (RF)-Score, InteractionGraphNet (IGN), and DeepDTA, to better test their capabilities when applied to new protein-ligand complexes. In particular, we have formulated a new independent data set, BDB2020+, by matching high-quality binding free energies from BindingDB with cocrystallized ligand-protein complexes from the PDB that have been deposited since 2020. Based on all of the benchmark results, the retrained models using LP-PDBBind consistently perform better, with IGN especially being recommended for scoring and ranking applications for new protein-ligand systems.
LinkLlama: Enabling Large Language Model for Chemically Reasonable Linker Design
bioRxiv (Cold Spring Harbor Laboratory) · 2026-04-16
articleOpen accessSenior authorCorrespondingFragment-based drug discovery (FBDD) relies heavily on the design of chemically viable linkers to connect fragments binding to different pocket regions into potent lead molecules. While recent generative models have advanced spatial fragment linking, they frequently produce linkers characterized by high torsional strain and non-drug-like motifs. In this work, we present LinkLlama, a fine-tuned Meta Llama 3 model that bridges the gap between text-based generation and 3D spatial awareness. By accepting natural language prompts that specify geometric constraints, such as distances and angles, alongside physicochemical targets like Lipinski's rules and rotatable bond limits, LinkLlama generates highly tailored molecules for the input fragments. Leveraging the inherent chemical grammar captured through supervised fine-tuning on a curated corpus of drug-like molecules from ChEMBL, the model prioritizes chemical validity without requiring complex reinforcement learning loops. Benchmarking on the ZINC and HiQBind datasets demonstrates that LinkLlama maintains competitive geometric fidelity compared to strictly 3D-aware models while achieving a two-fold increase in the proportion of chemically reasonable designs. This rising success rate, jumping from 35% to over 80%, is defined by strict adherence to comprehensive structural filters including PAINS, non-drug-like chemical patterns and complex ring systems. We further illustrate the model's versatility through prospective case studies in novel small-molecule scaffold hopping and PROTAC linker design, validated via molecular docking and molecular dynamics simulations against known crystal poses. Ultimately, LinkLlama demonstrates that large language models can overcome the structural pitfalls of purely 3D-generative methods, offering a highly controllable and chemically robust framework to accelerate linker design and drug discovery in general.
arXiv (Cornell University) · 2026-02-07
articleOpen accessSenior authorMany molecules' vibrational frequencies are sensitive to intermolecular electric fields, enabling them to probe the field in complex molecular environments. However, it is often unclear whether the probe is responding to the local electric field or other types of intermolecular interactions, inhibiting interpretation of the frequency and effectiveness as probes. This is especially true of molecules whose vibrational frequencies blueshift instead of the more typical redshift in hydrogen bonding configurations. Here we computationally investigate the causes of redshifting versus blueshifting over a range of vibrational reporters. First, we apply adiabatic energy decomposition analysis to a paradigmatic set of probes, finding that redshifting only occurs when electrostatic interactions are strong enough to overcome the dominant and large blueshifting contribution of Pauli repulsion. Furthermore, we demonstrate that field inhomogeneity can further shift the frequency of many probes substantially to either reinforce or counteract the shift expected from a homogeneous field. We find that redshifting is reinforced by electric field inhomogeneity, otherwise field inhomogeneity further weakens the electrostatic contribution relative to Pauli repulsion, leading to blueshifting. Further calculations indicate that the probe's response to field inhomogeneity can be understood by considering the mass of the atoms involved in the stretching mode and sign of the electric field. In explaining the interplay of different intermolecular interactions and field inhomogeneity for many probes, our results should enable the use and interpretation of spectroscopic probes and their connection to electric fields in more complex systems.
More Accurate Binding Affinity Prediction Using Protein Homology and Ligand-Based Transfer Learning
Journal of Chemical Information and Modeling · 2026-02-06
articleSenior authorAccurate and rapid prediction of protein-ligand binding affinities is critical for drug discovery, particularly when evaluating large chemical libraries or new drug molecules from high-throughput generative models. We present UCBbind, a hybrid framework that combines a similarity-based transfer module with a deep-learning-based prediction module, to efficiently estimate binding affinities of small molecules to target proteins. For each query protein-ligand pair, UCBbind transfers experimental data from highly similar reference pairs when available and applies the prediction module when no sufficiently similar reference exists. We benchmarked UCBbind on multiple datasets, including the CASF-2016 set, the HiQBind dataset post 2020, and the COVID Moonshot database. Our results show that UCBbind achieves state-of-the-art predictive performance, particularly for test entries with high similarity to well-characterized reference proteins and ligands, and can support downstream tasks such as binding site prediction and binder/nonbinder classification.
Awards & honors
- IBM SUR Award (2001)
- Fellow, American Institute for Medical and Biological Engine…
- Fellow, American Chemical Society (2018)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Teresa Head-Gordon
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup