Jeffrey Gray

· ProfessorVerified

Johns Hopkins University · Chemical and Biomolecular Engineering

Active 1964–2026

h-index71

Citations22.2k

Papers418108 last 5y

Funding$11.1M1 active

Faculty page Lab page

See your match with Jeffrey Gray — sign in to PhdFit.Sign in

About

Jeffrey J. Gray, Ph.D., is the Principal Investigator at GrayLab at Johns Hopkins University. His research focuses on computational protein structure prediction and design, with particular emphasis on protein-protein docking, therapeutic antibodies, glycoproteins, and deep learning. At Johns Hopkins, he has held several leadership roles including Vice Chair of the Department of Chemical and Biomolecular Engineering (ChemBE), engineering faculty senator, member of the Diversity Leadership Council, co-founder of the Homewood Council on Inclusive Excellence, and departmental diversity champion for ChemBE. He also serves as the Director of the Rosetta Commons, the NSF-supported Rosetta Commons Summer Intern Program, and the Rosetta Postbac Program. Additionally, he is involved in several commercial science activities.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Political Science
Biology
Medicine
Psychology
Medical education
Data science
Sociology
Social Science
Public relations
Social psychology
Engineering
Mathematics
Systems engineering
Human–computer interaction
Computational biology
World Wide Web
Nanotechnology
Combinatorics
Algorithm
Biochemistry
Programming language

Selected publications

Predicting Supramolecular Self-Assembly of Peptide Structures with AlphaFold3
bioRxiv (Cold Spring Harbor Laboratory) · 2026-04-30
articleOpen accessSenior author
Abstract Self-assembled peptide-based nanostructures have diverse applications in the pharmaceutical and materials fields, but accurately predicting their self-assembly behavior without time-intensive organic synthesis and characterization remains a significant challenge. Here, we assess the effectiveness of AlphaFold3 (AF3), a deep learning model for protein structure prediction, in modeling peptide-based nanostructures and the interactions driving supramolecular self-assembly. We designed amphiphilic peptides composed of alternating hydrophobic residues (valine, leucine, isoleucine, phenylalanine) and hydrophilic residues (glutamic acid), varying both sequence length and residue order. Using AF3’s multimer mode, we modeled assemblies with copy numbers ranging from 10 to 1000, generating diverse morphologies such as micelles and nanotubes. We qualitatively analyzed hydrophobic regions, secondary structures, and intermolecular interactions, while also calculating radii of gyration, packing scores, and aspect ratios using PyRosetta. Our results indicate that AF3 predicts morphologies consistent with hydrophobic driving forces and steric constraints. Increased hydrophobicity correlates with smaller radii of gyration, while higher copy numbers correspond to smaller aspect ratios (more compact structures). Longer hydrophobic segments lead to disordered structures, whereas longer hydrophilic segments promote organization. While AF3 captures systemic trends consistent with biophysical principles, comparisons to literature reveal discrepancies driven by charge effects and secondary structure bias, including an overemphasis on helical propensity (e.g., alanine-rich sequences) and sensitivity to terminal charge repulsion. Additionally, since AF3 is predisposed to predict a single assembled entity rather than higher-order assemblies such as multiple micelles or fibers, finding the optimal copy number for the best prediction requires system-specific iteration. These limitations highlight the need for complementary approaches with controlled chemical potential and environmental conditions, though qualitative agreement with experimental trends in morphology and compactness supports AF3’s utility for initial structure generation. Our findings highlight AF3’s potential as a user-friendly design tool for structure generation in peptide design, aiding the efficient development of functional self-assembled peptide nanomaterials.
Publisher DOI
Predictions from deep learning propose substantial protein–carbohydrate interplay
Proceedings of the National Academy of Sciences · 2026-05-18
articleOpen accessSenior authorCorresponding
Noncovalent interaction between proteins and carbohydrates (sugars, glycans) is the basis for biological functions from metabolic regulation to intercellular recognition. It is a grand challenge to identify the protein–carbohydrate interactomes in organisms. Direct experiments would require extensive libraries of glycans to distinguish binding from nonbinding proteins. Computational screening of proteins for carbohydrate binding potential provides an attractive alternative. Current estimates propose that <5% of proteins bind carbohydrates, a number that is not well established. We therefore developed a neural network, “Protein interaction of Carbohydrates Predictor” (PiCAP), to predict whether a protein noncovalently binds to a carbohydrate. We trained PiCAP on a manually curated dataset of known carbohydrate binders and proteins that we identified as likely not to bind carbohydrates (transcription factors, cytoskeletal components, and small-molecule-binding proteins). PiCAP achieves 90% balanced accuracy on protein-level predictions of carbohydrate binding/nonbinding. Using the same datasets, we developed Carbohydrate Protein Site Identifier 2 (CAPSIF2) to predict protein residues that interact noncovalently with carbohydrates. CAPSIF2 achieves a Dice coefficient of 0.57 on residue-level predictions on our independent test dataset, outperforming previous models. To demonstrate the models’ biological applicability, we investigated human cell surface proteins and further predicted the likelihood of carbohydrate binding in six proteomes ( Escherichia coli , Mus musculus, Homo sapiens, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster ). PiCAP predicts that ~35 to 40% of proteins in these proteomes bind carbohydrates, with 75% of extracellular and cell surface proteins predicted to bind. The PiCAP predicted binders are enriched for functions including growth factor receptor binding, inflammation, and cell–cell adhesion.
Publisher OA PDF DOI
Fitness Landscape for Antibodies 2: Benchmarking Reveals That Protein AI Models Cannot Yet Consistently Predict Developability Properties
SSRN Electronic Journal · 2026-01-01
preprintOpen accessSenior author
Publisher DOI
The Open Molecular Software Foundation (OMSF) and the Growing Role of Open Source Software in Molecular Modeling
Journal of Chemical Information and Modeling · 2026-03-10
article
The increasing importance and predictive power of modern molecular modeling, driven by physics- and machine-learning-based methods, necessitates a new collaborative architecture to replace the isolated, traditional model of software development. The traditional approach often led to redundant engineering effort, high costs, and opaque systems that limit reproducibility, independent scrutiny, and scientific independence. Additionally, it results in taxpayer-funded research being left siloed in commercial tools where it cannot have as much impact as if it were returned to the general public. This Perspective advocates for permissively licensed open source software as a scientific and economic multiplier by reducing the duplication of effort and enabling scientific validation of modeling tools and frictionless experimentation with new ideas. Coordinated multiproject consortia, such as Open Force Field, Open Free Energy, OpenFold, and OpenADMET, have formed to collaboratively build shared computational infrastructure and release all methods under permissive licenses. The success of these large-scale efforts requires organizational structures that extend beyond code. The Open Molecular Software Foundation (OMSF), a U.S. nonprofit, serves as a domain-specific institutional home and fiscal sponsor. By providing governance, administrative infrastructure, and dedicated research software engineers, OMSF aligns incentives across academic and industrial stakeholders. This framework enables a synergistic ecosystem where projects interoperate to accelerate innovation, eliminate duplication, and ensure long-term software sustainability, thereby creating durable foundations that elevate the entire molecular modeling community.
Publisher DOI
The growing role of open source software in molecular modeling
ChemRxiv · 2026-03-02
articleOpen access
The increasing importance and predictive power of modern molecular modeling, driven by physics- and machine learning-based methods, necessitates a new collaborative architecture to replace the isolated, traditional model of software development. The traditional approach often led to redundant engineering effort, high costs, and opaque systems that limit reproducibility, independent scrutiny, and scientific independence. Additionally, it results in taxpayer-funded research being left siloed in commercial tools where it cannot have as much impact as if it were returned to the general public. This perspective advocates for permissively licensed open source software as a scientific and economic multiplier by reducing the duplication of effort, enabling scientific validation of modeling tools, and frictionless experimentation with new ideas. Coordinated, multi-project consortia, such as Open Force Field, Open Free Energy, OpenFold, and OpenADMET have formed to collaboratively build shared computational infrastructure and release all methods under permissive licenses. The success of these large-scale efforts requires organizational structures that extend beyond code. The Open Molecular Software Foundation (OMSF), a US nonprofit, serves as a domain-specific institutional home and fiscal sponsor. By providing governance, administrative infrastructure, and dedicated research software engineers, OMSF aligns incentives across academic and industrial stakeholders. This framework enables a synergistic ecosystem where projects interoperate to accelerate innovation, eliminate duplication, and ensure long-term software sustainability, thereby creating durable foundations that elevate the entire molecular modeling community.
Publisher DOI
What does AlphaFold3 learn about antibody and nanobody docking, and what remains unsolved?
mAbs · 2025-08-14 · 21 citations
articleOpen accessSenior authorCorresponding
improves discriminative power for correctly docked antibody and nanobody complexes. However, AF3's 65% failure rate for antibody and nanobody docking (with single seed sampling) demonstrates a need to further improve antibody modeling tools.
Publisher OA PDF DOI
Evaluation of De Novo Deep Learning Models on the Protein-Sugar Interactome
bioRxiv (Cold Spring Harbor Laboratory) · 2025-09-06 · 2 citations
preprintOpen accessSenior authorCorresponding
Advances in deep learning have produced a range of models for predicting the protein-sugar interactome; however, structural docking of noncovalent protein-carbohydrate complexes remains largely unexplored. Although all-atom structure prediction models like AlphaFold3 (AF3), Boltz-1, Chai-1, DiffDock, and RosettaFold-All Atom (RFAA) were validated on protein-small molecule complexes, no benchmark or evaluation exists specifically for noncovalent protein-carbohydrate docking. To address this, we developed a high-quality dataset of experimental structures - Benchmark of CArbohydrate Protein Interactions (BCAPIN). Using BCAPIN and a novel evaluation metric, DockQC, we assessed the performance of all-atom structure prediction models on non-covalent protein-carbohydrate docking. We found all methods achieved comparable results, with an 85% success rate for structures of at least acceptable quality. However, we found that the predictive power of all models declined with increasing carbohydrate polymer length. With the capabilities and limitations assessed, we evaluated AF3's ability to predict binding for a set of putative human carbohydrate binding and carbohydrate non-binding proteins. While current models show promise, further development is needed to enable high-confidence, high-throughput prediction of the complete protein-sugar interactome.
Publisher OA PDF DOI
Author response: Reliable protein–protein docking with AlphaFold, Rosetta, and replica exchange
2025-05-27
peer-reviewOpen accessSenior author
Publisher DOI
Can We Extract Physics-like Energies from Generative Protein Diffusion Models?
bioRxiv (Cold Spring Harbor Laboratory) · 2025-11-29 · 2 citations
preprintOpen accessSenior authorCorresponding
Diffusion models have emerged as the state-of-the-art method in generative artificial intelligence (AI) and have shown great success in image synthesis, video generation, molecular design, and protein structure prediction. For biophysical problems, such as protein folding and association, a fundamental question in diffusion-based methods is how their learned functions correspond to thermodynamics. In this paper, we study diffusion models through the lens of theoretical biophysics, analyzing their underlying formulation of potentials and exploring their applications in scoring protein interactions. We develop simple theories rooted in statistical physics that relate thermodynamic potentials to the negative log of the probability of observing a system in a particular state. We include dimensional analysis of diffusion model equations and a table mapping AI and physics jargon. We then test a diffusion model’s ability to capture learned energies as negative log-likelihood values, − log p 0 ( x 0 ), by integrating over the diffusion-generated path or a probability flow path. We test these integrals on a simple 1D Gaussian mixture diffusion model and a protein-docking diffusion model, DFMDock. In the 1D case, we find that integration over both diffusion and flow paths can accurately recover ground truth probabilities. When we extract the learned docking energies for cases where DFMDock succeeds, we observe energy funnels with the minimum energy near the experimental docked structure, like those we observe with Rosetta, an empirically tuned physics-based biomolecular modeling suite. The learned energy performs comparably or outperforms Rosetta interface energy in 6 out of 25 cases at ranking the correctness of docked poses. These data show that we can extract a relevant learned energy function from a diffusion model and compare it to physical energy functions.
Publisher DOI
Responsible Biodesign Workshop: AI, Protein Design, and the Biosecurity Landscape – Recommended Actions
2025-06-04
preprintOpen access
This report presents Recommended Actions from the January 2025 Responsible Biodesign Workshop, which convened leading experts across AI-enabled biomolecular design and biosecurity policy. Building on existing community commitments for the Responsible Development of AI for Protein Design, the Recommended Actions aim to guide scientists, policy practitioners, and funding bodies in ensuring safe and beneficial development of AI-enabled biomolecular design tools. The Recommended Actions focus on advancing AI-Resilient nucleic acid synthesis security screening, assessing the risk-benefit landscape of biomolecular design capabilities, and building fora for sustained engagement between scientists and policy practitioners.
Publisher OA PDF DOI

Recent grants

Directed Biomineralization: Designing Peptides to Control Crystal Nucleation and Growth
NSF · $360k · 2015–2019
EAGER: A New Model for Undergraduate Training: a Virtual Community of Researchers in Computational Biomolecular Structure & Design
NSF · $224k · 2015–2017
Technologies to predict and probe glycosyl transfer
NIH · $1.2M · 2018–2022
Prediction of the Structures of Protein Complexes
NIH · $3.9M · 2021–2027
REU Site: A Cyberlinked Program in Computational Biomolecular Structure & Design
NSF · $427k · 2017–2020

Frequent coauthors

F Dobson
Vanderbilt University
211 shared
Acacia Grove
211 shared
Leicestershire Museums
Natural History Museum
211 shared
P Crittenden
211 shared
M. R. D. Seaward
211 shared
J. W. Sheard
University of Saskatchewan
186 shared
O. W. Purvis
University of Exeter
161 shared
New Mills
University of Bradford
136 shared

Education

Ph.D., Chemical Engineering
University of Texas at Austin
2000
B.S.E., Chemical Engineering
University of Michigan
1994

Awards & honors

Johns Hopkins University Provost’s Discovery Award
National Institutes of Health (NIH) K01 Mentored Quantitativ…
NSF Career Award
Beckman Young Investigator Award
College of Fellows, American Institute for Medical and Biolo…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Jeffrey Gray

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you