
Yingkai Zhang
· Professor of ChemistryVerifiedNew York University · Chemistry
Active 1993–2026
About
Yingkai Zhang is a Professor of Chemistry at New York University. He holds a B.S. in Chemistry from Nanjing University and a Ph.D. in Chemistry from Duke University. His postdoctoral research was conducted at the Howard Hughes Medical Institute, University of California at San Diego. His research interests include computational chemical biology, integrated molecular modeling, machine learning, rational modulator design to target protein-protein interactions, and computer simulations of biomolecular systems. Zhang has received several honors, including the Whitehead Fellowship for Junior Faculty in Biomedical and Biological Sciences, the James D. Watson Young Investigator Award, and the National Science Foundation Career Award. His work focuses on advancing the understanding of molecular interactions and drug design through innovative computational approaches.
Research topics
- Biology
- Computer Science
- Machine Learning
- Medicine
- Artificial Intelligence
- Bioinformatics
- Pathology
- Neuroscience
- Biochemistry
- Chemistry
- Computational biology
- Biophysics
- Cell biology
- Cancer research
Selected publications
Fine-Tuning DiffDock-L for Allosteric Kinase Docking
Journal of Chemical Information and Modeling · 2026-03-04
articleOpen accessSenior authorCorrespondingAllosteric kinase inhibitors are an important modality for overcoming resistance and achieving selectivity, yet most structure-based docking and deep generative models are trained predominantly on orthosteric protein-ligand complexes. As a result, current methods often misplace allosteric kinase ligands into the adenosine triphosphate (ATP)-binding site and fail to recover the correct binding mode. Here we curate AlloSet, a kinome-wide, time-split data set of kinase-ligand complexes annotated by binding mode, to systematically evaluate and fine-tune the diffusion-based docking model DiffDock-L for allosteric pose prediction. We explore several fine-tuning strategies, including increased dropout, freezing of torsion parameters with translation/rotation-only fine-tuning, and molecular dynamics-based supersampling of receptor conformations and ligand poses. The resulting DiffDock-L-Allo model is found to markedly improve pose-recovery metrics for Type III/IV allosteric binders while preserving the performance on ATP-site ligands. Binding-mode-resolved evaluations and comparisons with cofolding models such as AlphaFold3 and Boltz-2 highlight how targeted retraining reshapes the generative model's sampling distribution, offering practical guidance for adapting AI-driven docking to challenging, low-data binding modes in kinase structure-based drug design.
ProMol_Func: A Structure-Free Deep Learning Model for Virtual Screening
JACS Au · 2026-02-24
articleOpen accessSenior authorCorrespondingIn computational-aided drug discovery, structure-based drug design models are computationally intensive and rely on protein structures, limiting their scalability and generalization. Additionally, many existing models suffer from inflated false-positive rates due to the scarcity of negative binding data for training. To overcome these challenges, we present ProMol_Func, a structure-free deep learning framework that integrates graph-based encodings of small molecules with protein function embeddings derived solely from amino acid sequences. By augmenting the training data set with both experimentally validated inactives and randomly selected decoys, ProMol_Func improves screening power and generalization. The model achieves state-of-the-art performance on the challenging LIT-PCBA (Library of Integrated Targeted-Panel of Cell-Based Assays) benchmark, with an enrichment factor (EF1%) of 10.9, demonstrating robust screening power in realistic assay settings. Furthermore, in a zero-shot prospective application to E. coli DnaK, a protein chaperone without actives in the training set, ProMol_Func successfully identified compounds that inhibit its ATPase activity or alter the protein’s thermal stability, validating the potential of ProMol_Func for discovering binders toward novel targets. These results position ProMol_Func as an efficient and scalable alternative to traditional structure-dependent approaches in early stage hit discovery.
ACS Chemical Biology · 2026-05-01
articleOpen accessSenior authorpockets in future optimization efforts. Together, these results demonstrate the potential for target-specific deep learning approaches to guide the rapid screening and discovery of new inhibitor leads or drug scaffolds.
Zenodo (CERN European Organization for Nuclear Research) · 2026-03-21
datasetOpen accessSenior authorThis repository contains the data associated with: "End-to-end Molecular Structure Elucidation from Multimodal NMR Spectra Images using Vision Transformers".
Zenodo (CERN European Organization for Nuclear Research) · 2026-03-21
datasetOpen accessSenior authorThis repository contains the data associated with: "End-to-end Molecular Structure Elucidation from Multimodal NMR Spectra Images using Vision Transformers".
Correction: Structure-based design of an aromatic helical foldamer–protein interface
Chemical Science · 2026-01-01
articleOpen accessCorrection for ‘Structure-based design of an aromatic helical foldamer–protein interface’ by Lingfei Wang et al. , Chem. Sci. , 2025, 16 , 12385–12396, https://doi.org/10.1039/D5SC01826A.
Bioactivity Deep Learning for Complex Structure-Free Compound-Protein Interaction Prediction
Journal of Chemical Information and Modeling · 2025-09-16 · 6 citations
articleOpen accessSenior authorCorrespondingProtein–ligand binding affinity assessment plays a pivotal role in virtual drug screening, yet conventional data-driven approaches rely heavily on limited protein–ligand crystal structures. Structure-free compound-protein interaction (CPI) methods have emerged as competitive alternatives, leveraging extensive bioactivity data to serve as more robust scoring functions. However, these methods often overlook two critical challenges that affect data efficiency and modeling accuracy: the heterogeneity of bioactivity data due to differences in bioassay measurements and the presence of activity cliffs (ACs)─small chemical modifications that lead to significant changes in bioactivity, which have not been thoroughly investigated in CPI modeling. To address these challenges, we present CPI2M, a large-scale CPI benchmark data set containing approximately 2 million bioactivity data points across four activity types (Ki, Kd, EC50, and IC50) with AC annotations. Moreover, we developed GGAP-CPI, a complex structure-free deep learning model trained by integrated bioactivity learning and designed to mitigate the impact of ACs on CPI prediction through advanced protein representation modeling. Our comprehensive evaluation demonstrates that GGAP-CPI outperforms 12 target-specific and 7 general CPI baselines across 4 scenarios (general CPI prediction, rare protein prediction, transfer learning, and virtual screening) on 7 benchmarks (CPI2M, MoleculeACE, CASF-2016, MerckFEP, DUD-E, DEKOIS-v2, and LIT-PCBA). Furthermore, GGAP-CPI is able to not only deliver stable bioactivity predictions but also measure prediction uncertainty and enrich binding pocket residues and interactions, underscoring its applicability to real-world bioactivity assessments and virtual drug screening.
Can Deep Learning Blind Docking Methods be Used to Predict Allosteric Compounds?
Journal of Chemical Information and Modeling · 2025-04-01 · 8 citations
articleOpen accessSenior authorCorrespondingAllosteric compounds offer an alternative mode of inhibition to orthosteric compounds with opportunities for selectivity and noncompetition. Structure-based drug design (SBDD) of allosteric compounds introduces complications compared to their orthosteric counterparts; multiple binding sites of interest are considered, and often allosteric binding is only observed in particular protein conformations. Blind docking methods show potential in virtual screening allosteric ligands, and deep learning methods, such as DiffDock, achieve state-of-the-art performance on protein-ligand complex prediction benchmarks compared to traditional docking methods such as Vina and Lin_F9. To this aim, we explore the utility of a data-driven platform called the minimum distance matrix representation (MDMR) to retrospectively predict recently discovered allosteric inhibitors complexed with Cyclin-Dependent Kinase (CDK) 2. In contrast to other protein complex representations, it uses the minimum residue-residue (or residue-ligand) distance as a feature that prioritizes the formation of interactions. Analysis of this representation highlights the variety of protein conformations and ligand binding modes, and we identify an intermediate protein conformation that other heuristic-based kinase conformation classification methods do not distinguish. Next, we design self- and cross-docking benchmarks to assess whether docking methods can predict both orthosteric and allosteric binding modes and if prospective success is conditional on the selection of the protein receptor conformation, respectively. We find that a combined method, DiffDock followed by Lin_F9 Local Re-Docking (DiffDock + LRD), can predict both orthosteric and allosteric binding modes, and the intermediate conformation must be selected to predict the allosteric pose. In summary, this work highlights the value of a data-driven method to explore protein conformations and ligand binding modes and outlines the challenges of SBDD of allosteric compounds.
Topological deep learning for enhancing peptide-protein complex prediction
Communications Chemistry · 2025-11-12
articleOpen accessSenior authorPeptide-protein interactions are essential to biological processes and drug discovery, but selecting high-quality models from predicted complexes remains challenging due to high false positive rates (FPR). Here we introduce TopoDockQ, a topological deep learning model leveraging persistent combinatorial Laplacian (PCL) features to predict DockQ scores (p-DockQ) for accurately evaluating peptide-protein interface quality, aimed at enhancing precision and mitigating FPR in model selection. Compared to AlphaFold2's built-in confidence score, TopoDockQ reduces false positives by at least 42% and increases precision by 6.7% across five evaluation datasets filtered to ≤70% peptide-protein sequence identity, while maintaining relatively high recall and F1 scores. To support flexible peptide design, we introduce ResidueX, a workflow incorporating non-canonical amino acids (ncAA) into peptide scaffolds. Together, TopoDockQ and ResidueX advance peptide-protein modeling by refining confidence scoring and supporting ncAA incorporation, enabling precise, customizable design and accelerating next-generation peptide therapeutics development.
Unaligned RGB Guided Hyperspectral Image Super-Resolution with Spatial-Spectral Concordance
International Journal of Computer Vision · 2025-06-17 · 3 citations
articleOpen access1st authorCorresponding
Recent grants
Computational modulator design and machine learning to target protein-protein interactions
NIH · $4.6M · 2018–2028
Computational Studies of Histone Modifications
NIH · $2.4M · 2007–2018
Computational inhibitor design to target protein-protein interactions
NIH · $1.6M · 2016–2021
NIH · $395k · 2014
CAREER: Theoretical Investigation of Metalloenzymes
NSF · $500k · 2005–2010
Frequent coauthors
- 48 shared
Saba Mottaghinia
Centre International de Recherche en Infectiologie
- 40 shared
Lucie Etienne
Centre International de Recherche en Infectiologie
- 34 shared
Shenglong Wang
Chinese Academy of Sciences
- 34 shared
Xuhang Dai
New York University
- 34 shared
Angel D′Oliviera
University of Delaware
- 34 shared
Jeffrey S. Mugridge
University of Delaware
- 28 shared
Weitao Yang
Duke University
- 26 shared
Xuben Hou
Shandong University
Awards & honors
- Whitehead Fellowship for Junior Faculty in Biomedical and Bi…
- James D. Watson Young Investigator Award (2005)
- National Science Foundation Career Award (2005)
- Fellow of the American Chemical Society
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Yingkai Zhang
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup