
Alexander Tropsha
· KH Lee Distinguished Professor and Associate DeanVerifiedUniversity of North Carolina at Chapel Hill · Toxicology
Active 1991–2025
About
Alexander Tropsha is the KH Lee Distinguished Professor and Associate Dean at the University of North Carolina Eshelman School of Pharmacy. His major research area is Biomolecular Informatics, which involves understanding the relationships between molecular structures—both organic and macromolecular—and their properties, such as activity or function. He focuses on building validated and predictive quantitative models that relate molecular structure to biological function, utilizing statistical and machine learning approaches. These models are exploited to make verifiable predictions about the putative functions of untested molecules.
Research signals
Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.
Research topics
- Computer Science
- Machine Learning
- Medicine
- Pharmacology
- Bioinformatics
- Virology
- Data science
- Biology
Selected publications
Protein–ligand data at scale to support machine learning
Nature Reviews Chemistry · 2025-07-23 · 10 citations
reviewActivity prediction and identification of mis‐annotated chemical compounds using extreme descriptors
UNC Libraries · 2025-10-25
articleOpen access1st authorCorrespondingData pre‐processing that includes removal of descriptors with low variance is a standard first step in quantitative structure–activity relationship modeling. In this paper, we study low‐variance descriptors and show that some of them contain significant amounts of useful information. In particular, we define the notion of extreme descriptors (those variables that have the same value for almost all compounds and only a few values that are different from the common median). We show that extreme descriptors can be helpful for activity prediction in a standard binary classification setting. Moreover, we demonstrate using two case studies ( M 2 muscarinic receptors and skin sensitization) that extreme descriptors can be used for the identification of possibly mislabeled compounds. Because of these previously unknown, but important, properties, extreme descriptors should be considered in quantitative structure–activity relationship modeling studies. Copyright © 2016 John Wiley & Sons, Ltd. In this paper authors explore low‐variance (extreme) descriptors and show that some of them contain significant amount of useful information. Furthermore, authors demonstrate that extreme descriptors can be helpful for activity prediction in a standard binary classification setting and can be used for the identification of possibly mislabeled compounds
Challenges of broad-spectrum antiviral drug discovery and development for emerging pathogens
Drug Discovery Today · 2025-09-25
reviewSenior authorCorrespondingConserved Filovirus Proteins as Targets of Broad-Spectrum Antivirals
bioRxiv (Cold Spring Harbor Laboratory) · 2025-09-28 · 1 citations
preprintOpen accessAbstract Filoviruses are enveloped, non-segmented, negative-strand RNA viruses belonging to the Filoviridae family, which includes five genera: Ebolavirus , Marburgvirus , Cuevavirus, Striavirus , and Thamnovirus . Members of this family cause severe and, often, fatal hemorrhagic fevers in humans and non-human primates, with high mortality rates. To date, only two filoviruses, Ebola virus (EBOV) and Marburg virus (MARV), are known to infect humans and are listed as priority pathogens by the World Health Organization due to their potential for re-emergence and the current lack of effective vaccines and antiviral treatments. In this study, we identify and characterize conserved binding sites within key filoviral proteins to support the development of broad-spectrum, direct-acting antiviral agents. We validated the significance of these conserved regions for drug discovery using existing experimental data. Our analysis revealed notably high sequence similarity among proteins from filoviruses capable of infecting humans (EBOV, TAFV, BDBV, SUDV, MARV, and RAVV) compared to those from non-zoonotic species, with the highest conservation observed in the L and VP40 proteins—both critical for viral genome transcription and replication. Furthermore, we compiled and analyzed available experimental data on known antiviral compounds targeting these proteins, identifying several agents with cross-filovirus activity, including Galidesivir, Remdesivir, and Favipiravir. The integrated approach described here—combining sequence and structural conservation analysis with chemical structure and antiviral activity data—demonstrates a strategy that could be extended to the development of broad-spectrum therapeutics across multiple viral families. HIGHLIGHTS Conserved filovirus sites targeted for broad-spectrum antivirals. Structural modeling identifies key antiviral binding sites. Viral internal proteins are crucial targets for inhibition. Remdesivir validates conserved polymerase as a druggable target. Study highlights need for pan-filovirus drug screening TOC GRAPHIC
UNC Libraries · 2025-09-05
articleOpen accessA study by (Luechtefeld et al., 2018) described the development of a suite of in silico models, termed read-across structure activity relationships (RASAR), that have “balanced accuracies in the 80%–95% range across 9 health hazards with no constraints on tested compounds.” This work can be considered groundbreaking for apparently exceeding the most optimistic expectations for quantitative structure-activity relationship (QSAR) modeling accuracy, especially without restrictions on model applicability domains. Predictive in silico models have been facilitating replacement and reduction of animal testing in toxicology (Wold et al., 1985); however, it is also recognized that “these methods are not always reliable and must be assessed on their individual merit for the compound and context in question” (Cronin et al., 2017). It is widely acknowledged that QSAR and other in silico models should be subject to rigorous testing and validation (Dearden et al., 2009; Fourches et al., 2010, 2015; OECD, 2014; Tropsha, 2010). Thus, we were curious to understand what technological advances have enabled the RASAR models to achieve accuracy that, for the first time in the history of QSAR, was “outperforming animal tests reproducibility” (Luechtefeld et al., 2018).
Proteins Structure Function and Bioinformatics · 2025-10-20 · 12 citations
articleOpen accessIn the CASP16 experiment, our team employed hybrid computational strategies to predict both protein-protein and protein-ligand complex structures. For protein-protein docking, we combined physics-based sampling-using ClusPro FFT docking and molecular dynamics-with AlphaFold (AF)-based sampling, followed by AF-based refinement. Our method produced numerous high-accuracy complex models, including cases where AF alone failed, underscoring the critical role of physics-based sampling alongside deep learning-based refinement. For protein-ligand docking, we integrated the ClusPro LigTBM template-based approach with a machine learning-based confidence model for rescoring. The method preserves conserved interaction fragments derived from homologous complexes, followed by local resampling using physics-based sampling and a diffusion model. Our template-based strategy achieved a mean lDDT-PLI of 0.69 across 233 targets, which was highly competitive. These results demonstrate that combining physics-based modeling with AI-driven refinement can significantly enhance the accuracy of both protein-protein and protein-ligand structure predictions.
In silico Drug Discovery: Bridging the Gaps in Preclinical Translation
Drug Discovery Today · 2025-12-03 · 2 citations
articleOpen accessUNC Libraries · 2025-06-25
articleOpen accessMachine Learning Models and a Web Portal for Predicting Cytochrome P450 Activity
ChemRxiv · 2025-10-15
articleCytochrome P450 (CYP) family of enzymes plays an integral role in drug metabolism and excretion. This application note describes the development of a novel computational CYP profiler (CYP-Pro) as a drug development tool. To enable new model development, we integrated and curated the largest, to the best of our knowledge, dataset comprising 26587 entries, including both inhibitors and substrates of CYP2D6, CYP3A4, and CYP2C9. We have built and externally validated Quantitative Structure-Activity Relationship (QSAR) models that can accurately predict whether molecules of interest are expected to be inhibitors or substrates. The models were assessed mainly by Positive Predictive Value (PPV), which ranged between 0.14 and 0.92. CYP-Pro showed the highest accuracy in predicting compounds selectively metabolized by CYP3A4 alone or by CYP2D6 and CYP2C9 without CYP3A4 involvement. All models are incorporated into the previously developed PhaKinPro portal (https://phakinpro.mml.unc.edu). CYP-Pro is unique in that it provides separate models for predicting CYP inhibitors vs. substrates, prioritizes high positive predictive value (PPV) as a pragmatic metric of accuracy to support the experimental testing of a small number of predicted substrates and inhibitors, enhances interpretability with fragment maps, and ensures reliability through strict applicability domain control. We expect that this new tool will aid researchers in early identification of compounds with favorable metabolic profiles, reducing the risks of drug-drug interactions and improving the efficiency of drug development efforts.
Nucleic Acids Research · 2025-12-12
articleOpen accessSenior authorDrug databases typically aim to provide reference information on medications and their uses but often lack strict definitions of the terms drug (e.g. approved or a clinical candidate) or disease, and do not focus on any specific context of use. The recent emergence of biomedical knowledge graphs, which integrate diverse biomedical data into a contiguous, harmonized knowledge network, has enabled innovation in drug repurposing (identification of novel uses of existing drugs). This objective has created a new set of requirements and challenges for drug databases to be used for generating high-confidence, testable drug repurposing hypotheses. To address this challenge, we have developed MeDIC as an open, foundational database built from government regulatory sources only, which comprises highly curated lists of drugs (including combination therapies), diseases, indications (i.e. drug approvals to treat specific diseases), contraindications, and additional metadata. MeDIC allows for easy maintainability, open-source adaptability, and ongoing updates concordant with updates of primary sources. To facilitate downstream use, MeDIC is provided in a tabulated format, and each drug, disease, indication, or contraindication entry is mapped to multiple ontologies. We offer MeDIC as a web-based, freely accessible (https://medic.renci.org), downloadable (including lists and source code), searchable, and machine learning-friendly resource for patients, providers, and researchers.
Recent grants
NIH · $3.0M · 2013
ARAGORN: Autonomous Relay Agent for Generation Of Ranked Networks
NIH · $4.7M · 2020–2024
Drug Repurposing for Cancer Therapy: From Man to Molecules to Man
NIH · $1.2M · 2016–2020
NSF · $869k · 2012–2016
NIH · $856k · 2016
Frequent coauthors
- 227 shared
Eugene Muratov
- 129 shared
Denis Fourches
North Carolina State University
- 101 shared
Vinícius M. Alves
- 84 shared
Alexander Golbraikh
University of North Carolina at Chapel Hill
- 64 shared
Igor V. Tetko
- 58 shared
Stephen J. Capuzzi
University of North Carolina at Chapel Hill
- 51 shared
Alexandre Varnek
Centre National de la Recherche Scientifique
- 49 shared
Hao Zhu
Obstetrics and Gynecology Hospital of Fudan University
Education
- 1993
Ph.D., Toxicology
University of North Carolina at Chapel Hill
- 1989
M.S., Toxicology
University of North Carolina at Chapel Hill
- 1984
B.S., Chemistry
University of Belgrade
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Alexander Tropsha
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup