Alberto Perez

· William R. Kenan, Jr. Term Associate Professor & Associate Director Quantum Theory ProjectVerified

University of Florida · Chemistry

Active 1994–2026

h-index59

Citations14.7k

Papers18867 last 5y

Funding$650k1 active

Faculty page

See your match with Alberto Perez — sign in to PhdFit.Sign in

About

Alberto Perez is William R. Kenan, Jr. Term Associate Professor and Associate Director at the Quantum Theory Project within the Department of Chemistry at the University of Florida. His research focuses on developing and applying computational chemistry tools to understand how biomolecules work and their role in disease. Using a combination of AI and physics-based methods, he is building drug discovery pipelines that target peptides and DNA aptamers. Perez is also dedicated to scientific communication, engaging in VR and AI–driven projects to collaborate with high school students and help them experience research firsthand. His educational background includes a PhD in Computational and Theoretical Chemistry from the University of Barcelona, postdoctoral fellowships at USCF, Stony Brook University, IRBB, and Barcelona Supercomputer Center, and a B.S. in Chemistry from the University of Barcelona. He has received several awards, including the NSF CAREER Award in 2022, the ACS OpenEye Cadence Outstanding Junior Faculty Award in 2023, and the Japanese Society for the Promotion of Science Long Term Fellow in 2023. Perez actively participates in organizing scientific symposia and serves on editorial boards, contributing to the advancement of computational chemistry and biophysics.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Computer Science
Computational biology
Biology
Data Mining
Chemistry
Biochemistry
Data science
Combinatorial chemistry
Bioinformatics
Physics
Engineering
Computational chemistry
Mathematics
Algorithm

Selected publications

Efficient exploration of peptide libraries using active learning with AlphaFold-based screening
bioRxiv (Cold Spring Harbor Laboratory) · 2026-04-18
articleOpen accessSenior author
We previously showed that AlphaFold2 can be used to screen for peptide-binding epitopes targeting the extraterminal (ET) domain of Bromodomain and Extraterminal (BET) proteins from candidate protein partners identified in pull-down experiments. However, such approaches require large numbers of AlphaFold2 calculations, making exhaustive screening impractical for larger datasets, such as viral proteomes that may target the ET domain. In many cases, identifying a substantial fraction of binders-even without exhaustive coverage-would already provide valuable biological insight into these interaction networks. Here, we show that an active learning strategy based on Thompson sampling (TS) can efficiently explore peptide sequence space. Using a library derived from BRD3 pull-down experiments, TS recovers 50% of all binders using 15% of the queries required by exhaustive sampling (3.3 times improvement over random sampling). Moreover, TS consistently identifies experimentally known binding epitopes with substantially fewer queries. Because the approach relies only on binary labels, it is readily transferable to other protein-peptide systems where AF-based binding classification is applicable, as well as to peptide-property predictors for properties such as solubility or aggregation propensity.
Publisher OA PDF DOI
Beyond Classical Force Fields: Physics‐Driven Assessment of the Grappa Machine‐Learned Force Field on the FoldBind Dataset
ChemPhysChem · 2026-04-14
articleSenior authorCorresponding
Physics-based approaches rely on accurate force fields and efficient sampling to provide mechanistic insight into biomolecular systems. Recent AI-driven advances are transforming this landscape, replacing empirically fitted force fields-built from manually curated atom types-with machine-learned models. Achieving interoperability between these new force fields and established sampling strategies-and validating them on challenging benchmark sets that extend far beyond near-native states-is essential for progress in the field. To this end, we introduce the FoldBind benchmark set, a collection of 18 systems encompassing 14 protein-folding cases and 4 peptide-protein complexes that undergo folding upon binding. This suite expands existing validation efforts by probing both conformational transitions and binding-induced folding, offering a rigorous test for sampling methods and force-field accuracy alike. To explore these systems, we employ the Modeling Employing Limited Data (MELD) framework as the sampling engine. MELD accelerates conformational exploration by integrating ambiguous or noisy physical restraints-for example, the general expectation that proteins form hydrophobic cores-within a Bayesian inference formalism. By balancing exploration (broad conformational search) and exploitation (stabilization of structures consistent with physics and data), MELD efficiently accesses native-like states that are otherwise inaccessible to conventional molecular dynamics. Under identical data conditions, the quality of the force field determines which states are stabilized and whether the correct native basin emerges. Furthermore, the ability of a force field to consistently stabilize the native basin among multiple data-compatible states provides an additional measure of its physical realism. Together, this FoldBind benchmark, along with the information used in MELD, can be used to test and distinguish future force field development efforts.
Publisher DOI
The NMR Exchange Format (NEF): Specification and Applications
bioRxiv (Cold Spring Harbor Laboratory) · 2026-04-24
articleOpen access
The NMR Exchange Format (NEF) is a community-driven standard for representing NMR experimental data in a consistent, interoperable, and machine-readable form. Built on the STAR syntax, NEF provides a structured framework for storing and exchanging chemical shifts, peak lists, various types of structural restraints, and related metadata, thus allowing for data exchange across software platforms. By enabling direct, lossless transfer of information, NEF simplifies multi-software workflows, improves reproducibility, and supports FAIR (Findable, Accessible, Interoperable, Reusable) data principles. We describe the NEF specification, its current implementation across commonly used NMR software packages, and its application in areas including biomolecular structure determination, metabolomics, and ligand screening. Testing demonstrates that NEF can be used to exchange complete datasets between programs without loss of information or functionality. We also outline recent developments and future directions, such as inclusion of NMR relaxation data and support for non-standard residue topologies. NEFs growing adoption highlights its potential as a unifying standard for NMR data, enabling more efficient, transparent and collaborative research.
Publisher DOI
MELD in Action: Harnessing Data to Accelerate Molecular Dynamics
Journal of Chemical Information and Modeling · 2025-02-02 · 6 citations
reviewOpen accessSenior authorCorresponding
We review MELD, an accelerator of Molecular Dynamics simulations of biomolecules. MELD (Modeling Employing Limited Data) integrates molecular dynamics (MD) with a variety of types of structural information through Bayesian inference, generating ensembles of protein and DNA structures having proper Boltzmann populations. MELD minimizes the computational sampling of irrelevant regions of phase space by applying energetic penalties to areas that conflict with the available data. MELD is effective in refining protein structures using NMR or cryo-EM data or predicting protein-ligand binding poses. As a plugin for OpenMM, MELD is interoperable with other enhanced sampling methods, offering a versatile tool for structural determination in computational chemistry and biophysics.
Publisher OA PDF DOI
BPS2025 - A novel computational pipeline to identifying novel binding epitopes to BET proteins
Biophysical Journal · 2025-02-01
articleSenior author
Publisher DOI
The need to implement FAIR principles in biomolecular simulations
Nature Methods · 2025-04-01 · 53 citations
articleOpen access
Publisher OA PDF DOI
Hierarchical Extended Linkage Method (HELM)’s Deep Dive into Hybrid Clustering Strategies
Journal of Chemical Information and Modeling · 2025-06-02 · 9 citations
article
Clustering remains a key tool in the analysis of molecular dynamics (MD) simulations, from the preparation of kinetic models to the study of mechanistic pathways and structural determination. It is no surprise then that multiple algorithms are currently used in the MD community, with k-means and hierarchical approaches being arguably the two most popular approaches. The former is very attractive from a purely computational point of view, demanding minimal memory and time resources, but at the price of being able to partition the data in very restrictive ways. Hierarchical strategies, on the other hand, can generate arbitrary partitions, but with steep memory and time requirements due to their need to build a pairwise distance matrix for all the considered conformations/frames. Here we propose a new hybrid paradigm, the hierarchical extended linkage method (HELM), that retains the efficiency of k-means while incorporating the flexibility of hierarchical methods. The key ingredient is the use of n-ary difference functions as a way to stabilize the k-means results and efficiently build the hierarchy of subsets. We showcase the applicability of this strategy over protein-DNA and protein folding studies, including the complete analysis of simulations with over 1.5 million frames. HELM is freely available in our MDANCE clustering package.
Publisher DOI
MDZip: Neural Compression of Molecular Dynamics Trajectories for Scalable Storage and Ensemble Reconstruction
The Journal of Physical Chemistry B · 2025-10-31
articleSenior authorCorresponding
The size of molecular dynamics (MD) trajectories remains a major obstacle for data sharing, long-term storage, and ensemble analysis at scale. Existing solutions often rely on frame subsampling or reduced atom representations, which limit the utility of shared data sets. Here, we present MDZip, a neural compression framework based on convolutional autoencoders trained per system to reconstruct atomic trajectories with high geometric fidelity from compact latent representations. MDZip achieves over 95% reduction in storage size across a diverse benchmark of proteins, protein-peptide complexes, and nucleic acids. Despite operating in a physics-agnostic manner, the reconstructed trajectories accurately preserve ensemble-level features, including RMSD fluctuations, pairwise distance distributions, radius of gyration, and projections onto principal and time-lagged independent components. A residual (skip-connected) autoencoder variant consistently improves reconstruction accuracy and reduces outliers. While local structural deviations can impair energetic fidelity, short energy minimization partially recovers physically reasonable conformations. This framework enables customizable compression-accuracy trade-offs and supports a modular workflow for sharing latent representations, decoder models, and reconstruction protocols. MDZip offers a scalable solution to current storage limitations, facilitating broader dissemination of MD data without sacrificing essential dynamical information.
Publisher DOI
BPS2025 - Exploring the binding selectivity of BET ET domains through advanced molecular dynamics sampling
Biophysical Journal · 2025-02-01
articleSenior author
Publisher DOI
Hybrid AI/physics pipeline for miniprotein binder prioritization: application to the BRD3 ET domain
Chemical Communications · 2025-01-01 · 2 citations
articleOpen accessSenior authorCorresponding
AI-based protein design can rapidly generate thousands of candidate binders, but most fail to fold or bind productively, creating a critical need for robust prioritization. We present a generalizable hybrid pipeline that integrates deep-learning design and physics-based simulations to filter large libraries down to a handful of high-confidence candidates.
Publisher OA PDF DOI

Recent grants

CAREER: Enhanced Sampling Methods to Characterize Nucleic Acid Structure, Recognition Mechanisms and Function
NSF · $650k · 2022–2027

Frequent coauthors

Modesto Orozco
Universitat de Barcelona
249 shared
Jiřı́ Šponer
Czech Academy of Sciences, Institute of Biophysics
82 shared
Richard Lavery
Institut de Biologie et de Chimie des Protéines
80 shared
Charles A. Laughton
University of Nottingham
62 shared
Thomas E. Cheatham
University of Utah
57 shared
Filip Lankaš
University of Chemistry and Technology, Prague
49 shared
Ken A. Dill
Stony Brook University
48 shared
K. Zakrzewska
University of Florence
48 shared

Education

Ph.D., Physical Chemistry
University of Barcelona
2008
Chemistry
University Of Barcelona
2002

Awards & honors

Japanese Society for the Promotion of Science (JSPS) Long Te…
ACS OpenEye Cadence Outstanding Junior Faculty Award / ACS C…
NSF CAREER Award (2022)
European Molecular Biology Organization (EMBO) postdoctoral…
Juan de la Cierva Postdoctoral fellowship (2009)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Alberto Perez

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you