
Connor W. Coley
· Class of 1957 Career Development Professor; Associate Professor of Chemical Engineering, Electrical Engineering and Computer ScienceMassachusetts Institute of Technology · Chemical Engineering
Active 2015–2026
About
Connor W. Coley is the Class of 1957 Career Development Professor and an Associate Professor of Chemical Engineering, Electrical Engineering, and Computer Science at MIT. His research focuses on chemical engineering, with particular emphasis on the development and application of machine learning and data-driven approaches to accelerate chemical discovery and process design. As a faculty member within the Department of Chemical Engineering, he contributes to advancing the understanding and innovation in chemical engineering through integrating computational methods with experimental research.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Chemistry
- Cognitive science
- Data science
- Information Retrieval
- Data Mining
- World Wide Web
- Database
- Bioinformatics
- Nanotechnology
- Psychology
- Biology
Selected publications
Hydrolysis Reaction Rate Prediction Using Machine Learning: WaterDRoP
Environmental Science & Technology · 2026-04-21
articleTo enable sustainable chemical design, there is a need for the capability to predict the degradation potential of proposed structures not yet produced and for which experimental data are unavailable. Hydrolysis is a key process impacting contaminant fate, especially in aqueous and biological systems. This work develops WaterDRoP (Water Degradation Rate of Pollutants), a machine learning model to predict the rate of hydrolysis from chemical structure in environmentally relevant settings (pH 7 and 25°C). The two-stage model classifies a compound as stable (half-life > 1 year) or unstable (half-life ≤ 1 year) and estimates the numeric half-life of unstable compounds. Each stage is a pretrained neural network fine-tuned using 808 experimental hydrolysis rates collected from reports and databases. WaterDRoP compares favorably to existing models for hydrolysis rate prediction (EPI Suite, Hydrolysis QSAR, QSAR Toolbox) in terms of applicability, stability classification (F1 score), and rate prediction of unstable compounds (RMSE, MAE, R2). Atom-level attribution scores obtained through Shapley Additive Explanations (SHAP) analysis, illustrating the substructures identified by the model as most relevant for anticipating hydrolysis, were compared against proposed hydrolysis mechanisms from the literature. This in silico hydrolysis rate estimation tool and curated training data set are made openly available.
ChemRxiv · 2026-05-18
articleOpen accessHigh-fidelity quantum chemical (QM) datasets that jointly resolve reaction thermochemistry, kinetics, and solvation at scale remain scarce, especially for radical chemistry. We introduce QuantumPioneer, an open-access reaction-centered QM database and workflow for small organic molecules, focused on peroxyl-mediated hydrogen atom transfer (HAT) and the corresponding homolytic bond dissociation reactions. QuantumPioneer contains 348,258 species (2–21 heavy atoms), 167,237 validated HAT transition states (TS) with corresponding reaction energies and homolytic bond dissociation energies (BDEs), and over 100 million COSMO-RS solvation free energies (∆Go solv) and enthalpies (∆Ho solv) across 295 solvents. The workflow uses ωB97X-D/def2-SVP geometries, DLPNO-CCSD(T)-F12d/def2-TZVP single-point energies, empirical thermochemical corrections, transition-state theory, and COSMORS BP-TZVPD-FINE solvation in a single high-throughput pipeline. Our benchmarks show reliable accuracy, with mean absolute errors compared to experimental data of 0.82 kcal/mol for gas-phase enthalpies of formation, 1.60 kcal/mol for C–H BDEs, 1.45 kcal/mol for HAT barriers, and 0.57 kcal/mol for ∆Go solv values. We demonstrate two predictive applications. First, we show that a combined BDE and HAT-barrier model identifies experimentally observed oxidative degradation sites in drug-like molecules with a 91% top-5 hit rate and 80% site-level recall. Second, a QM-parameterized Abraham model enables rapid solvation energy estimates at near-COSMO-RS accuracy within its training domain, reproducing ∆Go solv and ∆Ho solv
Computational design of functional random heteropolymers through atomistic simulations
PLoS ONE · 2026-03-18
articleOpen accessRandom heteropolymers (RHPs) are emerging single-chain nanoparticles with great potential in protein mimicry, yet a systematic understanding of how chemical composition and monomer structures govern their structure, dynamics, and hydration remains limited. Using atomistic molecular dynamics simulations, we examine how various design parameters, including chain length, backbone architecture, charged monomer concentration, chain-level composition, and side-chain micropolarity influence RHP assembly and hydration behavior. As chain length increases, methacrylate-based RHPs transition from rod-like to random-walk statistics and ultimately collapse into compact globules stabilized by hydrophobic collapse and methacrylate-poly(ethylene glycol) (PEG) interactions. Positively charged monomers follow the Hofmeister series in their hydration. Interestingly, the dimerization results from hydrophobic and PEG-positively charged-monomer interactions, and not from opposite charge interactions. Alternative backbones such as acrylate and (meth)acrylamide display sequence-dependent compactness and dynamics, reflecting greater chemical sensitivity. PEG side-chain length strongly affects solubility and hydration, with shorter side chains making the overall chain more hydrophobic. Also, we show that branching-induced micropolarity modulates local hydration patterns of hydrophobic residues. Overall, these results establish general molecular design principles for tuning the assembly and dynamics of RHPs through compositional and chemical control, providing a foundation for engineering synthetic polymers that mimic the compactness, hydration, and functional adaptability of proteins.
Macromolecules · 2026-02-05 · 1 citations
articleSynthetic random heteropolymers (RHPs) offer a versatile platform for mimicking protein-like functions through their sequence and structure ensembles, providing a cost-effective and scalable alternative to natural proteins. Unlike the well-studied energy landscapes of protein folding, the energy landscape of RHP folding, or more generally, collapse, remains largely unexplored. Here, we investigate the energy landscape and structural stability of a recently emergent class of methyl methacrylate-based RHPs. By conducting microsecond-scale atomistic molecular dynamics simulations with umbrella sampling, we propose a hierarchically rugged free energy landscape characterized by high energy barriers separating broad minima with internally rugged basins that permit local structural fluctuations. Identical local sequences are found to be able to adopt diverse conformations. Using XGBoost and SHAP analysis, we identify key contact patterns critical for structural stability. These include specific residue–residue contacts reminiscent of those observed in protein folding, and position-nonspecific interactions, such as contacts between backbone and polar or hydrophobic side groups, which are related to monomer miscibility. This latter relationship resembles the design rules in plastics. Moreover, the inherent diversity of microenvironments in RHPs highlights their potential to incorporate functional ligands, enabling versatile applications such as catalysis. This work elucidates both the similarities and differences among RHPs, proteins, and plastics, providing fundamental insight into the collapse free energy landscape, structural stability, and functional adaptability of RHPs.
A geometric foundation model for enzyme retrieval with evolutionary insights
Nature Catalysis · 2026-02-12 · 2 citations
articleGenerative AI for navigating synthesizable chemical space
Proceedings of the National Academy of Sciences · 2025-10-06 · 14 citations
articleOpen accessSenior authorCorrespondingWe introduce SynFormer, a generative modeling framework designed to efficiently explore and navigate synthesizable chemical space. Unlike traditional molecular generation approaches, we generate synthetic pathways for molecules to ensure that designs are synthetically tractable. By incorporating a scalable transformer architecture and a diffusion module for building block selection, SynFormer surpasses existing models in synthesizable molecular design. We demonstrate SynFormer's effectiveness in two key applications: 1) local chemical space exploration, where the model generates synthesizable analogs of a query molecule, and 2) global chemical space exploration, where the model aims to identify optimal molecules according to a black-box property prediction oracle. Additionally, we demonstrate the scalability of our approach via the improvement in performance as more computational resources become available. With our code and trained models openly available, we hope that SynFormer will find use across applications in drug discovery and materials science.
Matter · 2025-07-28 · 7 citations
articleOpen accessSenior authorRandom Heteropolymers Enable Nonspecific Protein Binding and Loop-Mediated Stabilization
ACS Nano · 2025-11-10 · 1 citations
articleMembrane proteins play essential roles in cellular signaling, transport, and catalysis, but their structural instability outside of lipid bilayers presents a major challenge for biophysical studies and therapeutic applications. Here, we demonstrate that methacrylate-based random heteropolymers (MMA-based RHPs) can stabilize the β-barrel membrane protein, OmpLA in aqueous environments, without the need for lipids or detergents. Using large-scale atomistic molecular dynamics simulations, we investigate how RHP composition, binding orientation, and contact geometry affect protein stability. We find that RHPs preferentially bind to the lateral β-sheet surfaces of the OmpLA while avoiding direct binding to the top and bottom loop regions. Despite this, RHPs enable contacts to the loops to loop via lateral binding due to their comparable size and spatial reach. Among various factors, loop-mediated stabilization emerges as the dominant mechanism: increased RHP contact with flexible loop regions reduces local fluctuations and correlates with enhanced global structural integrity. This effect is prominent for MMA-based RHPs, which present a chemically heterogeneous, patchy binding interface, unlike core-shell architectures formed by other backbones. Our findings reveal a nonspecific yet effective way of protein stabilization driven by loop-targeting interactions, offering design principles for polymer-based chaperonin mimetics to stabilize membrane proteins in abiotic environments.
Electron flow matching for generative reaction mechanism prediction
Nature · 2025-08-20 · 13 citations
articleSenior authorCorrespondingAnomeric Selectivity of Glycosylations through a Machine Learning Lens
Journal of the American Chemical Society · 2025-09-25 · 4 citations
articleSenior authorCorrespondingPredicting the stereoselectivity of glycosylations is a major challenge in carbohydrate chemistry. Herein we show that it is possible to build machine learning models that can predict the major anomer of a glycosylation, whether the other anomer is observed as the minor product, and the anomeric ratio of the two anomers. The three models are integrated into a publicly available tool, GlycoPredictor. From a statistical analysis of literature data, we analyze glycosylation trends and compare them to known trends in the field of carbohydrate chemistry, making it possible to elucidate a hierarchy of rules governing the stereoselectivity of glycosylations and discover promising new trends that complement expert intuition, which are tested in novel glycosylation methods.
Recent grants
NSF · $650k · 2022–2027
NIH · $1.1M · 2022–2024
Frequent coauthors
- 71 shared
Klavs F. Jensen
Massachusetts Institute of Technology
- 38 shared
Regina Barzilay
- 33 shared
William H. Green
Massachusetts Institute of Technology
- 24 shared
Tommi Jaakkola
- 24 shared
Thijs Stuyver
Chimie ParisTech
- 21 shared
Wenhao Gao
- 21 shared
David Graff
Harvard University
- 19 shared
Natalie S. Eyke
Massachusetts Institute of Technology
Labs
MIT ChemEPI
Education
- 2010
Ph.D., Chemical Engineering
Massachusetts Institute of Technology
- 2006
M.S., Chemical Engineering
Massachusetts Institute of Technology
- 2004
B.S., Chemical Engineering
University of California, Berkeley
Awards & honors
- James W. Swan Outstanding Faculty Award (2026)
- Selected to Participate, Grainger Foundation Frontiers of En…
- James W. Swan Outstanding Faculty (2025)
- Camille Dreyfus Teacher-Scholar Award (2025)
- Scialog Funding for Automated Laboratories (2024)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Connor W. Coley
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup