Sergei Maslov

· ProfessorVerified

University of Illinois Urbana-Champaign · Bioengineering

Active 1975–2026

h-index53

Citations13.7k

Papers29882 last 5y

Funding—

Faculty page Lab page

See your match with Sergei Maslov — sign in to PhdFit.Sign in

About

Sergei Maslov is a full Professor and Bliss Faculty Scholar at the Department of Bioengineering and the Department of Physics at the University of Illinois at Urbana-Champaign. His primary research areas include Computational and Systems Biology, Big Data Bioinformatics, Biomolecular modeling, Cancer biology, Cancer genomics, Complex system modeling, and Computational biology and biomedicine. His work focuses on the dynamics of complex biomolecular networks, gene regulation, genomics of bacteria and their communities, metagenomics, neural modeling, noise biology, polymer mechanics, quantitative biology, statistical learning, systems biology, and theoretical biological physics. Maslov has contributed to the understanding of complex biological systems through his research on biomolecular networks, microbial community structure, and the emergence of catalytic functions in prebiotic polymers. He holds a PhD in Theoretical Statistical Physics from Stony Brook University (1996) and has held academic positions since 2015, including his current roles at the University of Illinois. His extensive publication record and editorial service reflect his significant contributions to the fields of computational and systems biology.

Research topics

Computer Science
Machine Learning
Artificial Intelligence
Data Mining
Medicine
Virology
Mathematics
Environmental health
Telecommunications
Pathology
Anesthesia
Medical emergency
Emergency medicine
Nursing
Engineering
Statistics
Simulation

Selected publications

Data for FUN-PROSE: A Deep Learning Approach to Predict Condition-Specific Gene Expression in Fungi
Open MIND · 2026-01-01
datasetSenior author
mRNA levels of all genes in a genome is a critical piece of information defining the overall state of the cell in a given environmental condition. Being able to reconstruct such condition-specific expression in fungal genomes is particularly important to metabolically engineer these organisms to produce desired chemicals in industrially scalable conditions. Most previous deep learning approaches focused on predicting the average expression levels of a gene based on its promoter sequence, ignoring its variation across different conditions. Here we present FUN-PROSE—a deep learning model trained to predict differential expression of individual genes across various conditions using their promoter sequences and expression levels of all transcription factors. We train and test our model on three fungal species and get the correlation between predicted and observed condition-specific gene expression as high as 0.85. We then interpret our model to extract promoter sequence motifs responsible for variable expression of individual genes. We also carried out input feature importance analysis to connect individual transcription factors to their gene targets. A sizeable fraction of both sequence motifs and TF-gene interactions learned by our model agree with previously known biological information, while the rest corresponds to either novel biological facts or indirect correlations.
DOI
Coarse-grained model of serial dilution dynamics in synthetic human gut microbiome
PLoS Computational Biology · 2025-07-14
articleOpen accessSenior authorCorresponding
Many microbial communities in nature are complex, with hundreds of coexisting strains and the resources they consume. We currently lack the ability to assemble and manipulate such communities in a predictable manner in the lab. Here, we take a first step in this direction by introducing and studying a simplified consumer resource model of such complex communities in serial dilution experiments. The main assumption of our model is that during the growth phase of the cycle, strains share resources and produce metabolic byproducts in proportion to their average abundances and strain-specific consumption/production fluxes. We fit the model to describe serial dilution experiments in hCom2, a defined synthetic human gut microbiome with a steady-state diversity of 63 species growing on a rich media, using consumption and production fluxes inferred from metabolomics experiments. The model predicts serial dilution dynamics reasonably well, with a correlation coefficient between predicted and observed strain abundances as high as 0.8. We applied our model to: (i) calculate steady-state abundances of leave-one-out communities and use these results to infer the interaction network between strains; (ii) explore direct and indirect interactions between strains and resources by increasing concentrations of individual resources and monitoring changes in strain abundances; (iii) construct a resource supplementation protocol to maximally equalize steady-state strain abundances.
Publisher DOI
Evolutionary chemical learning in dimerization networks
ArXiv.org · 2025-06-16
preprintOpen accessSenior author
We present a novel framework for chemical learning based on Competitive Dimerization Networks (CDNs) - systems in which multiple molecular species, e.g. proteins or DNA/RNA oligomers, reversibly bind to form dimers. We show that these networks can be trained in vitro through directed evolution, enabling the implementation of complex learning tasks such as multiclass classification without digital hardware or explicit parameter tuning. Each molecular species functions analogously to a neuron, with binding affinities acting as tunable synaptic weights. A training protocol involving mutation, selection, and amplification of DNA-based components allows CDNs to robustly discriminate among noisy input patterns. The resulting classifiers exhibit strong output contrast and high mutual information between input and output, especially when guided by a contrast-enhancing loss function. Comparative analysis with in silico gradient descent training reveals closely correlated performance. These results establish CDNs as a promising platform for analog physical computation, bridging synthetic biology and machine learning, and advancing the development of adaptive, energy-efficient molecular computing systems.
Publisher OA PDF DOI
ML-Guided GWAS Reveals Genetic Architectures for MASLD for Overweight and Lean Individuals in the <i>All of Us</i> Cohort
medRxiv · 2025-12-20
preprintOpen accessSenior author
Abstract Metabolic dysfunction-associated steatotic liver disease (MASLD) arises from excessive hepatic fat accumulation that triggers inflammation and liver injury. It is the most prevalent chronic liver disease worldwide, affecting more than one quarter of adults. Despite this, MASLD is often underdiagnosed, making it more difficult to perform genome-wide association studies (GWAS). In this paper, we implemented a machine learning (ML)-guided GWAS framework to identify genetic risk factors for MASLD across lean and overweight individuals in the All of Us Research Program. A random forest model trained on laboratory measurements, vital signs, and demographic features generated an in silico MASLD (I-MASLD) score, a continuous risk score for MASLD, which was validated to accurately represent clinical MASLD diagnosis. This score was then used as the phenotype in a GWAS of whole-exome sequencing variants. The resultant GWAS discovered a novel variant in the ANGPTL4 gene to be significantly associated with MASLD risk and recapitulated known variants in various genes involved in lipid metabolism and insulin signaling. Our results also suggest a potential role of APOA5 in MASLD onset or progression in lean patients. These findings demonstrate that ML-derived quantitative phenotypes can enhance genetic discovery in large, heterogeneous cohorts where conventional case/control labels are limited or imprecise.
Publisher OA PDF DOI
Intrinsic OASL expression licenses interferon induction during influenza A virus infection
bioRxiv (Cold Spring Harbor Laboratory) · 2025-03-17 · 4 citations
preprintOpen access
Effective control of viral infection requires rapid induction of the innate immune response, especially the type I and type III interferon (IFN) systems. Despite the critical role of IFN induction in host defense, numerous studies have established that most cells fail to produce IFNs in response to viral stimuli. The specific factors that govern cellular heterogeneity in IFN induction potential during infection are not understood. To identify specific host factors that license some cells but not others to mount an IFN response to viral infection, we developed an approach for analyzing temporal scRNA-seq data of influenza A virus (IAV)-infected cells. This approach identified the expression of several interferon stimulated genes (ISGs) within pre-infection cells as correlates of IFN induction potential of those cells, post-infection. Validation experiments confirmed that intrinsic expression of the ISG OASL is essential for robust IFNL induction during IAV infection. Altogether, our findings reveal an important role for IFN-independent, intrinsic expression of ISGs in promoting IFN induction and provide new insights into the mechanisms that regulate cell-to-cell heterogeneity in innate immune activation.
Publisher OA PDF DOI
Protein Language Models Capture Structural and Functional Epistasis in a Zero-Shot Setting
bioRxiv (Cold Spring Harbor Laboratory) · 2025-09-17 · 2 citations
preprintOpen accessSenior author
Abstract Protein language models (PLMs) learn from large collections of natural sequences and achieve striking success across prediction tasks, yet it remains unclear what biological principles underlie their representations. We use epistasis, the dependence of a mutation’s effect on its sequence context, as a lens to probe what PLMs capture about proteins. Comparing PLM-derived scores with deep mutational scanning data, we find that epistasis emerges naturally from pretrained models, without supervision on experimental fitness. Raw model scores align with residue–residue contacts, indicating that PLMs internalize structural proximity. Applying a nonlinear transformation to bring model outputs onto the experimental scale, however, shifts the signal toward functional couplings between distant sites. These findings show that PLMs capture both structural and functional dependencies from sequence data alone, and that epistasis provides a powerful window into the biological principles embedded in their representations.
Publisher OA PDF DOI
Single-cell heterogeneity in interferon induction potential is heritable and governed by variation in cell state
bioRxiv (Cold Spring Harbor Laboratory) · 2025-12-12 · 2 citations
articleOpen access
SUMMARY Type I and III interferons (IFNs) are among the first lines of defense against viral infection, yet they are generally only produced by a tiny fraction of infected cells. Here, we show that cellular heterogeneity in IFN induction potential upon treatment with immunostimulatory RNA is not due to variability in sensing of stimuli but instead is shaped by heterogeneity in tonic cell signaling state. Using complementary single-cell approaches, we found that baseline variation in the c-Jun N-terminal kinase (JNK) and activator protein (AP)-1 transcription factor families correlated with IFNL1 expression predisposition. We further show that drug-based inhibition of JNK signaling virtually eliminates the innate antiviral response to immunostimulatory RNA. Finally, we show that single cell heterogeneity in IFN induction potential is heritable and stably maintained over numerous generations. Together, our study emphasizes the influence of intrinsic variability in cell state on innate immune regulation and IFN induction heterogeneity.
Publisher DOI
Predicting metabolite response to dietary intervention using deep learning
Nature Communications · 2025-01-18 · 27 citations
articleOpen access
Due to highly personalized biological and lifestyle characteristics, different individuals may have different metabolite responses to specific foods and nutrients. In particular, the gut microbiota, a collection of trillions of microorganisms living in the gastrointestinal tract, is highly personalized and plays a key role in the metabolite responses to foods and nutrients. Accurately predicting metabolite responses to dietary interventions based on individuals' gut microbial compositions holds great promise for precision nutrition. Existing prediction methods are typically limited to traditional machine learning models. Deep learning methods dedicated to such tasks are still lacking. Here we develop a method McMLP (Metabolite response predictor using coupled Multilayer Perceptrons) to fill in this gap. We provide clear evidence that McMLP outperforms existing methods on both synthetic data generated by the microbial consumer-resource model and real data obtained from six dietary intervention studies. Furthermore, we perform sensitivity analysis of McMLP to infer the tripartite food-microbe-metabolite interactions, which are then validated using the ground-truth (or literature evidence) for synthetic (or real) data, respectively. The presented tool has the potential to inform the design of microbiota-based personalized dietary strategies to achieve precision nutrition.
Publisher OA PDF DOI
Evolutionary chemical learning in dimerization networks
bioRxiv (Cold Spring Harbor Laboratory) · 2025-06-19
preprintSenior author
We present a novel framework for chemical learning based on Com- petitive Dimerization Networks (CDNs)— systems in which multiple molecular species, e.g. proteins or DNA/RNA oligomers, reversibly bind to form dimers. We show that these networks can be trained in vitro through directed evolution, enabling the implementation of complex learning tasks such as multiclass classification without digital hardware or explicit parameter tuning. Each molecular species functions analogously to a neuron, with binding affinities acting as tunable synaptic weights. A training protocol involving mutation, selection, and amplification of DNA-based components allows CDNs to robustly discriminate among noisy input patterns. The resulting classifiers exhibit strong output contrast and high mutual information between input and output, especially when guided by a contrastenhancing loss function. Comparative analysis with in silico gradient descent training reveals closely correlated performance. These results establish CDNs as a promising platform for analog physical computation, bridging synthetic biology and machine learning, and advancing the development of adaptive, energy-efficient molecular computing systems. Significance Statement This study introduces a new paradigm for learning based on chemical reaction networks rather than digital circuits. Using Competitive Dimerization Networks (CDNs)—biomolecular systems in which species reversibly bind to form dimers—complex classification tasks are learned through in vitro directed evolution. This approach eliminates the need for digital hardware or gradient-based optimization, relying instead on intrinsic molecular dynamics for computation. The resulting chemical classifiers achieve high fidelity and robustness to noise, with performance comparable to that of gradient descent training. These findings establish CDNs as a scalable, energy-efficient platform for molecular computing, suggesting broad potential applications in diagnostics, biosensing, synthetic biology, and nanotechnology, where programmable, adaptive chemical systems could serve as alternatives to conventional electronic processors.
Publisher DOI
noSpliceVelo infers gene expression dynamics without separating unspliced and spliced transcripts
bioRxiv (Cold Spring Harbor Laboratory) · 2024-08-09 · 3 citations
preprintOpen accessSenior authorCorresponding
Abstract Modern single-cell transcriptomics has revolutionized biological research, but because of its destructive nature, it provides only static snapshots. Computational approaches that infer RNA velocity from the ratio of unspliced to spliced mRNA levels can be used to predict how gene expression changes over time. However, information about unspliced and spliced transcripts is not always available and may change on a timescale too short to accurately infer transitions between cellular states. Here we present noSpliceVelo, a novel technique for reconstructing RNA velocity without relying on unspliced and spliced transcripts. Instead, it exploits the temporal relationship between the variance and mean of bursty gene expression using a well-established biophysical model. When evaluated on datasets describing mouse pancreatic endocrinogenesis, mouse and human erythroid maturation, and neuronal stimulation in mouse embryonic cortex, noSpliceVelo performed comparably or better than scVelo, a splicing-based approach. In addition, noSpliceVelo inferred key biophysical parameters of gene regulation, specifically burst size and frequency, potentially distinguishing between transcriptional and epigenetic regulation.
Publisher OA PDF DOI

Frequent coauthors

Alexei V. Tkachenko
Brookhaven National Laboratory
53 shared
Maya Paczuski
40 shared
Kim Sneppen
University of Copenhagen
37 shared
Veronika Dubinkina
Gladstone Institutes
36 shared
Tong Wang
Brigham and Women's Hospital
31 shared
Akshit Goyal
Massachusetts Institute of Technology
29 shared
Ananthan Nambiar
Urbana University
28 shared
Per Bak
24 shared

Labs

Maslov labPI
Not provided

Education

Ph.D., Bioengineering
University of Illinois Urbana-Champaign
2000
M.S., Bioengineering
University of Illinois Urbana-Champaign
1996
B.S., Bioengineering
University of Illinois Urbana-Champaign
1994

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Sergei Maslov

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you