Yihui Zhou
· ProfessorVerifiedNorth Carolina State University · Plant and Microbial Biology
Active 1999–2026
About
Yi-Hui Zhou, PhD, is a researcher leading the High Dimensional Predictive Biology Lab. The lab's focus includes human microbiome analysis, methods development, and the application of machine learning techniques to discover disease genetics and perform prediction. Their work also encompasses computational toxicology, integrative analysis of 'omics data, high dimensional statistics and inference, biomedical data analysis, and single-cell RNA-sequencing analysis. Dr. Zhou's research aims to advance understanding and methodologies in these areas to contribute to biomedical and health-related discoveries.
Research topics
- Medicine
- Biology
- Computer Science
- Geography
- Biochemistry
- Chemistry
- Internal medicine
- Engineering
- Library science
- Genetics
- Pathology
- Environmental science
- Gerontology
- Environmental chemistry
- Organic chemistry
- Ecology
Selected publications
The Journal of Climate Change and Health · 2026-04-17
articleOpen accessSenior authorCorrespondingIntroduction: Climate change is increasingly recognized as a critical determinant of human health, yet its integration into health policy remains limited. Understanding the link between climate variables and human life expectancy is essential for developing targeted interventions that mitigate health risks. This study examines the relationship between key climate features and life expectancy, providing evidence to inform integrated climate and health policies. Methods: We applied a multi-method modeling framework, including Random Forest, Geographically Weighted Regression, and Generalized Additive Models, to analyze the impact of climate variables on human life expectancy in the United States. Key environmental factors, such as vapor pressure, temperature, actual evapotranspiration, and solar radiation, were assessed for their influence on life expectancy across diverse geographic regions. Model performance and variable importance were evaluated to ensure robustness and consistency. Results: Our findings indicate that higher vapor pressure, actual evapotranspiration, and temperature are associated with reduced life expectancy, whereas increased solar radiation demonstrates a protective effect. These associations remained consistent across all models, highlighting the reliability of our approach. The Random Forest model exhibited strong predictive performance, reinforcing the validity of our findings. Conclusions: This study underscores the significant impact of climate on human life expectancy and highlights the need for integrated climate and health policies. Addressing harmful climate variables through targeted interventions-such as urban planning and emergency management-can enhance public health resilience. Our findings provide a scientific foundation for policymakers to develop sustainable strategies that safeguard human well-being in the face of climate change.
Environmental Research · 2025-09-03
articleOpen access1st authorCorrespondingOrganophosphate flame retardants (OPFRs) are widely used environmental contaminants with suspected developmental neurotoxicity, yet their stage-specific molecular impacts and potential relevance to autism spectrum disorder (ASD) remain poorly defined. We integrated transcriptomic and lipidomic analyses from two rat models to investigate OPFR-induced disruption across early neurodevelopment. In dataset GSE148266, fetal forebrain and placenta were analyzed following in utero OPFR exposure; in dataset GSE211430, neonatal cortical RNA-seq and lipidomics were profiled after postnatal exposure to triphenyl phosphate and isopropylated triaryl phosphate (1,000 μg/day; n = 10/sex/group). Differential expression (DESeq2; FDR < 0.05), pathway enrichment (GSEA), and multi-omics integration (DIABLO; |r| > 0.9) were performed. Fetal exposure altered 191 genes (144 mapped to human orthologues), including ASD-relevant genes such as ADNP , BRAF , and MAPK3 , with enrichment in spliceosome (NES = 2.39), cell-cycle regulation (NES = 2.26), and suppressed Toll-like/NOD-like immune signaling (NES = –2.06). Postnatal exposure disrupted 34 genes and 12 lipids, notably PC(15:0_16:0) and TG(18:1_20:1_20:1), which correlated with synaptic and immune-related genes. Eighteen neonatal DEGs overlapped with human ASD cortical transcriptomes, and integrated analysis revealed shared neurotransmission networks across developmental stages. These findings demonstrate that OPFRs disrupt conserved neurodevelopmental gene networks in a stage-specific manner. The convergence of transcriptomic and lipidomic signals with ASD-relevant features supports further investigation of OPFRs as candidate environmental risk factors and highlights molecular pathways for future biomarker and toxicity studies. • Organophosphate flame retardant (OPFR) exposure induces stage-specific disruptions in gene expression during fetal and neonatal rat brain development. • Dysregulated pathways include cell cycle regulation , immune signaling (Toll-like and NOD-like receptors), and spliceosome function , with enrichment of high-confidence autism spectrum disorder (ASD) risk genes such as ADNP , BRAF , and MAPK3. • Integrated lipid–gene network analysis reveals coordinated perturbations in phosphatidylcholines and triglycerides correlated with synaptic and immune-related genes, suggesting potential autism-relevant mechanisms. • Neonatal OPFR-responsive genes show significant overlap with dysregulated cortical transcriptomes from human ASD cases, supporting translational relevance. • Multi-omics integration of transcriptomic and lipidomic data identifies conserved molecular signatures linking OPFR exposure to autism-associated neurodevelopmental pathways.
Liver eQTL meta-analysis illuminates potential molecular mechanisms of cardiometabolic traits
UNC Libraries · 2025-02-02
articleOpen accessUnderstanding the molecular mechanisms of complex traits is essential for developing targeted interventions. We analyzed liver expression quantitative-trait locus (eQTL) meta-analysis data on 1,183 participants to identify conditionally distinct signals. We found 9,013 eQTL signals for 6,564 genes; 23% of eGenes had two signals, and 6% had three or more signals. We then integrated the eQTL results with data from 29 cardiometabolic genome-wide association study (GWAS) traits and identified 1,582 GWAS-eQTL colocalizations for 747 eGenes. Non-primary eQTL signals accounted for 17% of all colocalizations. Isolating signals by conditional analysis prior to coloc resulted in 37% more colocalizations than using marginal eQTL and GWAS data, highlighting the importance of signal isolation. Isolating signals also led to stronger evidence of colocalization: among 343 eQTL-GWAS signal pairs in multi-signal regions, analyses that isolated the signals of interest resulted in higher posterior probability of colocalization for 41% of tests. Leveraging allelic heterogeneity, we predicted causal effects of gene expression on liver traits for four genes. To predict functional variants and regulatory elements, we colocalized eQTL with liver chromatin accessibility QTL (caQTL) and found 391 colocalizations, including 73 with non-primary eQTL signals and 60 eQTL signals that colocalized with both a caQTL and a GWAS signal. Finally, we used publicly available massively parallel reporter assays in HepG2 to highlight 14 eQTL signals that include at least one expression-modulating variant. This multi-faceted approach to unraveling the genetic underpinnings of liver-related traits could lead to therapeutic development.
medRxiv · 2025-11-17
preprintOpen access1st authorCorrespondingAbstract Clinical decision support systems increasingly guide ICU care, but may perpetu-ate or amplify existing healthcare disparities. We develop a doubly-robust statistical framework for detecting and quantifying algorithmic bias in ICU treatment recommen-dations. Rather than prescribing treatment allocation, our approach audits existing clinical decision support systems to identify disparities in predicted treatment benefits across demographic groups. Analyzing 193,683 patients from the eICU database, we demonstrate the framework’s ability to detect systematic biases. For age-based anal-ysis, we identify a 5.1 percentage point mortality disparity with differential predicted treatment effects (3.2pp younger vs. 1.8pp older patients). For race-based analysis, severity-adjusted outcome disparities (average 2.2pp, reaching 5.6pp at high severity) suggest potential differences in care quality or algorithmic recommendations despite similar aggregate outcomes. We quantify how different fairness metrics (demographic parity, equalized odds, calibration) reveal distinct bias patterns, providing guidance for bias auditing in clinical AI systems. This framework enables healthcare systems to identify and address algorithmic bias before deployment, supporting more equitable clinical decision support.
<i>CHP2</i> Modifies Chronic <i>Pseudomonas aeruginosa</i> Airway Infection Risk in Cystic Fibrosis
Annals of the American Thoracic Society · 2025-01-02
articleOpen accessAbstract Rationale Chronic Pseudomonas aeruginosa (Pa) airway infection is common and a key contributor to diminished lung function and early mortality in persons with cystic fibrosis (PwCF). Risk factors for chronic Pa among PwCF include CFTR (cystic fibrosis transmembrane conductance regulator) genotype, genetic modifiers, and environmental factors. Intensive antibiotic therapy and highly effective modulators do not eradicate Pa in most adolescents and adults with cystic fibrosis. Objectives To identify new genetic modifiers contributing to the pathophysiology of chronic Pa infection in PwCF. Methods A total of 4,945 participants in the CF Genome Project with whole-genome sequencing linked to longitudinal clinical data from the 2017 Cystic Fibrosis Foundation Patient Registry were used to conduct a time-to-event genome-wide association study using two definitions of chronic Pa infection. Results We identified a genome-wide significant association (P = 2.2 × 10−8) between delayed onset of chronic Pa infection and rs194810, a common variant near the gene CHP2, which encodes calcineurin B homolog protein 2 (minor A allele frequency 43%). Survival curves by rs198410 allele dosage show that PwCF homozygous for the A allele are an average of 3 years older when achieving chronic Pa infection compared with G allele homozygotes. Conclusions Variants near CHP2 are associated with a significant delay in the age of chronic Pa infection among PwCF.
Genetic Modifiers of Cystic Fibrosis Lung Disease Severity: Whole-Genome Analysis of 7,840 Patients.
UNC Libraries · 2025-10-22
articleOpen accessSenior author<strong>Rationale:</strong> Lung disease is the major cause of morbidity and mortality in persons with cystic fibrosis (pwCF). Variability in CF lung disease has substantial non-CFTR (CF transmembrane conductance regulator) genetic influence. Identification of genetic modifiers has prognostic and therapeutic importance. <strong>Objectives:</strong> Identify genetic modifier loci and genes/pathways associated with pulmonary disease severity. <strong>Methods:</strong> Whole-genome sequencing data on 4,248 unique pwCF with pancreatic insufficiency and lung function measures were combined with imputed genotypes from an additional 3,592 patients with pancreatic insufficiency from the United States, Canada, and France. This report describes association of approximately 15.9 million SNPs using the quantitative Kulich normal residual mortality-adjusted (KNoRMA) lung disease phenotype in 7,840 pwCF using premodulator lung function data. <strong>Measurements and Main Results:</strong> Testing included common and rare SNPs, transcriptome-wide association, gene-level, and pathway analyses. Pathway analyses identified novel associations with genes that have key roles in organ development, and we hypothesize that these genes may relate to dysanapsis and/or variability in lung repair. Results confirmed and extended previous genome-wide association study findings. These whole-genome sequencing data provide finely mapped genetic information to support mechanistic studies. No novel primary associations with common single variants or rare variants were found. Multilocus effects at chr5p13 (<em>SLC9A3/CEP72</em>) and chr11p13 (<em>EHF/APIP</em>) were identified. Variant effect size estimates at associated loci were consistently ordered across the cohorts, indicating possible age or birth cohort effects. <strong>Conclusions:</strong> This premodulator genomic, transcriptomic, and pathway association study of 7,840 pwCF will facilitate mechanistic and postmodulator genetic studies and the development of novel therapeutics for CF lung disease.
Pleiotropic modifiers of age-related diabetes and neonatal intestinal obstruction in cystic fibrosis
UNC Libraries · 2025-10-21
articleOpen access1st authorCorrespondingGenome Research · 2025-05-20 · 4 citations
articleOpen accessChromatin accessibility quantitative trait locus (caQTL) studies have identified regulatory elements that underlie genetic effects on gene expression and metabolic traits. However, caQTL discovery has been limited by small sample sizes. Here, we map caQTLs in liver tissue from 138 human donors and identify caQTLs for 35,361 regulatory elements, including population-specific caQTLs driven by differences in allele frequency across populations. We identify 2126 genetic signals associated with multiple, presumably coordinately regulated elements. Coordinately regulated elements link distal elements to target genes and are more likely to be associated with gene expression compared with single-element caQTLs. We predict driver and response elements at coordinated loci and find that driver elements are enriched for transcription factor binding sites of key liver regulators. We identify colocalized caQTLs at 667 genome-wide association (GWAS) signals for metabolic and liver traits, and annotate these loci with predicted target genes and disrupted transcription factor binding sites. CaQTLs identify threefold more GWAS colocalizations than liver expression QTLs (eQTLs) in a larger sample size, suggesting that caQTLs can detect mechanisms missed by eQTLs. At a GWAS signal colocalized with a caQTL and an eQTL for TENM2 , we validated regulatory activity for a variant within a predicted driver element that is coordinately regulated with 39 other elements. At another locus, we validate a predicted enhancer of RALGPS2 using CRISPR interference and demonstrate allelic effects on transcription for a haplotype within this enhancer. These results demonstrate the power of caQTLs to characterize regulatory mechanisms at GWAS loci.
Structure Maintained Representation Learning Neural Network for Causal Inference
ArXiv.org · 2025-08-03
preprintOpen accessSenior authorRecent developments in causal inference have greatly shifted the interest from estimating the average treatment effect to the individual treatment effect. In this article, we improve the predictive accuracy of representation learning and adversarial networks in estimating individual treatment effects by introducing a structure keeper which maintains the correlation between the baseline covariates and their corresponding representations in the high dimensional space. We train a discriminator at the end of representation layers to trade off representation balance and information loss. We show that the proposed discriminator minimizes an upper bound of the treatment estimation error. We can address the tradeoff between distribution balance and information loss by considering the correlations between the learned representation space and the original covariate feature space. We conduct extensive experiments with simulated and real-world observational data to show that our proposed Structure Maintained Representation Learning (SMRL) algorithm outperforms state-of-the-art methods. We also demonstrate the algorithms on real electronic health record data from the MIMIC-III database.
Inducible genome-wide mutagenesis for improvement of pDNA production by <i>E. coli</i>
bioRxiv (Cold Spring Harbor Laboratory) · 2025-06-14
preprintOpen accessABSTRACT Plasmid DNA (pDNA) is a cost-driving reagent for the production of gene therapies and DNA vaccines. Improving pDNA production in the most common production host ( E. coli ) has faced obstacles arising from the complex network of genes responsible for pDNA synthesis, with the specific enzyme(s) limiting pDNA yield remaining unidentified. To address this challenge, we employed an inducible genome-wide mutagenesis strategy, combined with fluorescent screening, to isolate E. coli NEB 5α strains with enhanced pDNA production. Following selection, we successfully isolated an E. coli strain (M3) with elevated plasmid copy numbers (PCNs) across multiple origin types. Specifically, we observed a 5.93-fold increase in PCN for the GFP reporter plasmid, a 1.93-fold increase for the gWiz DNA vaccine plasmid, and an 8.7-fold increase for the pAAV-CAGG-eGFP plasmid, all of which contain pUC origins. In addition, plasmids with p15A and pSC101 origins showed 1.44-fold and 1.68-fold increases in PCN, respectively. Whole-genome sequencing of the adapted strain M3 identified 85 mutations, including one in recG , which encodes an ATP-dependent DNA helicase. Replacement of the mutant recG with its wild-type counterpart in the mutant strain resulted in a 63% reduction in PCN, but the recG mutation alone was insufficient to increase PCN in the wild-type strain. These findings suggest that the recG mutation plays a synergistic role with other genomic mutations to drive PCN increases. Taken together, this study presents the development of a pDNA hyperaccumulating E. coli strain with promising applications in industrial and therapeutic pDNA production, while also offering important insights into key genes involved in pDNA production.
Recent grants
Permutation Approximations for Next Generation Association
NIH · $417k · 2015–2018
Frequent coauthors
- 64 shared
Fred A. Wright
- 35 shared
Michael J. Bamshad
Brotman Baty Institute
- 31 shared
Michael R. Knowles
Lung Institute
- 30 shared
Scott M. Blackman
Johns Hopkins University
- 28 shared
Garry R. Cutting
Johns Hopkins Medicine
- 27 shared
Rhonda G. Pace
Lung Institute
- 23 shared
Paul J. Gallins
North Carolina State University
- 19 shared
Ronald L. Gibson
Seattle Children's Hospital
Education
- 2011
PhD, Biostatitsics
University of North Carolina at Chapel Hill
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Yihui Zhou
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup