Huibin Zhou

· Henry Ford II Professor of Statistics & Data ScienceVerified

Yale University · Department of Statistics and Data Science

Active 1992–2026

h-index55

Citations13.2k

Papers299171 last 5y

Funding—

Faculty page Lab page

See your match with Huibin Zhou — sign in to PhdFit.Sign in

About

Huibin Zhou is the Henry Ford II Professor of Statistics and Data Science at Yale University. He earned his Ph.D. from Cornell University in 2004 and has held various academic positions at Yale since then, progressing from Assistant Professor to full Professor and eventually to his current named professorship. Zhou has also served as Chair of the Department of Statistics and Data Science at Yale University during multiple terms between 2012 and 2021. His professional service includes organizing and co-organizing several workshops and conferences focused on empirical processes, high-dimensional data, and statistical methodologies. Zhou's teaching portfolio spans a wide range of courses in statistics and data science, including probability theory, decision theory, nonparametric estimation, data mining, machine learning, and functional data analysis. His research contributions are extensive and cover theoretical and computational aspects of statistics, including high-dimensional data analysis, Gaussian mixture models, community detection in networks, covariance and precision matrix estimation, Bayesian structured linear models, and sparse canonical correlation analysis. Zhou's work has been published in leading journals such as the Annals of Statistics, Journal of the American Statistical Association, and Journal of Machine Learning Research, reflecting his significant impact on modern statistical theory and methodology.

Research topics

Biology
Genetics
Psychiatry
Medicine
Psychology
Clinical psychology
Computer Science
Neuroscience
Internal medicine
Developmental psychology

Selected publications

Evolutionary game analysis and stability control scenarios of food safety supervision based on system dynamics
Frontiers in Sustainable Food Systems · 2026-04-22
articleOpen accessSenior author
Introduction Food safety supervision involves dynamic strategic interactions between enterprises and government regulators. Traditional static models fail to capture the co-evolution of behaviors under varying policy incentives. This study develops a coupled evolutionary game-system dynamics (EG-SD) model to investigate how cost-benefit configurations and policy parameters shape long-term strategic outcomes in food safety governance, with the goal of identifying conditions that balance enterprise-government relationships and promote sustainable food systems. Methods An integrated EG-SD framework was constructed to model the two-player (enterprise and government) evolutionary game. Equilibrium stability conditions were derived analytically under different cost-benefit scenarios. Stability analysis was performed to identify evolutionary stable strategies (ESS). A numerical simulation was conducted to replicate four distinct case configurations, tracking strategy evolution over time under varying parameter sets. Results The stability analysis revealed that equilibrium outcomes depend critically on the relative magnitudes of supervision costs, penalty levels, and compliance benefits. Numerical simulations demonstrated the absence of any stable pure or mixed strategy in Case 1. In all simulated scenarios, enterprises consistently converged to a “not pay attention” strategy regardless of government actions. Government strategy was scenario-dependent: it fully adopted a “supervise” stance in some cases, but switched to a “not supervise” stance in Cases 3 and 4. No parameter configuration induced enterprise proactive compliance as a stable outcome. Discussion The government's scenario-dependent behavior indicates that supervision is effective only under specific cost-benefit thresholds. These findings underscore the necessity of redesigning regulatory dynamics to align economic incentives with long-term environmental and social health goals. Effective supervision requires not only enforcement but also mechanisms that make proactive compliance economically attractive for enterprises. The model provides a tool for testing policy interventions before implementation.
Publisher DOI
Shared genetic risk between eating disorder- and substance-use-related phenotypes: Evidence from genome-wide association studies.
Archive ouverte UNIGE (University of Geneva) · 2026-02-10
articleOpen access
Eating disorders and substance use disorders frequently co-occur. Twin studies reveal shared genetic variance between liabilities to eating disorders and substance use, with the strongest associations between symptoms of bulimia nervosa and problem alcohol use (genetic correlation [rg ], twin-based = 0.23-0.53). We estimated the genetic correlation between eating disorder and substance use and disorder phenotypes using data from genome-wide association studies (GWAS). Four eating disorder phenotypes (anorexia nervosa [AN], AN with binge eating, AN without binge eating, and a bulimia nervosa factor score), and eight substance-use-related phenotypes (drinks per week, alcohol use disorder [AUD], smoking initiation, current smoking, cigarettes per day, nicotine dependence, cannabis initiation, and cannabis use disorder) from eight studies were included. Significant genetic correlations were adjusted for variants associated with major depressive disorder and schizophrenia. Total study sample sizes per phenotype ranged from ~2400 to ~537 000 individuals. We used linkage disequilibrium score regression to calculate single nucleotide polymorphism-based genetic correlations between eating disorder- and substance-use-related phenotypes. Significant positive genetic associations emerged between AUD and AN (rg = 0.18; false discovery rate q = 0.0006), cannabis initiation and AN (rg = 0.23; q < 0.0001), and cannabis initiation and AN with binge eating (rg = 0.27; q = 0.0016). Conversely, significant negative genetic correlations were observed between three nondiagnostic smoking phenotypes (smoking initiation, current smoking, and cigarettes per day) and AN without binge eating (rgs = -0.19 to -0.23; qs < 0.04). The genetic correlation between AUD and AN was no longer significant after co-varying for major depressive disorder loci. The patterns of association between eating disorder- and substance-use-related phenotypes highlights the potentially complex and substance-specific relationships among these behaviors.
Publisher OA PDF DOI
Whole-exome sequencing study of opioid dependence offers novel insights into the contributions of exome variants
Translational Psychiatry · 2025-10-06 · 2 citations
articleOpen accessSenior author
Opioid dependence (OD) is epidemic in the United States and it is associated with a variety of adverse health effects. Its estimated heritability is ~50% in twin studies, and recent genome-wide association studies have identified more than a dozen common risk variants. However, there are no published studies of rare OD risk variants. In this study, we analyzed whole-exome sequencing data from the Yale-Penn cohort, comprising 2100 participants of European ancestry (EUR; 1321 OD cases) and 1790 of African ancestry (AFR; 864 cases). A novel low-frequency variant (rs746301110) in the RUVBL2 gene was identified in EUR (p = 6.59 × 10−10). Suggestive associations (p < 1 × 10−5; not passing the Bonferroni correction) were observed in TMCO3 in EUR, in NEIL2 and CFAP44 in AFR, and in FAM210B in the cross-ancestry meta-analysis. Gene-based collapsing tests identified SLC22A10, TMCO3, FAM90A1, DHX58, CHRND, GLDN, PLAT, H1-4, COL3A1, GPHB5 and QPCTL as top genes (p < 1 × 10−4) with most associations attributable to rare variants and driven by the burden of predicted loss-of-function and missense variants. This study begins to fill the gap in our understanding of the genetic architecture of OD, providing insights into the contribution of rare coding variants and potential targets for future functional studies and drug development.
Publisher OA PDF DOI
Identification of risk variants and cross-disorder pleiotropy through multi-ancestry genome-wide analysis of alcohol use disorder
Nature Mental Health · 2025-01-08 · 3 citations
articleOpen access
Publisher OA PDF DOI
Genotype‐by‐sex interaction analyses for alcohol use disorder across biobanks
Alcohol Clinical and Experimental Research · 2025-09-29 · 1 citations
articleOpen access1st authorCorresponding
Abstract Background Alcohol use and alcohol use disorder (AUD) are significant contributors to morbidity and mortality, with different prevalences between males and females. Despite the established genetic contribution to AUD, sex as a biological variable and the interplay with genetic factors in the disorder remain largely unexplored. This study aimed to address the key question as to how genetic variations interact with biological sex to influence the AUD risk. Methods We conducted genome‐wide genotype‐by‐sex (G × S) interaction analyses using multiancestry datasets from the Million Veteran Program (MVP) and UK Biobank (UKB). In total, 1,039,476 participants were analyzed, comprising 150,429 AUD cases and 889,046 controls. AUD cases were defined using ICD‐9/10 codes in the MVP and using ICD‐10 codes (field ID 41270) along with self‐reported history of alcohol addiction (field ID 20406) in the UKB. Results In single‐ancestry analyses, we identified two loci in African ancestry samples with lead single‐nucleotide polymorphisms (SNPs) rs2183020 ( p = 1.82 × 10 −8 ) and rs9304803 ( p = 4.66 × 10 −8 ), and one locus in Admixed American ancestry samples with lead SNP rs9527196 ( p = 2.83 × 10 −8 ). The cross‐ancestry meta‐analysis identified one additional locus with lead SNP rs62446539 ( p = 3.95 × 10 −8 ). The deep learning method predicted that rs9304803 has B‐cell type‐specific enhancer activity. Rs2183020 and rs9304803 exhibited expression quantitative trait locus (eQTL) effects on multiple genes across various tissues, including the brain. Further experiments in ethanol‐exposed human neurons confirmed expression changes in several of these genes. Phenome‐wide association analyses revealed significant associations between rs2183020 and weight/body mass index, and between rs9304803 and prothrombin time (measured as international normalized ratio). Conclusions We believe this is the first genome‐wide G × S study of AUD, providing novel insights into the genetic basis of sex differences in AUD and advancing our understanding of its biological underpinnings.
Publisher OA PDF DOI
55. DISSECTING THE ANCESTRY-SPECIFIC GENETIC ARCHITECTURE OF ALCOHOL CONSUMPTION IN LATIN AMERICANS
European Neuropsychopharmacology · 2025-10-01
articleOpen access
Genome-wide association studies (GWAS) have made substantial contributions to our understanding of the genetic factors that influence alcohol consumption. However, most efforts have been made in populations of European ancestry, resulting in less representation of other populations, with Latin American (LA) populations among the least represented, comprising less than 2% of the total GWAS participants. LA populations are characterized by varying degrees of genetic admixture from Indigenous American, European, and African ancestries, which creates challenges when modeling the genetic architecture of complex traits. However, recently developed GWAS approaches, such as Tractor, that leverage local ancestry information (defined as the genetic ancestry of an individual at a particular genomic location) could help overcome this challenge. This study, led by members of the Latin American Genomics Consortium (LAGC), is a meta-analysis of GWAS studies of self-reported alcohol consumption in 465,516 individuals from cohorts based in Latin American countries and the United States (US). We also conducted a local ancestry-aware GWAS on 11,655 LA individuals using Tractor. We replicated well-known genetic associations for alcohol consumption in genes that include the ADH locus (lead variant rs1229984, p-value = 1.12e-203) and others associated with psychiatric and behavioral traits, such as CADM2. We also identified a signal in the ALDH2 locus, previously associated only in East Asian populations. Using the local ancestry-aware GWAS, we identified associations in genes previously associated with the number of drinks consumed per week (Drkwk) by the GWAS and Sequencing Consortium of Alcohol and Nicotine use consortium (GSCAN), rs1874323 (p-value = 2.5860e-08) in the MAGI1 gene, and rs6833926 (p-value = 3.00e-08) in the ARAP2 gene. We also identified potential novel associations in the SLIT3 gene (rs73805262, p-value = 9.95e-09 and rs115143510, p-value = 1.23e-08), as well as one intergenic variant (rs3929849, p-value = 2.3930e-09) among segments of African-like ancestry. We also identified associations in intergenic regions of American-like descent (rs4130378, p-value = 9.673e-09; rs536315876, p-value = 4.3640e-09; rs115675116, p-value = 4.3190e-09). Nevertheless, given the small sample size in the local ancestry-aware GWAS, these results required further replication. We also compared the association of the polygenic risk score (PRS) derived from the European population using GSCAN (PRS-EUR) data with the PRS from our large-scale meta-analysis (PRS-LA), examining the relationship with drinks consumed per week in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) cohort, identifying a heterogeneity in the transferability of the PRS across geographical LA subgroups. We identified that both the PRS-EUR and PRS-La were associated with Drkwk in individuals from Puerto Rico (PRS-EUR, p-value = 7.40e-03; PRS-LA, p-value = 1.40e-02); meanwhile, only the PRS-EUR was associated in individuals from Mexico (p-value = 7.41e-3) and Cuba (p-value = 1.31e-02), and only the PRS-LA was associated in individuals from South America (p-value = 3.23e-02) Our study contributes to current efforts to elucidate the genetic architecture of alcohol consumption in Latin American populations, implicating novel genes (such as WRN) and revealing varying performance of PRS across different geographical subgroups.
Publisher DOI
Similarities and Differences in Genetics
Advances in experimental medicine and biology · 2025-01-01
review
Publisher DOI
Advancing Human Population Genomics with DNA Foundation Models
Research Square · 2025-11-10
preprintOpen access
Publisher OA PDF DOI
Investigating the Contribution of Coding Variants in Alcohol Use Disorder Using Whole-Exome Sequencing Across Ancestries
Biological Psychiatry · 2025-01-30 · 6 citations
articleOpen accessSenior author
BACKGROUND: Alcohol use disorder (AUD) is a leading cause of death and disability worldwide.There has been substantial progress in identifying genetic variants that underlie AUD.However, whole-exome sequencing studies of AUD have been hampered by the lack of available samples.METHODS: We analyzed whole-exome sequencing data of 4530 samples from the Yale-Penn cohort and 469,835 samples from the UK Biobank, which represent an unprecedented resource for exploring the contribution of coding variants in AUD.After quality control, 1750 African-ancestry (1142 cases) and 2039 European-ancestry (1420 cases) samples from the Yale-Penn and 6142 African-ancestry (130 cases), 415,617 European-ancestry (12,861 cases), and 4607 South Asian (130 cases) samples from the UK Biobank cohorts were included in the analyses.RESULTS: We confirmed the well-known functional variant rs1229984 in ADH1B (p = 4.88 3 10 231 ) and several other variants in ADH1C.Gene-based collapsing tests that considered the high allelic heterogeneity revealed the previously unreported genes CNST (p = 1.19 3 10 26 ), attributable to rare variants with allele frequency , 0.001, and IFIT5 (p = 3.74 3 10 26 ), driven by the burden of both common and rare loss-of-function and missense variants.CONCLUSIONS: This study extends our understanding of the genetic architecture of AUD by providing insights into the contribution of rare coding variants, separately and convergently with common variants in AUD.
Publisher OA PDF DOI
The evolving global burden of ADHD: A comprehensive analysis and future projections (1990-2046)
Journal of Affective Disorders · 2025-08-06 · 5 citations
article
Publisher DOI

Frequent coauthors

Joel Gelernter
300 shared
Hongjie Yu
Zhejiang University of Technology
253 shared
Henry R. Kranzler
Washington University in St. Louis
231 shared
Di Mu
Tongji University
151 shared
Wenwu Yin
Chinese Center For Disease Control and Prevention
139 shared
Shengjie Lai
China Medical University
126 shared
Qiulan Chen
Beijing Anzhen Hospital
121 shared
Rachel L. Kember
118 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Huibin Zhou

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you