
Ziyue Gao
VerifiedUniversity of Pennsylvania · Rehabilitation Medicine
Active 2013–2026
About
Ziyue Gao, PhD, is an Assistant Professor of Genetics at the University of Pennsylvania's Perelman School of Medicine and a member of the Penn Center for Global Genomics and Health Equity. His research focuses on addressing questions in human genetics within an evolutionary context through computational approaches. His work explores the genesis and accumulation of mutations in natural populations, investigating mechanisms and timing of mutagenesis in germline and somatic tissues, as well as the causes and consequences of mutation rate variation. Gao's research integrates data from comparative genomics, human genetics, cancer genetics, and developmental biology. He also studies the genetic basis and evolution of human phenotypes, developing methods to detect and quantify the impacts of natural selection under various evolutionary scenarios. A primary aim of his research is to understand the genetic architecture and evolution of complex traits in human populations by applying mathematical modeling and statistical analysis to genomic data from both modern and ancient samples.
Research topics
- Biology
- Genetics
- Evolutionary biology
- Geography
- Materials science
Selected publications
Methylation-associated mutagenesis underlies variation in the mutation spectrum across eukaryotes
Proceedings of the National Academy of Sciences · 2026-03-13
articleOpen accessMutation spectra vary across genetic and environmental contexts, leading to differences between and within species. Most research on mutation spectrum has focused on trinucleotide (3-mer) mutation types in mammals, limiting the breadth and depth of variation surveyed. In this study, we use whole-genome resequencing data across 108 eukaryotic species-including mammals, fish, plants, and invertebrates-to characterize pentanucleotide (5-mer) noncoding mutation spectra using a Bayesian approach. Our findings reveal cytosine transition mutability at CpG sites and other sources of variation in the transition/transversion ratio as the main drivers of variation in mutation spectra across eukaryotes. We find that inferred CpG mutation rates almost perfectly predict genomic CpG depletion but are not predicted by genome-wide average CpG methylation levels. Together, our results illustrate the pivotal role of mutagenesis in shaping genome composition across eukaryotes and highlight a gap in knowledge about the mechanisms governing mutation rates.
Artificial Intelligence in Early Cancer Detection: Advances, Challenges, and Future Perspectives
Theoretical and Natural Science · 2026-05-11
articleOpen access1st authorCorrespondingCancer remains a major global public health challenge due to its high mortality rate. Early detection and diagnosis are critical for reducing cancer-related deaths, yet conventional screening methods, such as medical imaging and tissue biopsy, have limitations in accuracy, efficiency, and objectivity. Recent advances in Artificial Intelligence (AI), particularly deep learning, have shown strong potential in improving early cancer detection. This review summarizes recent progress and clinical applications of AI in this field. Key developments in deep learning architectures, including Convolutional Neural Networks (CNNs) and Transformer-based models, have enabled improved analysis of medical images, nanosensor signals, and multi-omics data. AI has been increasingly applied in the early detection of common cancers, including lung cancer (CT screening), breast cancer (mammography, ultrasound, and MRI), and colorectal cancer (endoscopy, fecal testing, and circulating tumor DNA analysis). However, challenges such as data heterogeneity, limited interpretability, and regulatory barriers remain. Future advances in multimodal data integration and Explainable AI (XAI) are expected to further enhance diagnostic performance. Overall, AI is likely to serve as an important assistive tool to support more accurate and personalized cancer diagnosis.
eLife Assessment: The genomic legacy of aurochs hybridisation in ancient and modern Iberian cattle
2025-03-19
peer-reviewOpen access1st authorCorrespondingFor over five thousand years, domesticated cows and oxen in the Iberian Peninsula lived alongside their wild counterparts, the aurochs. These large and aggressive animals, from which modern European cattle descends, only went extinct during the 17th century. Genetic evidence points to aurochs and livestock having interbred during their long coexistence; when and how these mixing events took place, however, remains unclear. Details regarding the management of ancient herds are also missing. To address these questions, Günther et al. analysed the DNA extracted from ancient bovine bones sampled at four Iberic archaeological sites. This revealed that wild aurochs and cattle frequently interbred during the last 8,000 years. Mating principally took place between male aurochs and domesticated cows but slowed down after 4,000 years, resulting in modern cattle having inherited about 20% of genes from their wild relatives. This percentage was consistent across various breeds, including one renowned for its aggressivity and which has been selected for centuries for Spanish bullfighting. Additional bone analyses revealed that aurochs and ancient cattle shared comparable diets composed primarily of wild vegetation. Only some domestic animals showed signs of having been fed crops. These findings help us understand how modern cattle breeds came to be. The genes they inherited from aurochs may help them survive harsh environmental conditions, such as extreme heat or diseases. In the future, researchers could use this knowledge to refine breeding programs.
eLife Assessment: The Genomic Legacy of Aurochs hybridization in ancient and modern Iberian Cattle
2025-01-07
peer-reviewOpen access1st authorCorrespondingHuman milk flavor: an overview of odor composition, influencing factors, and flavoromics techniques
Journal of Future Foods · 2025-05-24 · 3 citations
articleOpen access1st authorCorrespondingHuman milk (HM) not only provides nutrients for infants but also produces volatile odors that can be perceived by newborns, influencing their dietary behavior, flavor learning, and food preferences. Meanwhile, the design of infant formula is gradually getting closer to HM in terms of nutrient composition rather than sensory performance. Volatile compounds in HM are produced upon lipid, protein, and carbohydrate degradation and via Maillard reactions. They primarily consist of fatty acids, terpenes, aldehydes, ketones, alcohols, furans, and pyrans. Moreover, the factors influencing HM flavors are critically involved in dietary intake, HM macronutrients, storage temperature, storage time, and sterilization conditions. This review aimed to summarize the formation, composition characteristics, influencing factors, and analytical techniques of HM odor by summarizing existing studies. Relevant conclusions can provide a theoretical basis for future research on the identification, evaluation, and simulation of HM flavor profiles.
Thyroid · 2025-01-27 · 4 citations
articleBackground: Epidemiological data suggest the population distribution of thyrotropin (TSH) values is shifted toward lower values in self-identified Black non-Hispanic individuals compared with self-identified White non-Hispanic individuals. It is unknown whether genetic differences between individuals with genetic similarities to African reference populations (GSA) and those with similarities to European reference populations (GSE) contribute to these observed differences. We aimed to compare genome-wide associations with TSH and putative causal TSH-associated variants between GSA and GSE groups. Methods: We performed genome-wide association studies (GWAS) in 9827 GSA individuals and 9827 GSE individuals with TSH values between 0.45 and 4.5 mU/L. We compared effect sizes and allele frequencies of previously reported putative causal TSH-associated variants and our power to detect associations with these variants between the two groups. We additionally focused on variants in PDE8B and PDE10A , loci that have been most strongly associated with TSH in previous GWAS in GSE populations. Results: Four loci attained genome-wide significance in the GSA group compared with seven in the GSE group. PDE8B was not significantly associated with TSH in the GSA group, despite its strong association in the GSE group. Eight putative causal variants had significantly different effect sizes between groups. There was ≥80% power in the GSA group to detect significant associations with variants in PDE8B , PDE10A , NFIA , and LOC105377480 , with higher expected power than in the GSE group for variants in PDE8B , NFIA , and LOC105377480 and similar power for other variants in PDE8B and PDE10A. No additional putative causal variants in PDE8B and PDE10A had effect sizes that differed significantly between the groups; power to identify associations with additional putative causal variants in PDE8B and PDE10A was similar between the groups. Conclusions: Patterns of genetic associations with TSH differed between identically sized GSA and GSE groups. Failure to replicate the strongest associations previously reported in GSE individuals in our GSA population was not fully explained by differences in allele frequencies or power, assuming similar effect sizes. Larger GSA population GWAS are necessary to confirm our findings and further investigate the contribution of genetic factors to population differences in the distribution of TSH values.
Methylation-associated mutagenesis underlies variation in the mutation spectrum across eukaryotes
bioRxiv (Cold Spring Harbor Laboratory) · 2025-05-30 · 2 citations
preprintOpen accessMutation spectra vary across genetic and environmental contexts, leading to differences between and within species. Most research on mutation spectrum has focused on the trinucleotide (3-mer) mutation types in mammals, limiting the breadth and depth of variation surveyed. In this study, we use whole-genome resequencing data across 108 eukaryotic species - including mammals, fish, plants, and invertebrates - to characterize pentanucleotide (5-mer) non-coding mutation spectra using a Bayesian approach. Our findings reveal cytosine transition mutability at CpG and (among plants) at CHG sites as the main drivers of variation in mutation spectra across eukaryotes, correlating strongly with genomic CpG and CHG depletion. However, despite the influence of methylation on CpG mutability, genome-wide average CpG methylation levels do not predict CpG transition rates across species and CHG methylation does not predict CHG transition rate, indicating unknown genetic or environmental factors influencing mutation rates at methylated cytosines. Together, our results illustrate the pivotal role of mutagenesis in shaping genome composition across eukaryotes and highlight a gap in knowledge about the mechanisms governing mutation rates.
Sequence context and methylation interact to shape germline mutation rate variation at CpG sites
bioRxiv (Cold Spring Harbor Laboratory) · 2025-11-13 · 1 citations
preprintOpen accessSenior authorCorrespondingA prominent example of sequence context-dependent mutation rate variation is the elevated transition rate at CpG sites, which is largely attributed to cytosine methylation. CpGs with different flanking sequences also exhibit mutation rate variation, but this variation is only partially correlated with context-specific methylation level. Here, we quantify the CpG mutation rate and mutagenic effect of methylation across sequence contexts. Using a regression framework that accounts for recurrent mutations, we analyze human polymorphisms from the gnomAD dataset to estimate mutation rates of unmethylated and methylated CpGs separately in each unique 4-mer or 6-mer context. We find that CpG mutation rate variation in the human genome is shaped by methylation at the focal cytosine, the flanking nucleotides, and interactions between them, suggesting distinct context-dependent mutation patterns for unmethylated and methylated cytosines. Our analysis further reveals that the context effects are driven by largely independent effects of upstream and downstream sequences. Notably, an upstream adenine markedly increases CpG mutation rates regardless of methylation status or downstream sequences. Furthermore, upstream and downstream sequences have similar effects in chimpanzee and rhesus macaque, indicating that some conserved, intrinsic sequence features shape CpG mutability. On the other hand, some inter-species differences, which are especially pronounced at methylated sites on the chimpanzee lineage, point to recent evolutionary changes, possibly in context-specificity of proteins governing DNA demethylation and repair processes. Author Summary: The DNA sequence surrounding a nucleotide strongly influences how likely it is to mutate. An extreme example is the CpG dinucleotide: cytosines in CpGs mutate far more frequently than other sites in the human genome. This is related to DNA methylation, a chemical modification that occurs almost exclusively at CpGs in vertebrates and makes cytosines more prone to mutations. However, CpGs in different sequence contexts also vary in their mutation rates, and methylation level alone cannot explain this variation. To gain insight into what processes drive this variation, we estimate mutation rates for methylated and unmethylated CpGs in different sequence contexts using human genetic variation data. We find that methylation and neighboring bases interact to influence CpG mutation rates, and that the DNA sequence on either side of the CpG exerts largely independent effects. Extending our analysis to other primates reveals both conserved and species-specific patterns, with differences being especially pronounced at methylated sites on the chimpanzee lineage. Together, our results suggest that while intrinsic DNA sequence features underlie some conserved context effects on CpG mutation rate, inter-species differences may reflect recent evolutionary changes in the mechanisms that regulate DNA demethylation and repair.
Fetal and parental genomes offer mechanistic insights into pregnancy loss
Nature · 2025-06-17
article1st authorCorresponding3d-Printed Modular Platform for Programmable Liquids Transport
SSRN Electronic Journal · 2024-01-01
preprintOpen access1st authorCorresponding
Frequent coauthors
- 49 shared
Jonathan K. Pritchard
Stanford University
- 30 shared
Molly Przeworski
Columbia University
- 19 shared
Alfredo Coppa
Utrecht University
- 16 shared
Philippe Froguel
Centre Hospitalier Universitaire de Lille
- 14 shared
Loïc Yengo
University of Queensland
- 13 shared
Daniel Fernandes
Universidade Federal do Rio Grande do Norte
- 13 shared
Yahia Mehdi Seddik Cherifi
University of Algiers Benyoucef Benkhedda
- 12 shared
Alexander Marson
University of California, San Francisco
Education
- 2015
PhD of genetics
The University of Chicago
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ziyue Gao
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup