Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Bernard Kim

Bernard Kim

· Assistant Professor | EEBVerified

Princeton University · Ecology and Evolutionary Biology

Active 1995–2025

h-index25
Citations3.1k
Papers10262 last 5y
Funding$131k
See your match with Bernard Kim — sign in to PhdFit.Sign in

Research topics

  • Genetics
  • Biology
  • Evolutionary biology
  • Computer Science
  • Artificial Intelligence
  • Data science
  • Computational biology
  • Ecology
  • Demography
  • Programming language

Selected publications

  • Double trouble: two retrotransposons triggered a cascade of invasions in Drosophila species within the last 50 years

    Nature Communications · 2025-01-09 · 9 citations

    articleOpen access

    Horizontal transfer of genetic material in eukaryotes has rarely been documented over short evolutionary timescales. Here, we show that two retrotransposons, Shellder and Spoink, invaded the genomes of multiple species of the melanogaster subgroup within the last 50 years. Through horizontal transfer, Spoink spread in D. melanogaster during the 1980s, while both Shellder and Spoink invaded D. simulans in the 1990s. Possibly following hybridization, D. simulans infected the island endemic species D. mauritiana (Mauritius) and D. sechellia (Seychelles) with both TEs after 1995. In the same approximate time-frame, Shellder also invaded D. teissieri, a species confined to sub-Saharan Africa. We find that the donors of Shellder and Spoink are likely American Drosophila species from the willistoni, cardini, and repleta groups. Thus, the described cascade of TE invasions could only become feasible after D. melanogaster and D. simulans extended their distributions into the Americas 200 years ago, likely aided by human activity. Our work reveals that cascades of TE invasions, likely initiated by human-mediated range expansions, could have an impact on the genomic and phenotypic evolution of geographically dispersed species. Within a few decades, TEs could invade many species, including island endemics, with distributions very distant from the donor of the TE.

  • Highly contiguous assemblies of 101 drosophilid genomes

    UNC Libraries · 2025-10-11

    articleOpen access

    Over 100 years of studies in <em>Drosophila melanogaster</em> and related species in the genus <em>Drosophila</em> have facilitated key discoveries in genetics, genomics, and evolution. While high-quality genome assemblies exist for several species in this group, they only encompass a small fraction of the genus. Recent advances in long-read sequencing allow high-quality genome assemblies for tens or even hundreds of species to be efficiently generated. Here, we utilize Oxford Nanopore sequencing to build an open community resource of genome assemblies for 101 lines of 93 drosophilid species encompassing 14 species groups and 35 sub-groups. The genomes are highly contiguous and complete, with an average contig N50 of 10.5 Mb and greater than 97% BUSCO completeness in 97/101 assemblies. We show that Nanopore-based assemblies are highly accurate in coding regions, particularly with respect to coding insertions and deletions. These assemblies, along with a detailed laboratory protocol and assembly pipelines, are released as a public resource and will serve as a starting point for addressing broad questions of genetics, ecology, and evolution at the scale of hundreds of species.

  • Strong bias in long-read sequencing prevents assembly of <i>Drosophila melanogaster</i> Y-linked genes

    Genome Research · 2025-10-01 · 1 citations

    articleOpen access

    Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are generally considered free from sequence composition bias, a key factor, alongside read length, that explains their success in producing high-quality genome assemblies. Indeed, there had been very few reports of bias, the clearest one against GA-rich repeats in the human genome. However, our study reveals a systematic failure of both technologies to sequence and assemble specific exons of Drosophila melanogaster genes, indicating an overlooked limitation. Namely, multiple Y-linked exons are nearly or completely absent from raw reads produced by deep sequencing with state-of-the-art Nanopore (10.4 flow cells, 200× coverage) and PacBio (HiFi 50×). The same exons are accurately assembled using Illumina 67× coverage. We find that these missing exons are consistently located near simple satellite sequences, in which sequencing fails at multiple levels: read initiation (very few reads start within satellite regions), read elongation (satellite-containing reads are shorter on average), and basecalling (quality scores drop as sequencing enters a satellite sequence). These findings challenge the assumption that long-read technologies are unbiased and reveal a critical barrier to assembling sequences near repetitive regions. As large-scale sequencing projects move toward telomere-to-telomere assemblies in a wide range of organisms, recognizing and addressing these biases will be important to achieving truly complete and accurate genomes. Additionally, the underrepresented Y-linked exons provide a valuable benchmark for refining those sequencing technologies while improving the assembly of the highly heterochromatic and often neglected Drosophila Y Chromosome.

  • IMPACT OF PULMONARY HYPERTENSION ON CLINICAL OUTCOMES AMONG PATIENTS UNDERGOING TRANSCATHETER MITRAL VALVE EDGE TO EDGE REPAIR

    Journal of the American College of Cardiology · 2025-03-29

    articleOpen access
  • Manual validation finds only ultra-long long-read sequencing enables faithful, population-level structural variant calling in <i>Drosophila melanogaster</i> euchromatin

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-04-25

    preprintOpen access

    Abstract The increasing accessibility of long-read sequencing and the rapid development of automated variant callers are promoting the generation of population-level structural variation data. However, the effect of the length of long-reads on automated variant callers is not well understood, especially for non-human species. Here we show that only ultra-long long-reads, with read N50s greater than 50kb, are capable of accurately calling structural variants of any size in Drosophila melanogaster euchromatin. We used Oxford Nanopore Technologies to long-read sequence eight, inbred D. melanogaster strains to extremely high coverage (mean 238 × ), and we then downsampled the reads to create read pools of different length distributions. We assembled genomes from these different read-length pools and used both read-based and assembly-based structural variant callers to call variants in each strain before merging the calls into population-level datasets. We manually validated over 2,300 putative structural variants to assess the accuracy of the variant calls across the different read-length distributions and to determine the cause and rates of false positive errors. We found that more than half of all structural-variant-calling errors stem from misaligned reads that contain mobile elements or are located in repetitive and complex regions. Overall, our results show that long reads need to be at least three times longer than the repetitive and mobile elements found in the genome in order to accurately call structural variants at the population level.

  • Predicting the functional impact of single nucleotide variants in <i>Drosophila melanogaster</i> with FlyCADD

    Genetics · 2025-11-21

    articleOpen access

    Understanding how genetic variants drive phenotypic differences is a major challenge in molecular biology. Single nucleotide polymorphisms form the vast majority of genetic variation and play critical roles in complex, polygenic phenotypes, yet their functional impact is poorly understood from traditional gene-level analyses. In-depth knowledge about the impact of single nucleotide polymorphisms has broad applications in health and disease, population genomic, and evolution studies. The wealth of genomic data and available functional genetic tools make Drosophila melanogaster an ideal model species for studies at single nucleotide resolution. However, to leverage these resources for genotype-phenotype research and potentially combine it with the power of functional genetics, it is essential to develop techniques to predict functional impact and causality of single nucleotide variants. Here, we present FlyCADD, a functional impact prediction tool for single nucleotide variants in D. melanogaster. FlyCADD, based on the Combined Annotation-Dependent Depletion (CADD) framework, integrates over 650 genomic features-including conservation scores, GC content, and DNA secondary structure-into a single metric reflecting a variant's predicted impact on evolutionary fitness. FlyCADD provides impact prediction scores for any single nucleotide variant on the D. melanogaster genome. We demonstrate the power of FlyCADD for typical applications, such as the ranking of phenotype-associated variants to prioritize variants for follow-up studies, evaluation of naturally occurring polymorphisms, and refining of CRISPR-Cas9 experimental design. FlyCADD provides a powerful framework for interpreting the functional impact of any single nucleotide variant in D. melanogaster, thereby improving our understanding of genotype-phenotype connections.

  • Evolutionary adaptation under climate change: <i>Aedes</i> sp. demonstrates potential to adapt to warming

    Proceedings of the National Academy of Sciences · 2025-01-07 · 22 citations

    articleOpen access

    Climate warming is expected to shift the distributions of mosquitoes and mosquito-borne diseases, promoting expansions at cool range edges and contractions at warm range edges. However, whether mosquito populations could maintain their warm edges through evolutionary adaptation remains unknown. Here, we investigate the potential for thermal adaptation in Aedes sierrensis , a congener of the major disease vector species that experiences large thermal gradients in its native range, by assaying tolerance to prolonged and acute heat exposure, and its genetic basis in a diverse, field-derived population. We found pervasive evidence of heritable genetic variation in mosquito heat tolerance, and phenotypic trade-offs in tolerance to prolonged versus acute heat exposure. Further, we found genomic variation associated with prolonged heat tolerance was clustered in several regions of the genome, suggesting the presence of larger structural variants such as chromosomal inversions. A simple evolutionary model based on our data estimates that the maximum rate of evolutionary adaptation in mosquito heat tolerance will exceed the projected rate of climate warming, implying the potential for mosquitoes to track warming via genetic adaptation.

  • Strong sequencing bias in Nanopore and PacBio prevents assembly of <i>Drosophila melanogaster</i> Y-linked genes

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-03-01 · 4 citations

    preprintOpen access

    Abstract Nanopore and PacBio are generally considered free from sequence composition bias, a key factor – alongside read length – that explains their success in producing high quality genome assemblies. However, our study reveals a systematic failure of both technologies to sequence and assemble specific exons of Drosophila melanogaster genes, indicating an overlooked limitation. Namely, multiple Y-linked exons are nearly or completely absent from raw reads produced by deep sequencing with state-of-the-art Nanopore (10.4 flow cells, 200× coverage) and PacBio (HiFi 50×). The same exons are accurately assembled using Illumina 65× coverage. We found that these missing exons are consistently located near simple satellite sequences, where sequencing fails at multiple levels: read initiation (very few reads start within satellite regions), read elongation (satellite-containing reads are shorter on average), and base-calling (quality scores drop as sequencing enters a satellite sequence). These findings challenge the assumption that long-read technologies is unbiased and reveal a critical barrier to assembling sequences near repetitive regions. As large-scale sequencing projects move towards telomere-to-telomere assemblies in a wide range of organisms, recognizing and addressing these biases will be important to achieving truly complete and accurate genomes. Additionally, the underrepresented Y-linked exons provides a valuable benchmark for refining those sequencing technologies while improving the assembly of the highly heterochromatic and often neglected Drosophila Y chromosome.

  • Comparative gene annotation and orthology assignments across 301 species of Drosophilidae

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-04-15 · 4 citations

    preprintOpen access

    High-quality genome annotations are essential if we are to address central questions in comparative genomics, such as the origin of new genes, the drivers of genome size variation, and the evolutionary forces shaping gene content and structure. Here, we present protein-coding gene annotations for 301 species of the family Drosophilidae, generated using the Comparative Annotation Toolkit (CAT) and BRAKER3, and incorporating available RNA-seq and protein evidence. We take a comparative phylogenetic approach to annotation, with the aim of improving consistency and accuracy, and to generate a robust set of gene annotations and orthology assignments. We analyze our annotations using a phylogenetic mixed-model approach and find that gene number and CDS length exhibit moderate phylogenetic heritability (40% and 9.7%, respectively). For comparison, we also present analyses using a subset of the 215 highest quality genomes, although the findings were not markedly different. Our work suggests that while evolutionary history contributes to variation in these traits, species-specific factors-including assembly error-play a substantial role in shaping observed differences. To illustrate the utility of our annotations for comparative analyses, we investigate codon usage bias and amino acid composition across Drosophilidae. We find that codon usage is correlated with overall GC content and evolves slowly, but that it is also strongly shaped by selection-such that, in general, species with the strongest selection on synonymous codon usage show the lowest GC bias in third codon positions. This comparative annotation dataset forms part of an on-going collaborative project to sequence and annotate all species of Drosophilidae, with data and annotations being made rapidly and freely available on an on-going basis. We hope that this effort will serve as a foundation for studies in evolutionary and functional genomics and comparative biology across Drosophilidae.

  • Predicting the functional impact of single nucleotide variants in <i>Drosophila melanogaster</i> with FlyCADD

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-03-06

    preprintOpen access

    Abstract Understanding how genetic variants drive phenotypic differences is a major challenge in molecular biology. Single nucleotide polymorphisms form the vast majority of genetic variation and play critical roles in complex, polygenic phenotypes, yet their functional impact is poorly understood from traditional gene-level analyses. In-depth knowledge about the impact of single nucleotide polymorphisms has broad applications in health and disease, population genomic and evolution studies. The wealth of genomic data and available functional genetic tools make Drosophila melanogaster an ideal model species for studies at single nucleotide resolution. However, to leverage these resources for genotype-phenotype research and potentially combine it with the power of functional genetics, it is essential to develop techniques to predict functional impact and causality of single nucleotide variants. Here, we present FlyCADD, a functional impact prediction tool for single nucleotide variants in D. melanogaster . FlyCADD, based on the Combined Annotation-Dependent Depletion (CADD) framework, integrates over 650 genomic features - including conservation scores, GC content, and DNA secondary structure - into a single metric reflecting a variant’s predicted impact on evolutionary fitness. FlyCADD provides impact prediction scores for any single nucleotide variant on the D. melanogaster genome. We demonstrate the power of FlyCADD for typical applications, such as the ranking of phenotype-associated variants to prioritize variants for follow-up studies, evaluation of naturally occurring polymorphisms, and refining of CRISPR-Cas9 experimental design. FlyCADD provides a powerful framework for interpreting the functional impact of any single nucleotide variant in D. melanogaster , thereby improving our understanding of genotype-phenotype connections. Article summary Single nucleotide polymorphisms (SNPs), the most common form of genomic variation, drive micro-evolution and adaptation. In Drosophila melanogaster , many SNPs are associated with phenotypes, yet functional validation is rare and experimentally challenging. FlyCADD is a new impact prediction tool that integrates D. melanogaster genome annotations into a single score predicting SNP impact. FlyCADD can be applied to distinguish causal from neutral variants, prioritize variants prior to functional studies, and to interpret natural variation, thereby improving understanding of genotype-phenotype relationships.

Recent grants

Frequent coauthors

Education

  • PhD, Ecology and Evolutionary Biology

    UCLA Life Sciences

    2018
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Bernard Kim

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup