
Eleazar Eskin
· ProfessorVerifiedUniversity of California, Los Angeles · Computer Science
Active 1980–2026
About
Eleazar Eskin is a Professor of Computer Science at UCLA Samueli School of Engineering, serving as the Department Chair of Computational Medicine. His research interests include bioinformatics, genetics, genomics, and machine learning. Eskin has contributed to the development of innovative COVID-19 testing technologies, including a $10 test capable of processing thousands of results in a day, which received FDA authorization. He earned his PhD from Columbia University in 2002 and has been recognized with awards such as the Okawa Foundation Research Award in 2008 and a Sloan Research Fellowship in 2009. Eskin's work focuses on applying computational methods to biological and medical challenges, advancing the fields of genomics and personalized medicine.
Research topics
- Genetics
- Biology
- Computer Science
- Computational biology
- Data science
- Medicine
- Machine Learning
- Artificial Intelligence
- Algorithm
- Evolutionary biology
- Cell biology
- Programming language
- Ecology
- Pathology
- Internal medicine
Selected publications
bioRxiv (Cold Spring Harbor Laboratory) · 2026-04-20
articleOpen accessSequencing the respiratory tract transcriptome has the potential to provide insights into infectious pathogens and the host's immune response. While DNA-based sequencing is more standard in clinical laboratories due to its stability, RNA assays offer unique advantages. RNA reflects dynamic physiological changes, and for RNA viruses, viral RNA particles directly represent copies of the viral genome, enabling greater diagnostic sensitivity. However, RNA's susceptibility to degradation remains a significant challenge, particularly in RNase-rich specimens like saliva. To address this, we conducted a systematic, combinatorial evaluation of 24 distinct mNGS workflows, crossing eight nucleic acid extraction methods with three RNA-Seq library preparation protocols. Remnant saliva samples (n = 6) were pooled and spiked with MS2 phage as a control. The SARS-CoV-2 virus was spiked into half of the samples, which were extracted using the eight different extraction methods (n = 3) and compared using RNA Integrity Number equivalent (RINe) scores and RNA concentration. The extracted RNA was then processed across the three library construction methods and subjected to short-read sequencing to assess all 24 combinations head-to-head. We compared methods based on viral read recovery and found that RINe and concentration did not correlate with viral detection. The Zymo Quick-RNA Magbead kit and the Tecan Revelo RNA-Seq High-Sensitivity RNA library kit were the extraction and library-preparation kits that yielded the most SARS-CoV-2 reads, respectively. Importantly, our combinatorial analysis revealed that any small variability attributable to different nucleic acid extraction methods was heavily overshadowed by differences in quality attributable to the RNA-Seq library preparation methods. These findings challenge the reliance on conventional RNA quality metrics for clinical metagenomics and underscore the need to redefine extraction quality standards for mNGS applications. IMPORTANCE: mNGS is a powerful and unbiased approach towards pathogen detection that has mostly been applied to blood and cerebrospinal fluid samples. However mNGS has recently been applied to more areas including the respiratory pathogen detection space, with potential applications in both in-patient diagnostics and public health surveillance. Saliva samples are an ideal sample type for these use cases since they can be collected non-invasively. However, saliva is also a challenging sample type due to its high RNase activity and often yields low-quality nucleic acid. This study explores the feasibility of using saliva specimens in mNGS with contrived SARS-CoV-2 samples to optimize the combination of two factors: nucleic acid extraction and RNA-seq library preparation. Exploration in this area could enhance the sensitivity of saliva-based mNGS assays, with the goal of future expansion of this specimen type in clinical diagnostics and public health surveillance. Key Points: The choice of RNA-Seq library preparation kit has a greater impact on pathogen detection than the nucleic acid extraction method.The combination of Zymo Quick-RNA Magbead extraction kit and TECAN Revelo RNA-Seq High Sensitivity RNA library kit recovered the highest percentage of total SARS-CoV-2 reads.RNA quantity and RINe score do not correlate with viral read capture, indicating a need for an alternative metric to assess RNA quality for downstream mNGS clinical diagnostics.
DRYAD · 2026-01-31
datasetOpen accessThe human basal ganglia (BG), subcortical nuclei fundamental to motor regulation and cognitive modulation, is constructed from neurons produced during gestation in the adjacent ganglionic eminences (GEs). GEs are transient structures in the ventral prenatal brain that also generate GABAergic inhibitory neurons, which migrate to destinations in the BG, cortex, and other destinations. This study aims to elucidate the epigenomic and 3D-genomic dynamics involved in the specification and maturation of GEs and GE-derived neurons, using single-nucleus methyl-3C sequencing (snm3C-seq), highly-multiplexed spatial transcriptomics, and chromatin+RNA single-molecule imaging. Our multi-modal data support a heterogeneous temporal progression across GE subregions, with the lateral GE (LGE) showing declining neurogenic activity in mid-gestation and caudal GE (CGE) exhibiting ongoing developmental progression through infancy. We identified regulatory programs that specify subtypes of BG principal cells, medium spiny neurons (MSN), via synchronized maturation of the 3D-epigenome. In infant brains, we found a transient short-range enriched (SE) chromatin conformation during the transition between oligodendrocyte progenitors (OPCs) and oligodendrocytes (ODCs), and a temporary shift toward Long-range Enriched (LE) chromatin conformation in projection neurons, extending previous works showing the differentiation of neurons and glial cells is associated with permanent SE and LE conformation, respectively. Lastly, we found that gene regulatory regions active in MSNs were enriched in loci associated with genetic risk for neuropsychiatric disease. Our study delineates the highly complex, lineage-specific 3D genomic dynamics in ventral progenitors and basal ganglia populations of the perinatal human brain.
bioRxiv (Cold Spring Harbor Laboratory) · 2025-06-21 · 2 citations
preprintOpen accessAutism spectrum disorder (ASD) is a common, genetically and clinically heterogeneous neurodevelopmental condition. Despite this diversity, studies of postmortem brain tissue have revealed convergent molecular changes across the cortex, including reduced synaptic function in subsets of excitatory and inhibitory neurons and increased glial reactivity. Whether these features are reflected in cell type-specific epigenetic signatures remains unknown. Here, we present the first single-cell analysis of DNA methylation (DNAm) coupled with transcriptomics in ASD. Using snmCT- seq, we profiled DNAm and gene expression from over 60,000 nuclei across 49 donors. We identified thousands of differentially methylated regions (DMRs) in ASD, enriched in promoters and regulatory elements active during both prenatal development and in adult cortex. ASD-related methylation changes were spatially localized but uncorrelated with gene expression, and were small in magnitude compared to robust age-associated effects. Age-DMRs were concentrated in excitatory neurons, enriched in known cognitive aging pathways, and revealed distinct roles for CG and non- CG methylation in the aging brain. Finally, we explored age-by-diagnosis interactions, identifying a reduction in inhibitory neuron abundance with age in ASD relative to controls, highlighting this area as a promising direction for future research. Highlights: We generate a single cell multi-omic dataset, jointly profiling DNA methylation and gene expression in autistic and neurotypical donorsWe identify thousands of cell type informed differentially methylated regions (DMRs) in ASD, particularly in excitatory neurons from superficial cortical lamina and microgliaASD-DMRs are enriched in promoters and known regulatory regions, but not strongly tied to gene expressionAge effects on DNA methylation are profound, cell type specific, and concentrated in excitatory neurons.
Differing Genetics of Saline and Cocaine Self‐Administration in the Hybrid Mouse Diversity Panel
Genes Brain & Behavior · 2025-06-01
articleOpen accessTo identify genes that regulate the response to the potentially addictive drug cocaine, we performed a control experiment using genome-wide association studies (GWASs) and RNA-Seq of a panel of inbred and recombinant inbred mice undergoing intravenous self-administration of saline. A linear mixed model increased statistical power for the analysis of the longitudinal behavioral data, which was acquired over 10 days. A total of 145 loci were identified for saline compared to 17 for the corresponding cocaine GWAS. Only one locus overlapped. Transcriptome-wide association studies (TWASs) using RNA-Seq data from the nucleus accumbens and medial frontal cortex identified 5031434O11Rik and Zfp60 as significant for saline self-administration. Two other genes, Myh4 and Npc1, were nominated based on proximity to loci for multiple endpoints or a cis locus regulating expression. All four genes have previously been implicated in locomotor activity, despite the absence of a strong relationship between saline taking and distance traveled in the open field. Our results indicate a distinct genetic basis for saline and cocaine self-administration, and suggest some common genes for saline self-administration and locomotor activity.
Perceptual and technical barriers in sharing and formatting metadata accompanying omics studies
Cell Genomics · 2025-04-10 · 15 citations
reviewOpen accessMetadata, or "data about data," is essential for organizing, understanding, and managing large-scale omics datasets. It enhances data discovery, integration, and interpretation, enabling reproducibility, reusability, and secondary analysis. However, metadata sharing remains hindered by perceptual and technical barriers, including the lack of uniform standards, privacy concerns, study design limitations, insufficient incentives, inadequate infrastructure, and a shortage of trained personnel. These challenges compromise data reliability and obstruct integrative meta-analyses. Addressing these issues requires standardization, education, stronger roles for journals and funding agencies, and improved incentives and infrastructure. Looking ahead, emerging technologies such as artificial intelligence and machine learning may offer promising solutions to automate metadata processes, increasing accuracy and scalability. Fostering a collaborative culture of metadata sharing will maximize the value of omics data, accelerating innovation and scientific discovery.
Packaging and containerization of computational methods
Nature Protocols · 2024-04-02 · 23 citations
reviewVISTA: an integrated framework for structural variant discovery
Briefings in Bioinformatics · 2024-07-25 · 3 citations
articleOpen accessStructural variation (SV) refers to insertions, deletions, inversions, and duplications in human genomes. SVs are present in approximately 1.5% of the human genome. Still, this small subset of genetic variation has been implicated in the pathogenesis of psoriasis, Crohn's disease and other autoimmune disorders, autism spectrum and other neurodevelopmental disorders, and schizophrenia. Since identifying structural variants is an important problem in genetics, several specialized computational techniques have been developed to detect structural variants directly from sequencing data. With advances in whole-genome sequencing (WGS) technologies, a plethora of SV detection methods have been developed. However, dissecting SVs from WGS data remains a challenge, with the majority of SV detection methods prone to a high false-positive rate, and no existing method able to precisely detect a full range of SVs present in a sample. Previous studies have shown that none of the existing SV callers can maintain high accuracy across various SV lengths and genomic coverages. Here, we report an integrated structural variant calling framework, Variant Identification and Structural Variant Analysis (VISTA), that leverages the results of individual callers using a novel and robust filtering and merging algorithm. In contrast to existing consensus-based tools which ignore the length and coverage, VISTA overcomes this limitation by executing various combinations of top-performing callers based on variant length and genomic coverage to generate SV events with high accuracy. We evaluated the performance of VISTA on comprehensive gold-standard datasets across varying organisms and coverage. We benchmarked VISTA using the Genome-in-a-Bottle gold standard SV set, haplotype-resolved de novo assemblies from the Human Pangenome Reference Consortium, along with an in-house polymerase chain reaction (PCR)-validated mouse gold standard set. VISTA maintained the highest F1 score among top consensus-based tools measured using a comprehensive gold standard across both mouse and human genomes. VISTA also has an optimized mode, where the calls can be optimized for precision or recall. VISTA-optimized can attain 100% precision and the highest sensitivity among other variant callers. In conclusion, VISTA represents a significant advancement in structural variant calling, offering a robust and accurate framework that outperforms existing consensus-based tools and sets a new standard for SV detection in genomic research.
Multi-class Modeling Identifies Shared Genetic Risk for Late-onset Epilepsy and Alzheimer’s Disease
medRxiv · 2024-02-06 · 2 citations
preprintOpen accessAbstract Background Previous studies have established a strong link between late-onset epilepsy (LOE) and Alzheimer’s disease (AD). However, their shared genetic risk beyond the APOE gene remains unclear. Our study sought to examine the shared genetic factors of AD and LOE, interpret the biological pathways involved, and evaluate how AD onset may be mediated by LOE and shared genetic risks. Methods We defined phenotypes using phecodes mapped from diagnosis codes, with patients’ records aged 60-90. A two-step Least Absolute Shrinkage and Selection Operator (LASSO) workflow was used to identify shared genetic variants based on prior AD GWAS integrated with functional genomic data. We calculated an AD-LOE shared risk score and used it as a proxy in a causal mediation analysis. We used electronic health records from an academic health center (UCLA Health) for discovery analyses and validated our findings in a multi-institutional EHR database (All of Us). Results The two-step LASSO method identified 34 shared genetic loci between AD and LOE, including the APOE region. These loci were mapped to 65 genes, which showed enrichment in molecular functions and pathways such as tau protein binding and lipoprotein metabolism. Individuals with high predicted shared risk scores have a higher risk of developing AD, LOE, or both in their later life compared to those with low-risk scores. LOE partially mediates the effect of AD-LOE shared genetic risk on AD (15% proportion mediated on average). Validation results from All of Us were consistent with findings from the UCLA sample. Conclusions We employed a machine learning approach to identify shared genetic risks of AD and LOE. In addition to providing substantial evidence for the significant contribution of the APOE-TOMM40-APOC1 gene cluster to shared risk, we uncovered novel genes that may contribute. Our study is one of the first to utilize All of Us genetic data to investigate AD, and provides valuable insights into the potential common and disease-specific mechanisms underlying AD and LOE, which could have profound implications for the future of disease prevention and the development of targeted treatment strategies to combat the co-occurrence of these two diseases.
ESC Heart Failure · 2024-04-18 · 16 citations
reviewOpen accessExisting risk prediction models for hospitalized heart failure patients are limited. We identified patients hospitalized with a diagnosis of heart failure between 7 May 2013 and 26 April 2022 from a large academic, quaternary care medical centre (training cohort). Demographics, medical comorbidities, vitals, and labs were collected and were used to construct random forest machine learning models to predict in-hospital mortality. Models were compared with logistic regression, and to commonly used heart failure risk scores. The models were subsequently validated in patients hospitalized with a diagnosis of heart failure from a second academic, community medical centre (validation cohort). The entire cohort comprised 21 802 patients, of which 14 539 were in the training cohort and 7263 were in the validation cohort. The median age (25th-75th percentile) was 70 (58-82) for the entire cohort, 43.2% were female, and 6.7% experienced inpatient mortality. In the overall cohort, 7621 (35.0%) patients had heart failure with reduced ejection fraction (EF ≤ 40%), 1271 (5.8%) had heart failure with mildly reduced EF (EF 41-49%), and 12 910 (59.2%) had heart failure with preserved EF (EF ≥ 50%). Random forest models in the validation cohort demonstrated a c-statistic (95% confidence interval) of 0.96 (0.95-0.97), sensitivity (SN) of 87.3%, and specificity (SP) of 90.6% for the prediction of in-hospital mortality. Models for those with HFrEF demonstrated a c-statistic of 0.96 (0.94-0.98), SN 88.2%, and SP 91.0%, and those for patients with HFpEF showed a c-statistic of 0.95 (0.93-0.97), SN 87.4%, and SP 89.5% for predicting in-hospital mortality. The random forest model significantly outperformed logistic regression (c-statistic 0.87, SN 75.9%, and SP 86.9%), and current existing risk scores including the Acute Decompensated Heart Failure National Registry risk score (c-statistic of 0.70, SN 69%, and SP 62%), and the Get With the Guidelines-Heart Failure risk score (c-statistic 0.69, SN 67%, and SP 63%); P < 0.001 for comparison. Machine learning models built from commonly recorded patient information can accurately predict in-hospital mortality among patients hospitalized with a diagnosis of heart failure.
BioLLMBench: A Comprehensive Benchmarking of Large Language Models in Bioinformatics
Research Square · 2024-01-16
preprintOpen accessAbstract Large Language Models (LLMs) have shown great promise in their knowledge integration and problem-solving capabilities, but their ability to assist in bioinformatics research has not been systematically evaluated. To bridge this gap, we present BioLLMBench, a novel benchmarking framework coupled with a scoring metric scheme for comprehensively evaluating LLMs in solving bioinformatics tasks. Through BioLLMBench, we conducted a thorough evaluation of 2,160 experimental runs of the three most widely used models, GPT-4, Bard and LLaMA, focusing on 36 distinct tasks within the field of bioinformatics. The tasks come from six key areas of emphasis within bioinformatics that directly relate to the daily challenges and tasks faced by individuals within the field. These areas are domain expertise, mathematical problem-solving, coding proficiency, data visualization, summarizing research papers, and developing machine learning models. The tasks also span across varying levels of complexity, ranging from fundamental concepts to expert-level challenges. Each key area was evaluated using seven specifically designed task metrics, which were then used to conduct an overall evaluation of the LLM’s response. To enhance our understanding of model responses under varying conditions, we implemented a Contextual Response Variability Analysis. Our results reveal a diverse spectrum of model performance, with GPT-4 leading in all tasks except mathematical problem solving. GPT4 was able to achieve an overall proficiency score of 91.3% in domain knowledge tasks, while Bard excelled in mathematical problem-solving with a 97.5% success rate. While GPT-4 outperformed in machine learning model development tasks with an average accuracy of 65.32%, both Bard and LLaMA were unable to generate executable end-to-end code. All models faced considerable challenges in research paper summarization, with none of them exceeding a 40% score in our evaluation using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score, highlighting a significant area for future improvement. We observed an increase in model performance variance when using a new chatting window compared to using the same chat, although the average scores between the two contextual environments remained similar. Lastly, we discuss various limitations of these models and acknowledge the risks associated with their potential misuse.
Recent grants
Undergraduate Research Experience in Neuropsychiatric Genomics
NIH · $429k · 2016–2021
Collaborative Research: Design and Analysis of Compressed Sensing DNA Microarrays
NSF · $300k · 2007–2011
III: Medium: Meta-analysis reinterpreted using causal graphs
NSF · $1.1M · 2013–2019
III: Small: Inference of Causal Regulatory Relationships from Genetic Studies
NSF · $499k · 2009–2013
NSF · $700k · 2011–2017
Frequent coauthors
- 186 shared
Yi Zhang
Hubei University of Arts and Science
- 92 shared
Serghei Mangul
University of Southern California
- 89 shared
Buhm Han
Seoul National University
- 82 shared
Farhad Hormozdiari
Google (United States)
- 82 shared
Chun Ye
Gladstone Institutes
- 78 shared
Daniel T. O’Connor
University of California, Davis
- 76 shared
Fangwen Rao
The Ohio State University Wexner Medical Center
- 68 shared
Caroline M. Nievergelt
University of California, San Diego
Education
- 2003
Ph.D., Computer Science
University of California, Los Angeles
- 1999
M.S., Computer Science
University of California, Los Angeles
- 1998
B.S., Computer Science
University of California, Los Angeles
Awards & honors
- Okawa Foundation Research Award - 2008
- Sloan Research Fellowship - 2009
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Eleazar Eskin
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup