Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Martin Jinye Zhang

Martin Jinye Zhang

· Assistant ProfessorVerified

Carnegie Mellon University · Ray and Stephanie Lane Computational Biology Department

Active 2016–2026

h-index21
Citations4.5k
Papers7958 last 5y
Funding
See your match with Martin Jinye Zhang — sign in to PhdFit.Sign in

About

Martin Jinye Zhang is an Assistant Professor in the Ray and Stephanie Lane Computational Biology Department at Carnegie Mellon University. He is based at the School of Computer Science, located at 5000 Forbes Avenue, Pittsburgh, PA. His role involves research and teaching within the field of computational biology, contributing to the department's academic and scientific endeavors.

Research topics

  • Biology
  • Genetics
  • Computational biology
  • Evolutionary biology
  • Cell biology

Selected publications

  • SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources

    arXiv (Cornell University) · 2026-04-05

    preprintOpen access

    Modern scientific ecosystems are rich in procedural knowledge across repositories, APIs, scripts, notebooks, documentation, databases, and papers, yet much of this knowledge remains fragmented across heterogeneous artifacts that agents cannot readily operationalize. This gap between abundant scientific know-how and usable agent capabilities is a key bottleneck for building effective scientific agents. We present SkillFoundry, a self-evolving framework that converts such resources into validated agent skills, reusable packages that encode task scope, inputs and outputs, execution steps, environment assumptions, provenance, and tests. SkillFoundry organizes a target domain as a domain knowledge tree, mines resources from high-value branches, extracts operational contracts, compiles them into executable skill packages, and then iteratively expands, repairs, merges, or prunes the resulting library through a closed-loop validation process. SkillFoundry produces a substantially novel and internally valid skill library, with 71.1\% of mined skills differing from existing skill libraries such as SkillHub and SkillSMP. We demonstrate that these mined skills improve coding agent performance on five of the six MoSciBench datasets. We further show that SkillFoundry can design new task-specific skills on demand for concrete scientific objectives, and that the resulting skills substantially improve performance on two challenging genomics tasks: cell type annotation and the scDRS workflow. Together, these results show that automatically mined skills improve agent performance on benchmarks and domain-specific tasks, expand coverage beyond hand-crafted skill libraries, and provide a practical foundation for more capable scientific agents.

  • MultiSuSiE improves multi-ancestry fine-mapping in All of Us whole-genome sequencing data

    Nature Genetics · 2026-01-01 · 1 citations

    article
  • SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources

    arXiv (Cornell University) · 2026-04-05

    articleOpen access

    Modern scientific ecosystems are rich in procedural knowledge across repositories, APIs, scripts, notebooks, documentation, databases, and papers, yet much of this knowledge remains fragmented across heterogeneous artifacts that agents cannot readily operationalize. This gap between abundant scientific know-how and usable agent capabilities is a key bottleneck for building effective scientific agents. We present SkillFoundry, a self-evolving framework that converts such resources into validated agent skills, reusable packages that encode task scope, inputs and outputs, execution steps, environment assumptions, provenance, and tests. SkillFoundry organizes a target domain as a domain knowledge tree, mines resources from high-value branches, extracts operational contracts, compiles them into executable skill packages, and then iteratively expands, repairs, merges, or prunes the resulting library through a closed-loop validation process. SkillFoundry produces a substantially novel and internally valid skill library, with 71.1\% of mined skills differing from existing skill libraries such as SkillHub and SkillSMP. We demonstrate that these mined skills improve coding agent performance on five of the six MoSciBench datasets. We further show that SkillFoundry can design new task-specific skills on demand for concrete scientific objectives, and that the resulting skills substantially improve performance on two challenging genomics tasks: cell type annotation and the scDRS workflow. Together, these results show that automatically mined skills improve agent performance on benchmarks and domain-specific tasks, expand coverage beyond hand-crafted skill libraries, and provide a practical foundation for more capable scientific agents.

  • martinjzhang/LDSPEC: ldspec paper

    Open MIND · 2026-04-26

    otherOpen access1st authorCorresponding

    code corresponding to the publication

  • martinjzhang/LDSPEC: ldspec paper

    Zenodo (CERN European Organization for Nuclear Research) · 2026-04-26

    otherOpen access1st authorCorresponding

    code corresponding to the publication

  • TusoAI: Agentic Optimization for Scientific Methods

    ArXiv.org · 2025-09-28 · 1 citations

    preprintOpen accessSenior author

    Scientific discovery is often slowed by the manual development of computational tools needed to analyze complex experimental data. Building such tools is costly and time-consuming because scientists must iteratively review literature, test modeling and scientific assumptions against empirical data, and implement these insights into efficient software. Large language models (LLMs) have demonstrated strong capabilities in synthesizing literature, reasoning with empirical data, and generating domain-specific code, offering new opportunities to accelerate computational method development. Existing LLM-based systems either focus on performing scientific analyses using existing computational methods or on developing computational methods or models for general machine learning without effectively integrating the often unstructured knowledge specific to scientific domains. Here, we introduce TusoAI , an agentic AI system that takes a scientific task description with an evaluation function and autonomously develops and optimizes computational methods for the application. TusoAI integrates domain knowledge into a knowledge tree representation and performs iterative, domain-specific optimization and model diagnosis, improving performance over a pool of candidate solutions. We conducted comprehensive benchmark evaluations demonstrating that TusoAI outperforms state-of-the-art expert methods, MLE agents, and scientific AI agents across diverse tasks, such as single-cell RNA-seq data denoising and satellite-based earth monitoring. Applying TusoAI to two key open problems in genetics improved existing computational methods and uncovered novel biology, including 9 new associations between autoimmune diseases and T cell subtypes and 7 previously unreported links between disease variants linked to their target genes. Our code is publicly available at https://github.com/Alistair-Turcan/TusoAI.

  • Genetic and Cellular Architecture of Breast Cancer Risk Across Ancestries

    medRxiv · 2025-08-24

    preprintOpen access

    Abstract Background Breast cancer genome-wide association studies (GWAS) have identified more than 200 susceptibility loci, but most studies are dominated by European and East Asian populations. Methods We analyzed breast cancer GWAS summary statistics from African (AFR), East Asian (EAS), European (EUR), and Hispanic/Latina (H/L) samples (159,297 cases and 212,102 controls). We estimated logit-scale SNP-based heritability, polygenicity, and cross-ancestry genetic correlation, partitioned heritability across functional annotations, and integrated GWAS results with the Tabula Sapiens single-cell atlas using scDRS+. Results The logit-scale heritability of breast cancer ranged from ℎ 2 =0.47 (SE = 0.07) in EAS to AFR ℎ 2 =0.61 (SE = 0.10), with no significant differences across ancestries (p=0.63). The estimated number of susceptibility markers in a sparse normal-mixture effects model also varied from 4,446 (SE = 3,100) in EAS to 8,308 (SE = 2,751) in AFR, but differences were not significant across ancestries (p=0.55). Cross-sample genetic correlations varied, with the strongest correlation between EUR and EAS (𝜌 = 0.79, SE = 0.08) and weakest between AFR and H/L (𝜌 = 0.26, SE = 0.24). Regulatory annotations were enriched for breast cancer heritability across samples. Integration with single-cell expression profiles implicated ancestry-shared associations with innate immune, secretory epithelial, and stromal cell types. Conclusion These results indicate substantial cross-ancestry sharing of breast cancer polygenic architecture, highlight a consistent contribution of regulatory variation, and identify convergent cellular contexts that motivate functional follow-up and inform expectations for the transferability and attainable performance of common-variant risk prediction across populations.

  • Principal Components for Practice‐Oriented Measurement of Running Technique: A Proof‐Of‐Concept Study

    European Journal of Sport Science · 2025-06-27

    articleOpen access

    This study aims to construct valid and practically applicable running technique measures using principal component analysis (PCA). We hypothesized that data-driven principal movements (PMs), derived from deliberately instructed opposite technique variations, would significantly distinguish these variations and could serve as quantitative measures of running technique as described by practitioners. 20 experienced runners were instructed to vary 14 distinct running technique elements into two opposing directions (e.g., forward and backward lean for a technique element representing horizontal movements). Elements and their variations were selected based on visual descriptions from practitioners found in running literature. Kinematic data were collected on a treadmill using optical motion capture and analyzed using a PCA-based approach to determine running-specific technique measures per technique element. By combining trials with opposing technique variations, variance in the data was purposefully produced, which in turn caused the resultant principal movements to align with the intended technique element. For all of the 14 technique elements, a valid measure-in the sense that the inputted opposite variations were significantly distinguishable within this measure-could be constructed. The measures could further be applied to the habitual running technique of the group of tested runners. The results of this study demonstrate the construct validity and applicability of the presented approach to measure running technique. This method can provide runners and coaches with valuable feedback and will enable future studies to investigate running technique, quantified through practice-informed measures, in the context of performance, injury risk, or adaptations to equipment.

  • Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis

    medRxiv · 2025-04-16 · 7 citations

    preprintOpen access

    Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a complex, heterogeneous, and systemic disease defined by a suite of symptoms, including unexplained persistent fatigue, post-exertional malaise (PEM), cognitive impairment, myalgia, orthostatic intolerance, and unrefreshing sleep. The disease mechanism of ME/CFS is unknown, with no effective curative treatments. In this study, we present a multi-site ME/CFS whole-genome analysis, which is powered by a novel deep learning framework, HEAL2. We show that HEAL2 not only has predictive value for ME/CFS based on personal rare variants, but also links genetic risk to various ME/CFS-associated symptoms. Model interpretation of HEAL2 identifies 115 ME/CFS-risk genes that exhibit significant intolerance to loss-of-function (LoF) mutations. Transcriptome and network analyses highlight the functional importance of these genes across a wide range of tissues and cell types, including the central nervous system (CNS) and immune cells. Patient-derived multi-omics data implicate reduced expression of ME/CFS risk genes within ME/CFS patients, including in the plasma proteome, and the transcriptomes of B and T cells, especially cytotoxic CD4 T cells, supporting their disease relevance. Pan-phenotype analysis of ME/CFS genes further reveals the genetic correlation between ME/CFS and other complex diseases and traits, including depression and long COVID-19. Overall, HEAL2 provides a candidate genetic-based diagnostic tool for ME/CFS, and our findings contribute to a comprehensive understanding of the genetic, molecular, and cellular basis of ME/CFS, yielding novel insights into therapeutic targets. Our deep learning model also offers a potent, broadly applicable framework for parallel rare variant analysis and genetic prediction for other complex diseases and traits.

  • Fine-mapping causal tissues and genes at disease-associated loci

    Nature Genetics · 2025-01-01 · 13 citations

    articleOpen access

Frequent coauthors

  • James Zou

    Stanford University

    58 shared
  • Alkes L. Price

    Broad Institute

    32 shared
  • Soumya Raychaudhuri

    Brigham and Women's Hospital

    27 shared
  • Angela Oliveira Pisco

    Chan Zuckerberg Initiative (United States)

    24 shared
  • Kangcheng Hou

    University of California, Los Angeles

    23 shared
  • Xilin Jiang

    21 shared
  • Michael Inouye

    University of Cambridge

    18 shared
  • Saori Sakaue

    Harvard University

    17 shared

Education

  • Ph.D., Computational Biology

    Carnegie Mellon University

  • M.S., Computational Biology

    Carnegie Mellon University

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Martin Jinye Zhang

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup