Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Jian Ma

Jian Ma

· Ray and Stephanie Lane Professor of Computational BiologyVerified

Carnegie Mellon University · Ray and Stephanie Lane Computational Biology Department

Active 1997–2025

h-index60
Citations22.7k
Papers368150 last 5y
Funding$36.6M3 active
See your match with Jian Ma — sign in to PhdFit.Sign in

About

Jian Ma is the Ray and Stephanie Lane Professor of Computational Biology at the School of Computer Science, Carnegie Mellon University, a position he has held since September 2021. Prior to this, he served as a Full Professor from July to August 2021 and as an Associate Professor from January 2016 to June 2021. He is also affiliated faculty in Machine Learning at Carnegie Mellon. Before joining Carnegie Mellon, Jian Ma was an Associate Professor with tenure and Assistant Professor in the Bioengineering department at the University of Illinois at Urbana-Champaign, where he was also a faculty member of the Carl R. Woese Institute for Genomic Biology, Biophysics and Computational Biology, and an affiliate faculty member in Computer Science. He completed his postdoctoral training at UC Santa Cruz under David Haussler from 2007 to 2009, earned his Ph.D. in Computer Science from Penn State University in 2006 under Webb Miller, and holds M.S. and B.S. degrees in Computer Science from Fudan University in Shanghai, China. Jian Ma's research interests focus on computational biology, with particular emphasis on 3D epigenomics and comparative genomics. His work integrates computational and machine learning approaches to understand the spatial organization of the genome and its regulatory mechanisms. He has been recognized with numerous honors including being named an ACM Fellow, ISCB Fellow, and AAAS Fellow in 2025 and 2022 respectively, receiving the Allen Newell Award for Research Excellence in 2025, a Guggenheim Fellowship in 2020, and an NSF CAREER award in 2011.

Research topics

  • Biology
  • Genetics
  • Cell biology
  • Computational biology
  • Computer Science
  • Artificial Intelligence
  • Computer vision
  • Physics
  • Psychology
  • Gerontology
  • Chemistry
  • Medicine
  • Biochemistry
  • Evolutionary biology
  • Anatomy

Selected publications

  • TissueNarrator: Generative Modeling of Spatial Transcriptomics with Large Language Models

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-11-27

    preprintOpen access

    Abstract The intricate spatial organization and molecular communication among cells are fundamental to multicellular systems. Spatial transcriptomics (ST) enables gene expression profiling while preserving spatial context, providing rich data for studying cellular interactions and tissue dynamics. However, most existing computational approaches focus on embedding-based tasks and provide limited generative capacity for simulating cell behavior in situ . Moreover, accurately interpreting spatial interactions requires extensive biological knowledge, which current models do not incorporate. Here, we introduce T issue N arrator , a framework that reformulates spatial omics analysis as a language modeling problem. By representing tissue sections as spatial sentences – rank-based gene lists augmented with spatial coordinates and metadata – T issue N arrator leverages pretrained large language models (LLMs) to learn spatially conditioned gene expression patterns. The model generates realistic, context-aware cellular profiles, predicts intercellular interactions, and performs in silico perturbation analyses. Across multiple ST technologies (MERFISH, Perturb-FISH, and CosMx SMI), T issue N arrator achieves superior quantitative performance and recovers biologically meaning-ful ligand–receptor and signaling pathways. Furthermore, a conversational inference mode enables natural-language querying of tissue organization. By integrating pretrained biological knowledge with spatial context, T issue N arrator establishes a new, scalable generative paradigm for modeling, simulating, and reasoning about tissue systems.

  • Elevated SLC3A2 Expression Promotes the Progression of Gliomas and Enhances Ferroptosis Resistance through the AKT/NRF2/GPX4 Axis

    Cancer Research and Treatment · 2025-03-10 · 1 citations

    articleOpen access

    PURPOSE: The aim of this study is to determine the impact of solute carrier family 3 member 2 (SLC3A2) on the malignant phenotype of gliomas and its role in regulating ferroptosis sensitivity. MATERIALS AND METHODS: The malignant phenotype of glioma was assessed by cell proliferation assay, colony formation assay, EdU assay, wound healing, and Transwell experiments. We further validated the impact of reduced SLC3A2 expression on the sensitivity to ferroptosis in glioma cells through Cell Counting Kit-8 assays, flow cytometry, western blotting, and transmission electron microscopy. Western blot was used to explore how SLC3A2 affects glioma sensitivity to ferroptosis through the AKT/NF-E2-related factor 2 (NRF2)/glutathione peroxidase 4 (GPX4) axis. By establishing a subcutaneous xenograft tumor model in BALB/c-nude mice, we investigated the growth of tumors following the knockout of SLC3A2 in glioma cells. RESULTS: Downregulation of SLC3A2 suppressed the malignant phenotype of glioma by blocking the cell cycle and epithelial-mesenchymal transition processes. On the other hand, loss of SLC3A2 not only downregulated SLC7A11 but also prevented the activation of the AKT/NRF2/GPX4 axis. These lead to increased accumulation of reactive oxygen species and lipid peroxides, ultimately enhancing the susceptibility of glioma to ferroptosis. CONCLUSION: Our findings suggest that SLC3A2 is an oncogene in gliomas, promoting their occurrence and development. It plays a critical role in ferroptosis resistance through the AKT/NRF2/GPX4 axis.

  • An integrated view of the structure and function of the human 4D nucleome

    Nature · 2025-12-17 · 19 citations

    articleOpen access

    to map and analyse the 4D nucleome in widely used H1 human embryonic stem cells and immortalized fibroblasts (HFFc6). We produced and integrated diverse genomic datasets of the 4D nucleome, each contributing unique observations, which enabled us to assemble extensive catalogues of more than 140,000 looping interactions per cell type, to generate detailed classifications and annotations of chromosomal domain types and their subnuclear positions, and to obtain single-cell 3D models of the nuclear environment of all genes including their long-range interactions with distal elements. Through extensive benchmarking, we describe the unique strengths of different genomic assays for studying the 4D nucleome, providing guidelines for future studies. Three-dimensional models of population-based and individual cell-to-cell variation in genome structure showed connections between chromosome folding, nuclear organization, chromatin looping, gene transcription and DNA replication. Finally, we demonstrate the use of computational methods to predict genome folding from DNA sequence, which will facilitate the discovery of potential effects of genetic variants, including variants associated with disease, on genome structure and function.

  • Complete sequencing of ape genomes

    Nature · 2025-04-09 · 119 citations

    articleOpen access

    . Consequently, our understanding of the evolution of our species is incomplete. Here we present haplotype-resolved reference genomes and comparative analyses of six ape species: chimpanzee, bonobo, gorilla, Bornean orangutan, Sumatran orangutan and siamang. We achieve chromosome-level contiguity with substantial sequence accuracy (<1 error in 2.7 megabases) and completely sequence 215 gapless chromosomes telomere-to-telomere. We resolve challenging regions, such as the major histocompatibility complex and immunoglobulin loci, to provide in-depth evolutionary insights. Comparative analyses enabled investigations of the evolution and diversity of regions previously uncharacterized or incompletely studied without bias from mapping to the human reference genome. Such regions include newly minted gene families in lineage-specific segmental duplications, centromeric DNA, acrocentric chromosomes and subterminal heterochromatin. This resource serves as a comprehensive baseline for future evolutionary studies of humans and our closest living ape relatives.

  • Serum Myoglobin After Cardiac Surgery Predicts Postoperative Cardiogenic Shock Requiring Mechanical Circulatory Support Within 14 Days

    Shock · 2025-07-28

    articleOpen access

    BACKGROUND: Cardiogenic shock requiring mechanical circulatory support is a life-threatening complication of cardiac surgery with cardiopulmonary bypass (CPB). This study aimed to determine the role of myoglobin in predicting the occurrence of postoperative cardiogenic shock requiring mechanical circulatory support within 14 days. METHODS: A total of 4,610 patients undergoing cardiac surgery with CPB were included and analyzed. Mechanical circulatory support included the form of intra-aortic balloon pump and extracorporeal membrane oxygenation. Cox regression with a natural cubic spline was used to assess the relationship between postoperative myoglobin levels and the 14-day risk of mechanical circulatory support for cardiogenic shock. RESULTS: Of 4,610 patients, 279 (6.1%) required mechanical circulatory support within 14 days after surgery. The 14-day risk of using mechanical circulatory support increased with the postoperative peak myoglobin levels. Among the patients who underwent aortic surgery, the threshold myoglobin level measured within 1 day after surgery, associated with an adjusted hazard ratio greater than 1.00 for using mechanical circulatory support within 14 days, was 1,568 ng/mL (95% confidence interval [CI], 195-6,040). Among the patients who underwent non-aortic surgery, the corresponding threshold myoglobin level was 419 ng/mL (95% CI, 180-452). CONCLUSIONS: Postoperative myoglobin levels are closely related to the 14-day risk of using mechanical circulatory support after cardiac surgery. When postoperative myoglobin exceeds certain thresholds, the 14-day risk of using mechanical circulatory support after surgery starts to increase with the myoglobin level. Myoglobin has potential value in predicting postoperative cardiogenic shock requiring mechanical circulatory support within 14 days after cardiac surgery.

  • O-090 Advancing embryo selection: combined DNA and RNA analysis in PGT

    Human Reproduction · 2025-06-01

    articleOpen access

    Abstract Study question Can a parallel DNA and RNA sequencing strategy address challenges in selecting the most viable embryos and reduce false positive/negative results in PGT-A? Summary answer Combining DNA and RNA sequencing in PGT-A enhances embryo selection, improving implantation rates, reducing time and costs, without increasing embryo wastage. What is known already Preimplantation genetic testing for aneuploidies (PGT-A) aims to reduce miscarriage rates and improve clinical pregnancy rates by screening for chromosome aneuploidies in the trophectoderm. However, two major challenges remain: selecting the most viable embryos to enhance outcomes and addressing false positive and false negative results due to mosaicism. Study design, size, duration This retrospective study used a parallel DNA-seq and RNA-seq strategy to explore the link between embryo transcriptome and clinical pregnancy outcomes in PGT-A. A total of 102 patients undergoing 111 ICSI PGT cycles were included. The study analyzed over 400 embryos, comparing genomic copy number variation (CNV) and transcriptomic data to develop predictive models. Participants/materials, setting, methods The study involved 102 patients who underwent 111 ICSI PGT cycles in a clinical setting. Over 400 embryos were collected, and their genomic CNV and transcriptomic profiles were analyzed using DNA-seq and RNA-seq. Differentially expressed genes (DEGs) were identified and a predictive model was created using data from 48 successful pregnancies and 35 failed pregnancies. Methods included statistical modeling with machine learning algorithms like random forest, support vector machine, and linear discriminant analysis. Main results and the role of chance A modeling strategy using 280 DEGs improved euploid embryo selection, achieving areas under the curve (AUCs) of 0.88, 0.71, and 0.84 for the random forest (RF), support vector machine, and linear discriminant analysis models, respectively. Retrospective analysis of 83 transferred euploid blastocysts using the RF model identified three embryo categories with decreasing implantation potential. Notably, the implantation rate of the good group was significantly higher than that of the moderate group (88.6% vs 50.0% P = 0.001) and that of the moderate group was higher than that of the poor group (50.0% vs 20.8%, P = 0.035). Combining DNA and RNA sequencing in PGT-A offers a novel method for selecting embryos with greater implantation potential, potentially reducing time and costs associated with achieving clinical pregnancy, while not increasing the wastage rate of embryos. Limitations, reasons for caution The findings are based on retrospective data, and external validation is necessary to confirm the predictive model’s generalizability across diverse patient populations. Wider implications of the findings Combining DNA and RNA sequencing in PGT-A offers a novel method for selecting embryos with greater implantation potential, potentially reducing time and costs associated with achieving clinical pregnancy, while not increasing the wastage rate of embryos. Trial registration number No

  • The IGVF catalog—from genetic variation to function

    Nucleic Acids Research · 2025-12-08 · 3 citations

    articleOpen access

    Genomic variation between individuals is essential for understanding how differences in the genome sequence affect molecular and cellular processes. The Impact of Genomic Variation on Function (IGVF) Consortium aims to uncover the relationships among genomic variation, genome function, and phenotypes by combining experimental techniques, such as single-cell mapping and genomic perturbation assays, with computational approaches such as machine learning-based predictive modeling. The IGVF Data and Administrative Coordinating Centers collect, analyze, and disseminate data and results from across the consortium through an open-source platform called the IGVF Catalog. This resource includes, but is not limited to, data on the effects of coding variants on protein abundance and function, noncoding variants on enhancer activity (measured by MPRA or predicted computationally), and associations between variants and quantitative traits. All data are organized within a graph database comprising over 50 types of data collections with nearly 3 billion nodes and over 7.5 billion edges. The Catalog offers public API endpoints (https://api.catalogkg.igvf.org/) and a user-friendly interface for exploring, querying, and visualizing the data at https://catalog.igvf.org. We expect that this open-access platform will support the broader scientific community to advance our understanding of how genomic variation influences biology and disease.

  • Popari: Modeling multisample variation in spatial transcriptomics

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-05-13 · 2 citations

    preprintOpen accessSenior authorCorresponding

    . While tools exist for multisample single-cell RNA-seq, methods tailored to multisample SRT remain limited. Here, we introduce Popari, a probabilistic graphical model for factor-based decomposition of multisample SRT that captures condition-specific changes in spatial organization. Popari jointly learns spatial metagenes - linear gene expression programs - and their spatial affinities across samples. Its key innovations include a differential prior to regularize spatial accordance and spatial downsampling to enable multiresolution, hierarchical analysis. Simulations show Popari outperforms existing methods on multisample and multi-resolution spatial metrics. Applications to real datasets uncover spatial metagene dynamics, spatial accordance, and cell identities. In mouse brain (STARmap PLUS), Popari identifies spatial metagenes linked to AD; in thymus (Slide-TCR-seq), it captures increasing colocalization of V(D)J recombination and T cell proliferation; and in ovarian cancer (CosMx), it reveals sample-specific malignant-immune interactions. Overall, Popari provides a general, interpretable framework for analyzing variation in multisample SRT.

  • EYKTHYR reveals transcriptional regulators of spatial gene programs

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-05-23

    preprintOpen accessSenior authorCorresponding

    Abstract Understanding how transcription factors (TFs) orchestrate gene regulatory networks that define complex tissue structures is central to uncovering tissue organization and disease mechanisms. Although spatial multiome technologies now enable in situ measurement of both transcriptional activity and chromatin accessibility, existing computational methods either overlook spatial tissue context or are hindered by the high dropout rates characteristic of such data. Here, we introduce E ykthyr , a computational framework that integrates gene expression and chromatin accessibility within a spatially aware model to identify TFs driving spatial gene programs. E ykthyr mitigates dropout effects by leveraging interpretable, low-dimensional embeddings of gene expression and chromatin accessibility – both linear with respect to their input – enabling robust identification and scalable inference of spatial transcriptional regulators. Applied across diverse spatial multiome datasets, E ykthyr consistently outperforms existing approaches, accurately identifying TFs that coordinate spatial gene programs in mouse brain development and regulate T-cell states within tumor microenvironments. E ykthyr establishes a foundation for decoding how TFs interpret local intercellular signaling to shape tissue structure, offering insights into the regulatory logic underlying spatial organization in health and disease.

  • Steamboat: Attention-based multiscale delineation of cellular interactions in tissues

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-04-10 · 4 citations

    preprintOpen accessSenior authorCorresponding

    Abstract Spatial-omics technologies profile cells in their native spatial context within tissues, enabling more complete understanding of cellular properties. However, a key computational challenge remains: identifying cellular interactions that underlie cell types and states – interactions that are essential for spatial organization and provide a biologically grounded framework for understanding cell identities and spatial patterns. These interactions span different distances and thus require multiscale modeling, which remains a major gap in existing methods. Here, we introduce S teamboat , an interpretable machine learning framework that leverage a self-supervised, multi-head attention model to uniquely decompose gene expression of a cell into multiple key factors: intrinsic cell programs, neighboring cell communication, and long-range interactions. By applying S teamboat to diverse tissues in health and disease across various spatial-omics technologies, we demonstrate its ability to uncover critical multiscale cellular interactions, capturing classical contact signaling and revealing previously unrecognized patterns of cellular communication. S teamboat provides a powerful approach for spatial-omics analysis, offering new insights into the multiscale spatial organization of cells and their communication across a wide range of biological contexts.

Recent grants

Frequent coauthors

  • Yang Zhang

    San Diego Biomedical Research Institute

    97 shared
  • Yuchuan Wang

    San Diego Biomedical Research Institute

    73 shared
  • Omid Gholamalamdari

    University of Illinois Urbana-Champaign

    71 shared
  • Bas van Steensel

    The Netherlands Cancer Institute

    69 shared
  • Andrew S. Belmont

    University of Illinois Urbana-Champaign

    69 shared
  • Liguo Zhang

    University of Illinois Urbana-Champaign

    67 shared
  • Tom van Schaik

    Oncode Institute

    64 shared
  • David M. Gilbert

    San Diego Biomedical Research Institute

    64 shared

Labs

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Jian Ma

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup