Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Zongming Ma

Zongming Ma

· Professor of Statistics and Data ScienceVerified

Yale University · Department of Statistics and Data Science

Active 2005–2026

h-index30
Citations4.1k
Papers9632 last 5y
Funding$400k
See your match with Zongming Ma — sign in to PhdFit.Sign in

About

Zongming Ma is a professor who is actively engaged in mentoring highly motivated graduate students at Yale University. His students come from diverse academic backgrounds including statistics, computer science, and computational biology. He is interested in working with those who have a genuine interest in both the practical and theoretical aspects of learning from data. His current postdoctoral researcher is Yang Cao, who earned a PhD from HKUST. He also supervises several PhD candidates from Yale and other institutions, reflecting his collaborative approach to research and education. The alumni of his group include individuals who have completed their PhDs under his guidance, some in joint supervision with other faculty members. This indicates his involvement in interdisciplinary research and training.

Research topics

  • Artificial Intelligence
  • Computer Science
  • Machine Learning
  • Mathematics
  • Genetics
  • Statistics
  • Cell biology
  • Econometrics
  • Combinatorics
  • Discrete mathematics
  • Biology
  • Computational biology
  • Evolutionary biology

Selected publications

  • Integration of imaging-based and sequencing-based spatial omics mapping on the same tissue section via DBiTplus

    Nature Methods · 2026-01-15 · 7 citations

    articleOpen access

    Spatially mapping the transcriptome and proteome in the same tissue section can profoundly advance our understanding of cellular heterogeneity and function. Here we present Deterministic Barcoding in Tissue sequencing plus (DBiTplus), an integrative multimodal spatial omics approach combining sequencing-based spatial transcriptomics and multiplexed protein imaging on the same section, enabling both single-cell-resolution cell typing and transcriptome-wide interrogation of biological pathways. DBiTplus utilizes spatial barcoding and RNase H-mediated cDNA retrieval, preserving tissue architecture for multiplexed protein imaging. We developed computational pipelines to integrate these modalities, allowing imaging-guided deconvolution to generate single-cell-resolved spatial transcriptome atlases. We demonstrate DBiTplus across diverse samples including frozen mouse embryos, and formalin-fixed paraffin-embedded human lymph nodes and lymphoma tissues, highlighting its compatibility with challenging clinical specimens. DBiTplus uncovered mechanisms of lymphomagenesis, progression and transformation in human lymphomas. Thus, DBiTplus is a unified workflow for spatially resolved single-cell atlasing and unbiased exploration of biological mechanisms in a cell-by-cell manner at transcriptome scale.

  • Single-cell spatial multi-omics molecular pathology enabled by SuperFocus

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-12-27

    articleOpen access

    Histopathology and molecular pathology are currently distinct diagnostic modalities for the most part, one revealing tissue morphology at cellular resolution and the other providing molecular measurements with limited or no spatial context. Projecting genome-scale molecular information onto histopathology images at single-cell resolution across whole tissue sections represents a long-sought goal for next-generation pathology. Here we present SuperFocus, a modality-agnostic computational platform that generates histopathology-integrated single-cell spatial multi-omics from spot-based spatial measurements acquired on the same or an adjacent section without requiring external reference data. SuperFocus combines constrained cascading imputation with feature-level and cell-level quality-control scores to reduce spurious predictions and quantify confidence. On a ground-truth spatial transcriptomics benchmark dataset, SuperFocus improves key accuracy metrics by 28-73% over existing methods. Across Patho-DBiT, spatial ATAC-RNA, spatial CITE-seq and Visium-MALDI-MSI (SMA) datasets, SuperFocus enables cell-resolved analyses of MALT lymphoma microenvironments, gene regulatory programs in human hippocampus, lipotoxic hepatocyte states in human MASH, and transcriptomic-metabolomic states linked to neurotransmission and neuroinflammation in Parkinsonian mouse brain. Overall, SuperFocus enables scalable whole-slide single-cell spatial multi-omics integrated with histopathology, bridging histology and genome-scale molecular profiling for next-generation molecular pathology.

  • MoDaH achieves rate optimal batch correction

    ArXiv.org · 2025-12-10

    preprintOpen accessSenior author

    Batch effects pose a significant challenge in the analysis of single-cell omics data, introducing technical artifacts that confound biological signals. While various computational methods have achieved empirical success in correcting these effects, they lack the formal theoretical guarantees required to assess their reliability and generalization. To bridge this gap, we introduce Mixture-Model-based Data Harmonization (MoDaH), a principled batch correction algorithm grounded in a rigorous statistical framework. Under a new Gaussian-mixture-model with explicit parametrization of batch effects, we establish the minimax optimal error rates for batch correction and prove that MoDaH achieves this rate by leveraging the recent theoretical advances in clustering data from anisotropic Gaussian mixtures. This constitutes, to the best of our knowledge, the first theoretical guarantee for batch correction. Extensive experiments on diverse single-cell RNA-seq and spatial proteomics datasets demonstrate that MoDaH not only attains theoretical optimality but also achieves empirical performance comparable to or even surpassing those of state-of-the-art heuristics (e.g., Harmony, Seurat-V5, and LIGER), effectively balancing the removal of technical noise with the conservation of biological signal.

  • MoDaH achieves rate optimal batch correction.

    PubMed · 2025-12-10

    articleSenior author

    Batch effects pose a significant challenge in the analysis of single-cell omics data, introducing technical artifacts that confound biological signals. While various computational methods have achieved empirical success in correcting these effects, they lack the formal theoretical guarantees required to assess their reliability and generalization. To bridge this gap, we introduce Mixture-Model-based Data Harmonization (MoDaH), a principled batch correction algorithm grounded in a rigorous statistical framework. Under a new Gaussian-mixture-model with explicit parametrization of batch effects, we establish the minimax optimal error rates for batch correction and prove that MoDaH achieves this rate by leveraging the recent theoretical advances in clustering data from anisotropic Gaussian mixtures. This constitutes, to the best of our knowledge, the first theoretical guarantee for batch correction. Extensive experiments on diverse single-cell RNA-seq and spatial proteomics datasets demonstrate that MoDaH not only attains theoretical optimality but also achieves empirical performance comparable to or even surpassing those of state-of-the-art heuristics (e.g., Harmony, Seurat-V5, and LIGER), effectively balancing the removal of technical noise with the conservation of biological signal.

  • Multi-modal contrastive learning adapts to intrinsic dimensions of shared latent variables

    ArXiv.org · 2025-05-18

    preprintOpen accessSenior author

    Multi-modal contrastive learning as a self-supervised representation learning technique has achieved great success in foundation model training, such as CLIP~\citep{radford2021learning}. In this paper, we study the theoretical properties of the learned representations from multi-modal contrastive learning beyond linear representations and specific data distributions. Our analysis reveals that, enabled by temperature optimization, multi-modal contrastive learning not only maximizes mutual information between modalities but also adapts to intrinsic dimensions of data, which can be much lower than user-specified dimensions for representation vectors. Experiments on both synthetic and real-world datasets demonstrate the ability of contrastive learning to learn low-dimensional and informative representations, bridging theoretical insights and practical performance.

  • CellLENS enables cross-domain information fusion for enhanced cell population delineation in single-cell spatial omics data

    Nature Immunology · 2025-05-22 · 10 citations

    articleOpen accessSenior author
  • All-Optical Multimodal Mapping of Single Cell-Type–Specific Metabolic Activities via REDCAT

    bioRxiv (Cold Spring Harbor Laboratory) · 2024-11-10 · 5 citations

    preprintCorresponding

    Abstract Metabolism underlies cell growth, survival, and function, yet its activities vary widely across cell types and tissue environments. Spatially resolving these processes in situ at single-cell resolution is essential to advance our understanding of cellular function and tissue physiology in health and disease. However, existing approaches are limited by either destructive workflows, insufficient spatial resolution and biochemical specificity, or lack of direct linkage to cell identity. Here, we present Raman Enhanced Delineation of Cell Atlases in Tissues (REDCAT), a multimodal all-optical platform that integrates stimulated Raman scattering, autofluorescence redox imaging, second harmonic generation, and high-plex immunofluorescence to co-map metabolic activities and cell types within the same tissue section. REDCAT achieves subcellular resolution profiling of protein, lipid, redox, and nuclear acid metabolism, together with extracellular matrix composition, in both FFPE and fresh-frozen human tissues. Applied to normal lymph nodes, REDCAT delineated distinct redox and lipid remodeling programs across germinal center B-cell zones and immune subsets, highlighting cell-type–specific metabolic specialization. In lymphoma, it revealed profound metabolic reprogramming, including extensive lipid accumulation, nuclear metabolic heterogeneity, and a transitional metabolic state associated with transformation from chronic lymphocytic leukemia to diffuse large B-cell lymphoma, thereby illuminating tumor evolution in situ . In human liver, REDCAT resolved cell-type–specific lipid droplet diversity and zonation-dependent nuclear metabolic gradients, uncovering new principles of spatial metabolic organization. By directly linking cell identity with spatial metabolic states at single-cell or subcellular resolution, REDCAT establishes a broadly applicable framework for studying immune function, tumor progression, and tissue physiology, and offers a new path to deciphering the metabolic basis of health and disease.

  • Integration of Imaging-based and Sequencing-based Spatial Omics Mapping on the Same Tissue Section via DBiTplus

    bioRxiv (Cold Spring Harbor Laboratory) · 2024-11-11 · 6 citations

    preprintOpen accessCorresponding

    Spatially mapping the transcriptome and proteome in the same tissue section can significantly advance our understanding of heterogeneous cellular processes and connect cell type to function. Here, we present Deterministic Barcoding in Tissue sequencing plus (DBiTplus), an integrative multi-modality spatial omics approach that combines sequencing-based spatial transcriptomics and image-based spatial protein profiling on the same tissue section to enable both single-cell resolution cell typing and genome-scale interrogation of biological pathways. DBiTplus begins with in situ reverse transcription for cDNA synthesis, microfluidic delivery of DNA oligos for spatial barcoding, retrieval of barcoded cDNA using RNaseH, an enzyme that selectively degrades RNA in an RNA-DNA hybrid, preserving the intact tissue section for high-plex protein imaging with CODEX. We developed computational pipelines to register data from two distinct modalities. Performing both DBiT-seq and CODEX on the same tissue slide enables accurate cell typing in each spatial transcriptome spot and subsequently image-guided decomposition to generate single-cell resolved spatial transcriptome atlases. DBiTplus was applied to mouse embryos with limited protein markers but still demonstrated excellent integration for single-cell transcriptome decomposition, to normal human lymph nodes with high-plex protein profiling to yield a single-cell spatial transcriptome map, and to human lymphoma FFPE tissue to explore the mechanisms of lymphomagenesis and progression. DBiTplusCODEX is a unified workflow including integrative experimental procedure and computational innovation for spatially resolved single-cell atlasing and exploration of biological pathways cell-by-cell at genome-scale.

  • Integration of Imaging-based and Sequencing-based Spatial Omics Mapping on the Same Tissue Section via DBiTplus

    Research Square · 2024-12-11 · 4 citations

    preprintOpen accessSenior author
  • Multimodal data integration and cross-modal querying via orchestrated approximate message passing

    arXiv (Cornell University) · 2024-07-26 · 1 citations

    preprintOpen accessSenior author

    The need for multimodal data integration arises naturally when multiple complementary sets of features are measured on the same sample. Under a dependent multifactor model, we develop a fully data-driven orchestrated approximate message passing algorithm for integrating information across these feature sets to achieve statistically optimal signal recovery. In practice, these reference data sets are often queried later by new subjects that are only partially observed. Leveraging on asymptotic normality of estimates generated by our data integration method, we further develop an asymptotically valid prediction set for the latent representation of any such query subject. We demonstrate the prowess of both the data integration and the prediction set construction algorithms on both synthetic examples and real world single-cell datasets.

Recent grants

Frequent coauthors

Labs

Education

  • PhD, Statistics

    Stanford University

    2010
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Zongming Ma

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup