
Song Ma
· Associate Professor of FinanceVerifiedYale University · Finance
Active 2002–2025
About
Song Ma is a Professor of Finance and Entrepreneurship at Yale School of Management (SOM) and a Faculty Research Fellow at the National Bureau of Economic Research (NBER). He joined Yale SOM faculty in 2016 and is also an affiliated faculty member at Yale Law School Center for the Study of Corporate Law and Yale SOM Program on Entrepreneurship. His main research interests include innovation economics, entrepreneurship, financial economics, AI, and big data, with additional work spanning corporate strategy, industrial organization, antitrust, labor, and business law. His research has been published in top academic journals such as the Journal of Political Economy, Journal of Finance, Journal of Financial Economics, and Review of Financial Studies, and has received numerous awards. His work has been referenced by policymakers and legislatures worldwide, including the Federal Trade Commission, EU Competition Commission, and UK Competition and Markets Authority. Professor Ma is recognized for his teaching, offering popular electives like “Entrepreneurial Finance,” “Venture Capital and Private Equity,” and “Finance and the Society,” and was named Poets & Quants 40 under 40 Best Business School Professor in 2021. He earned his PhD in Finance from Duke University’s Fuqua School of Business in 2016, receiving the Top Finance Graduate Award, and holds a BA in Economics from Zhejiang University.
Research topics
- Biology
- Medicine
- Immunology
- Internal medicine
- Oncology
- Genetics
- Pathology
- Computational biology
- Virology
Selected publications
Ordinal Sparse Neural Networks for Modeling Gene‐ and Imaging‐Environment Interactions
Statistics in Medicine · 2025-10-01 · 1 citations
articleIn biomedical studies, gene-environment (G-E) interactions and imaging-environment (I-E) interactions play an important role in modeling disease outcomes. Substantial investigations have been made; however, there is still a lack of related studies exploring flexible nonparametric statistical methods for modeling ordinal responses, such as the tumor pathological stage. In this paper, we develop a neural network-based method for modeling ordinal responses with interaction analysis. A novel definition of the output function for the neural network is derived to predict ordinal categories. To facilitate variable selection, we employ a sparse layer within the proposed neural networks. The penalized estimation is obtained using the local quadratic approximation (LQA) algorithm. Extensive simulation studies demonstrate that the proposed method achieves competitive performance in both prediction and variable selection. We further apply our method to breast cancer (BRCA) and skin cutaneous melanoma (SKCM) datasets, examining tumor stage prediction based on G-E and I-E interaction analyses, respectively. The proposed method identifies relevant main effects and interactions, providing insights into the underlying biological mechanisms.
Gastroenterology · 2025-05-01
articleHierarchical Multi‐Label Classification With Gene‐Environment Interactions in Disease Modeling
Statistics in Medicine · 2025-01-26 · 2 citations
articleOpen accessIn biomedical studies, gene-environment (G-E) interactions have been demonstrated to have important implications for analyzing disease outcomes beyond the main G and main E effects. Many approaches have been developed for G-E interaction analysis, yielding important findings. However, hierarchical multi-label classification, which provides insightful information on disease outcomes, remains unexplored in G-E analysis literature. Moreover, unlabeled data are commonly observed in practical settings but omitted by many existing methods of hierarchical multi-label classification. In this study, we consider a semi-supervised scenario and develop a novel approach for the two-layer hierarchical response with G-E interactions. A two-step penalized estimation is then proposed using an efficient expectation-maximization (EM) algorithm. Simulation shows that it has superior performance in classification and feature selection. The analysis of The Cancer Genome Atlas (TCGA) data on lung cancer demonstrates the practical utility of the proposed method. Overall, this study can fill the important knowledge gap in G-E interaction analysis by providing a widely applicable framework for hierarchical multi-label classification of complex disease outcomes.
Analysis of cross-platform health communication with a network approach
Biometrics · 2025-10-08
articleSenior authorOnline health communities (OHCs) provide a platform for patients and those related to share and communicate, making complex medical information more digestible and actionable. Health communication within OHCs can be impacted by other information sources. This study examines cross-platform health communication by mining Breastcancer.org (the largest online breast cancer community) and Twitter (now X). Early analyses of OHCs, Twitter, and other online platforms often adopt simple measures like word frequency, and more recent research has shifted towards word co-occurrence network analysis. Relatively, cross-platform communication analysis is limited, and the adopted techniques have drawbacks. We propose a new cross-platform communication model that collectively analyzes word co-occurrence networks and word frequency vectors. Here, the former describe the structural contents of health communication, and the latter describe the volumes. This model offers a nuanced perspective, accommodates temporal variations, and is examined for its theoretical and numerical properties. Collected from January 2010 to December 2020, the analyzed data contains over 1 395 000 tweets and 517 000 posts. Our analysis suggests that the Twitter's topics on breast cancer significantly impact the contents and volumes in the OHC. Distinct time phases are observed, with notable peaks during 2012-2013 and 2015-2018. This study can provide a venue for better understanding health communication and new insights into two highly important online platforms.
Adoption of Broad Genomic Profiling in Patients With Cancer
JAMA Oncology · 2025-04-17 · 2 citations
articleOpen accessThis cross-sectional study examines the use of broad genomic profiling in patients with metastatic cancer.
ArXiv.org · 2025-04-13
preprintOpen accessSpatial transcriptomics has revolutionized tissue analysis by simultaneously mapping gene expression, spatial topography, and histological context across consecutive tissue sections, enabling systematic investigation of spatial heterogeneity. The detection of spatially variable (SV) genes, which are molecular signatures with position-dependent expression, provides critical insights into disease mechanisms spanning oncology, neurology, and cardiovascular research. Current methodologies, however, confront dual constraints: predominant reliance on predefined spatial pattern templates restricts detection of novel complex spatial architectures, and inconsistent sample selection strategies compromise analytical stability and biological interpretability. To overcome these challenges, we propose a novel Bayesian hierarchical framework incorporating non-parametric spatial modeling and across-sample integration. It takes advantage of the non-parametric technique and develops an adaptive spatial process accommodating complex pattern discovery while maintaining biological interpretability. A novel cross-sample bi-level shrinkage prior is further introduced for robust multi-sample SV gene detection, facilitating more effective information fusion. An efficient variational inference is developed for posterior inference ensuring computational scalability. Comprehensive simulations demonstrate the improved performance of our proposed method over existing analytical frameworks, and its application to DLPFC and SCC data reveals interpretable SV genes whose spatial patterns delineate relevant clusters and gradients, advancing human transcriptomics.
Integrative rank-based regression for multi-source high-dimensional data with multi-type responses
Journal of Applied Statistics · 2025-01-16
articleOpen accessPractical scenarios often present instances where the types of responses are different between multi-source different datasets, reflecting distinct attributes or characteristics. In this paper, an integrative rank-based regression is proposed to facilitate information sharing among varied datasets with multi-type responses. Taking advantage of the rank-based regression, our proposed approach adeptly tackles differences in the magnitude of loss functions. In addition, it can robustly handle outliers and data contamination, and effectively mitigate model misspecification. Extensive numerical simulations demonstrate the superior and competitive performance of the proposed approach in model estimation and variable selection. Analysis of genetic data on HNSC and LUAD yields results with biological explanations and confirms its practical usefulness.
A Selective Review of Network Analysis Methods for Gene Expression Data
Methods in molecular biology · 2025-01-01
reviewSenior authorRobust sparse Bayesian regression for longitudinal gene–environment interactions
Journal of the Royal Statistical Society Series C (Applied Statistics) · 2025-03-31 · 2 citations
articleOpen accessAbstract In longitudinal studies, repeated measure analysis of variance (ANOVA) is a classical analysis where selecting important main and interaction effects for accurate estimation and prediction is among one of its central goals. With high-dimensional genetic factors, ANOVA leads to a sparse longitudinal gene–environment (G×E) interaction problem that has not been thoroughly investigated so far, partially due to the challenges to incorporate robustness against skewed phenotypic measurements, intra-cluster correlations among longitudinal observations, and structured sparsity arising from the ANOVA design. We have developed a novel robust sparse Bayesian mixed model to tackle these challenges. Outliers and inter-relatedness among repeated measurements can be efficiently accommodated. Meanwhile, the proposed model conducts robust Bayesian variable selection accounting for main and interaction effects via structured spike-and-slab priors. We have developed Gibbs samplers and MCMC algorithms for fast computation and posterior inference. The advantage of the proposed method over benchmarks in variable selection and estimation has been established through extensive simulations. In the case study, we have analysed longitudinal lipidomics data with repeatedly measured body weight of CD-1 mice from a cancer prevention study. The proposed model has identified main and interactions with important implications and led to better prediction performance over alternative methods.
Network-based hierarchical heterogeneity analysis and applications to cancer omics data
Science China Mathematics · 2025-12-02
articleSenior author
Recent grants
NIH · $435k · 2014–2018
Collaborative Research: Novel methods for pharmacogenomic data analysis using gene clusters
NSF · $100k · 2009–2013
Collaborative Proposal: Novel Semiparametric Two-part Models: New Theories and Applications
NSF · $105k · 2008–2012
Unsupervised and Semisupervised Heterogeneity Analysis Based on Gaussian Graphical Models
NSF · $200k · 2022–2026
NIH · $1.3M · 2016
Frequent coauthors
- 131 shared
Jian Huang
- 89 shared
Qingzhao Zhang
- 75 shared
Xingjie Shi
Chuzhou University
- 73 shared
Jin Liu
Chinese University of Hong Kong, Shenzhen
- 60 shared
Mengyun Wu
Shanghai University of Finance and Economics
- 56 shared
Sanguo Zhang
- 50 shared
Shihao Shen
- 49 shared
Yong Zhou
Nanchang University
Awards & honors
- 2023 Best Paper Award, China International Conference in Fin…
- 2022 Best Paper on Competition Economics, Association of Com…
- 2022 Jerry S. Cohen Award for Antitrust Scholarship, America…
- 2021 GARP Best Paper in Risk Management Award, MFA
- 2020 AdC Competition Policy Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Song Ma
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup