Yang Feng
· Professor of BiostatisticsVerifiedNew York University · Department of Biostatistics
Active 1988–2026
About
Yang Feng is a Professor and Ph.D. Program Director of Biostatistics at the NYU School of Global Public Health, with an affiliate faculty position in the Center for Data Science at New York University. He earned his Ph.D. in Operations Research from Princeton University in 2010. His research interests encompass the theoretical and methodological aspects of machine learning, high-dimensional statistics, social network models, and nonparametric statistics, leading to practical applications in areas such as Alzheimer's disease, cancer classification, and electronic health records. Feng's work has been funded by multiple grants from the NIH and NSF, including the NSF CAREER Award. He serves as an Associate Editor for several prominent journals, including the Journal of the American Statistical Association, the Journal of Business & Economic Statistics, the Journal of Computational & Graphical Statistics, and the Annals of Applied Statistics. His professional recognitions include being named a fellow of the American Statistical Association and the Institute of Mathematical Statistics, as well as being an elected member of the International Statistical Institute.
Research topics
- Computer Science
- Data Mining
- Internal medicine
- Machine Learning
- Artificial Intelligence
- Medicine
- Algorithm
- Pathology
- Mathematics
- Statistics
- Gastroenterology
- Immunology
Selected publications
Review of Electromagnetic Crimping Technology for Cable Joints
2026-01-16
articleCurrently, traditional crimping methods suffer from numerous shortcomings. As a form of electromagnetic forming technology, electromagnetic pulse crimping offers a novel technical solution to address these limitations. This paper systematically introduces the concept and development process of electromagnetic crimping for cable joints, highlighting its significant application potential. First, the systematic design of electromagnetic crimping devices for cable joints is discussed. Subsequently, the key technologies of existing electromagnetic crimping for cable joints are elaborated from three dimensions: coil and magnetizer design, multi-objective optimization of crimping process parameters, and metal interface bonding mechanisms along with adaptation control techniques. Following this, focusing on quality inspection methods, the current research status in temperature monitoring and defect identification, visual inspection and dimensional measurement, as well as non-destructive evaluation and performance testing is summarized and analyzed. Finally, this paper explores the urgent technical challenges in electromagnetic crimping for cable joints and outlines its future development trends. This review uniquely synthesizes design, process, and inspection methodologies, thereby providing a consolidated framework to guide future research toward intelligent and reliable cable joint systems.
American Journal of Respiratory and Critical Care Medicine · 2025-05-01
articleAbstract Rationale: Recent data from patients with chronic respiratory diseases such as COPD suggest that air pollution is associated with changes in the airway microbiome, and lower airway dysbiosis has been linked to the severity of bronchiectasis. Here we examine whether distinct air pollution exposure patterns are associated with the composition of the lung microbiome in a bronchiectasis cohort. Method: Lower airway samples were obtained from 200 patients undergoing clinically indicated bronchoscopies. Outdoor air pollution exposure data was derived from the Chemical Speciation Network, and was available for 96 participants. Chi-square tests were used for categorical variables, reported as frequency/percentage, while continuous variables were analyzed using the Kruskal-Wallis test and reported as median/interquartile range. Hierarchical clustering (complete linkage) with Euclidean distance identified three clusters based on air pollutant levels. Using 16S rRNA gene sequencing data from lower airway samples, beta diversity was assessed using the Bray-Curtis dissimilarity index. Differential enrichment analysis across clusters was performed using LEfSe (LDA score: 2). Result: Clustering analysis revealed three distinct environmental exposure profiles (Figure 1A). Cluster 1 had elevated exposure to biomass and potassium with lower levels of traffic-related pollutants. Cluster 2 had high levels of traffic-related pollutants, including elemental carbon, nickel, and oil, along with PM2.5, soil, and silicon. Cluster 3 exhibited high sulfur and coal exposure with lower levels of soil and silicon. Beta diversity analysis showed significant differences in microbial composition between clusters (p = 0.006, Figure 1B). Genus-level analysis identified multiple taxa differentially enriched across all three clusters (Figure 1C). Differential enrichment analyses using LEfSe showed that 23 taxa at genus level were significantly enriched in the lower airways comparing the three environmental clusters. Among the top differentially enriched taxa, the lower airways of patients exposed to Cluster 1 air pollutants with Halomonas (LDA score = 3.95, p < 0.006), the lower airways of patients exposed to Cluster 2 air pollutants with Flavobacterium (LDA score = 4.6, p < 0.017), while the lower airways of patients exposed to Cluster 3 air pollutants were enriched with Haemophilus (LDA score = 4.89, p < 0.005). Conclusion: Distinct environmental exposure profiles in bronchiectasis are associated with specific shifts in lower airway microbiota, highlighting potential impacts of pollutants on microbial dysbiosis. These findings emphasize the potential role of environmental factors in the pathogenesis of bronchiectasis. Further investigation is needed to clarify the mechanisms by which these pollutants may contribute to disease progression and exacerbation risk.
Fusion Engineering and Design · 2025-02-05 · 3 citations
articleThe Lancet Oncology · 2025-09-01 · 11 citations
articleOpen accessCHEST Journal · 2025-10-01
articleJournal of the American Statistical Association · 2025-06-18
articleOpen accessSenior authorCorrespondingDesign-based causal inference, also known as randomization-based or finite-population causal inference, is one of the most widely used causal inference frameworks, largely due to the merit that its validity can be guaranteed by study design (e.g., randomized experiments) and does not require assuming specific outcome-generating distributions or super-population models. Despite its advantages, design-based causal inference can still suffer from other issues, among which outcome missingness is a prevalent and significant challenge. This work systematically studies the outcome missingness problem in design-based causal inference. First, we propose a general and flexible outcome missingness mechanism that can facilitate finite-population-exact randomization tests of no treatment effect. Second, under this general missingness mechanism, we propose a general framework called "imputation and re-imputation" for conducting randomization tests in design-based causal inference with missing outcomes. We prove that our framework can still ensure finite-population-exact type-I error rate control even when the imputation model was misspecified or when unobserved covariates or interference exist in the missingness mechanism. Third, we extend our framework to conduct covariate adjustment in randomization tests and construct finite-population-valid confidence regions with missing outcomes. Our framework is evaluated via extensive simulation studies and applied to a large-scale randomized experiment.
Journal of Magnetic Resonance Imaging · 2025-08-22 · 1 citations
articleOpen accessBACKGROUND: Identifying early neuropathological changes in Alzheimer's disease (AD) is important for improving treatment efficacy. Among quantitative MRI measures, transverse relaxation time (T2) has been shown to reflect tissue microstructure relevant in aging and neurodegeneration; however, findings regarding T2 changes in both normal aging and AD have been inconsistent. The association between T2 and amyloid-beta (Aβ) accumulation, a hallmark of AD pathology, is also unclear, particularly in cognitively normal individuals who may be in preclinical stages of the disease. PURPOSE: To investigate longitudinal hippocampal T2 changes in a cognitively normal cohort of older adults and their association with global Aβ accumulation. STUDY TYPE: Retrospective, longitudinal. SUBJECTS: 56 cognitively normal adults between 55 and 90 years of age (17 males and 39 females). FIELD STRENGTH/SEQUENCE: 3 Tesla; multi-echo spin echo sequence for T2 mapping; 18F-florbetaben positron emission tomography for Aβ measurement. ASSESSMENT: Bilateral hippocampal T2 and volume were extracted to relate to Aβ PET measurements. To understand variations in AD risk, participants were separated into Aβ-high and Aβ-low subgroups using a predetermined threshold. STATISTICAL TESTS: Linear mixed-effect models and general linear models were used. A p-value < 0.025 was considered significant to account for bilateral comparisons. RESULTS: Older age was associated with increased T2 in the bilateral hippocampus (left: β = 0.30, right: β = 0.25) and smaller hippocampal volume on the left (β = -0.12). In the Aβ-low subgroup, both longitudinal T2 increase rates (β = 0.65) in the left hippocampus and bilateral cross-sectional T2 (left: β = 0.64, right: β = 0.46) were positively correlated with Aβ PET, independent of hippocampal volume. DATA CONCLUSION: This study provided in vivo evidence linking hippocampal T2 to Aβ accumulation in cognitively normal aging individuals, suggesting that quantitative T2 may be sensitive to microstructural changes accompanying early Aβ pathology, such as neuroinflammation, demyelination, and reduced tissue integrity. EVIDENCE LEVEL: 3. TECHNICAL EFFICACY: Stage 2.
SSRN Electronic Journal · 2025-01-01
preprintOpen accessStatistics in Medicine · 2024-11-12 · 3 citations
articleOpen accessSenior authorCorrespondingABSTRACT High‐dimensional multinomial regression models are very useful in practice but have received less research attention than logistic regression models, especially from the perspective of statistical inference. In this work, we analyze the estimation and prediction error of the contrast‐based ‐penalized multinomial regression model and extend the debiasing method to the multinomial case, providing a valid confidence interval for each coefficient and value of the individual hypothesis test. We also examine cases of model misspecification and non‐identically distributed data to demonstrate the robustness of our method when some assumptions are violated. We apply the debiasing method to identify important predictors in the progression into dementia of different subtypes. Results from extensive simulations show the superiority of the debiasing method compared to other inference methods.
Journal of the Royal Statistical Society Series B (Statistical Methodology) · 2024-06-12
articleOpen access1st authorCorrespondingMany statistical models for networks overlook the fact that most real-world networks are formed through a growth process.To address this, we introduce the Preferential Attachment Plus Erdo s-Rnyi model, where we let a random network G be the union of a preferential attachment (PA) tree T and additional Erdo s-Rnyi (ER) random edges.The PA tree captures the underlying growth process of a network where vertices/edges are added sequentially, while the ER component can be regarded as noise.Given only one snapshot of the final network G, we study the problem of constructing confidence sets for the root node of the unobserved growth process; the root node can be patient zero in an infection network or the source of fake news in a social network.We propose inference algorithms based on Gibbs sampling that scales to networks with millions of nodes and provide theoretical analysis showing that the size of the confidence set is small if the noise level of the ER edges is not too large.We also propose variations of the model in which multiple growth processes occur simultaneously, reflecting the growth of multiple communities; we use these models to provide a new approach to community detection.
Recent grants
NSF · $130k · 2013–2016
CAREER: Statistical inference of network and relational data
NSF · $317k · 2016–2020
CAREER: Statistical inference of network and relational data
NSF · $151k · 2019–2022
Frequent coauthors
- 121 shared
Daniel G. Haller
University of Pennsylvania
- 121 shared
Paul J. Catalano
- 121 shared
Peter J. O’Dwyer
University of Pennsylvania
- 121 shared
Mace L. Rothenberg
- 121 shared
Edith P. Mitchell
- 121 shared
Jordan Berlin
Twitter (United States)
- 121 shared
Howard S. Hochster
Yale Cancer Center
- 66 shared
Al B. Benson
Northwestern University
Awards & honors
- Fellow of the American Statistical Association (ASA)
- Fellow of the Institute of Mathematical Statistics (IMS)
- Elected member of the International Statistical Institute (I…
- NSF CAREER Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Yang Feng
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup