Jianqing Fan

· Associated FacultyVerified

Princeton University · Computer Science

Active 1986–2026

h-index119

Citations62.0k

Papers782199 last 5y

Funding$11.7M2 active

Faculty page Lab page Website

See your match with Jianqing Fan — sign in to PhdFit.Sign in

About

Jianqing Fan is a statistician, financial econometrician, data scientist, and AI researcher. He holds the position of Frederick L. Moore '18 Professor of Finance, Professor of Statistics, and Professor of Operations Research and Financial Engineering at Princeton University. He has served as the chair of the department from 2012 to 2015. His research focuses on statistics, finance, machine learning, and computational biology. Fan has received numerous awards including the 2000 COPSS Presidents' Award, the Morningside Gold Medal for Applied Mathematics in 2007, and the Guy Medal in Silver in 2014. He was elected as an Academician from Academia Sinica in 2012, a member of the Royal Flemish Academy of Belgium in 2023, and a member of the National Academy of Science in 2026. He is associated with multiple departments and centers at Princeton, including the Department of Economics, Department of Computer Science, Department of Electrical Engineering, and various research centers related to statistics, finance, and energy.

Research topics

Computer Science
Artificial Intelligence
Statistics
Machine Learning
Mathematics
Combinatorics
Applied mathematics
Discrete mathematics
Algorithm
Data science

Selected publications

Inferences on mixing probabilities and ranking in mixed-membership models
Journal of the American Statistical Association · 2026-05-18 · 3 citations
preprintOpen access
Network data is prevalent in numerous big data applications, including economics and health networks, where understanding the latent structure of the network is of prime importance. In this paper, we model the network using the Degree-Corrected Mixed Membership (DCMM) model. In the DCMM model, for each node i, there exists a membership vector πi=(πi(1),πi(2),…,πi(K)), where πi(k) denotes the weight that node i puts in community k. We derive a novel finite-sample expansion for the πi(k) s, which allows us to obtain asymptotic distributions and confidence intervals of the membership mixing probabilities and other related population quantities. This fills an important gap in uncertainty quantification on the member’s profile. We further develop a ranking scheme of the vertices based on the membership mixing probabilities on certain communities and perform relevant statistical inferences. A multiplier bootstrap method is proposed for ranking inference of individual membership profiles with respect to a given community. The validity of our theoretical results is further demonstrated via numerical experiments in both real and synthetic data examples.
Publisher OA PDF DOI
Unearthing Financial Statement Fraud: Insights from News Coverage Analysis
Management Science · 2025-09-05 · 1 citations
article1st authorCorresponding
We propose a financial statement (FS) fraud detection framework, called PeerMeta, that makes improvements in all three components of the detection procedure: label measurement, feature set, and detection model. For the label measurement, prior studies mainly adopt FS fraud events that have already been disclosed and confirmed. We construct a new measure based on news coverage that can reflect unrevealed FS fraud behaviors as well. For the feature set, we innovatively add peer factors learned through the business description texts in financial reports. For the detection model, two meta-learning algorithms are applied to aggregate the 19 popular classifiers. The results indicate that the proposed method has amazingly high recall of real fraud cases announced by regulatory authorities, reaching a staggering value of 0.982. We document that all components in PeerMeta contribute to the improvements of FS fraud detection and also showcase the significant economic value of the detection framework and find that recall is more crucial for the economic value than precision. This paper was accepted by Agostino Capponi, finance. Funding: This work was supported by the National Natural Science Foundation of China [Grants 71991470, 7199471, 72121002, 72310107002] and the National Key R&D Program of China [Grant 2021YFC3340703]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2023.03604 .
Publisher DOI
Asymptotic Theory of Eigenvectors for Latent Embeddings with Generalized Laplacian Matrices
ArXiv.org · 2025-03-01
preprintOpen access1st authorCorresponding
Laplacian matrices are commonly employed in many real applications, encoding the underlying latent structural information such as graphs and manifolds. The use of the normalization terms naturally gives rise to random matrices with dependency. It is well-known that dependency is a major bottleneck of new random matrix theory (RMT) developments. To this end, in this paper, we formally introduce a class of generalized (and regularized) Laplacian matrices, which contains the Laplacian matrix and the random adjacency matrix as a specific case, and suggest the new framework of the asymptotic theory of eigenvectors for latent embeddings with generalized Laplacian matrices (ATE-GL). Our new theory is empowered by the tool of generalized quadratic vector equation for dealing with RMT under dependency, and delicate high-order asymptotic expansions of the empirical spiked eigenvectors and eigenvalues based on local laws. The asymptotic normalities established for both spiked eigenvectors and eigenvalues will enable us to conduct precise inference and uncertainty quantification for applications involving the generalized Laplacian matrices with flexibility. We discuss some applications of the suggested ATE-GL framework and showcase its validity through some numerical examples.
Publisher OA PDF DOI
How to Find Fantastic AI Papers: Self-Rankings as a Powerful Predictor of Scientific Impact Beyond Peer Review
ArXiv.org · 2025-10-02
preprintOpen access
Peer review in academic research aims not only to ensure factual correctness but also to identify work of high scientific potential that can shape future research directions. This task is especially critical in fast-moving fields such as artificial intelligence (AI), yet it has become increasingly difficult given the rapid growth of submissions. In this paper, we investigate an underexplored measure for identifying high-impact research: authors' own rankings of their multiple submissions to the same AI conference. Grounded in game-theoretic reasoning, we hypothesize that self-rankings are informative because authors possess unique understanding of their work's conceptual depth and long-term promise. To test this hypothesis, we conducted a large-scale experiment at a leading AI conference, where 1,342 researchers self-ranked their 2,592 submissions by perceived quality. Tracking outcomes over more than a year, we found that papers ranked highest by their authors received twice as many citations as their lowest-ranked counterparts; self-rankings were especially effective at identifying highly cited papers (those with over 150 citations). Moreover, we showed that self-rankings outperformed peer review scores in predicting future citation counts. Our results remained robust after accounting for confounders such as preprint posting time and self-citations. Together, these findings demonstrate that authors' self-rankings provide a reliable and valuable complement to peer review for identifying and elevating high-impact research in AI.
Publisher OA PDF DOI
Surface Atomic Defects and Self-Regulated CO Adsorption on Cu(111): Insights from High-Resolution Scanning Probe Microscopy
Microscopy and Microanalysis · 2025-07-01 · 1 citations
article
Publisher DOI
Fundamental Computational Limits in Pursuing Invariant Causal Prediction and Invariance-Guided Regularization
ArXiv.org · 2025-01-29
preprintOpen accessSenior author
Pursuing invariant prediction from heterogeneous environments opens the door to learning causality in a purely data-driven way and has several applications in causal discovery and robust transfer learning. However, existing methods such as ICP [Peters et al., 2016] and EILLS [Fan et al., 2024] that can attain sample-efficient estimation are based on exponential time algorithms. In this paper, we show that such a problem is intrinsically hard in computation: the decision problem, testing whether a non-trivial prediction-invariant solution exists across two environments, is NP-hard even for the linear causal relationship. In the world where P$\neq$NP, our results imply that the estimation error rate can be arbitrarily slow using any computationally efficient algorithm. This suggests that pursuing causality is fundamentally harder than detecting associations when no prior assumption is pre-offered. Given there is almost no hope of computational improvement under the worst case, this paper proposes a method capable of attaining both computationally and statistically efficient estimation under additional conditions. Furthermore, our estimator is a distributionally robust estimator with an ellipse-shaped uncertain set where more uncertainty is placed on spurious directions than invariant directions, resulting in a smooth interpolation between the most predictive solution and the causal solution by varying the invariance hyper-parameter. Non-asymptotic results and empirical applications support the claim.
Publisher OA PDF DOI
Uncertainty Quantification for Ranking with Heterogeneous Preferences
ArXiv.org · 2025-09-02
preprintOpen access1st authorCorresponding
This paper studies human preference learning based on partially revealed choice behavior and formulates the problem as a generalized Bradley-Terry-Luce (BTL) ranking model that accounts for heterogeneous preferences. Specifically, we assume that each user is associated with a nonparametric preference function, and each item is characterized by a low-dimensional latent feature vector - their interaction defines the underlying low-rank score matrix. In this formulation, we propose an indirect regularization method for collaboratively learning the score matrix, which ensures entrywise $\ell_\infty$-norm error control - a novel contribution to the heterogeneous preference learning literature. This technique is based on sieve approximation and can be extended to a broader class of binary choice models where a smooth link function is adopted. In addition, by applying a single step of the Newton-Raphson method, we debias the regularized estimator and establish uncertainty quantification for item scores and rankings of items, both for the aggregated and individual preferences. Extensive simulation results from synthetic and real datasets corroborate our theoretical findings.
Publisher OA PDF DOI
Communication-Efficient Distributed Estimation and Inference for Cox’s Model
Journal of the American Statistical Association · 2025-06-18 · 2 citations
article
Publisher DOI
The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review
Journal of the American Statistical Association · 2025-06-02 · 2 citations
article
Publisher DOI
Transformers versus the EM Algorithm in Multi-class Clustering
ArXiv.org · 2025-02-09
preprintOpen access
LLMs demonstrate significant inference capacities in complicated machine learning tasks, using the Transformer model as its backbone. Motivated by the limited understanding of such models on the unsupervised learning problems, we study the learning guarantees of Transformers in performing multi-class clustering of the Gaussian Mixture Models. We develop a theory drawing strong connections between the Softmax Attention layers and the workflow of the EM algorithm on clustering the mixture of Gaussians. Our theory provides approximation bounds for the Expectation and Maximization steps by proving the universal approximation abilities of multivariate mappings by Softmax functions. In addition to the approximation guarantees, we also show that with a sufficient number of pre-training samples and an initialization, Transformers can achieve the minimax optimal rate for the problem considered. Our extensive simulations empirically verified our theory by revealing the strong learning capacities of Transformers even beyond the assumptions in the theory, shedding light on the powerful inference capacities of LLMs.
Publisher OA PDF DOI

Recent grants

Robust and Distributed Statistical Learning from Big Data
NSF · $600k · 2017–2023
DMS/NIGMS 2: Collaborative Research: Developing Statistical Learning Methods for Revealing the Molecular Signatures of Microvascular Changes in Neural Injury
NSF · $450k · 2021–2026
Statistical Methods for Ultrahigh-dimensional Biomedical Data
NIH · $293k · 2006–2022
Quantitative Methods for Genome-wide Analysis of Macrophage Activation by ESCs
NIH · $365k · 2011–2016
Collaborative Research: Interface of Probability and Statistics for High-dimensional Inference
NSF · $400k · 2014–2018

Frequent coauthors

Yi Ren
Guangxi Medical University
131 shared
Kai Cao
University Radiology
114 shared
Wise Young
Rutgers, The State University of New Jersey
114 shared
Lin Leng
Yale University
110 shared
Richard Bucala
Yale University
110 shared
Iman Tadmori
110 shared
Andreas Meinhardt
Hudson Institute of Medical Research
110 shared
Changshun Shao
Changchun University of Science and Technology
110 shared

Education

Ph. D., Department of Statistics
University of California, Berkeley
1989
Masters, Department of Statistics
Institute of Applied Mathematics, Chinese Academy of Science
1985
Bachelor, Department of Mathematics
Fudan University
1982

Awards & honors

2000 COPSS Presidents' Award
Morningside Gold Medal for Applied Mathematics (2007)
Guggenheim Fellow (2009)
Pao-Lu Hsu Prize (2013)
Guy Medal in Silver (2014)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Jianqing Fan

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you