Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…

Michael I. Jordan

· ProfessorVerified

University of California, Berkeley · Department of Statistics

Active 1982–2026

h-index175
Citations208.7k
Papers1.4k419 last 5y
Funding$965k
See your match with Michael I. Jordan — sign in to PhdFit.Sign in

About

Michael I. Jordan is Senior Researcher at Inria, Paris, and the Pehong Chen Distinguished Professor Emeritus in the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California, Berkeley. He received his Masters in Mathematics from Arizona State University and earned his PhD in Cognitive Science in 1985 from the University of California, San Diego. He was a professor at MIT from 1988 to 1998. His research interests bridge the computational, statistical, cognitive, biological, and social sciences. Prof. Jordan is a member of several prestigious academies including the National Academy of Sciences, the National Academy of Engineering, the American Academy of Arts and Sciences, the Royal Society as a Foreign Member, and the Chinese Academy of Sciences as a Foreign Member. He is also a Fellow of the American Association for the Advancement of Science. Throughout his career, he has been recognized with numerous awards such as the BBVA Foundation Frontiers of Knowledge Award in Information and Communication Technologies in 2025, the inaugural World Laureates Association Prize in 2022, the Ulf Grenander Prize from the American Mathematical Society in 2021, the IEEE John von Neumann Medal in 2020, the IJCAI Research Excellence Award in 2016, the David E. Rumelhart Prize in 2015, and the ACM/AAAI Allen Newell Award in 2009. He has delivered several distinguished lectures including the Inaugural IMS Grace Wahba Lecture in 2022, the IMS Neyman Lecture in 2011, an IMS Medallion Lecture in 2004, and was a Plenary Lecturer at the International Congress of Mathematicians in 2018. In 2016, Prof. Jordan was named the "most influential computer scientist" worldwide in an article in Science, based on rankings from the Semantic Scholar search engine.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Machine Learning
  • Mathematics
  • Mathematical optimization
  • Applied mathematics
  • Data Mining
  • Algorithm
  • Geology
  • Mathematical analysis
  • Bioinformatics
  • Computational biology
  • Geometry
  • Meteorology
  • Environmental science
  • Geography
  • Biology
  • Statistics
  • Genetics
  • Climatology

Selected publications

  • Mitigating Bias in Spatial Transcriptomic Pipelines via Human Feedback

    bioRxiv (Cold Spring Harbor Laboratory) · 2026-01-16

    articleOpen access

    Abstract Biological discovery from experimental data, particularly large-scale assays, requires extensive preprocessing, during which raw outputs (e.g., images, sequences) are processed into structured forms that are more amenable to analysis. While statistical methods for such processed data are at the core of computational biology, the problem of coping with uncertainties introduced during preprocessing is a significant and underexplored issue. We address this issue in the context of differential expression analysis in spatial transcriptomics, which depends on a series of preprocessing steps, including demarcation of cell regions (segmentation), quantification of gene expression in cells, and cell-type annotation. We introduce Corrected Spatial Differential Expression (CSDE), a method that builds on Prediction-Powered Inference to leverage a small set of expert-validated data points (cells) to account for uncertainty due to preprocessing errors. Using two case studies, we demonstrate that CSDE produces more reliable and calibrated estimates of differential expression compared to the prevalent approach that neglects the impact of preprocessing. CSDE incorporates an efficient workflow to generate the required expert-annotated data, and is available as open-source at https://github.com/YosefLab/CSDE .

  • Online Learning in a Creator Economy

    Artificial Intelligence Science and Engineering · 2026-03-01 · 1 citations

    preprintOpen accessSenior author

    The creator economy is revolutionizing the way in which individuals can profit from their engagement with online platforms. In this paper, we initiate the formal study of online learning in a creator economy by modeling it as a three-party game between users, a platform, and content creators. The platform interacts with creators through contracts under a principal-agent framework and with users via a recommender system. We study how the platform can jointly optimize contracts and recommendation policies in an online learning setting. We analyze return-based and feature-based contracts. Under smoothness assumptions, return-based contracts achieve regret <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\Theta(T^{2/3})$</tex>. For feature-based contracts, we introduce an intrinsic dimension <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$d$</tex> and prove a regret bound <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathcal{O}(T^{(d+1)/(d+2)})$</tex>, which is tight for linear families.

  • An Overview of Large Language Models for Statisticians

    The American Statistician · 2026-04-13 · 4 citations

    articleOpen access

    Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI), exhibiting remarkable capabilities across diverse tasks such as text generation, reasoning, and decision-making. While their success has primarily been driven by advances in computational power and deep learning architectures, emerging problems -- in areas such as uncertainty quantification, decision-making, causal inference, and distribution shift -- require a deeper engagement with the field of statistics. This paper explores potential areas where statisticians can make important contributions to the development of LLMs, particularly those that aim to engender trustworthiness and transparency for human users. Thus, we focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation. We also consider possible roles for LLMs in statistical analysis. By bridging AI and statistics, we aim to foster a deeper collaboration that advances both the theoretical foundations and practical applications of LLMs, ultimately shaping their role in addressing complex societal challenges.

  • Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences

    ArXiv.org · 2025-12-15

    preprintOpen accessSenior author

    The problem of stopping stochastic gradient descent (SGD) in an online manner, based solely on the observed trajectory, is a challenging theoretical problem with significant consequences for applications. While SGD is routinely monitored as it runs, the classical theory of SGD provides guarantees only at pre-specified iteration horizons and offers no valid way to decide, based on the observed trajectory, when further computation is justified. We address this longstanding gap by developing anytime-valid confidence sequences for stochastic gradient methods, which remain valid under continuous monitoring and directly induce statistically valid, trajectory-dependent stopping rules: stop as soon as the current upper confidence bound on an appropriate performance measure falls below a user-specified tolerance. The confidence sequences are constructed using nonnegative supermartingales, are time-uniform, and depend only on observable quantities along the SGD trajectory, without requiring prior knowledge of the optimization horizon. In convex optimization, this yields anytime-valid certificates for weighted suboptimality of projected SGD under general stepsize schedules, without assuming smoothness or strong convexity. In nonconvex optimization, it yields time-uniform certificates for weighted first-order stationarity under smoothness assumptions. We further characterize the stopping-time complexity of the resulting stopping rules under standard stepsize schedules. To the best of our knowledge, this is the first framework that provides statistically valid, time-uniform stopping rules for SGD across both convex and nonconvex settings based solely on its observed trajectory.

  • Safety versus performance: How multi-objective learning reduces barriers to market entry

    Proceedings of the National Academy of Sciences · 2025-10-15

    articleOpen accessCorresponding

    Emerging marketplaces for large language models and other large-scale machine learning models appear to exhibit market concentration, which has raised concerns about whether there are insurmountable barriers to entry in such markets. In this work, we study this issue from both an economic and an algorithmic point of view, focusing on a phenomenon that reduces barriers to entry. Specifically, an incumbent company risks reputational damage unless its model is sufficiently aligned with safety objectives, whereas a new company can more easily avoid reputational damage. To study this issue formally, we define a multi-objective high-dimensional regression framework that captures reputational damage, and we characterize the number of data points that a new company needs to enter the market. Our results demonstrate how multi-objective considerations can fundamentally reduce barriers to entry-the required number of data points can be significantly smaller than the incumbent company's dataset size. En route to proving these results, we develop scaling laws for high-dimensional linear regression in multi-objective environments, showing that the scaling rate becomes slower when the dataset size is large, which could be of independent interest.

  • Conditional Coverage Diagnostics for Conformal Prediction

    ArXiv.org · 2025-12-12

    articleOpen access

    Evaluating conditional coverage remains one of the most persistent challenges in assessing the reliability of predictive systems. Although conformal methods can give guarantees on marginal coverage, no method can guarantee to produce sets with correct conditional coverage, leaving practitioners without a clear way to interpret local deviations. To overcome sample-inefficiency and overfitting issues of existing metrics, we cast conditional coverage estimation as a classification problem. Conditional coverage is violated if and only if any classifier can achieve lower risk than the target coverage. Through the choice of a (proper) loss function, the resulting risk difference gives a conservative estimate of natural miscoverage measures such as L1 and L2 distance, and can even separate the effects of over- and under-coverage, and non-constant target coverages. We call the resulting family of metrics excess risk of the target coverage (ERT). We show experimentally that the use of modern classifiers provides much higher statistical power than simple classifiers underlying established metrics like CovGap. Additionally, we use our metric to benchmark different conformal prediction methods. Finally, we release an open-source package for ERT as well as previous conditional coverage metrics. Together, these contributions provide a new lens for understanding, diagnosing, and improving the conditional reliability of predictive systems.

  • Understanding In-context Learning of Addition via Activation Subspaces

    ArXiv.org · 2025-05-08

    preprintOpen access

    To perform few-shot learning, language models extract signals from a few input-label pairs, aggregate these into a learned prediction rule, and apply this rule to new inputs. How is this implemented in the forward pass of modern transformer models? To explore this question, we study a structured family of few-shot learning tasks for which the true prediction rule is to add an integer $k$ to the input. We introduce a novel optimization method that localizes the model's few-shot ability to only a few attention heads. We then perform an in-depth analysis of individual heads, via dimensionality reduction and decomposition. As an example, on Llama-3-8B-instruct, we reduce its mechanism on our tasks to just three attention heads with six-dimensional subspaces, where four dimensions track the unit digit with trigonometric functions at periods $2$, $5$, and $10$, and two dimensions track magnitude with low-frequency components. To deepen our understanding of the mechanism, we also derive a mathematical identity relating ``aggregation'' and ``extraction'' subspaces for attention heads, allowing us to track the flow of information from individual examples to a final aggregated concept. Using this, we identify a self-correction mechanism where mistakes learned from earlier demonstrations are suppressed by later demonstrations. Our results demonstrate how tracking low-dimensional subspaces of localized heads across a forward pass can provide insight into fine-grained computational structures in language models.

  • Decoding of image properties from single-trial visual evoked potentials recorded by ultra-high-density EEG

    Scientific Reports · 2025-09-25 · 1 citations

    articleOpen access

    Visual evoked potentials (VEPs) recorded by encephalography (EEG) allow us to study the neuronal activity non-invasively and in high temporal resolution. Traditionally, EEG analyses have relied on univariate group-level statistics and trial averaging to detect effects. However, recent advances in high-density EEG enable the investigation of brain responses at the single-subject and single-trial level. In this study, we combine ultra-high-density (uHD) EEG with cross-validated single-trial decoding to bridge both approaches, improving generalizability and reproducibility. Study participants were shown a diverse set of random images while 512 channels from the uHD system recorded their EEG over the occipital lobe. Image properties (contrast, hue, luminance, saturation and spatial frequency) were extracted for each stimuli and VEPs were used for decoding these properties in a cross-validated regression analysis. Additionally, the same data were spatially subsampled to investigate the impact of spatial resolution and electrode density on the decoding performance. Image properties could be decoded from single-trial VEPs, with contrast, saturation and spatial frequency providing the best decoding performances. Grand average decoding performance across all image properties and subjects yielded a Pearson's r of 0.50 between predicted and actual image property score. Greater electrode density improves decoding performance compared to standard EEG as well as subsampled configurations. Image properties robustly modulate early components of the VEP. Importantly, these modulations are pronounced enough to allow for single-trial decoding. Our analyses highlight the importance of electrode density with improvements in decoding performance extending even below 10 mm of inter-electrode distance.

  • Deep generative modeling of sample-level heterogeneity in single-cell genomics

    Nature Methods · 2025-10-13 · 12 citations

    articleOpen access

    Single-cell genomic studies were recently conducted on hundred of samples exhibiting complex designs. These data have tremendous potential for discovering how sample- or tissue-level phenotypes relate to cellular and molecular composition. However, current analyses are often based on simplified representations of these data by averaging information across cells. We present multi-resolution variational inference (MrVI), a deep generative model designed to realize the potential of cohort studies at the single-cell level. MrVI tackles two fundamental, intertwined problems: stratifying samples into groups and evaluating the cellular and molecular differences between groups, without requiring predefined cell states. Leveraging its single-cell perspective, MrVI detects clinically relevant stratifications of cohorts of people with COVID-19 or inflammatory bowel disease that are manifested in only certain cellular subsets, enabling new discoveries that would otherwise be overlooked. MrVI can de novo identify groups of small molecules with similar biochemical properties and evaluate their effects on cellular composition and gene expression in large-scale perturbation studies. MrVI is an open-source tool at scvi-tools.org .

  • Learn then test: Calibrating predictive algorithms to achieve risk control

    The Annals of Applied Statistics · 2025-05-28 · 10 citations

    article

    We introduce a framework for calibrating machine learning models to satisfy finite-sample statistical guarantees. Our calibration algorithms work with any model and (unknown) data-generating distribution and do not require retraining. The algorithms address, among other examples, false discovery rate control in multilabel classification, intersection-over-union control in instance segmentation, and simultaneous control of the type-1 outlier error and confidence set coverage in classification or regression. Our main insight is to reframe risk control as multiple hypothesis testing, enabling different mathematical arguments. We demonstrate our algorithms with detailed worked examples in computer vision and tabular medical data. The computer vision experiments demonstrate the utility of our approach in calibrating state-of-the-art predictive architectures that have been deployed widely, such as the detectron2 object detection system.

Recent grants

Frequent coauthors

Education

  • Ph.D.

    University of California, Berkeley

  • M.S.

    Massachusetts Institute of Technology (MIT)

  • B.S.

    University of California, Berkeley

Awards & honors

  • BBVA Foundation Frontiers of Knowledge Award in Information…
  • ICBS Frontiers of Science Award (with John Duchi and Martin…
  • ICBS Frontiers of Science Award (with Yuchen Zhang, Mingshen…
  • Laureate Distinguished Fellow, International Engineering and…
  • Academy Award, National Academy of Artificial Intelligence (…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Michael I. Jordan

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup