Tengyu Ma

· Machine Learning, Deep Learning & Reinforcement LearningVerified

Stanford University · Learning, Design, and Technology

Active 2011–2026

h-index43

Citations9.5k

Papers192107 last 5y

Funding—

Faculty page Lab page

See your match with Tengyu Ma — sign in to PhdFit.Sign in

About

Tengyu Ma is an assistant professor of computer science at Stanford University. His research broadly encompasses machine learning, algorithms, and their theoretical foundations. His interests include deep learning, (deep) reinforcement learning, pre-training and foundation models, robustness, non-convex optimization, distributed optimization, and high-dimensional statistics. Ma's work focuses on advancing the understanding and development of machine learning algorithms with provable guarantees and practical impact. He has contributed to areas such as reasoning models, reinforcement learning, large language models, representation learning, robustness, deep learning theory, and uncertainty quantification. Ma is actively involved in teaching courses related to statistical and machine learning theory, machine learning, and nonparametric statistics. He has also served on program committees and as area chair for major conferences including AAAI, ICLR, NeurIPS, and COLT. His research excellence has been recognized with several awards, including the 2022 Samsung AI Researcher of the Year, the Sloan Research Fellowship, the NSF CAREER Award, and the ACM Doctoral Dissertation Award Honorable Mention.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Data Mining
Natural Language Processing
Political Science
Data science
Mathematics
Computer vision
Engineering ethics
Management science
Law
Engineering
Geography
Computer engineering
Algorithm

Selected publications

Scaling Self-Play with Self-Guidance
arXiv (Cornell University) · 2026-04-22
articleOpen accessSenior author
LLM self-play algorithms are notable in that, in principle, nothing bounds their learning: a Conjecturer model creates problems for a Solver, and both improve together. However, in practice, existing LLM self-play methods do not scale well with large amounts of compute, instead hitting learning plateaus. We argue this is because over long training runs, the Conjecturer learns to hack its reward, collapsing to artificially complex problems that do not help the Solver improve. To overcome this, we introduce Self-Guided Self-Play (SGS), a self-play algorithm in which the language model itself guides the Conjecturer away from degeneracy. In SGS, the model takes on three roles: Solver, Conjecturer, and a Guide that scores synthetic problems by their relevance to unsolved target problems and how clean and natural they are, providing supervision against Conjecturer collapse. Our core hypothesis is that language models can assess whether a subproblem is useful for achieving a goal. We evaluate the scaling properties of SGS by running training for significantly longer than prior works and by fitting scaling laws to cumulative solve rate curves. Applying SGS to formal theorem proving in Lean4, we find that it surpasses the asymptotic solve rate of our strongest RL baseline in fewer than 80 rounds of self-play and enables a 7B parameter model, after 200 rounds of self-play, to solve more problems than a 671B parameter model pass@4.
Publisher OA PDF
Scaling Self-Play with Self-Guidance
arXiv (Cornell University) · 2026-04-22
preprintOpen accessSenior author
LLM self-play algorithms are notable in that, in principle, nothing bounds their learning: a Conjecturer model creates problems for a Solver, and both improve together. However, in practice, existing LLM self-play methods do not scale well with large amounts of compute, instead hitting learning plateaus. We argue this is because over long training runs, the Conjecturer learns to hack its reward, collapsing to artificially complex problems that do not help the Solver improve. To overcome this, we introduce Self-Guided Self-Play (SGS), a self-play algorithm in which the language model itself guides the Conjecturer away from degeneracy. In SGS, the model takes on three roles: Solver, Conjecturer, and a Guide that scores synthetic problems by their relevance to unsolved target problems and how clean and natural they are, providing supervision against Conjecturer collapse. Our core hypothesis is that language models can assess whether a subproblem is useful for achieving a goal. We evaluate the scaling properties of SGS by running training for significantly longer than prior works and by fitting scaling laws to cumulative solve rate curves. Applying SGS to formal theorem proving in Lean4, we find that it surpasses the asymptotic solve rate of our strongest RL baseline in fewer than 80 rounds of self-play and enables a 7B parameter model, after 200 rounds of self-play, to solve more problems than a 671B parameter model pass@4.
Publisher DOI
Rethinking Reconstruction and Denoising in the Dark: New Perspective, General Architecture and Beyond
2025-06-10 · 3 citations
article1st authorCorresponding
Recently, enhancing image quality in the original RAW domain has garnered significant attention, with denoising and reconstruction emerging as fundamental tasks. Although some works attempt to couple these tasks, they primarily focus on cascade learning while neglecting task associativity within a broader parameter space, leading to suboptimal performance. This work introduces a novel approach by rethinking denoising and reconstruction from a "backbone-head" perspective, leveraging the stronger shared parameter space offered by the backbone, compared to the encoder used in existing works. We derive task-specific heads with fewer parameters to mitigate learning pressure. By incorporating chromaticity-and-noise perception module into the backbone and introducing task-specific supervision during training, we enable simultaneous high-quality results for reconstruction and denoising. Additionally, we design a dual-head interaction module to capture the latent correspondence between the two tasks, significantly enhancing multi-task accuracy. Extensive experiments validate the superiority of the proposed method. Code is available at: https://github.com/csmty/CANS.
Publisher DOI
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
ArXiv.org · 2025-04-17 · 1 citations
preprintOpen access
Vision-language models are integral to computer vision research, yet many high-performing models remain closed-source, obscuring their data, design and training recipe. The research community has responded by using distillation from black-box models to label training data, achieving strong benchmark results, at the cost of measurable scientific progress. However, without knowing the details of the teacher model and its data sources, scientific progress remains difficult to measure. In this paper, we study building a Perception Language Model (PLM) in a fully open and reproducible framework for transparent research in image and video understanding. We analyze standard training pipelines without distillation from proprietary models and explore large-scale synthetic data to identify critical data gaps, particularly in detailed video understanding. To bridge these gaps, we release 2.8M human-labeled instances of fine-grained video question-answer pairs and spatio-temporally grounded video captions. Additionally, we introduce PLM-VideoBench, a suite for evaluating challenging video understanding tasks focusing on the ability to reason about "what", "where", "when", and "how" of a video. We make our work fully reproducible by providing data, training recipes, code & models. https://github.com/facebookresearch/perception_models
Publisher OA PDF DOI
SAM 3: Segment Anything with Concepts
arXiv (Cornell University) · 2025-11-20 · 3 citations
preprintOpen access
We present Segment Anything Model (SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phrases (e.g., "yellow school bus"), image exemplars, or a combination of both. Promptable Concept Segmentation (PCS) takes such prompts and returns segmentation masks and unique identities for all matching object instances. To advance PCS, we build a scalable data engine that produces a high-quality dataset with 4M unique concept labels, including hard negatives, across images and videos. Our model consists of an image-level detector and a memory-based video tracker that share a single backbone. Recognition and localization are decoupled with a presence head, which boosts detection accuracy. SAM 3 doubles the accuracy of existing systems in both image and video PCS, and improves previous SAM capabilities on visual segmentation tasks. We open source SAM 3 along with our new Segment Anything with Concepts (SA-Co) benchmark for promptable concept segmentation.
Publisher OA PDF DOI
Fantastic Pretraining Optimizers and Where to Find Them
ArXiv.org · 2025-09-02
preprintOpen access
AdamW has long been the dominant optimizer in language model pretraining, despite numerous claims that alternative optimizers offer 1.4 to 2x speedup. We posit that two methodological shortcomings have obscured fair comparisons and hindered practical adoption: (i) unequal hyperparameter tuning and (ii) limited or misleading evaluation setups. To address these two issues, we conduct a systematic study of ten deep learning optimizers across four model scales (0.1B-1.2B parameters) and data-to-model ratios (1-8x the Chinchilla optimum). We find that fair and informative comparisons require rigorous hyperparameter tuning and evaluations across a range of model scales and data-to-model ratios, performed at the end of training. First, optimal hyperparameters for one optimizer may be suboptimal for another, making blind hyperparameter transfer unfair. Second, the actual speedup of many proposed optimizers over well-tuned baselines is lower than claimed and decreases with model size to only 1.1x for 1.2B parameter models. Thirdly, comparing intermediate checkpoints before reaching the target training budgets can be misleading, as rankings between two optimizers can flip during training due to learning rate decay. Through our thorough investigation, we find that all the fastest optimizers such as Muon and Soap, use matrices as preconditioners -- multiplying gradients with matrices rather than entry-wise scalars. However, the speedup of matrix-based optimizers is inversely proportional to model scale, decreasing from 1.4x over AdamW for 0.1B parameter models to merely 1.1x for 1.2B parameter models.
Publisher OA PDF DOI
STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving
ArXiv.org · 2025-01-31
preprintOpen accessSenior author
A fundamental challenge in formal theorem proving by LLMs is the lack of high-quality training data. Although reinforcement learning or expert iteration partially mitigates this issue by alternating between LLM generating proofs and finetuning them on correctly generated ones, performance quickly plateaus due to the scarcity of correct proofs (sparse rewards). To keep improving the models with limited data, we draw inspiration from mathematicians, who continuously develop new results, partly by proposing novel conjectures or exercises (which are often variants of known results) and attempting to solve them. We design the Self-play Theorem Prover (STP) that simultaneously takes on two roles, conjecturer and prover, each providing training signals to the other. The conjecturer is trained iteratively on previously generated conjectures that are barely provable by the current prover, which incentivizes it to generate increasingly challenging conjectures over time. The prover attempts to prove the conjectures with standard expert iteration. We evaluate STP with both Lean and Isabelle formal versifiers. With 51.3 billion tokens generated during the training in Lean, STP proves 28.5% of the statements in the LeanWorkbook dataset, doubling the previous best result of 13.2% achieved through expert iteration. The final model achieves state-of-the-art performance among whole-proof generation methods on miniF2F-test (65.0%, pass@3200), Proofnet-test (23.9%, pass@3200) and PutnamBench (8/644, pass@3200). We release our code, model, and dataset in this URL: https://github.com/kfdong/STP.
Publisher OA PDF DOI
PB-TPR: A Smart Grid Privacy Protection Framework for Secure Data Sharing and Efficient Aggregation
Lecture notes in computer science · 2025-10-15
book-chapter1st authorCorresponding
Publisher DOI
Learning With Self-Calibrator for Fast and Robust Low-Light Image Enhancement
IEEE Transactions on Pattern Analysis and Machine Intelligence · 2025-07-07 · 24 citations
article
Convolutional Neural Networks (CNNs) have shown significant success in the low-light image enhancement task. However, most of existing works encounter challenges in balancing quality and efficiency simultaneously. This limitation hinders practical applicability in real-world scenarios and downstream vision tasks. To overcome these obstacles, we propose a Self-Calibrated Illumination (SCI) learning scheme, introducing a new perspective to boost the model's capability. Based on a weight-sharing illumination estimation process, we construct an embedded self-calibrator to accelerate stage-level convergence, yielding gains that utilize only a single basic block for inference, which drastically diminishes computation cost. Additionally, by introducing the additivity condition on the basic block, we acquire a reinforced version dubbed SCI++, which disentangles the relationship between the self-calibrator and illumination estimator, providing a more interpretable and effective learning paradigm with faster convergence and better stability. We assess the proposed enhancers on standard benchmarks and in-the-wild datasets, confirming that they can restore clean images from diverse scenes with higher quality and efficiency. The verification on different levels of low-light vision tasks shows our applicability against other methods.
Publisher DOI
Perception Encoder: The best visual embeddings are not at the output of the network
ArXiv.org · 2025-04-17
preprintOpen access
We introduce Perception Encoder (PE), a state-of-the-art vision encoder for image and video understanding trained via simple vision-language learning. Traditionally, vision encoders have relied on a variety of pretraining objectives, each tailored to specific downstream tasks such as classification, captioning, or localization. Surprisingly, after scaling our carefully tuned image pretraining recipe and refining with our robust video data engine, we find that contrastive vision-language training alone can produce strong, general embeddings for all of these downstream tasks. There is only one caveat: these embeddings are hidden within the intermediate layers of the network. To draw them out, we introduce two alignment methods: language alignment for multimodal language modeling, and spatial alignment for dense prediction. Together, our PE family of models achieves best-in-class results on a wide variety of tasks, including (1) zero-shot image and video classification and retrieval, simultaneously obtaining 86.6 average zero-shot ImageNet robustness and 76.9 zero-shot Kinetics-400 video classification; (2) document, image, and video Q&A, enabling 94.6 DocVQA, 80.9 InfographicVQA, and 82.7 PerceptionTest with an 8B LLM; and (3) spatial tasks such as detection, tracking, and depth estimation, setting a new COCO state-of-the-art of 66.0 box mAP. To foster further research, we release our models, code, and novel dataset of synthetically and human-annotated videos: https://github.com/facebookresearch/perception_models
Publisher OA PDF DOI

Frequent coauthors

Sanjeev Arora
34 shared
Yingyu Liang
24 shared
Colin Wei
23 shared
Percy Liang
18 shared
Yuanzhi Li
18 shared
Andrej Risteski
17 shared
Sang Michael Xie
14 shared
Ananya Kumar
12 shared

Labs

Tengyu Ma's LabPI
Research interests broadly include topics in machine learning, algorithms and their theory, such as deep learning, (deep) reinforcement learning, pre-training / foundation models, robustness, non-convex optimization, distributed optimization, and high-dimensional statistics.

Awards & honors

2022 Samsung AI Researcher of the Year
Sloan Research Fellowships 2021
NSF CAREER Award
ACM Doctoral Dissertation Award Honorable Mention
2018 COLT Best Paper Award

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Tengyu Ma

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you