John Lafferty

· John C. Malone Professor of Statistics & Data ScienceVerified

Yale University · Psychology

Active 1970–2025

h-index81

Citations54.5k

Papers33320 last 5y

Funding$965k

Faculty page

See your match with John Lafferty — sign in to PhdFit.Sign in

About

John Lafferty is the John C. Malone Professor of Statistics and Data Science at Yale University. He is also the director of the Center for Neurocomputation and Machine Intelligence at the Wu Tsai Institute at Yale. His research involves developing computational models to understand perception and memory, as demonstrated by his recent work on a study exploring how the brain prioritizes what to remember. In collaboration with Yale scientists, Lafferty contributed to creating a model that addresses the processes of visual signal compression and reconstruction, which helps explain why certain images are more memorable based on the difficulty of their reconstruction. His work aims to shed light on perception and memory formation, with potential applications in developing more efficient memory systems for artificial intelligence.

Research topics

Artificial Intelligence
Computer Science
Machine Learning
Data Mining
Neuroscience
Biology
Algorithm
Physics
Computational science
Theoretical computer science

Selected publications

Confidence Intervals for Linear Models with Arbitrary Noise Contamination
ArXiv.org · 2025-11-10
preprintOpen accessSenior author
We study confidence interval construction for linear regression under Huber's contamination model, where an unknown fraction of noise variables is arbitrarily corrupted. While robust point estimation in this setting is well understood, statistical inference remains challenging, especially because the contamination proportion is not identifiable from the data. We develop a new algorithm that constructs confidence intervals for individual regression coefficients without any prior knowledge of the contamination level. Our method is based on a Z-estimation framework using a smooth estimating function. The method directly quantifies the uncertainty of the estimating equation after a preprocessing step that decorrelates covariates associated with the nuisance parameters. We show that the resulting confidence interval has valid coverage uniformly over all contamination distributions and attains an optimal length of order $O(1/\sqrt{n(1-ε)^2})$, matching the rate achievable when the contamination proportion $ε$ is known. This result stands in sharp contrast to the adaptation cost of robust interval estimation observed in the simpler Gaussian location model.
Publisher OA PDF DOI
Pressure-Induced Three- to Two-Dimensional Structural Transition in Light Lanthanide Trichlorides
Inorganic Chemistry · 2025-11-24
article
Rare-earth chlorides exhibit three polymorphs at ambient pressure, among which the UCl3-type three-dimensional (3D) framework with 9-fold Ln coordination is the dominant structural motif for the light lanthanides. Here, we report the high-pressure synthesis and structural characterization of LnCl3 (Ln = La, Ce, Pr, Nd, Gd, and Y) obtained at 5 GPa and 1000 °C. All high-pressure polymorphs adopt the two-dimensional (2D) NdBr3-type structure (Cmcm) built from LnCl8 polyhedra. For YCl3, the ambient-pressure AlCl3-type phase (CN = 6) transforms into the NdBr3-type structure (CN = 8) under compression. In contrast, La–Gd trichlorides undergo an unusual reduction from CN = 9 to 8. This counterintuitive behavior is rationalized by pressure-induced Ln–Cl bond shortening, which maintains reasonable bond-valence sums, together with enhanced packing density arising from pronounced out-of-plane contraction, as supported by density functional theory (DFT) calculations. These results demonstrate that high pressure can stabilize recoverable 2D polymorphs, expanding the compositional space of NdBr3-type layered structures and offering opportunities for the exploration of functional van der Waals-type materials.
Publisher DOI
CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision
ArXiv.org · 2025-05-21
preprintOpen accessSenior author
Learning complex functions that involve multi-step reasoning poses a significant challenge for standard supervised learning from input-output examples. Chain-of-thought (CoT) supervision, which provides intermediate reasoning steps together with the final output, has emerged as a powerful empirical technique, underpinning much of the recent progress in the reasoning capabilities of large language models. This paper develops a statistical theory of learning under CoT supervision. A key characteristic of the CoT setting, in contrast to standard supervision, is the mismatch between the training objective (CoT risk) and the test objective (end-to-end risk). A central part of our analysis, distinguished from prior work, is explicitly linking those two types of risk to achieve sharper sample complexity bounds. This is achieved via the *CoT information measure* $\mathcal{I}_{\mathcal{D}, h_\star}^{\mathrm{CoT}}(ε; \calH)$, which quantifies the additional discriminative power gained from observing the reasoning process. The main theoretical results demonstrate how CoT supervision can yield significantly faster learning rates compared to standard E2E supervision. Specifically, it is shown that the sample complexity required to achieve a target E2E error $ε$ scales as $d/\mathcal{I}_{\mathcal{D}, h_\star}^{\mathrm{CoT}}(ε; \calH)$, where $d$ is a measure of hypothesis class complexity, which can be much faster than standard $d/ε$ rates. Information-theoretic lower bounds in terms of the CoT information are also obtained. Together, these results suggest that CoT information is a fundamental measure of statistical complexity for learning under chain-of-thought supervision.
Publisher OA PDF DOI
ACM/IMS <i>Journal of Data Science</i> : Inaugural Issue Editorial
ACM / IMS Journal of Data Science · 2024-03-22
articleOpen accessSenior author
Data Science (JDS) is a joint journal of the Association for Computing Machinery (ACM) and the Institute of Mathematical Statistics (IMS), publishing high-impact research from all areas of data science, across foundations, applications, and systems.The scope of the journal is multi-disciplinary and broad, spanning statistics, machine learning, computer systems, and the societal implications of data science.JDS accepts original papers and novel surveys that summarize and organize critical subject areas, as well as opinion papers.The journal bridges communities across the two scientific societies, representing diverse areas of research expertise.By combining elements of journal and conference publishing, the journal aims to serve the needs of a rapidly evolving research landscape.The journal accepts submissions three times a year.Each submission receives three expert reviews from a standing reviewing board, and the three-month initial reviewing process includes author feedback, review quality analysis, and reviewer discussions to reach a decision and provide constructive feedback.After the initial review process, which proceeds on a fixed schedule typical of conferences, authors prepare revisions, taking as much time as required.Accepted papers are published online on the JDS website immediately after the camera-ready sources have been prepared and checked, followed by full publication in the first available issue.This inaugural issue of the journal includes four research papers that intersect the areas of machine learning, artificial intelligence, databases, and data management systems.The papers were submitted upon invitation by the editors to area experts.Following JDS guidelines, the papers were reviewed by experts in the relevant communities.The topics and perspectives seen in this work are signs of the diversity, impact, and high standards that JDS aims to achieve as the journal ramps up.The subsequent two issues will also include invited submissions, representing additional research at the interface of statistics, machine learning, computer systems, and other areas that make up the growing landscape of data science.
Publisher OA PDF DOI
Approximation of relation functions and attention mechanisms
arXiv (Cornell University) · 2024-02-13 · 2 citations
preprintOpen accessSenior author
Inner products of neural network feature maps arise in a wide variety of machine learning frameworks as a method of modeling relations between inputs. This work studies the approximation properties of inner products of neural networks. It is shown that the inner product of a multi-layer perceptron with itself is a universal approximator for symmetric positive-definite relation functions. In the case of asymmetric relation functions, it is shown that the inner product of two different multi-layer perceptrons is a universal approximator. In both cases, a bound is obtained on the number of neurons required to achieve a given accuracy of approximation. In the symmetric case, the function class can be identified with kernels of reproducing kernel Hilbert spaces, whereas in the asymmetric case the function class can be identified with kernels of reproducing kernel Banach spaces. Finally, these approximation results are applied to analyzing the attention mechanism underlying Transformers, showing that any retrieval mechanism defined by an abstract preorder can be approximated by attention through its inner product relations. This result uses the Debreu representation theorem in economics to represent preference relations in terms of utility functions.
Publisher OA PDF DOI
Images with harder-to-reconstruct visual representations leave stronger memory traces
Nature Human Behaviour · 2024-05-13 · 13 citations
article
Publisher DOI
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
arXiv (Cornell University) · 2024-05-26
preprintOpen accessSenior author
Relational reasoning is a central component of generally intelligent systems, enabling robust and data-efficient inductive generalization. Recent empirical evidence shows that many existing neural architectures, including Transformers, struggle with tasks requiring relational reasoning. In this work, we distinguish between two types of information: sensory information about the properties of individual objects, and relational information about the relationships between objects. While neural attention provides a powerful mechanism for controlling the flow of sensory information between objects, the Transformer lacks an explicit computational mechanism for routing and processing relational information. To address this limitation, we propose an architectural extension of the Transformer framework that we call the Dual Attention Transformer (DAT), featuring two distinct attention mechanisms: sensory attention for directing the flow of sensory information, and a novel relational attention mechanism for directing the flow of relational information. We empirically evaluate DAT on a diverse set of tasks ranging from synthetic relational benchmarks to complex real-world tasks such as language modeling and visual processing. Our results demonstrate that integrating explicit relational computational mechanisms into the Transformer architecture leads to significant performance gains in terms of data efficiency and parameter efficiency.
Publisher OA PDF DOI
The relational bottleneck as an inductive bias for efficient abstraction
Trends in Cognitive Sciences · 2024-05-09 · 17 citations
review
Publisher DOI
Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers
arXiv (Cornell University) · 2023-04-01 · 6 citations
preprintOpen accessSenior author
An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from object-level features. This enables explicit relational reasoning, supporting abstraction and generalization from limited data. The Abstractor is first evaluated on simple discriminative relational tasks and compared to existing relational architectures. Next, the Abstractor is evaluated on purely relational sequence-to-sequence tasks, where dramatic improvements are seen in sample efficiency compared to standard Transformers. Finally, Abstractors are evaluated on a collection of tasks based on mathematical problem solving, where consistent improvements in performance and sample efficiency are observed.
Publisher OA PDF DOI
Learning Hierarchical Relational Representations through Relational Convolutions
arXiv (Cornell University) · 2023-10-05 · 1 citations
preprintOpen accessSenior author
An evolving area of research in deep learning is the study of architectures and inductive biases that support the learning of relational feature representations. In this paper, we address the challenge of learning representations of hierarchical relations--that is, higher-order relational patterns among groups of objects. We introduce "relational convolutional networks", a neural architecture equipped with computational mechanisms that capture progressively more complex relational features through the composition of simple modules. A key component of this framework is a novel operation that captures relational patterns in groups of objects by convolving graphlet filters--learnable templates of relational patterns--against subsets of the input. Composing relational convolutions gives rise to a deep architecture that learns representations of higher-order, hierarchical relations. We present the motivation and details of the architecture, together with a set of experiments to demonstrate how relational convolutional networks can provide an effective framework for modeling relational tasks that have hierarchical structure.
Publisher OA PDF DOI

Recent grants

MSPA-MCS: Nonparametric Learning in High Dimensions
NSF · $500k · 2006–2010
Constrained Statistical Estimation and Inference: Theory, Algorithms and Applications
NSF · $320k · 2015–2017
Constrained Statistical Estimation and Inference: Theory, Algorithms and Applications
NSF · $145k · 2017–2018

Frequent coauthors

Larry Wasserman
Carnegie Mellon University
71 shared
Han Liu
46 shared
Fang Han
University of Washington
40 shared
Ming Yuan
Peking University Shenzhen Hospital
40 shared
Mark Crowther
St. Joseph’s Healthcare Hamilton
32 shared
David H.K. Chui
Boston Medical Center
27 shared
John S. Waye
Hamilton Regional Laboratory Medicine Program
21 shared
Andrew McFarlane
21 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with John Lafferty

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you