Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
John Lafferty

John Lafferty

· John C. Malone Professor of Statistics & Data ScienceVerified

Yale University · Psychology

Active 1970–2025

h-index81
Citations54.5k
Papers33320 last 5y
Funding$965k
See your match with John Lafferty — sign in to PhdFit.Sign in

About

John Lafferty is the John C. Malone Professor of Statistics and Data Science at Yale University. He is also the director of the Center for Neurocomputation and Machine Intelligence at the Wu Tsai Institute at Yale. His research involves developing computational models to understand perception and memory, as demonstrated by his recent work on a study exploring how the brain prioritizes what to remember. In collaboration with Yale scientists, Lafferty contributed to creating a model that addresses the processes of visual signal compression and reconstruction, which helps explain why certain images are more memorable based on the difficulty of their reconstruction. His work aims to shed light on perception and memory formation, with potential applications in developing more efficient memory systems for artificial intelligence.

Research topics

  • Artificial Intelligence
  • Computer Science
  • Machine Learning
  • Data Mining
  • Neuroscience
  • Biology
  • Algorithm
  • Physics
  • Computational science
  • Theoretical computer science

Selected publications

  • Confidence Intervals for Linear Models with Arbitrary Noise Contamination

    ArXiv.org · 2025-11-10

    preprintOpen accessSenior author

    We study confidence interval construction for linear regression under Huber's contamination model, where an unknown fraction of noise variables is arbitrarily corrupted. While robust point estimation in this setting is well understood, statistical inference remains challenging, especially because the contamination proportion is not identifiable from the data. We develop a new algorithm that constructs confidence intervals for individual regression coefficients without any prior knowledge of the contamination level. Our method is based on a Z-estimation framework using a smooth estimating function. The method directly quantifies the uncertainty of the estimating equation after a preprocessing step that decorrelates covariates associated with the nuisance parameters. We show that the resulting confidence interval has valid coverage uniformly over all contamination distributions and attains an optimal length of order $O(1/\sqrt{n(1-ε)^2})$, matching the rate achievable when the contamination proportion $ε$ is known. This result stands in sharp contrast to the adaptation cost of robust interval estimation observed in the simpler Gaussian location model.

  • Pressure-Induced Three- to Two-Dimensional Structural Transition in Light Lanthanide Trichlorides

    Inorganic Chemistry · 2025-11-24

    article

    Rare-earth chlorides exhibit three polymorphs at ambient pressure, among which the UCl3-type three-dimensional (3D) framework with 9-fold Ln coordination is the dominant structural motif for the light lanthanides. Here, we report the high-pressure synthesis and structural characterization of LnCl3 (Ln = La, Ce, Pr, Nd, Gd, and Y) obtained at 5 GPa and 1000 °C. All high-pressure polymorphs adopt the two-dimensional (2D) NdBr3-type structure (Cmcm) built from LnCl8 polyhedra. For YCl3, the ambient-pressure AlCl3-type phase (CN = 6) transforms into the NdBr3-type structure (CN = 8) under compression. In contrast, La–Gd trichlorides undergo an unusual reduction from CN = 9 to 8. This counterintuitive behavior is rationalized by pressure-induced Ln–Cl bond shortening, which maintains reasonable bond-valence sums, together with enhanced packing density arising from pronounced out-of-plane contraction, as supported by density functional theory (DFT) calculations. These results demonstrate that high pressure can stabilize recoverable 2D polymorphs, expanding the compositional space of NdBr3-type layered structures and offering opportunities for the exploration of functional van der Waals-type materials.

  • CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision

    ArXiv.org · 2025-05-21

    preprintOpen accessSenior author

    Learning complex functions that involve multi-step reasoning poses a significant challenge for standard supervised learning from input-output examples. Chain-of-thought (CoT) supervision, which provides intermediate reasoning steps together with the final output, has emerged as a powerful empirical technique, underpinning much of the recent progress in the reasoning capabilities of large language models. This paper develops a statistical theory of learning under CoT supervision. A key characteristic of the CoT setting, in contrast to standard supervision, is the mismatch between the training objective (CoT risk) and the test objective (end-to-end risk). A central part of our analysis, distinguished from prior work, is explicitly linking those two types of risk to achieve sharper sample complexity bounds. This is achieved via the *CoT information measure* $\mathcal{I}_{\mathcal{D}, h_\star}^{\mathrm{CoT}}(ε; \calH)$, which quantifies the additional discriminative power gained from observing the reasoning process. The main theoretical results demonstrate how CoT supervision can yield significantly faster learning rates compared to standard E2E supervision. Specifically, it is shown that the sample complexity required to achieve a target E2E error $ε$ scales as $d/\mathcal{I}_{\mathcal{D}, h_\star}^{\mathrm{CoT}}(ε; \calH)$, where $d$ is a measure of hypothesis class complexity, which can be much faster than standard $d/ε$ rates. Information-theoretic lower bounds in terms of the CoT information are also obtained. Together, these results suggest that CoT information is a fundamental measure of statistical complexity for learning under chain-of-thought supervision.

  • ACM/IMS <i>Journal of Data Science</i> : Inaugural Issue Editorial

    ACM / IMS Journal of Data Science · 2024-03-22

    articleOpen accessSenior author

    Data Science (JDS) is a joint journal of the Association for Computing Machinery (ACM) and the Institute of Mathematical Statistics (IMS), publishing high-impact research from all areas of data science, across foundations, applications, and systems.The scope of the journal is multi-disciplinary and broad, spanning statistics, machine learning, computer systems, and the societal implications of data science.JDS accepts original papers and novel surveys that summarize and organize critical subject areas, as well as opinion papers.The journal bridges communities across the two scientific societies, representing diverse areas of research expertise.By combining elements of journal and conference publishing, the journal aims to serve the needs of a rapidly evolving research landscape.The journal accepts submissions three times a year.Each submission receives three expert reviews from a standing reviewing board, and the three-month initial reviewing process includes author feedback, review quality analysis, and reviewer discussions to reach a decision and provide constructive feedback.After the initial review process, which proceeds on a fixed schedule typical of conferences, authors prepare revisions, taking as much time as required.Accepted papers are published online on the JDS website immediately after the camera-ready sources have been prepared and checked, followed by full publication in the first available issue.This inaugural issue of the journal includes four research papers that intersect the areas of machine learning, artificial intelligence, databases, and data management systems.The papers were submitted upon invitation by the editors to area experts.Following JDS guidelines, the papers were reviewed by experts in the relevant communities.The topics and perspectives seen in this work are signs of the diversity, impact, and high standards that JDS aims to achieve as the journal ramps up.The subsequent two issues will also include invited submissions, representing additional research at the interface of statistics, machine learning, computer systems, and other areas that make up the growing landscape of data science.

  • Approximation of relation functions and attention mechanisms

    arXiv (Cornell University) · 2024-02-13 · 2 citations

    preprintOpen accessSenior author

    Inner products of neural network feature maps arise in a wide variety of machine learning frameworks as a method of modeling relations between inputs. This work studies the approximation properties of inner products of neural networks. It is shown that the inner product of a multi-layer perceptron with itself is a universal approximator for symmetric positive-definite relation functions. In the case of asymmetric relation functions, it is shown that the inner product of two different multi-layer perceptrons is a universal approximator. In both cases, a bound is obtained on the number of neurons required to achieve a given accuracy of approximation. In the symmetric case, the function class can be identified with kernels of reproducing kernel Hilbert spaces, whereas in the asymmetric case the function class can be identified with kernels of reproducing kernel Banach spaces. Finally, these approximation results are applied to analyzing the attention mechanism underlying Transformers, showing that any retrieval mechanism defined by an abstract preorder can be approximated by attention through its inner product relations. This result uses the Debreu representation theorem in economics to represent preference relations in terms of utility functions.

  • Images with harder-to-reconstruct visual representations leave stronger memory traces

    Nature Human Behaviour · 2024-05-13 · 13 citations

    article
  • Disentangling and Integrating Relational and Sensory Information in Transformer Architectures

    arXiv (Cornell University) · 2024-05-26

    preprintOpen accessSenior author

    Relational reasoning is a central component of generally intelligent systems, enabling robust and data-efficient inductive generalization. Recent empirical evidence shows that many existing neural architectures, including Transformers, struggle with tasks requiring relational reasoning. In this work, we distinguish between two types of information: sensory information about the properties of individual objects, and relational information about the relationships between objects. While neural attention provides a powerful mechanism for controlling the flow of sensory information between objects, the Transformer lacks an explicit computational mechanism for routing and processing relational information. To address this limitation, we propose an architectural extension of the Transformer framework that we call the Dual Attention Transformer (DAT), featuring two distinct attention mechanisms: sensory attention for directing the flow of sensory information, and a novel relational attention mechanism for directing the flow of relational information. We empirically evaluate DAT on a diverse set of tasks ranging from synthetic relational benchmarks to complex real-world tasks such as language modeling and visual processing. Our results demonstrate that integrating explicit relational computational mechanisms into the Transformer architecture leads to significant performance gains in terms of data efficiency and parameter efficiency.

  • The relational bottleneck as an inductive bias for efficient abstraction

    Trends in Cognitive Sciences · 2024-05-09 · 17 citations

    review
  • Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers

    arXiv (Cornell University) · 2023-04-01 · 6 citations

    preprintOpen accessSenior author

    An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from object-level features. This enables explicit relational reasoning, supporting abstraction and generalization from limited data. The Abstractor is first evaluated on simple discriminative relational tasks and compared to existing relational architectures. Next, the Abstractor is evaluated on purely relational sequence-to-sequence tasks, where dramatic improvements are seen in sample efficiency compared to standard Transformers. Finally, Abstractors are evaluated on a collection of tasks based on mathematical problem solving, where consistent improvements in performance and sample efficiency are observed.

  • Learning Hierarchical Relational Representations through Relational Convolutions

    arXiv (Cornell University) · 2023-10-05 · 1 citations

    preprintOpen accessSenior author

    An evolving area of research in deep learning is the study of architectures and inductive biases that support the learning of relational feature representations. In this paper, we address the challenge of learning representations of hierarchical relations--that is, higher-order relational patterns among groups of objects. We introduce "relational convolutional networks", a neural architecture equipped with computational mechanisms that capture progressively more complex relational features through the composition of simple modules. A key component of this framework is a novel operation that captures relational patterns in groups of objects by convolving graphlet filters--learnable templates of relational patterns--against subsets of the input. Composing relational convolutions gives rise to a deep architecture that learns representations of higher-order, hierarchical relations. We present the motivation and details of the architecture, together with a set of experiments to demonstrate how relational convolutional networks can provide an effective framework for modeling relational tasks that have hierarchical structure.

Recent grants

Frequent coauthors

  • Larry Wasserman

    Carnegie Mellon University

    71 shared
  • Han Liu

    46 shared
  • Fang Han

    University of Washington

    40 shared
  • Ming Yuan

    Peking University Shenzhen Hospital

    40 shared
  • Mark Crowther

    St. Joseph’s Healthcare Hamilton

    32 shared
  • David H.K. Chui

    Boston Medical Center

    27 shared
  • John S. Waye

    Hamilton Regional Laboratory Medicine Program

    21 shared
  • Andrew McFarlane

    21 shared
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with John Lafferty

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup