Harvey Lederman

· Professor, PhilosophyVerified

University of Texas at Austin · Philosophy

Active 2006–2026

h-index11

Citations391

Papers3416 last 5y

Funding—

Faculty page Website

See your match with Harvey Lederman — sign in to PhdFit.Sign in

About

Harvey Lederman is a professor of philosophy at the University of Texas at Austin. Prior to his current position, he served as an assistant professor and then a professor of philosophy at Princeton University until 2023. Beginning in July 2026, he will join New York University. His research encompasses a broad range of interests in contemporary philosophy as well as the history of philosophy, with a particular focus on Chinese neo-Confucianism. Recently, Lederman has concentrated on conceptual and empirical questions related to AI mentality and the implications of artificial intelligence for the meaning of human life. He is co-principal investigator of the AI and Human Objectives Initiative and is affiliated with the Population and Wellbeing Initiative and the School of Civic Leadership. His scholarly contributions have been recognized with awards such as the 2023 Sanders Prize in Epistemology for his paper "Of marbles and matchsticks," which addresses incomplete preferences in decision theory, and the Dao Best Essay Award in 2022 for his work on Wang Yangming titled "What is the 'Unity' in the 'Unity of Knowledge and Action'?"

Research topics

Epistemology
Philosophy
Computer Science
Linguistics
Programming language
Artificial Intelligence
Natural Language Processing
Theology

Selected publications

Emergent Introspection in AI is Content-Agnostic
Open MIND · 2026-03-05
preprint1st authorCorresponding
Introspection is a foundational cognitive ability, but its mechanism is not well understood. Recent work has shown that AI models can introspect. We study the mechanism of this introspection. We first extensively replicate Lindsey (2025)'s thought injection detection paradigm in large open-source models. We show that introspection in these models is content-agnostic: models can detect that an anomaly occurred even when they cannot reliably identify its content. The models confabulate injected concepts that are high-frequency and concrete (e.g., "apple"). They also require fewer tokens to detect an injection than to guess the correct concept (with wrong guesses coming earlier). We argue that a content-agnostic introspective mechanism is consistent with leading theories in philosophy and psychology.
DOI
Dissociating Direct Access from Inference in AI Introspection
arXiv (Cornell University) · 2026-03-05
articleOpen access1st authorCorresponding
Introspection is a foundational cognitive ability, but its mechanism is not well understood. Recent work has shown that AI models can introspect. We study their mechanism of introspection, first extensively replicating Lindsey et al. (2025)'s thought injection detection paradigm in large open-source models. We show that these models detect injected representations via two separable mechanisms: (i) probability-matching (inferring from perceived anomaly of the prompt) and (ii) direct access to internal states. The direct access mechanism is content-agnostic: models detect that an anomaly occurred but cannot reliably identify its semantic content. The two model classes we study confabulate injected concepts that are high-frequency and concrete (e.g., "apple'"); for them correct concept guesses typically require significantly more tokens. This content-agnostic introspective mechanism is consistent with leading theories in philosophy and psychology.
Publisher OA PDF
Privileged Self-Access Matters for Introspection in AI
ArXiv.org · 2025-08-20
preprintOpen access
Whether AI models can introspect is an increasingly important practical question. But there is no consensus on how introspection is to be defined. Beginning from a recently proposed ''lightweight'' definition, we argue instead for a thicker one. According to our proposal, introspection in AI is any process which yields information about internal states through a process more reliable than one with equal or lower computational cost available to a third party. Using experiments where LLMs reason about their internal temperature parameters, we show they can appear to have lightweight introspection while failing to meaningfully introspect per our proposed definition.
Publisher OA PDF DOI
A Dominance Argument Against Incompleteness
The Philosophical Review · 2025-10-01 · 1 citations
article
This article presents a new argument against many forms of moral and prudential value incompleteness. The argument relies on two central principles: (i) a weak “negative dominance” principle, to the effect that lottery 1 is better than lottery 2 only if some possible outcome of lottery 1 is better than some possible outcome of lottery 2, and (ii) a weak form of ex ante Pareto, to the effect that, if lottery 1 gives an unambiguously better (stochastically dominant) prospect to some individuals than lottery 2, and equally good prospects to everyone else, then lottery 1 is better than lottery 2. Given modest auxiliary assumptions, these two principles rule out incompleteness in the prudential ranking of individual lives, and many forms of incompleteness in the moral rankings of outcomes and lotteries.
Publisher DOI
On the Value of Irreplaceable Objects Forthcoming in The Journal of Philosophy
Durham Research Online (Durham University) · 2025-04-23
articleOpen access
Bradford (2023) calls attention to the fact that the strength of our reasons to preserve distinctively valuable objects increases as the number of such objects decreases. Bradford develops an account of this phenomenon in terms of 'irreplaceable value', and in particular in terms of a notion of the degree of such value, which is distinct from its amount. We present an alternative explanation of this pattern in our reasons, which appeals to the value of diversity: the world is better, other things equal, insofar as it contains more kinds of value. We develop this view in two connected ways: one appeals to evidential probability under conditions of uncertainty, and the other appeals to the value of diversity. We conclude by discussing some explanatory advantages of our view over Bradford's.
Publisher
Are Language Models More Like Libraries or Like Librarians? Bibliotechnism, the Novel Reference Problem, and the Attitudes of LLMs
Transactions of the Association for Computational Linguistics · 2024-01-01 · 1 citations
articleOpen access1st authorCorresponding
Abstract Are LLMs cultural technologies like photocopiers or printing presses, which transmit information but cannot create new content? A challenge for this idea, which we call bibliotechnism, is that LLMs generate novel text. We begin with a defense of bibliotechnism, showing how even novel text may inherit its meaning from original human-generated text. We then argue that bibliotechnism faces an independent challenge from examples in which LLMs generate novel reference, using new names to refer to new entities. Such examples could be explained if LLMs were not cultural technologies but had beliefs, desires, and intentions. According to interpretationism in the philosophy of mind, a system has such attitudes if and only if its behavior is well explained by the hypothesis that it does. Interpretationists may hold that LLMs have attitudes, and thus have a simple solution to the novel reference problem. We emphasize, however, that interpretationism is compatible with very simple creatures having attitudes and differs sharply from views that presuppose these attitudes require consciousness, sentience, or intelligence (topics about which we make no claims).
Publisher DOI
Maximal Social Welfare Relations on Infinite Populations Satisfying Permutation Invariance
arXiv (Cornell University) · 2024-08-11
preprintOpen accessSenior author
We study social welfare relations (SWRs) on an infinite population. Our main result is a new characterization of a utilitarian SWR as the \emph{largest} SWR (in terms of subset when the weak relation is viewed as a set of pairs) which satisfies Strong Pareto, Permutation Invariance (elsewhere called ``Relative Anonymity'' and ``Isomorphism Invariance''), and a further ``Quasi-Independence'' axiom.
Publisher OA PDF DOI
A Dominance Argument Against Incompleteness
arXiv (Cornell University) · 2024-03-26
preprintOpen access
This article presents a new argument against many forms of moral and prudential value incompleteness. The argument relies on two central principles: (i) a weak "negative dominance" principle, to the effect that Lottery 1 is better than Lottery 2 only if some possible outcome of Lottery 1 is better than some possible outcome of Lottery 2, and (ii) a weak form of ex ante Pareto, to the effect that, if Lottery 1 gives an unambiguously better (stochastically dominant) prospect to some individuals than Lottery 2, and equally good prospects to everyone else, then Lottery 1 is better than Lottery 2. Given modest auxiliary assumptions, these two principles rule out incompleteness in the prudential ranking of individual lives, and many forms of incompleteness in the moral rankings of outcomes and lotteries.
Publisher OA PDF DOI
Are Language Models More Like Libraries or Like Librarians? Bibliotechnism, the Novel Reference Problem, and the Attitudes of LLMs
arXiv (Cornell University) · 2024-01-10 · 9 citations
preprintOpen access1st authorCorresponding
Are LLMs cultural technologies like photocopiers or printing presses, which transmit information but cannot create new content? A challenge for this idea, which we call bibliotechnism, is that LLMs generate novel text. We begin with a defense of bibliotechnism, showing how even novel text may inherit its meaning from original human-generated text. We then argue that bibliotechnism faces an independent challenge from examples in which LLMs generate novel reference, using new names to refer to new entities. Such examples could be explained if LLMs were not cultural technologies but had beliefs, desires, and intentions. According to interpretationism in the philosophy of mind, a system has such attitudes if and only if its behavior is well explained by the hypothesis that it does. Interpretationists may hold that LLMs have attitudes, and thus have a simple solution to the novel reference problem. We emphasize, however, that interpretationism is compatible with very simple creatures having attitudes and differs sharply from views that presuppose these attitudes require consciousness, sentience, or intelligence (topics about which we make no claims).
Publisher OA PDF DOI
Trying without fail
Philosophical Studies · 2024-08-20 · 11 citations
articleOpen accessSenior authorCorresponding
Publisher OA PDF DOI

Frequent coauthors

Hartry Field
New York University
9 shared
Tore Fjetland Øgaard
University of Bergen
9 shared
Jeremy Goodman
7 shared
P. Fritz
University of Oslo
6 shared
H. R. G. Greaves
University of Oxford
2 shared
Kyle Mahowald
1 shared
Christian Tarsney
The University of Texas at Austin
1 shared
Dean Spears
Indian Statistical Institute
1 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Harvey Lederman

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you