About
Chaz Firestone is an Associate Professor of Psychological and Brain Sciences and serves as the Director of the Hopkins Perception & Mind Laboratory at Johns Hopkins University. His research explores how perception interacts with the rest of the mind, focusing on how perception enables and incorporates sophisticated processing typically associated with higher-level cognition. His lab investigates how the mind generates physical intuitions about the world, such as understanding that a tower of blocks will topple or a stack of dishes will collapse, and how these intuitions are underpinned by basic operations of visual attention and memory. Firestone's work also addresses foundational questions about the nature of perception, including how higher-level cognitive factors like language, desire, emotion, and action influence what we see. His research employs a variety of methods, including computer-based psychophysics experiments, studies in real-world environments, computational models, 3D-printed stimuli, research involving brain-damaged patients, and field experiments such as those conducted in New York City's Times Square.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Psychology
- Cognitive science
- Neuroscience
- Cognitive psychology
- Epistemology
- Mathematics
- Philosophy
- Data science
- Computer Security
- Social psychology
- Sociology
- Human–computer interaction
- Geometry
Selected publications
Part–whole effects in visual number estimation
Attention Perception & Psychophysics · 2026-01-08
articleOpen accessIn a single glance at a collection of objects, we can appreciate their numerosity. But what are the "objects" over which this number sense operates? Most work in this domain has implicitly assumed that we estimate the number of discrete, bounded individuals actually present in the visual field. However, in many instances we can construe such individuals as potential parts of composite objects that they can create-as when we assemble furniture or complete a jigsaw puzzle. Here, we demonstrate that visual numerosity estimation is sensitive to such part-whole relations, such that the number of items in a display is underestimated when it contains spatially separated but easily combinable objects. Participants saw brief displays containing noncontiguous "puzzle-piece" stimuli, and reported which display had more pieces. Crucially, most of the pieces appeared in pairs that either could or could not efficiently combine into new objects. In four experiments, displays with combinable pieces were judged as less numerous than displays with noncombinable pieces-as if the mind treated two geometrically compatible pieces as being the single whole object they could create. These effects went beyond various low-level factors, and they persisted even when participants were explicitly trained to treat individual pieces as the units that should be counted. Thus, despite the many ways that sets of objects may be construed for the purposes of counting, visual perception automatically takes into account the ways that object parts may combine into wholes when extracting numerosity from visual displays.
Perceiving animacy in ‘identical’ images
bioRxiv (Cold Spring Harbor Laboratory) · 2026-04-05
articleOpen accessSenior authorAbstract Some objects appear ‘animate’ (e.g., dogs and elephants) while others do not (e.g., boots and sofas). This distinction pervades human cognition, with an expansive literature reporting striking effects of animacy on vision, memory, social perception, and neural organization. But studies of perceived animacy face a persistent challenge: Objects that differ in animacy tend to differ in many lower-level visual features (e.g., shape, texture, spatial frequency). Thus, it remains controversial whether animacy per se — as opposed to its lower-level correlates — drives visual processing. Here, we achieve previously unattainable levels of experimental control to demonstrate that the visual system represents animacy itself, beyond its lower-level covariates. We vary animacy while holding nearly all lower-level features constant by exploiting “visual anagrams” — a diffusion-based technique for generating static images whose interpretations change radically with orientation. Seven pre-registered experiments leverage this approach to demonstrate that representations of animacy structure visual working memory and guide visual attention. Thus, the visual system extracts animacy itself, beyond its lower-level correlates.
Guessing reveals internal models of perceptual precision
bioRxiv (Cold Spring Harbor Laboratory) · 2026-01-16
articleOpen accessAbstract When observers lack sufficient information to support a confident response, they often guess. Guessing plays a pervasive role in visual cognition and working memory, yet the mechanisms that govern how observers generate guesses remain poorly understood. Standard models traditionally assume that responses produced in the absence of information are either uniformly distributed over feature space or are perhaps weighted towards prevailing environmental statistics. In contrast, here we consider an intriguing alternative: that guesses incorporate observers’ knowledge of their own perceptual capacities. We empirically measured guessing by eliciting responses under extreme target uncertainty (Experiment 1) as well as a novel “0ms presentation” approach in which no stimulus appeared but subjects believed one had (Experiment 2). We evaluated three accounts of guesses under these conditions: unsystematic (lapse) responding, biases toward environmental statistics, and a self-representational account in which guesses reflect observers’ knowledge of their own feature-dependent precision (e.g., preferring to guess feature values they believe they would be likely to miss). Guess responses were non-uniform and systematically biased toward feature values typically encoded with the least precision (e.g., oblique orientations) — a counterintuitive bias away from high-frequency, high-fidelity feature values (e.g., cardinal orientations). This complementary relationship between guessing and perceptual fidelity held within individuals and across paradigms, and was recoverable via an empirical-guess mixture model that replaced the standard uniform assumption with empirically measured guess distributions. Our findings challenge prevailing views that guesses reflect random noise, and suggest instead that guessing behavior reflects metacognitive knowledge of internal precision. Rather than defaulting to environmental priors, observers appear to model their own sensory limitations and leverage these representations to inform decisions in the absence of evidence. These results reframe guessing as a theoretically informative behavior that expresses observers’ own beliefs about their perceptual capacities. Significance Guessing is commonly treated as random noise in models of perception and memory, assumed to reflect lapses or uninformed responses. Instead, we show that human guesses are systematically structured across feature space: observers preferentially guess values they typically encode with the least precision, revealing a consistent, strategic bias away from high-fidelity representations. By directly measuring guess behavior on stimulus-absent trials and integrating these empirical distributions into a mixture model, we find that guesses on stimulus-present trials can be systematically recovered, and that they too form the complement of perceptual precision. These findings challenge foundational psychophysical modeling assumptions and position guessing as a strategic, informative behavior that engages self-representation.
Author response for "Pretending not to know reveals a capacity for model-based self-simulation"
2025-08-21
peer-reviewThe psychophysics of compositionality: Relational scene perception occurs in a canonical order
2025-09-20
articleOpen accessWe see not only objects and their features (e.g., glass vases or wooden tables) but also relations between them (e.g., a vase on a table). An emerging view accounts for such relational representations by positing that visual perception is compositional: Much like language, where words combine to form phrases and sentences, many visual representations contain discrete constituents that combine systematically. This perspective raises a fundamental question: What principles guide the composition of relational representations, and how are they built over time? Here, we tested the hypothesis that the mind constructs relational representations in a canonical order. Inspired by a distinction from cognitive linguistics, we predicted that 'reference' objects (typically large, stable, and able to physically control other objects; e.g., tables) take precedence over 'figure' objects (e.g., vases) during scene composition. In Experiment 1, participants who arranged items to match linguistic descriptions (e.g., "The vase is on the table", "The table is supporting the vase") consistently placed reference objects first (e.g., table, then vase). Experiments 2–5 extended these findings to visual recognition itself: participants were faster to verify scene descriptions when reference objects appeared before figure objects in a scene, rather than vice versa. This Reference-first advantage emerged rapidly (within 100 ms), persisted in a purely visual task, and reflected abstract principles (e.g., physical forces) beyond simple differences in size or shape. Our findings reveal psychophysical principles underlying compositionality in visual processing: the mind builds representations of object relations sequentially, guided by the objects' roles in those relations.
Author response for "Pretending not to know reveals a capacity for model-based self-simulation"
2025-07-11
peer-reviewResponse duration: A ubiquitous implicit measure of confidence
Journal of Vision · 2025-07-15
articleOpen accessSenior authorAmong the most reliable connections between internal mental processing and external behavior is *response time*, with easier, more accurate, and more confident judgments typically made faster. But which aspects of response time are relevant? Whereas psychophysical studies traditionally focus on the time taken to initiate a response, an underexplored measure is the duration of the response itself—not just the amount of time between stimulus onset and keypress (reaction time), but also how long one holds down the key before releasing it (response duration). Response duration is a ubiquitous and freely available data source, yet almost no studies report or analyze it (Pfister et al., 2023). Here, 3 varied experiments demonstrate that response duration reliably predicts subjective confidence, independent of reaction time. In Experiment 1, subjects detected faces within white noise, with difficulty manipulated by varying face opacity. Subjects responded with a keypress (with both keyUp and keyDown events recorded separately), followed by a confidence judgment. Remarkably, subjects held down the response key longer during trials in which they subsequently reported lower confidence, as if making these face-detection judgments in a tentative fashion. The same pattern held in another visual task (judging the coherence of random-dot motion; Experiment 2), and a cognitive task (classifying American cities as geographically Eastern or Western; Experiment 3). In all cases, response duration accounted for variance in confidence that was not predicted by reaction time. Response duration has distinct advantages as a measure of confidence: It taps confidence at the time of judgment (rather than retrospectively), it can be used when traditional confidence judgments are difficult to elicit (e.g., in animals or infants), and it may be less affected by biases associated with explicit reports. Our results suggest that response duration is a valuable and untapped source of information, raising many avenues for future investigation.
Who's the actor? Performing and observing pantomimed actions
Underline Science Inc. · 2025-06-18
otherOpen accessSenior authorSocial perception research demonstrates that people can infer the high-level goals driving many motor actions. But what about the rich visuomotor processes underlying such actions? Visually-guided behavior relies on a complex feedback loop between agents and environments, with subtle corrective adjustments made online. What do observers understand about this dynamic? Here, we explore these questions through “pantomimed actions”. We created a stimulus set of videos where agents performed both genuine object-directed actions (e.g. stepping over a box), and pantomimes of those actions (e.g. stepping over an imagined box). Independent subjects then watched these videos and had to determine which videos were which. Collapsing across actions, observers successfully discriminated real actions from pantomimes. However, certain actions were more discriminable than others. This suggests that (1) observers understand how online visual information shapes human motor behavior; (2) The ability to “fake” actions may be more robust than previously suggested.
Number adaptation survives spatial displacement
Current Biology · 2025-11-13 · 4 citations
articleEvent-based warping: A relative distortion of time within events.
Journal of Experimental Psychology General · 2025-09-02 · 4 citations
articleOpen accessObjects and events are fundamental units of perception: Objects structure our experience of space, and events structure our experience of time. A striking and counterintuitive finding about object representation is that it can warp perceived space, such that stimuli within an object appear farther apart than stimuli in empty space. Might events influence perceived time in the same way objects influence perceived space? Here, five experiments (N = 500 adults) show that they do: Just as stimuli within an object are perceived as farther apart in space, stimuli within an event are perceived as further apart in time. Such "event-based warping" is elicited both by events characterized by sound (Experiment 1) and by events characterized by silence (Experiment 2). Moreover, these effects cannot be explained by surprise, distraction, or attentional cueing (Experiments 3 and 4) and also arise cross-modally (from audition to vision; Experiment 5). We suggest that object-based warping and event-based warping are both instances of a more general phenomenon in which representations of structure-whether in space or in time-generate powerful and analogous relative perceptual distortions. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Recent grants
Perceiving high-level relations
NSF · $528k · 2020–2024
Frequent coauthors
- 27 shared
Robert Feys
- 27 shared
Frederic B. Fitch
- 27 shared
Wilhelm Ackermann
- 27 shared
Carl G. Hempel
- 26 shared
Evert W. Beth
University of Tübingen
- 26 shared
Skolem Thoralf
Princeton University
- 26 shared
Andrzej Mckinsey
Institute for Advanced Study
- 26 shared
George D. W. Berry
Labs
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Chaz Firestone
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup