George A. Alvarez

· Fred Kavli Professor of NeuroscienceVerified

Harvard University · Human Development and Psychology

Active 1980–2026

h-index55

Citations13.7k

Papers25155 last 5y

Funding$1.5M

Faculty page Lab page

See your match with George A. Alvarez — sign in to PhdFit.Sign in

About

George A. Alvarez is the Fred Kavli Professor of Neuroscience in the Department of Psychology at Harvard University. His research focuses on understanding how the human visual system manages its limited resources to efficiently process and interpret visual information. His projects explore key areas such as attentional selection, memory storage, fluid resource allocation, and efficient coding, aiming to uncover how the mind and brain optimize their use of limited cognitive resources. Based at William James Hall in Cambridge, MA, Alvarez's work contributes to the broader understanding of cognition, brain function, and behavior, particularly in the context of visual perception and cognition. His research seeks to elucidate the strategies employed by the human visual system to navigate complex environments and social situations with apparent ease.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Psychology
Engineering
Biology
Neuroscience
Philosophy
Mathematics
Environmental ethics
Epistemology
Engineering ethics

Selected publications

Geometric Dynamics Across Recurrent Vision Models
2026-01-01
article
Publisher DOI
A Unified Account of Lightness Illusions via Edge-Based Reconstruction of Natural Images
bioRxiv (Cold Spring Harbor Laboratory) · 2026-04-10
articleOpen accessSenior author
ABSTRACT The human visual system transforms patterns of light into rich perceptual experiences, where what we see is a construction that goes beyond simple measurement. Lightness illusions—where identical parts of an image can appear dramatically different depending on context—provide a window into these processes. Here we leverage a deep learning framework to investigate the constructive processes that give rise to lightness illusions, introducing the core computational goal of edge-based image reconstruction. Specifically, we demonstrate that autoencoder models trained to reconstruct natural images based only on an edge-based image representation naturally recapitulate a wide range of lightness illusions, which were previously assumed to require distinct mechanisms, inference over lighting sources, and explicit three-dimensional scene representation. These results offer a simpler, unified account of diverse lightness phenomena as emerging naturally from surface filling-in mechanisms, and broadly provide a framework for understanding the computational principles that underlie our perception of the visual world. SIGNIFICANCE STATEMENT The human visual system shows remarkably stable perception of objects under different viewing conditions, but it uses strategies that can be thwarted by clever visual illusions – for instance, the exact same object can appear as either white or black in different contexts. The most complex of these lightness illusions have long been taken as evidence that perception involves explicit inference about 3D scene geometry and lighting conditions. However, here we show that these illusions also emerge in deep learning models, trained simply to reconstruct natural images from sparse edge signals. Thus, our perception of the lightness of surfaces in our world may instead arise from a much more primitive computation — reconstructing surface appearance from edge responses.
Publisher DOI
A feedforward mechanism for human-like contour integration
PLoS Computational Biology · 2025-08-18 · 1 citations
articleOpen accessSenior authorCorresponding
Deep neural network models provide a powerful experimental platform for exploring core mechanisms underlying human visual perception, such as perceptual grouping and contour integration-the process of linking local edge elements to arrive at a unified perceptual representation of a complete contour. Here, we demonstrate that feedforward convolutional neural networks (CNNs) fine-tuned on contour detection show this human-like capacity, but without relying on mechanisms proposed in prior work, such as lateral connections, recurrence, or top-down feedback. We identified two key properties needed for ImageNet pre-trained, feed-forward models to yield human-like contour integration: first, progressively increasing receptive field structure served as a critical architectural motif to support this capacity; and second, biased fine-tuning for contour-detection specifically for gradual curves (~20 degrees) resulted in human-like sensitivity to curvature. We further demonstrate that fine-tuning ImageNet pretrained models uncovers other hidden human-like capacities in feed-forward networks, including uncrowding (reduced interference from distractors as the number of distractors increases), which is considered a signature of human perceptual grouping. Thus, taken together these results provide a computational existence proof that purely feedforward hierarchical computations are capable of implementing gestalt "good continuation" and perceptual organization needed for human-like contour-integration and uncrowding. More broadly, these results raise the possibility that in human vision, later stages of processing play a more prominent role in perceptual-organization than implied by theories focused on recurrence and early lateral connections.
Publisher OA PDF DOI
Feature accentuation along the encoding axes of IT neurons uncovers hidden differences in model-brain alignment
Journal of Vision · 2025-07-15 · 1 citations
articleOpen access
While deep neural network (DNN) encoding models increasingly achieve high predictivity of neural responses to natural images, it remains unclear whether these scores indicate algorithmic or mechanistic alignment between models and neural systems. Here we introduce a novel paradigm for rigorously testing DNN encoding models based on how well they can control neural responses. As a case study, we consider a resnet-50 and an adversarially robust variant, whose encoding models of IT neural responses to natural images achieve nearly identical R2 predictivity. However, using an explainable AI (xAI) technique called feature accentuation, we found dramatic differences in these models’ ability to control neural responses. Specifically, for each neural site, we synthesized image sets predicted to parametrically drive neural activity along the encoding axes in the target model’s feature space, which critically relies on the hierarchical computations and mechanisms of the target model. We presented these accentuated stimuli to the same monkey under identical recording conditions the day after synthesis. In this test of "parametric control," we found that stimuli from the robust model achieved precise modulation of neural firing: responses reliably and predictably aligned with each feature level. In contrast, baseline resnet-derived stimuli showed far weaker parametric control. Qualitatively, the robust model accentuations enhanced cohesive object contours, such as face-like curvatures, whereas baseline accentuations predominantly altered textural features, such as fur-like patterns. These results highlight that adversarially robust training may naturally pressure learning of more brain-relevant features, compared to standard objectives. More broadly, these results show that models with similar encoding predictivity for natural images can be distinguished through targeted tests of fine-grained parametric control along the encoding axes, revealing that some models offer better controllability than others. By bridging neuroAI and xAI, this approach emphasizes mechanistic alignment as a key goal for linking DNNs and brains.
Publisher DOI
Dissecting sparse circuits to high-level visual categories in deep neural networks
Journal of Vision · 2025-07-15
articleOpen access
While humans easily recognize innumerable object categories, the underlying computational paths from retina to category-level representations are still being unraveled. Convolutional neural networks (CNNs) like AlexNet have remarkable competence in visual categorization, and thus offer a unique case study for understanding the hierarchical routing of visual information. Extending work from Hamblin et al., 2023, here we develop a method to extract the relevant connections involved in the computation of each output category, and assess the effectiveness of this sparser sub-network. The key idea is that not all connections are necessarily involved in the computation of any one category; thus, for each of the 1000 category-level output units in the Alexnet, our algorithm assigns scores to connections based on their contribution to the category unit's outputs and prunes the lowest-scored connections to a specified sparsity. Our goal is to identify the sparsest circuit through the network that still maintains the original function. To evaluate how well the extracted circuits reflect the output unit’s original functionality, we introduce a new metric–circuit substitution accuracy (CSA). We find that circuits need only 5.0% (median) of connections to achieve 85% of the unpruned CSA. Surprisingly, we observed that CSA initially increases with pruning and often actually exceeds the unpruned baseline at its peak (median peak CSA = 188.0% median unpruned CSA) with just 13.3% (median) of connections. We hypothesize that the full network must employ inhibition to negotiate between competing, interfering pathways. Finally, the “anatomical overlap” amongst these category circuits ranged from <1% to >99% shared circuitry, revealing a range of implicit modularization in the network's categorical processing routes. Broadly, this work presents a novel method for gaining insight into the functional neuroanatomy of neural networks, and offers a foundation for understanding the hierarchical computations involved in the emergence of category-level information in visual systems.
Publisher DOI
An Information Sharing Framework: Supporting Collaborative and Integrated Service Delivery
International Journal of Integrated Care · 2025-08-19
articleOpen access1st authorCorresponding
Background:The Information Sharing Framework enhances the degree of collaboration and integration among client-serving organizations providing mental health supports across all relevant sectors by providing guidance in navigating the complexities of privacy legislation; managing information appropriately; considerations in adapting policies and practices; and tools to use in collaborative service delivery. Approach:The identified issue was that information was not being shared as readily or effectively as it should be when organizations need to collaborate, thereby limiting and at times impeding the effectiveness of collaboration. In order to better understand the issue, a broad swath of organizations was invited to participate in the development of an enhanced service delivery approach, which ended up being the Framework. Input and participation were obtained from a wide variety of people serving organizations including non-profit agencies, associations (e.g. United Way), health organizations (e.g. PCNs), registrars from health profession colleges, school boards, police services, as well as provincial government, and privacy commissioner's staff, among others. Participants provided input as the Framework was developed, with inputs and comments assisting in shaping it as it evolved. The Framework is meant to address two main areas: the first being the fact that organizations in different sectors are subject to differing privacy legislation, which, by and large is neither harmonized, nor consistently interpreted, and in many jurisdictions not applicable to the non-profit sector. The second area has more to do with identifying areas for consideration that may be seen as more structural in nature, including governance, roles and responsibilities, policies and practices, and information management. The Framework provides support to organizations who wish or need to collaborate effectively when providing services to individuals and families. While it was developed in support of those delivering mental health supports and services, it can easily apply to any people-facing services in the health and social service sectors. Results:The outcome of the work is the development of the Information Sharing Framework, which provides fairly comprehensive guidance on what organizations wishing to develop or enhance their collaborative service delivery need to consider and implement. It also includes a number of resources and tools, that can support both individual agencies that are seeking to improve their information management policies and practices, as well as partnerships or groups of organizations that need to determine how they will work together more effectively, and how they will address onboarding of members.The Framework is being rolled out for use in Alberta, where it is in use by a number of groups and organizations, and is being offered for adaptation and adoption in other jurisdictions. Implications:The Framework addresses a number of areas that many jurisdictions are struggling with, and can be readily adapted for their use. The themes that are spoken to are broadly applicable, and focus on ensuring that organizations do not lose sight that privacy legislation was not meant to impede access by individuals and families to effective and necessary services delivered by organizations working collaboratively, creating the 'basket of supports' that are often alluded to. It sets out how to create that basket.Converge Mental Health Coalition has made the Framework materials available for use by hosting them on their website. In recognition that there will be a need to adapt it for use by jurisdictions other than Alberta, we are reaching out to various organizations in other jurisdictions to determine if there is interest in adapting and adopting the Framework for use by those jurisdictions. We are having a number of conversations with various organizations to that end.The package is available at Information Sharing Framework (Our Work - Converge (convergementalhealth.org)
Publisher OA PDF DOI
Edge-based Image Reconstruction Provides a Unified Account of (many) Lightness Illusions
Journal of Vision · 2025-07-15
articleOpen access
Lightness illusions demonstrate that how bright an object appears depends on an elaborate constructive process, to the point that the same surface can be perceived as either black or white depending on the context. Why does the biological visual system work this way? Traditionally, distinct computational goals have been proposed to account for simple lightness illusions (e.g. the Craik-O’Brien-Cornsweet Illusion) and for more complex illusions (e.g., the moon illusion: discs in different hazy backgrounds, Anderson & Winawer 2005). The Craik-O’Brien-Cornsweet illusion seems to depend on local cues — a dark/light difference at a singular edge— whereas the moon illusion seems to also require a recovered scene structure. Our work examines whether an edge-based reconstruction goal produces a range of lightness illusions. First, we trained a reconstructive U-Net to output a filled-in image from edge-only inputs of images, an objective analogous to filling-in surfaces from edge-selective neurons in the biological visual system. This model not only reconstructed images with minimal error, but also made systematic errors consistent with lightness illusions measured in people for both the Craik-O’Brien-Cornsweet illusion and the Anderson-Winawer illusion. This effect was robust across training parameter choices (32 combined variations between training datasets and model seeds) and illusion probe choices (contrast signal of edges). When the model was applied to a suite of additional lightness illusions (e.g., Adelson Haze Illusion, Snake Illusion, Koffka Illusions, and Kanizsa Square Illusion), we found that the model consistently recapitulated illusions when there are connected edges with consistent polarity bounding the illusory surface. When the same U-Net model architecture was trained with a different reconstructive goal – denoising different levels of Gaussian noise – the models did not recapitulate any illusions, indicating that edge-based reconstruction is critical and provides a plausible mechanism underlying many perceptual lightness illusions.
Publisher DOI
Representational Geometry Dynamics in Networks After Long-Range Modulatory Feedback
Journal of Vision · 2025-07-15
articleOpen access
The human visual system relies on extensive long-range feedback circuitry, where feedforward and feedback connections iteratively refine interpretations through reentrant loops (Di Lollo, 2012). Inspired by this neuroanatomy, a recent model introduced long-range feedback pathways into a convolutional neural network, where late-stage feature channels learn how to influence early-stage channels, to support successful object classification (Konkle & Alvarez, 2023). The model operates in two passes—a feedforward pass, generating initial representations, and a modulated pass, where activations reflect both the feedforward and feedback-modulatory processing. While prior work focused on injecting an external goal signal into the model to leverage feedback connections for category-based attention, here we examine the representational dynamics of this model during its default operation, without any top-down goals. Specifically, we explored how the representational geometry of exemplars and categories changes in the modulated pass, relative to the feedforward pass. We analyzed activations from 100 randomly selected ImageNet categories (300 images each). Local representational structure was evaluated through cluster sizes and k-nearest neighbor analysis, while global representational structure was assessed via prototype shifts and pairwise distances within and between categories. We found that default feedback modulation induced notable changes in representational geometry: Category cluster sizes significantly reduce as exemplar embeddings move closer to category prototypes. Locally, more nearest neighbors fall within the same category, and within-category distances decrease, reflecting tighter clustering. Meanwhile, the distances between categories remain relatively stable. Finally, the larger the prototype shift, the greater the cluster shrinkage, indicating a relationship between internal cohesion and global repositioning. These findings suggest that fixed long-range feedback connections induce an automatic prototype effect in the representational geometry, compacting clusters within categories while preserving global structure. Broadly, these emergent feedback dynamics might naturally induce categorical processing effects by refining local representations without disrupting overall structure, improving downstream category-based task efficiency.
Publisher DOI
<i>Monkey See, Model Knew</i> : Large Language Models Accurately Predict Visual Brain Responses in Humans <i>and</i> Non-Human Primates
bioRxiv (Cold Spring Harbor Laboratory) · 2025-03-10 · 6 citations
preprintOpen access
A bstract Recent progress in multimodal AI and ‘language-aligned’ visual representation learning has rekindled debates about the role of language in shaping the human visual system. In particular, the emergent ability of ‘language-aligned’ vision models (e.g. CLIP) – and even pure language models (e.g. BERT) – to predict image-evoked brain activity has led some to suggest that human visual cortex itself may be ‘language-aligned’ in comparable ways. But what would we make of this claim if the same procedures could model visual activity in a species without language? Here, we conducted controlled comparisons of pure-vision, pure-language, and multimodal vision-language models in their prediction of human (N=4) and rhesus macaque (N=6, 5:IT, 1:V1) ventral visual activity to the same set of 1000 captioned natural images (the ‘NSD1000’). The results revealed markedly similar patterns in model predictivity of early and late ventral visual cortex across both species. This suggests that language model predictivity of the human visual system is not necessarily due to the evolution or learning of language perse , but rather to the statistical structure of the visual world that is reflected in natural language.
Publisher OA PDF DOI
Foveated sensing with KNN-convolutional neural networks
Journal of Vision · 2025-07-15
articleOpen access
Human vision prioritizes the center of gaze through spatially-variant retinal sampling, leading to magnification of the fovea in cortical visual maps. In contrast, deep neural network models (DNNs) almost always operate on spatially uniform inputs, a severe mismatch that limits their use in understanding the active and foveated nature of human vision. Some work has explored foveated sampling in DNNs, however, these methods have been forced to wrangle retinal samples into grid-like representations, sacrificing faithful cortical retinotopy and creating undesirable warped receptive field shapes that depend on eccentricity. Here, we offer an alternative approach by adapting the model architecture to enable realistic foveated encoding of visual space. First, we use a spatially-variant input sensor derived from the log polar map model, which links retinal sampling to cortical magnification (Schwartz, 1980), but does not produce grid-like images. To handle the sensor’s outputs, we convert spatial kernels for convolution and pooling into k-nearest neighborhoods (KNNs) defined in pixel space, and generalize convolution to KNNs. Filters are learned in a canonical reference frame, and are spatially mapped into each neighborhood for perception. This approach allows us to build hierarchical KNN convolutional neural networks (KNN-CNNs) closely matched to their CNN counterparts. Architecturally, these models naturally exhibit realistic cortical retinotopy and desirable receptive field properties, such as exponentially increasing size and constant shape as a function of eccentricity. Training these models end-to-end over natural images, we find that they perform competitively with resource-matched CNNs trained on grid-like foveated images, and exhibit increasing performance with multiple fixations. Broadly, this model class offers a more biologically-aligned sampling of the visual world, enabling future computational work to model the active and spatial nature of human vision, with applications in understanding visual recognition, crowding, and visual search. Last, this approach holds promise in building more neurally mappable models.
Publisher DOI

Recent grants

NIH Grant R03MH086743
NIH · $168k · 2012
NIH Grant F31MH069095
NIH · $35k · 2006
COMPCOG: Intuitive Physics without Intuition or Physics: Leveraging Deep Neural Networks to Model Human Physical Reasoning
NSF · $554k · 2020–2025
NIH Grant F32EY016982
NIH · $137k · 2008
CAREER: Flexible Resource Allocation and Efficient Coding in Human Vision
NSF · $592k · 2010–2016

Frequent coauthors

Talia Konkle
Harvard University
65 shared
Timothy F. Brady
42 shared
Daryl Fougnie
New York University
37 shared
Jeremy M. Wolfe
Brigham and Women's Hospital
34 shared
Patrick Cavanagh
York University
23 shared
Sarah Cormiea
University of Pennsylvania
21 shared
Aude Oliva
Massachusetts Institute of Technology
20 shared
Jordan W. Suchow
20 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with George A. Alvarez

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you