Stefano Fusi

· Professor of Neuroscience (in the Mortimer B. Zuckerman Mind Brain Behavior Institute)Verified

Columbia University · Pathology & Cell Biology

Active 1973–2026

h-index58

Citations13.3k

Papers26472 last 5y

Funding$11.3M1 active

Faculty page

See your match with Stefano Fusi — sign in to PhdFit.Sign in

About

Stefano Fusi, PhD, is a Professor of Neuroscience at the Mortimer B. Zuckerman Mind Brain Behavior Institute at Columbia University. His research investigates the neural mechanisms underlying the formation of rule representations and their expression, focusing on how complex behaviors are generated by the brain. His work involves developing models of neural networks that encode inner mental states as attractors of neural dynamics and studying the synaptic mechanisms that lead to the abstraction of rules, learning, and memory. In collaboration with experimentalists from various fields, he tests his theoretical ideas on biological brains. His research interests include cognitive and systems neuroscience, computation and theory, and the neurobiology of learning and memory. Fusi's contributions include exploring the importance of mixed selectivity in complex cognitive tasks, efficient partitioning of memory systems for consolidation, and the internal representation of task rules by recurrent neural dynamics. His work aims to deepen the understanding of how neural networks encode and process information related to behavior, learning, and memory.

Research topics

Computer Science
Biology
Psychology
Artificial Intelligence
Neuroscience
Cognitive science
Machine Learning
Cognitive psychology
Medicine
Engineering
Algorithm
Ecology
Human–computer interaction
Biochemical engineering
Mathematics

Selected publications

Neural representations supporting generalization under continual learning
bioRxiv (Cold Spring Harbor Laboratory) · 2026-01-08
articleOpen access
Abstract Abstraction and generalization are essential for flexible decision-making in novel situations. Recent work in humans and monkeys has shown how abstract variables are encoded by the representational geometry of neural population activity. However, these observations—which are typically made after learning has converged—demonstrate the product of abstraction, but not the process by which abstract knowledge is learned: how are the inputs from concrete experiences transformed into abstract knowledge, and how do neural circuits perform these operations and relay this knowledge? To address these questions, we developed a factorized model of temporal abstraction that builds on the successor representation. The model disentangles the contributions of different levels of abstract learning—from stimulus-stimulus associations to a generalizable task schema—in the form of a factorized prediction error that relates the change in relational knowledge to a predicted change in representational geometry on each trial. We fit the model to the behavior of human participants performing a context-dependent decision task during fMRI. The model captured the learning dynamics at multiple timescales, including the increasing contribution of generalization as participants transferred abstracted relational knowledge between novel task instances. In fMRI, BOLD activity in hippocampus—where, in past work, abstract knowledge was represented after learning—was increasingly attributed to the acquisition of abstract knowledge based on generalization. A similar temporal pattern was observed in entorhinal cortex, a putative source of low-dimensional structural information, and orbitofrontal cortex (OFC), which may depend on relational knowledge to represent state relationships as a cognitive map that guides choices. Indeed, individual variation in the generalization signal in OFC correlated with behavioral performance on key trials that required relational knowledge. Our findings show how the brain regions previously shown to represent abstract knowledge after learning also support the process of abstraction as it evolves from learning concrete associations to a generalizable schema. Our approach offers a computational framework for disentangling the operations driving abstract learning and probing their neural correlates in the dynamics of representational geometry.
Publisher OA PDF DOI
A mathematical theory for understanding when abstract representations emerge in neural networks.
PubMed · 2026-03-13
articleSenior author
Recent experiments in neuroscience reveal that task-relevant variables are often encoded in approximately orthogonal subspaces of neural population activity. These disentangled, or abstract, representations have been observed in multiple brain areas and across different species. These representations have been shown to support out-of-distribution generalization and rapid learning of novel tasks. The mechanisms by which these representations emerge remain poorly understood, especially in the case of supervised task behavior. Here, we show mathematically that abstract representations of latent variables are guaranteed to appear in the hidden layer of feedforward nonlinear networks when they are trained on tasks that depend directly on these latent variables. These learned abstract representations reflect the semantics of the input stimuli. To show this, we reformulate the usual optimization over the network weights into a mean-field optimization problem over the distribution of neural preactivations. We then apply this framework to finite-width ReLU networks and show that the hidden layer of these networks will exhibit an abstract representation at all global minima of the task objective. Finally, we extend our findings to two broad families of activation functions as well as deep feedforward architectures. Together, our results provide an explanation for the widely observed abstract representations in both the brain and artificial neural networks. In addition, the general framework that we develop here provides a mathematically tractable toolkit for understanding the emergence of different kinds of representations in task-optimized, feature-learning network models.
Publisher
The geometry of context-dependent biased decisions during learning
bioRxiv (Cold Spring Harbor Laboratory) · 2026-01-15 · 1 citations
articleOpen access
Adaptive behavior requires inferring latent context and rapidly adjusting decisions in response to changing environmental contingencies. We investigated how reward context is learned, represented, and updated during decision making. We recorded large populations of neurons in lateral prefrontal cortex while macaque monkeys learned a direction-discrimination task in which reward contingencies alternated unpredictably between favoring leftward and rightward choices. Once trained, monkeys inferred context switches from a single unexpected outcome, immediately adjusting both choice bias and reaction times-hallmarks of model-based inference. Early in learning, however, adaptation unfolded gradually across multiple trials. Neural population analyses revealed that reward context was encoded through systematic shifts in the geometry of neural representations. Accumulated sensory evidence (decision variable) and choice were organized along curvilinear decision manifolds, which were displaced across contexts primarily along the decision-variable axis. This geometry naturally implemented context-dependent biases: a fixed linear readout generated different choice tendencies across contexts without remapping. Longitudinal recordings further showed that, with learning, these representational transitions between manifolds became faster, mirroring the emergence of one-trial behavioral generalization. Recurrent neural networks trained on the same task reproduced both the behavioral signatures and the context-dependent geometric shifts. Together, these findings identify a mechanism by which prefrontal circuits support hierarchical inference: reward context is encoded as structured shifts in representational geometry, enabling rapid generalization and flexible control of decision policies.
Publisher DOI
Impact of Task Similarity and Training Regimes on Cognitive Transfer and Interference
bioRxiv (Cold Spring Harbor Laboratory) · 2025-09-22
preprintOpen access
Abstract Learning depends not only on the content of what we learn, but also on how we learn and on how experiences are structured over time. To investigate how task similarity and training regime interact during learning, we trained participants on spatial and conceptual learning tasks that shared either similar or distinct underlying structures, using either interleaved or blocked regimes. Interleaving the two tasks hindered performance when their structures were similar, compared to when they were different. In contrast, blocked training produced the opposite effect: it improved performance and facilitated transfer across similar tasks. This effect, however, emerged only when participants first learned the conceptual task, followed by the spatial task, suggesting an asymmetric interaction between task order and structural similarity. We also replicated our results using a neural network model, providing converging evidence for the computational principles governing the interplay between training regime and structural similarity in multi-task learning.
Publisher OA PDF DOI
The effects of task similarity during representation learning in brains and neural networks
Nature Communications · 2025-11-29
articleOpen access
The complexity of our environment poses significant challenges for adaptive behavior. Recognizing shared structures across tasks can theoretically improve learning through generalization. However, how such shared representations emerge and influence performance remains poorly understood. Contrary to expectations, our findings revealed that individuals trained on tasks with similar low-dimensional structures performed worse than those trained on dissimilar tasks. Magnetoencephalography revealed correlated neural representations in the same-structure group and anticorrelated ones in the different-structure group. Crucially, practice reduced this performance gap and shifted the neural representations of the tasks in the same-structure group towards anticorrelation, resembling those in the different-structure group. A neural network model trained on similar tasks replicated these findings: tasks with similar structures require more iterations to orthogonalize their representations. These results highlight a complex interplay between task similarity, neural dynamics, and behavior, challenging traditional assumptions about learning and generalization.
Publisher OA PDF DOI
Optimal sparsity in autoencoder memory models of the hippocampus
bioRxiv (Cold Spring Harbor Laboratory) · 2025-01-06
preprintOpen accessSenior authorCorresponding
Storing complex correlated memories is significantly more efficient when memories are recoded to obtain compressed representations. Previous work has shown that compression can be implemented in a simple neural circuit, which can be described as a sparse autoencoder. The activity of the encoding units in these models recapitulates the activity of hippocampal neurons recorded in multiple experiments. However, these investigations have assumed that the level of sparsity is fixed and that inputs have the same statistics and, hence, that they are uniformly compressible. In contrast, biological agents encounter environments with vastly different memory demands and compressibility. Here, we investigate whether the compressibility of inputs determines optimal sparsity in sparse autoencoders. We find 1) that as the compressibility of inputs increases, the optimal coding level decreases, 2) that the desired coding level diverges from the observed coding level as a function of both memory demand and input compressibility, and 3) that optimal memory capacity is achieved when sparsity is weakly enforced. In addition, we characterize how sparsity and the strength of sparsity enforcement jointly control optimal performance. These results provide predictions for how sparsity in the hippocampus should change in response to environmental statistics and theoretical grounds for why sparsity is dynamically tuned in the brain.
Publisher DOI
The effects of task similarity during representation learning in brains and neural networks
bioRxiv (Cold Spring Harbor Laboratory) · 2025-01-20 · 4 citations
preprintOpen access
Abstract The complexity of our environment poses significant challenges for adaptive behavior. Recognizing shared structures across tasks can theoretically improve learning through generalization. However, how such shared representations emerge and influence performance remains poorly understood. Contrary to expectations, our findings revealed that individuals trained on tasks with similar low-dimensional structures performed worse than those trained on dissimilar tasks. Magnetoencephalography revealed correlated neural representations in the samestructure group and anticorrelated ones in the different-structure group. Crucially, practice reduced this performance gap and shifted the neural representations of the tasks in the samestructure group towards anticorrelation, like those in the different-structure group. A neural network model trained on similar tasks replicated these findings: tasks with similar structures require more iterations to orthogonalize their representations. These results highlight a complex interplay between task similarity, neural dynamics, and behavior, challenging traditional assumptions about learning and generalization.
Publisher OA PDF DOI
A mathematical theory for understanding when abstract representations emerge in neural networks
ArXiv.org · 2025-10-10
preprintOpen accessSenior author
Recent experiments in neuroscience reveal that task-relevant variables are often encoded in approximately orthogonal subspaces of neural population activity. These disentangled, or abstract, representations have been observed in multiple brain areas and across different species. These representations have been shown to support out of distribution generalization and rapid learning of novel tasks. The mechanisms by which these representations emerge remain poorly understood, especially in the case of supervised task behavior. Here, we show mathematically that abstract representations of latent variables are guaranteed to appear in the hidden layer of feedforward nonlinear networks when they are trained on tasks that depend directly on these latent variables. These learned abstract representations reflect the semantics of the input stimuli. To show this, we reformulate the usual optimization over the network weights into a mean field optimization problem over the distribution of neural preactivations. We then apply this framework to finite-width ReLU networks and show that the hidden layer of these networks will exhibit an abstract representation at all global minima of the task objective. Finally, we extend our findings to two broad families of activation functions as well as deep feedforward architectures. Together, our results provide an explanation for the widely observed abstract representations in both the brain and artificial neural networks. In addition, the general framework that we develop here provides a mathematically tractable toolkit for understanding the emergence of different kinds of representations in task-optimized, feature-learning network models.
Publisher OA PDF DOI
Neural population activity for memory: Properties, computations, and codes
Neuron · 2025-12-22
articleOpen access
The brain's memory function involves patterns of neural population spiking activity, shaped by experience and recurring over time. These neural population patterns are typically studied with respect to the three stages of acquisition, retention, and retrieval. Despite intensive investigation, the relationship between the features of population activity and the properties, computations, and codes for memory remains elusive. In this perspective, we synthesize recent advances in the study of memory from the viewpoint of brain network physiology, aiming for a comprehensive mapping between the properties and computations of memory and the features of population-activity codes. We propose that brain memory circuits implement trade-offs between conflicting demands on population codes. We anticipate that an important challenge for both discovery and translational neuroscience of memory is to study these trade-offs, delineating a safe zone in the population-activity space where neuronal circuits operate efficiently.
Publisher DOI
Distinct representations of an anxiogenic environment in different cell types of the ventral hippocampus
bioRxiv (Cold Spring Harbor Laboratory) · 2025-03-10
preprintOpen access
ABSTRACT In addition to its role in episodic memory and spatial navigation, the hippocampus has also been found to influence mood-related disorders such as anxiety and depression. These seemingly distinct roles are consistent with a functional dissociation between the two anatomical poles of the hippocampus: whereas the dorsal portion of the hippocampus in rodents is necessary for spatial tasks, the ventral portion controls affective behaviors. We have recently found that neurons in the ventral, but not dorsal, CA1 area of mice encode anxiety-related information (i.e. are “anxiety cells”) in diverse defensive and exploratory behaviors. Still it is unclear how general threat-related information is computed within the hippocampal circuit. In this work, we have examined how distinct hippocampal subregions and cell types encode anxiety-related information by imaging calcium activity in large populations of genetically-defined neurons in the ventral hippocampus while mice explore the elevated plus maze (EPM), a conflict-based anxiety test. We compared the neural encoding of task-related features within the ventral CA1 (vCA1) and ventral dentate gyrus (vDG) regions in order to examine the emergence of anxiety-related activity through the hippocampal circuit. We found that granule cells (vGCs) of the vDG represented similar valence information to neurons in vCA1 in the form of arm-type specific encoding in the EPM, which suggests that encoding of anxiety-related features is already present at this first stage of hippocampal processing. When compared with ventral granule cells (vGCs), ventral mossy cells (vMCs) underlying the DG had stronger spatial encoding and less valence encoding, suggesting that they may be more functionally connected with the highly spatially sensitive dorsal hippocampus. Together these findings will help to understand the encoding of anxiety-related information in the hippocampus and how it relates to neural circuit defects in mood-related disorders.
Publisher OA PDF DOI

Recent grants

Neurophysiology underlying neural representations of value
NIH · $11.3M · 2008–2028

Frequent coauthors

Mattia Rigotti
42 shared
C. Daniel Salzman
Columbia University
40 shared
Fabio Stefanini
Columbia University
35 shared
Marcus K. Benna
University of California, San Diego
34 shared
V. Fascianelli
Istituto Nazionale di Fisica Nucleare, Laboratori Nazionali di Frascati
27 shared
Lyudmila Kushnir
24 shared
Lorenzo Posani
Columbia University
24 shared
Frances Grace Ghinger
University of Maryland, Baltimore County
22 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Stefano Fusi

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you