
Robert Berwick
VerifiedMassachusetts Institute of Technology · Electrical Engineering & Computer Science
Active 1975–2025
About
Robert Berwick is a Professor of Computer Science and Engineering and Computational Linguistics at MIT. His research focuses on the intersection of artificial intelligence, natural language processing, and systems that interact with the external environment through perception, communication, and action. He is involved in developing techniques for the analysis and synthesis of systems that learn, make decisions, and adapt to changing environments. As a faculty member, Berwick contributes to advancing the understanding of AI and decision-making, leveraging computational, theoretical, and experimental tools to address complex challenges in these fields. His work emphasizes the development of systems that can process language and other modalities, contributing to the broader goals of artificial intelligence and machine learning.
Research topics
- Information Retrieval
- Computer Science
- Artificial Intelligence
- Linguistics
- Psychology
- Philosophy
- Cognitive science
Selected publications
Mathematical Structure of Syntactic Merge
The MIT Press eBooks · 2025-08-05 · 8 citations
bookOpen accessSenior authorA mathematical formalization of Chomsky’s theory of Merge in generative linguistics. The Minimalist Program advanced by Noam Chomsky thirty years ago, focusing on the biological nature of human language, has played a central role in our modern understanding of syntax. One key to this program is the notion that the hierarchical structure of human language syntax consists of a single operation Merge. For the first time, Mathematical Structure of Syntactic Merge presents a complete and precise mathematical formalization of Chomsky’s most recent theory of Merge. It both furnishes a new way to explore Merge’s important linguistic implications clearly while also laying to rest any fears that the Minimalist framework based on Merge might itself prove to be formally incoherent. In this book, Matilde Marcolli, Noam Chomsky, and Robert C. Berwick prove that Merge can be described as a very particular kind of highly structured algebra. Additionally, the book shows how Merge can be placed within a consistent framework that includes both a syntactic-semantic interface that realizes Chomsky’s notion of a conceptual-intentional interface, and an externalization system that realizes language-specific constraints. The syntax-semantics interface encompasses many current semantical theories and offers deep insights into the ways that modern “large language models” work, proving that these do not undermine in any way the scientific theories of language based on generative grammar.
Encoding syntactic objects and Merge operations in function spaces
ArXiv.org · 2025-07-17 · 1 citations
preprintOpen accessSenior authorWe provide a mathematical argument showing that, given a representation of lexical items as functions (wavelets, for instance) in some function space, it is possible to construct a faithful representation of arbitrary syntactic objects in the same function space. This space can be endowed with a commutative non-associative semiring structure built using the second Renyi entropy. The resulting representation of syntactic objects is compatible with the magma structure. The resulting set of functions is an algebra over an operad, where the operations in the operad model circuits that transform the input wave forms into a combined output that encodes the syntactic structure. The action of Merge on workspaces is faithfully implemented as action on these circuits, through a coproduct and a Hopf algebra Markov chain. The results obtained here provide a constructive argument showing the theoretical possibility of a neurocomputational realization of the core computational structure of syntax. We also present a particular case of this general construction where this type of realization of Merge is implemented as a cross frequency phase synchronization on sinusoidal waves. This also shows that Merge can be expressed in terms of the successor function of a semiring, thus clarifying the well known observation of its similarities with the successor function of arithmetic.
Redefining Measures of Career Success
Journal of Student Affairs Inquiry Improvement and Impact · 2025-07-27
articleOpen access1st authorCorrespondingTraditional measures of career success—primarily salary and job titles—offer a limited and often misleading view of post-graduation outcomes. These narrow metrics fail to capture the complexity of career trajectories and provide little actionable insight for institutions seeking to improve student preparedness. This paper advocates for a holistic approach to measuring career success by incorporating objective indicators, such as cost of living and industry trends, and subjective measures, such as alumni perceptions of job satisfaction and career fulfillment. Examples and strategies for measuring career success beyond salary and first-destination outcomes are provided. Lessons learned from collecting these measures are shared, including leadership commitment, community building, stakeholder engagement, and the use of technology and analytics. Additionally, it is important to integrate data collection into curricula, foster industry collaboration, and establish feedback loops to align academic programs with workforce needs. By redefining career success beyond traditional metrics, this study offers a framework for institutions to assess and enhance graduate outcomes more effectively in an evolving job market.
Parallel Algorithms for Exact Enumeration of Deep Neural Network Activation Regions
arXiv (Cornell University) · 2024-02-29
preprintOpen accessA feedforward neural network using rectified linear units constructs a mapping from inputs to outputs by partitioning its input space into a set of convex regions where points within a region share a single affine transformation. In order to understand how neural networks work, when and why they fail, and how they compare to biological intelligence, we need to understand the organization and formation of these regions. Step one is to design and implement algorithms for exact region enumeration in networks beyond toy examples. In this work, we present parallel algorithms for exact enumeration in deep (and shallow) neural networks. Our work has three main contributions: (1) we present a novel algorithm framework and parallel algorithms for region enumeration; (2) we implement one of our algorithms on a variety of network architectures and experimentally show how the number of regions dictates runtime; and (3) we show, using our algorithm's output, how the dimension of a region's affine transformation impacts further partitioning of the region by deeper layers. To our knowledge, we run our implemented algorithm on networks larger than all of the networks used in the existing region enumeration literature. Further, we experimentally demonstrate the importance of parallelism for region enumeration of any reasonably sized network.
Merge and the Strong Minimalist Thesis
2023 · 70 citations
- Computer Science
- Artificial Intelligence
- Computer Science
The goal of this contribution to the Elements series is to closely examine Merge, its form, its function, and its central role in current linguistic theory. It explores what it does (and does not do), why it has the form it has, and its development over time. The basic idea behind Merge is quite simple. However, Merge interacts, in intricate ways, with other components including the language's interfaces, laws of nature, and certain language-specific conditions. Because of this, and because of its fundamental place in the human faculty of language, this Element's focus on Merge provides insights into the goals and development of generative grammar more generally, and its prospects for the future.
Mathematical Structure of Syntactic Merge
arXiv (Cornell University) · 2023-05-29 · 5 citations
preprintOpen accessSenior authorThe syntactic Merge operation of the Minimalist Program in linguistics can be described mathematically in terms of Hopf algebras, with a formalism similar to the one arising in the physics of renormalization. This mathematical formulation of Merge has good descriptive power, as phenomena empirically observed in linguistics can be justified from simple mathematical arguments. It also provides a possible mathematical model for externalization and for the role of syntactic parameters.
Old and New Minimalism: a Hopf algebra comparison
arXiv (Cornell University) · 2023-06-17 · 3 citations
preprintOpen accessIn this paper we compare some old formulations of Minimalism, in particular Stabler's computational minimalism, and Chomsky's new formulation of Merge and Minimalism, from the point of view of their mathematical description in terms of Hopf algebras. We show that the newer formulation has a clear advantage purely in terms of the underlying mathematical structure. More precisely, in the case of Stabler's computational minimalism, External Merge can be described in terms of a partially defined operated algebra with binary operation, while Internal Merge determines a system of right-ideal coideals of the Loday-Ronco Hopf algebra and corresponding right-module coalgebra quotients. This mathematical structure shows that Internal and External Merge have significantly different roles in the old formulations of Minimalism, and they are more difficult to reconcile as facets of a single algebraic operation, as desirable linguistically. On the other hand, we show that the newer formulation of Minimalism naturally carries a Hopf algebra structure where Internal and External Merge directly arise from the same operation. We also compare, at the level of algebraic properties, the externalization model of the new Minimalism with proposals for assignments of planar embeddings based on heads of trees.
Syntax-semantics interface: an algebraic model
arXiv (Cornell University) · 2023-11-10 · 2 citations
preprintOpen accessWe extend our formulation of Merge and Minimalism in terms of Hopf algebras to an algebraic model of a syntactic-semantic interface. We show that methods adopted in the formulation of renormalization (extraction of meaningful physical values) in theoretical physics are relevant to describe the extraction of meaning from syntactic expressions. We show how this formulation relates to computational models of semantics and we answer some recent controversies about implications for generative linguistics of the current functioning of large language models.
Frontiers in Computer Science · 2023-05-19 · 4 citations
articleOpen accessCognitive computers (κ C ) are intelligent processors advanced from data and information processing to autonomous knowledge learning and intelligence generation. This work presents a retrospective and prospective review of the odyssey toward κ C empowered by transdisciplinary basic research and engineering advances. A wide range of fundamental theories and innovative technologies for κ C is explored, and a set of underpinning intelligent mathematics (IM) is created. The architectures of κ C for cognitive computing and Autonomous Intelligence Generation (AIG) are designed as a brain-inspired cognitive engine. Applications of κ C in autonomous AI (AAI) are demonstrated by pilot projects. This work reveals that AIG will no longer be a privilege restricted only to humans via the odyssey to κ C toward training-free and self-inferencing computers.
The Failure of Deep Neural Networks to Capture Human Language’s Cognitive Core
2021-10-29 · 3 citations
article1st authorCorrespondingCurrent deep neural networks have made remarkable advances in their ability to analyze and use natural language, with great apparent engineering success. But how well do these systems mirror the cognitive constraints associated with human language? In this talk we show that there are three essential core computations that characterize human language as an engine of human thought. One is "digital infinity"– the fact that we can produce an open-ended countably infinite number of sentences. The second is that sentences are hierarchically structured, rather than being arranged in a linear array. The third property is that human language computations always admit the possibility of "displacement" – a word or phrase can be pronounced at a place distinct from its usual location of semantic interpretation. All three properties can be shown to follow from a single, simple, recursive combinatorial operation. We provide empirical evidence for all three properties, both from concrete developmental examples as well as psycholinguistic and brain imaging experiments.What about current "deep neural network" systems? Although they perform very well after large-scale training, their success appears to be grounded on accurate table-lookup–memorization–without truly mirroring the three key computational principles of human language cognition. By "stress testing" currently available deep neural network processors, we show that they are, perhaps surprisingly very fragile when presented even with simple examples that deviate modestly from the examples on which they were trained. In particular, they fail to properly represent hierarchical structure and they cannot reliably reconstruct examples of sentences with "displacement" if the examples go just a bit beyond the complexity of their training set data. For example, while a deep neural network system might work on "Which cookie did Bob want," it fails on, "Which cookie did Bob want to eat." Such failures indicate that the neural net systems have not generalized in the same sense that children do, since children can easily handle such examples after receiving much more limited training data.
Frequent coauthors
- 28 shared
Noam Chomsky
- 27 shared
Johan J. Bolhuis
University of Cambridge
- 12 shared
Kazuo Okanoya
Teikyo University
- 12 shared
Matilde Marcolli
- 11 shared
Gabriël J. L. Beckers
Utrecht University
- 10 shared
Sandiway Fong
University of Arizona
- 9 shared
Partha Niyogi
- 9 shared
Ian Tattersall
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Robert Berwick
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup