
Jeffrey Heinz
· ProfessorVerifiedStony Brook University · Psychology
Active 1994–2026
About
Jeffrey Heinz is a Professor at Stony Brook University with a joint appointment in the Department of Linguistics and the Institute of Advanced Computational Science. He conducts research in several related areas including theoretical, computational and mathematical linguistics, grammatical inference, computational learning theory, theoretical computer science, robotic planning and control, and artificial intelligence. His research focuses on characterizations of subclasses of regular languages and transductions, algorithms for learning those subclasses, and what those subclasses mean for patterns in language and the "real" world. His research has been published in the Journal of Language Modelling, Linguistic Inquiry, Theoretical Computer Science, the Transactions of the ACL, and Science. He has co-authored a book on grammatical inference for computational linguists, co-edited three books, and co-guest-edited special issues of the journals Machine Learning and Phonology. He obtained his Ph.D. from UCLA in 2007 and spent ten years as a professor at the University of Delaware before coming to Stony Brook in 2017. The Linguistic Society of America recognized Heinz with its 2017 Early Career Award for his "contributions leading to a new computational science of inference and learning as applied to language."
Research topics
- Computer Science
- Artificial Intelligence
- Natural Language Processing
- Neuroscience
- Physical medicine and rehabilitation
- Human–computer interaction
- Arithmetic
- Medicine
- Psychology
- Linguistics
- Mathematics
- Biology
- Theoretical computer science
- Algorithm
Selected publications
2026-01-07
book-chapter1st authorCorrespondingThe blueprint model of production
Phonology · 2025-01-01 · 1 citations
articleOpen accessSenior authorAbstract This article introduces the blueprint model of production (BMP), which characterises the phonetics–phonology interface in terms of typed functions. The standard modular feed-forward view to the interface is that the phonetic form of a lexical item is the output of a phonetic module which takes the output of a phonological module as its input. The central idea of the BMP is that the phonetic form is instead the output of a higher-order phonetics function which takes the phonological function as one of multiple inputs. We explain how understanding the production process this way can account for systematic fine-grained variation in phonetic forms while maintaining a discrete phonological grammar. We present one possible instantiation of the model that simulates incomplete neutralisation, some cases of near-merger, and variation in homophone duration. Consequently, these types of systematic fine-grained phonetic patterns do not necessarily provide evidence against discrete, symbolic phonology.
The Computational Power of Harmonic Forms
Oxford University Press eBooks · 2024-10-22 · 1 citations
book-chapterSenior authorAbstract This chapter discusses the nature of vowel harmony (VH) from a computational perspective. In particular, it considers the dependencies present in surface forms of phonological representations obeying various types of harmony. It situates this typological landscape of dependencies within a corresponding landscape of computational functions. The precise nature of these functions gives us a principled window into the characteristic similarities and differences between harmony types. It goes on to show that this landscape is restricted, and that VH is tightly constrained typologically, describing how the interactions of these functions elucidate key properties of harmonic systems, and how these properties enable new avenues for typological and computational work.
Robust Identification in the Limit from Incomplete Positive Data
Lecture notes in computer science · 2023-01-01
book-chapterOpen accessSenior authorMLRegTest: A Benchmark for the Machine Learning of Regular Languages
arXiv (Cornell University) · 2023-04-16 · 1 citations
preprintOpen accessSenior authorSynthetic datasets constructed from formal languages allow fine-grained examination of the learning and generalization capabilities of machine learning systems for sequence classification. This article presents a new benchmark for machine learning systems on sequence classification called MLRegTest, which contains training, development, and test sets from 1,800 regular languages. Different kinds of formal languages represent different kinds of long-distance dependencies, and correctly identifying long-distance dependencies in sequences is a known challenge for ML systems to generalize successfully. MLRegTest organizes its languages according to their logical complexity (monadic second order, first order, propositional, or monomial expressions) and the kind of logical literals (string, tier-string, subsequence, or combinations thereof). The logical complexity and choice of literal provides a systematic way to understand different kinds of long-distance dependencies in regular languages, and therefore to understand the capacities of different ML systems to learn such long-distance dependencies. Finally, the performance of different neural networks (simple RNN, LSTM, GRU, transformer) on MLRegTest is examined. The main conclusion is that performance depends significantly on the kind of test set, the class of language, and the neural network architecture.
An Algebraic Characterization of Total Input Strictly Local Functions
HAL (Le Centre pour la Communication Scientifique Directe) · 2023-06-05
preprintOpen accessSenior authorThis paper provides an algebraic characterization of the total input strictly local functions. Simultaneous, noniterative rules of the form A→B/C D, common in phonology, are definable as functions in this class whenever CAD represents a finite set of strings. The algebraic characterization highlights a fundamental connection between input strictly local functions and the simple class of definite string languages, as well as connections to string functions studied in the computer science literature, the definite functions and local functions. No effective decision procedure for the input strictly local maps was previously available, but one arises directly from this characterization. This work also shows that, unlike the full class, a restricted subclass is closed under composition. Additionally, some products are defined which may yield new factorization methods.
Why Linguistics Will Thrive in the 21st Century: A Reply to Piantadosi (2023)
arXiv (Cornell University) · 2023-08-06 · 8 citations
preprintOpen accessSenior authorWe present a critical assessment of Piantadosi's (2023) claim that "Modern language models refute Chomsky's approach to language," focusing on four main points. First, despite the impressive performance and utility of large language models (LLMs), humans achieve their capacity for language after exposure to several orders of magnitude less data. The fact that young children become competent, fluent speakers of their native languages with relatively little exposure to them is the central mystery of language learning to which Chomsky initially drew attention, and LLMs currently show little promise of solving this mystery. Second, what can the artificial reveal about the natural? Put simply, the implications of LLMs for our understanding of the cognitive structures and mechanisms underlying language and its acquisition are like the implications of airplanes for understanding how birds fly. Third, LLMs cannot constitute scientific theories of language for several reasons, not least of which is that scientific theories must provide interpretable explanations, not just predictions. This leads to our final point: to even determine whether the linguistic and cognitive capabilities of LLMs rival those of humans requires explicating what humans' capacities actually are. In other words, it requires a separate theory of language and cognition; generative linguistics provides precisely such a theory. As such, we conclude that generative linguistics as a scientific discipline will remain indispensable throughout the 21st century and beyond.
2022-03-24 · 1 citations
book-chapter1st authorCorrespondingAbstract This chapter examines the brief but vibrant history of learnability in phonology. We trace the question of learnability back to the foundational crises in mathematics and computer science, through the synthesis of these fields with linguistics, and onwards to the foundational problems of language, and phonological, learning. We observe this history is mostly one-sided, with many ideas from learning imported to phonology, but rarely the converse. We review some of the most significant interactions between formal learnability and phonology, topics such as the necessity of structured hypothesis spaces, the credit/blame/hidden structure problem, and the subset principle. We finish by discussing several overarching tensions pervading this field: the role of mathematical descriptions versus computational simulations of learning, typological versus learnability concerns in grammar design, and debates on the psychological reality of phonological grammars. As a field, we should not fear rapid change or the many flowering prospects.
Phonological Abstraction in The Mental Lexicon
Oxford University Press eBooks · 2022-02-14 · 1 citations
book-chapterOpen accessAbstract In this chapter, we examine the nature of the long-term memory representation of the pronunciations of words. A fundamental question concerns how abstract these representations are vis à vis the physical manifestation of words, both as gestures and as physical percepts. We consider this question and related issues within the traditions of linguistic cognition and generative phonology. We first explore the general nature of abstraction, and then review the arguments in generative phonology for positing that the units of speech stored in long-term memory (so called ‘underlying forms’) abstract away from many phonetic details. Motivations for concepts such as phonemes and distinctive phonological features are given. We then visit the open question regarding how abstract underlying forms may be allowed to be. We conclude by highlighting the contributions that evidence from neuroscience and sign language linguistics brings to these issues of phonological abstraction in the mental lexicon.
Categorical Account of Gradient Acceptability of Word-Initial Polish Onsets
Proceedings of the Annual Meetings on Phonology · 2022-08-05 · 1 citations
articleOpen accessSenior authorWe examine how well categorical and probabilistic phonotactic learning models extract grammars which predict Polish speakers' acceptability judgments of words with varied initial consonant clusters. Polish is an especially interesting language to look at because of its rich inventory of sonority-sequencing defying consonant clusters, often as a result of yer-deletion. In line with results by Gorman (2013) and Durvasula (2020), we find that the categorical baselines considered here generally outperformed the Hayes and Wilson's (2008) maximum-entropy based phonotactic learner. We conclude that gradient acceptability judgments do not provide unambiguous evidence for gradient, probabilistic grammars.
Recent grants
SCH: GEAR - Grounded Early Adaptive Rehabilitation
NIH · $1.5M · 2015–2020
Frequent coauthors
- 17 shared
Herbert G. Tanner
University of Delaware
- 16 shared
Jane Chandlee
Haverford College
- 14 shared
Jonathan Rawski
San Jose State University
- 12 shared
Adam Jardine
Rutgers Sexual and Reproductive Health and Rights
- 10 shared
Colin de la Higuera
Nantes Université
- 9 shared
Menno van Zaanen
- 8 shared
R. Eyraud
Laboratoire Hubert Curien
- 8 shared
Harry van der Hülst
Education
- 2001
Ph.D., Computer and Information Science
University of Pennsylvania
- 1997
M.S., Computer and Information Science
University of Pennsylvania
- 1995
B.S., Computer and Information Science
University of Pennsylvania
Awards & honors
- 2017 Early Career Award from the Linguistic Society of Ameri…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Jeffrey Heinz
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup