
About
Professor Claire Bowern is a faculty member at Yale University with a focus on linguistics. Her research encompasses language documentation, linguistic fieldwork, and the study of language evolution and cultural evolution. She has advised numerous students and postdoctoral researchers, contributing to the academic community through mentorship and scholarly publications. Her work is recognized for its depth in understanding language morphosyntax and the documentation of diverse languages.
Research topics
- Computer Science
- Linguistics
- Philosophy
- Psychology
- Speech recognition
- Programming language
- Artificial Intelligence
- Sociology
- Biology
- Machine Learning
- Acoustics
- Geography
- Anthropology
- Cognitive psychology
- Physics
Selected publications
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-13
datasetOpen accessCite the source of the dataset as: Kathryn R. Kirby, Russell D. Gray, Simon J. Greenhill, Fiona M. Jordan, Stephanie Gomes-Ng, Hans-Jörg Bibiko, Damián E. Blasi, Carlos A. Botero, Claire Bowern, Carol R. Ember, Dan Leehr, Bobbi S. Low, Joe McCarter, William Divale, and Michael C. Gavin. (2016). D-PLACE: A Global Database of Cultural, Linguistic and Environmental Diversity. PLoS ONE, 11(7): e0158391. doi:10.1371/journal.pone.0158391.
Kinbank: A global database of kinship terminology
Zenodo (CERN European Organization for Nuclear Research) · 2026-02-06
datasetOpen accessThe data repository for the Kinbank dataset
Proceedings of the Royal Society B Biological Sciences · 2026-01-28
articleOpen accessHumans collectively use thousands of languages. The number of languages in a region (i.e. 'richness') varies widely. Empirical research has identified social, environmental, geographic and demographic factors associated with language richness. However, our understanding of causal mechanisms and variation in their effects over space has been limited by prior analyses focusing on correlation and assuming stationarity. Here we use process-based, spatially explicit stochastic models to simulate the emergence, expansion, contraction, fragmentation and extinction of language ranges. We varied parameter settings in these computer-simulated experiments to evaluate the extent to which different processes reproduce observed patterns of language richness in North America. We find that the majority of spatial variation in language richness is explained by models in which environmental and social constraints determine population density, random shocks alter population sizes more frequently at higher population densities, and population shocks are more frequently negative than positive. Language diversification occurs when populations split after reaching size limits, and when ranges fragment due to population contractions following negative shocks or due to contact with other groups expanding following positive shocks. These findings support theories arguing that environmental and social conditions, constraints on group sizes, outcomes of contact and shifting demographics all shape language richness.
A Tutorial for Video in Spoken Language Documentation
Edinburgh Research Explorer (University of Edinburgh) · 2026-01-01
articleOpen accessSenior authorSpoken language always goes along with meaningful visible behavior, such as gesture and eye gaze. But while language use is multimodal, published recommendations and formal training in spoken language documentation tend to focus almost exclusively on the audio part of the signal. Therefore, this tutorial provides a practical guide to using video as part of a spoken language documentation project. We motivate why these projects should consider recording video, and we then describe the equipment needs, recording setups, and post-processing workflow required for collecting transcribable video. We also discuss the unique ethical/privacy concerns raised by video recording and archiving. Overall, our goal is to centralize and formalize the recommendations about video that have long circulated in oral form, or as grey literature, in documentation circles. The scripts in the supplementary materials are maintained <a href="https://urldefense.com/v3/__https://github.com/amaliaskilton/auto-ffmpeg__;!!PvDODwlR4mBZyAb0!RZQFbNOEre7MeT8vM-g_GcQq3N0JfiFc6Hif7Cf7NWfqL4DHxVpW4oOMYr_SnmAC3rtW_b6iMNJr_rqou2k$">here</a>.
Comparing Phonological Feature Sets for Low-Resource ASR
University of Massachusetts (UMass) Amherst · 2026-03-14
articleOpen accessSenior authorIn this paper, we explore an alternative ASR framework in which phonological features are predicted as an explicit intermediate representation, rather than predicting phones directly. Because feature systems encode cross-linguistically meaningful structure, this intermediate representation can reduce sample complexity by constraining what must be learned from limited data, while also enabling rapid adaptation to new languages through changes to the phone-to-feature mapping rather than retraining the model. As a result, this approach is particularly well suited to low-resource settings. We retrained Phonet models on two different feature sets to see the extent to which specific theories of phonological features facilitate better phoneme recognition, using a low-resourced language (Yan-nhangu, Pama-Nyungan) as a testing ground for performance. We use a naïve greedy decoding strategy to isolate the effect of feature set choice, and find that IPA features lead to the best transcription accuracy, followed closely by a featureless baseline.
Zenodo (CERN European Organization for Nuclear Research) · 2025-11-11
datasetOpen accessCite the source of the dataset as: Bouckaert RR, Bowern C & Atkinson QD. 2018. The origin and expansion of Pama–Nyungan languages across Australia. Nature Ecology and Evolution. 2: 741–749
Zenodo (CERN European Organization for Nuclear Research) · 2025-11-11
datasetOpen access1st authorCorrespondingCite the source of the dataset as: Bowern C & Atkinson QD. 2012. Computational phylogenetics and the internal structure of Pama-Nyungan. Language, 88(4), 817-845.
Oxford University Press eBooks · 2025-07-22
book-chapter1st authorCorrespondingAbstract This chapter discusses the linguistic, genetic, and archaeological stories of the Indigenous peoples of the area now known as Australia (the southern portion of Sahul). When attempting to synthesize information from genetics, archaeology, and language for the deep past of Sahul, we are confronted with several seeming contradictions. On the one hand, the picture from genetics emphasizes continuity: rapid and early expansion (above 40,000 years ago), followed by fairly stable regionalism and some subsequent gene flow. The linguistic picture, however, appears to show a heavy disjunction, with one family, Pama-Nyungan, spreading and replacing most of the languages of almost 90% of the continent within the past 7,000 years. The material record shows a combination of stability, regionalism, and shift. This chapter explores some of these questions.
Linguistically Informed Tokenization Improves ASR for Underresourced Languages
ArXiv.org · 2025-10-07
preprintOpen accessSenior authorAutomatic speech recognition (ASR) is a crucial tool for linguists aiming to perform a variety of language documentation tasks. However, modern ASR systems use data-hungry transformer architectures, rendering them generally unusable for underresourced languages. We fine-tune a wav2vec2 ASR model on Yan-nhangu, a dormant Indigenous Australian language, comparing the effects of phonemic and orthographic tokenization strategies on performance. In parallel, we explore ASR's viability as a tool in a language documentation pipeline. We find that a linguistically informed phonemic tokenization system substantially improves WER and CER compared to a baseline orthographic tokenization scheme. Finally, we show that hand-correcting the output of an ASR model is much faster than hand-transcribing audio from scratch, demonstrating that ASR can work for underresourced languages.
Diachronica · 2025-05-23 · 2 citations
article1st authorCorresponding
Recent grants
Dynamics of Hunter-Gatherer Language Change
NSF · $718k · 2008–2014
Language as a Window on Prehistory
NSF · $382k · 2014–2020
CAREER: Pama-Nyungan Reconstruction and the Prehistory of Australia
NSF · $407k · 2008–2014
The Language of Bardi (BCJ) Precontact Narratives
NSF · $13k · 2008–2012
Frequent coauthors
- 51 shared
Russell D. Gray
University of Auckland
- 48 shared
Simon J. Greenhill
University of Auckland
- 46 shared
Michael C. Gavin
Colorado State University
- 45 shared
Damián E. Blasí
- 42 shared
Kathryn R. Kirby
- 32 shared
Hannah J. Haynie
Kent State University
- 31 shared
Fiona M. Jordan
- 29 shared
Jakob Lesage
Labs
Education
- 2004
Ph.D., Linguistics
Harvard University
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Claire Bowern
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup