Stefan Thomas Gries

Verified

University of California, Santa Barbara · French and Italian Studies

Active 1988–2025

h-index62

Citations15.5k

Papers33196 last 5y

Funding—

Faculty page

See your match with Stefan Thomas Gries — sign in to PhdFit.Sign in

About

Stefan Thomas Gries is a professor in the Department of Linguistics at the University of California, Santa Barbara. His areas of specialization include quantitative corpus linguistics, statistical and computational linguistics, cognitive linguistics, and first and second language acquisition. He is involved in research that applies quantitative methods to linguistic data, contributing to the understanding of language learning and use through empirical and computational approaches. His work integrates insights from cognitive linguistics and employs statistical techniques to analyze language patterns, advancing the field of applied linguistics and language acquisition studies.

Research topics

Linguistics
Computer science
Natural language processing
Artificial intelligence
Psychology

Selected publications

Collostructional Analysis
Elsevier eBooks · 2025-01-01
book-chapterSenior author
Publisher DOI
Metacognition of frequency, directional association strength, and dispersion of MWEs in first and second language speakers
Studies in Second Language Acquisition · 2025-12-04
articleOpen access
Abstract Statistical regularities can be acquired from usage. To examine language speakers’ statistical metacognition about multiword expressions (MWEs), we collected ratings for frequency, dispersion, and directional association strength of English binomials from L1, advanced and intermediate L2 speakers. Mixed-effects modeling showed all speakers had limited speaker-to-corpus consistency but significant sensitivity to statistical regularities of language, supporting usage-based (Gries & Ellis, 2015) and statistical learning theories (Christiansen, 2019). Their statistical metacognition was also shaped by word-level cues, consistent with dual-route model (Carrol & Conklin, 2014). Despite similarities, frequency metacognition showed the strongest speaker-to-corpus consistency, while dispersion metacognition was the hardest to develop. Advanced L2 speakers showed the greatest speaker-to-corpus consistency and sensitivity, while lower-proficiency speakers relied more on word-level cues in metacognitive judgments, supporting the shallow-structure hypothesis (Clahsen & Felser, 2006). Overall, L1 and L2 speakers develop diverse statistical metacognition, with L2 speakers not necessarily inferior, suggesting that statistical metacognition is not solely shaped by usage-based experience.
Publisher DOI
Corpora in World Englishes
The Encyclopedia of Applied Linguistics · 2025-12-02
otherSenior author
Abstract In this overview, we survey recent and current developments in corpus‐based research on World Englishes. We exemplify current strands of research in both more theoretical and more applied parts of research on varieties of English and conclude with theoretical, methodological, and resource desiderata.
Publisher DOI
Same, same, but erm sort of different? Comparing three kinds of fluencemes across Australian, British, Canadian, and New Zealand English
Research in Corpus Linguistics · 2025-01-01
articleOpen accessSenior author
Although L1-English fluency has been extensively studied from many angles, few contrastive studies examine whether fluency develops similarly or differently across L1-varieties while taking sociolinguistic variation into consideration. This paper aims to close this research gap and examines the use of three core strategies of fluency (or fluencemes), i.e. discourse markers, filled pauses and unfilled pauses, across Australian, British, Canadian, and New Zealand English. These fluencemes were extracted and manually disambiguated from the private conversation sections of the respective components of the International Corpus of English (ICE-AUS, ICE-GB, ICE-CAN, and ICE-NZ). The data were normalised per speaker and linked with the sociobiographic metadata of the speakers. Analysis using random forests revealed a consistent fluenceme distribution across the four varieties, with unfilled pauses being the most common, followed by discourse markers, and then filled pauses. This pattern suggests a ‘common fluenceme core’ among L1-English varieties. The influence of sociolinguistic variables —gender, age, education, and occupation— was modest across varieties and exhibited diverse trends. Male speakers tend to use filled pauses more frequently but fewer unfilled pauses compared to female speakers. Increasing age did not significantly affect the frequency of these strategies; however, older speakers tend to use discourse markers less frequently. Both education and occupation showed a slight positive correlation with overall fluency.
Publisher DOI
Corpus Linguistics and Psycholinguistics
Elsevier eBooks · 2025-01-01
book-chapter1st authorCorresponding
Publisher DOI
Closing remarks and outlook
International Journal of Corpus Linguistics · 2025-06-10 · 2 citations
article1st authorCorresponding
Publisher DOI
Incorporating Corpora in Second‐Language Acquisition Research
The Encyclopedia of Applied Linguistics · 2025-12-02
other1st authorCorresponding
Abstract This article discusses the use of corpus‐linguistic methods in second‐language acquisition research. It focuses on measurement applications, specific linguistic case studies, and an evaluation coupled with an outlook over desiderata for the future.
Publisher DOI
Corpus Linguistics and the Cognitive/Constructional Endeavor
Cambridge University Press eBooks · 2025-01-30 · 1 citations
book-chapter1st authorCorresponding
Publisher DOI
Similative-pretence constructions in language contact situations
Cognitive Linguistic Studies · 2025-11-10 · 1 citations
articleSenior author
Abstract The present study introduces a method that can be used to explore in a quantitatively rigorous yet less demanding way (both in terms of data and statistical requirements) how constructional templates and their lexical preferences (lexico-syntactic transference) diffuse in language contact situations. The study investigates the influence of Mexican Spanish similative-pretence constructions on Huasteca Nahuatl similative-pretence constructions as a proof-of-concept kind of application for our method. Speakers of Huasteca Nahuatl have borrowed the markers komo ‘like’ and komo si ‘as if’ from Mexican Spanish to express similative (e.g., she swims like a fish ) and pretence meanings (e.g., she swims as if she were a fish ), respectively. Using a conditional inference forest, the paper demonstrates that speakers of Huasteca Nahuatl have not only borrowed these markers from Mexican Spanish, but also lexical preferences (e.g., verb lemmas) of the constructions in which these markers occur. These findings show that the rigid partition of structural levels that has been adopted by traditional models of language contact proves inadequate for describing complex language situations. The method introduced here provides an integrative, non-modular way to explore language contact from a Usage-Based Construction Grammar perspective.
Publisher DOI
Cultural Keywords in Varieties Research
Journal of Research Design and Statistics in Linguistics and Communication Science · 2025-07-02
article1st authorCorresponding
One of the four most central corpus-linguistic methods is keywords/keyness analysis, which is generally the identification and interpretation of word types of a target corpus ( T) that, when compared to their occurrence in a reference corpus ( R), are key/characteristic for T. In this article, I will (a) apply methods proposed by Gries (2021) to the study of three outer-circle varieties of English to identify cultural keywords in a bottom-up fashion and (b) use the results from that first application to advance two suggestions how to extend keyness analyses to better understand the keywords from the first step: key collocates, which involves applying keyness to contexts of keywords; and deep key collocates, which involves distributional semantics methods like word2vec, GloVe, BERT, etc. to keywords. I will use Mukherjee and Bernaisch's (2015) keyness analysis as a launchpad to identify keywords from comparisons of Indian, Pakistani, and Sri Lankan Englishes (IndE, PakE, and SriE, respectively) and zoom in on the variety-specific differences of the keyword of terror. The results not only indicate what terms are key for which of the three varieties; they also allow for a new level of granularity in how keywords use differs across varieties and possibly cultures. For example, in the PakE data, newspaper coverage of terror is mostly discussed with regard its financial aspects and implications and matters of communication, whereas in IndE and SriE, terror is much more approached from a military and a religious perspective, respectively. 1
Publisher DOI

Frequent coauthors

Stefanie Wulff
UiT The Arctic University of Norway
109 shared
Anatol Stefanowitsch
75 shared
Martin Hilpert
University of Neuchâtel
71 shared
Santa Barbara
University of California, Santa Barbara
67 shared
Susanne Flach
Catholic University of Eichstätt-Ingolstadt
66 shared
Anna Birmingham
University of Florida
66 shared
Magali Mccauley
Baidu (China)
66 shared
Neuchâtel Keller
University of Florida
62 shared

Education

Ph.D., Department of British and American Studies
Universität Hamburg
2000
M.A., Department of British and American Studies
Universität Hamburg
1998

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Stefan Thomas Gries

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you