Rob Voigt

· Assistant Professor of Linguistics and (by courtesy) Computer ScienceVerified

Northwestern University · Computer Science

Active 2012–2026

h-index13

Citations998

Papers3618 last 5y

Funding—

Faculty page Lab page Website

See your match with Rob Voigt — sign in to PhdFit.Sign in

About

Rob Voigt is an Assistant Professor of Linguistics and Computer Science (by courtesy) at Northwestern University. His research focuses on computational linguistics, analyzing pragmatic language variability, social stereotypes, and linguistic disparities through natural language processing and multimodal analysis. Voigt has contributed to understanding racial disparities in police communication, gender bias in responses, and the polarization of political speech, among other topics. His work often involves leveraging body-worn camera footage, large language models, and cross-lingual transfer techniques to explore social and linguistic phenomena. In addition to his research, Voigt is engaged in activities related to transparency, accountability, and creating inclusive academic environments. He also maintains a personal interest in music, playing Go, and sharing knowledge about salary transparency and equity.

Research topics

Political Science
Law
Sociology
Psychology
Virology
Biology
History
Linguistics
Criminology
Immunology
Gender studies
Genetics
Social psychology
Political economy

Selected publications

Quantifying racial disparities in media representations of gun violence at scale
Proceedings of the National Academy of Sciences · 2026-01-16
articleOpen accessSenior authorCorresponding
Previous research has documented racial disparities in gun violence news coverage in limited and small-scale contexts. This study curates and analyzes a large-scale dataset of news articles linked to specific incidents of gun violence to test for systematic race-related differences in representation across the US news media. Using computational techniques, we quantify how much media attention an incident gets, the topics and linguistic style of articles, and how participants in the incidents are framed. We find significant generalized disparities in media coverage and portrayal of incidents depending on whether they occur in neighborhoods that are majority white or majority people of color (POC), including increased media attention on police shootings if they occur in majority POC neighborhoods, greater focus on the people involved in incidents in majority white neighborhoods, and increased racialization and framing related to crime in majority POC neighborhoods.
Publisher OA PDF DOI
A multi-method phenotypic study of sex differences in pragmatic language in autism
Frontiers in Psychiatry · 2026-04-22
articleOpen access
Introduction: Autism spectrum disorder (ASD) is characterized in part by differences in pragmatic (i.e., social) language use. However, few studies on pragmatic language have included a meaningful number of autistic females, and even fewer have evaluated pragmatic language profiles for sex-specific differences. The existing literature on pragmatic language in autistic individuals without intellectual disability suggests that females may have stronger social communication skills compared to males, but findings are mixed, and there is not a clear profile of specific pragmatic skills that are liable to sex differences. It is also important to develop novel methodologies, such as computational methods, to characterize pragmatic language in ways less labor-intensive than gold-standard hand-coding methods, which are extremely time consuming and typically not feasible in clinical settings. Methods: The present study examined hand coding of pragmatic language data samples alongside multiple computational linguistic methodological approaches to characterize sex differences in pragmatic language in autistic males and females across narrative and semi-structured conversational tasks that might reveal context-specific patterns across sex and diagnostic groups. Results: Results indicated that most pragmatic domains differed between autistic and non-autistic groups, with autistic males showing the most obvious pragmatic differences, and that differences between diagnostic groups were more pronounced in the semi-structured conversational context. The alignment between computational and hand-coded findings was strongest in domains with clear theoretical overlap (e.g., frequency of emotional words) but was less consistent in areas that were less theoretically aligned (e.g., conversational dynamics and single word function). Discussion: These findings support the promise of computational methods for characterizing narrative abilities, though further study including validation against hand-coded approaches is warranted.
Publisher OA PDF DOI
The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models
Underline Science Inc. · 2026-03-06
otherOpen access
Current large language models (LLMs) have demonstrated emerging capabilities in social intelligence tasks, including implicature resolution and theory-of-mind reasoning, both of which require substantial pragmatic understanding. However, how LLMs acquire this pragmatic competence throughout the training process remains poorly understood. In this work, we introduce ALTPRAG, a dataset grounded in the pragmatic concept of alternatives, to evaluate whether LLMs at different training stages can accurately infer nuanced speaker intentions. Each instance pairs two equally plausible yet pragmatically divergent continuations and requires the model to (i) infer the speaker's intended meaning and (ii) explain when and why a speaker would choose one utterance over its alternative, thus directly probing pragmatic competence through contrastive reasoning. We systematically evaluate 22 LLMs across 3 key training stages: after pre-training, supervised fine-tuning (SFT), and preference optimization, to examine the development of pragmatic competence. Our results show that even base models exhibit notable sensitivity to pragmatic cues, which improves consistently with increases in model and data scale. Additionally, SFT and RLHF contribute further gains, particularly in cognitive-pragmatic scenarios. These findings highlight pragmatic competence as an emergent and compositional property of LLM training and offer new insights for aligning models with human communicative norms.
Publisher DOI
Schizophrenia Stigma in News Media: A Comprehensive Natural Language Processing Approach
Schizophrenia Bulletin · 2026-04-10
articleOpen access
BACKGROUND AND HYPOTHESIS: Stigma toward individuals with schizophrenia is well documented, yet the extent to which stigmatizing views are present in media remains understudied. This study investigates sentiment and violence-related content in US news coverage. We hypothesize that media with schizophrenia-related keywords will have higher negative and lower positive ratings, and more violent ratings than news for illnesses of comparable burden. STUDY DESIGN: We conducted an observational comparative content analysis on 116 866 news transcripts from CNN, Fox News, and MSNBC (2016-2023). Sentiment ratings were derived using a roBERTa large language model, and violence-related language was measured using a dictionary-based natural language processing tool (LIWC-22). STUDY RESULTS: News related to schizophrenia demonstrated significantly higher negative sentiment (β = 0.078, t (4061) = 6.85, P < .001), lower positive sentiment (β = -0.10, t (4061) = -11.55, P < .001), and higher violent content (β = 0.14, t (4061) = 3.31, P < .001) compared to news on comparable illnesses. Additionally, schizophrenia-related keywords were used in clinical contexts significantly less often (SZ = 48.49%, ILL = 97.84%, Χ2(1) = 1276.4, P < .001). CONCLUSIONS: The results indicate a pervasive association between schizophrenia and negative, violent language in media, reinforcing existing stigma. These findings highlight a need for interventions targeting media portrayal to reduce public stigma against individuals with schizophrenia.
Publisher OA PDF DOI
Enhancing Procedural Justness of Encounters Through Substantiation (EPJETS): The Atlantic County Randomized Controlled Trial, New Jersey, 2022-2024
ICPSR Data Holdings · 2026-03-11
datasetOpen access
The Enhancing Procedural Justness of Encounters Through Substantiation (EPJETS) project was a collaborative initiative between researchers from Stockton, Rutgers, and Northwestern Universities and the police departments of Atlantic City and Pleasantville, New Jersey that aimed to test whether incorporating principles of procedural justice, sharing body-worn camera (BWC) footage with drivers following traffic stops, and strategically targeting identified high-traffic crash locations for enforcement could improve public trust and perceptions of police legitimacy. Between October 2022 and June 2024, the study evaluated 1,423 traffic stops conducted for speeding violations by comparing standard enforcement protocols to a novel procedural justice-based intervention. Drivers who were speeding were surveyed immediately after the stop by researchers to determine whether the EPJETS protocol positively affected their perceptions of officer treatment and the effectiveness of BWCs.
Publisher DOI
Language Choice in Nigerian Social Media Hate Speech
2026-01-01
articleOpen accessSenior author
Language choice in multilingual societies is rarely arbitrary.In Nigerian, English, Nigerian Pidgin (NP) and indigenous languages are strategically deployed in online discourse, yet little is known about how they function in hostile contexts.Here we conduct the first systematic analysis of NP in online hate speech on two platforms, Twitter and Instagram.Using a linguistically enriched annotation scheme, we label each post for class, targeted group, language variety, and hate type.Our results show that NP is disproportionately used in offensive and hateful discourse, particularly against Hausa, women, and LGBTQ+ groups, and that insults are the dominant hate strategy.Cross-domain evaluation further reveals that classifiers trained on Twitter systematically overpredict hate on Instagram, highlighting challenges of domain transfer.These findings underscore NP's role as a linguistic resource for hostility and its sociolinguistic salience in amplifying stereotypes and affect.For NLP, the work demonstrates the need for NP-specific resources, sensitivity to figurative strategies, and domain adaptation across platforms.By bridging sociolinguistics and computational modeling, this study contributes new evidence on how language choice shapes online hate speech in a multilingual African context.
Publisher OA PDF DOI
TASER: Table Agents for Schema-guided Extraction and Recommendation
Underline Science Inc. · 2026-03-06
otherOpen access
Real-world financial filings report critical information about an entity's investment holdings, essential for assessing that entity's risk, profitability, and relationship profile. Yet, these details are often buried in messy, multi-page, fragmented tables that are difficult to parse, hindering downstream QA and data normalization. Specifically, 99.4% of the tables in our financial table dataset lack bounding boxes, with the largest table spanning 44 pages. To address this, we present TASER (Table Agents for Schema-guided Extraction and Recommendation), a continuously learning, agentic table extraction system that converts highly unstructured, multi-page, heterogeneous tables into normalized, schema-conforming outputs. Guided by an initial portfolio schema, TASER executes table detection, classification, extraction, and recommendations in a single pipeline. Our Recommender Agent reviews unmatched outputs and proposes schema revisions, enabling TASER to outperform vision-based table detection models such as Table Transformer by 10.1%. Within this continuous learning process, larger batch sizes yield a 104.3% increase in useful schema recommendations and a 9.8% increase in total extractions. To train TASER, we manually labeled 22,584 pages and 3,213 tables covering $731.7 billion in holdings, culminating in TASERTab to facilitate research on real-world financial tables and structured outputs. Our results highlight the promise of continuously learning agents for robust extractions from complex tabular data.
Publisher DOI
Enhancing Procedural Justness of Encounters Through Substantiation (EPJETS): The Atlantic County Randomized Controlled Trial, New Jersey, 2022-2024
ICPSR Data Holdings · 2026-03-11
datasetOpen access
The Enhancing Procedural Justness of Encounters Through Substantiation (EPJETS) project was a collaborative initiative between researchers from Stockton, Rutgers, and Northwestern Universities and the police departments of Atlantic City and Pleasantville, New Jersey that aimed to test whether incorporating principles of procedural justice, sharing body-worn camera (BWC) footage with drivers following traffic stops, and strategically targeting identified high-traffic crash locations for enforcement could improve public trust and perceptions of police legitimacy. Between October 2022 and June 2024, the study evaluated 1,423 traffic stops conducted for speeding violations by comparing standard enforcement protocols to a novel procedural justice-based intervention. Drivers who were speeding were surveyed immediately after the stop by researchers to determine whether the EPJETS protocol positively affected their perceptions of officer treatment and the effectiveness of BWCs.
Publisher DOI
The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models
2026-01-01
articleOpen accessSenior author
Kefan Yu, Qingcheng Zeng, Weihao Xuan, Wanxin Li, Jingyi Wu, Rob Voigt. Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2026.
Publisher OA PDF DOI
Language Choice in Nigerian Social Media Hate Speech
Underline Science Inc. · 2026-03-14
otherOpen accessSenior author
Language choice in multilingual societies is rarely arbitrary. In Nigerian, English, Nigerian Pidgin (NP) and indigenous languages are strategically deployed in online discourse, yet little is known about how they function in hostile contexts. Here we conduct the first systematic analysis of NP in online hate speech on two platforms, Twitter and Instagram. Using a linguistically enriched annotation scheme, we label each post for class, targeted group, language variety, and hate type. Our results show that NP is disproportionately used in offensive and hateful discourse, particularly against Hausa, women, and LGBTQ+ groups, and that insults are the dominant hate strategy. Cross-domain evaluation further reveals that classifiers trained on Twitter systematically over-predict hate on Instagram, highlighting challenges of domain transfer. These findings underscore NP’s role as a linguistic resource for hostility and its sociolinguistic salience in amplifying stereotypes and affect. For NLP, the work demonstrates the need for NP-specific resources, sensitivity to figurative strategies, and domain adaptation across platforms. By bridging sociolinguistics and computational modeling, this study contributes new evidence on how language choice shapes online hate speech in a multilingual African context.
Publisher DOI

Frequent coauthors

Leah Platt Boustan
Princeton University
21 shared
Ran Abramitzky
Stanford University
21 shared
Y M C Ruijs
20 shared
Laurence Meyer
Université Paris-Saclay
20 shared
Peter Catron
University of Washington
19 shared
Dylan S. Connor
Arizona State University
19 shared
Peter Reiss
Stichting HIV Monitoring
16 shared
Dan Jurafsky
14 shared

Labs

CANS and CANTS LabPI
Computational Potentials for Multimodality with a Case Study in Head Position

Awards & honors

2nd Place Best Paper Award (2014)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Rob Voigt

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you