Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Rob Voigt

Rob Voigt

· Assistant Professor of Linguistics and (by courtesy) Computer ScienceVerified

Northwestern University · Computer Science

Active 2012–2026

h-index13
Citations998
Papers3618 last 5y
Funding
See your match with Rob Voigt — sign in to PhdFit.Sign in

About

Rob Voigt is an Assistant Professor of Linguistics and Computer Science (by courtesy) at Northwestern University. His research focuses on computational linguistics, analyzing pragmatic language variability, social stereotypes, and linguistic disparities through natural language processing and multimodal analysis. Voigt has contributed to understanding racial disparities in police communication, gender bias in responses, and the polarization of political speech, among other topics. His work often involves leveraging body-worn camera footage, large language models, and cross-lingual transfer techniques to explore social and linguistic phenomena. In addition to his research, Voigt is engaged in activities related to transparency, accountability, and creating inclusive academic environments. He also maintains a personal interest in music, playing Go, and sharing knowledge about salary transparency and equity.

Research topics

  • Political Science
  • Law
  • Sociology
  • Psychology
  • Virology
  • Biology
  • History
  • Linguistics
  • Criminology
  • Immunology
  • Gender studies
  • Genetics
  • Social psychology
  • Political economy

Selected publications

  • Quantifying racial disparities in media representations of gun violence at scale

    Proceedings of the National Academy of Sciences · 2026-01-16

    articleOpen accessSenior authorCorresponding

    Previous research has documented racial disparities in gun violence news coverage in limited and small-scale contexts. This study curates and analyzes a large-scale dataset of news articles linked to specific incidents of gun violence to test for systematic race-related differences in representation across the US news media. Using computational techniques, we quantify how much media attention an incident gets, the topics and linguistic style of articles, and how participants in the incidents are framed. We find significant generalized disparities in media coverage and portrayal of incidents depending on whether they occur in neighborhoods that are majority white or majority people of color (POC), including increased media attention on police shootings if they occur in majority POC neighborhoods, greater focus on the people involved in incidents in majority white neighborhoods, and increased racialization and framing related to crime in majority POC neighborhoods.

  • A multi-method phenotypic study of sex differences in pragmatic language in autism

    Frontiers in Psychiatry · 2026-04-22

    articleOpen access

    Introduction: Autism spectrum disorder (ASD) is characterized in part by differences in pragmatic (i.e., social) language use. However, few studies on pragmatic language have included a meaningful number of autistic females, and even fewer have evaluated pragmatic language profiles for sex-specific differences. The existing literature on pragmatic language in autistic individuals without intellectual disability suggests that females may have stronger social communication skills compared to males, but findings are mixed, and there is not a clear profile of specific pragmatic skills that are liable to sex differences. It is also important to develop novel methodologies, such as computational methods, to characterize pragmatic language in ways less labor-intensive than gold-standard hand-coding methods, which are extremely time consuming and typically not feasible in clinical settings. Methods: The present study examined hand coding of pragmatic language data samples alongside multiple computational linguistic methodological approaches to characterize sex differences in pragmatic language in autistic males and females across narrative and semi-structured conversational tasks that might reveal context-specific patterns across sex and diagnostic groups. Results: Results indicated that most pragmatic domains differed between autistic and non-autistic groups, with autistic males showing the most obvious pragmatic differences, and that differences between diagnostic groups were more pronounced in the semi-structured conversational context. The alignment between computational and hand-coded findings was strongest in domains with clear theoretical overlap (e.g., frequency of emotional words) but was less consistent in areas that were less theoretically aligned (e.g., conversational dynamics and single word function). Discussion: These findings support the promise of computational methods for characterizing narrative abilities, though further study including validation against hand-coded approaches is warranted.

  • The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models

    Underline Science Inc. · 2026-03-06

    otherOpen access

    Current large language models (LLMs) have demonstrated emerging capabilities in social intelligence tasks, including implicature resolution and theory-of-mind reasoning, both of which require substantial pragmatic understanding. However, how LLMs acquire this pragmatic competence throughout the training process remains poorly understood. In this work, we introduce ALTPRAG, a dataset grounded in the pragmatic concept of alternatives, to evaluate whether LLMs at different training stages can accurately infer nuanced speaker intentions. Each instance pairs two equally plausible yet pragmatically divergent continuations and requires the model to (i) infer the speaker's intended meaning and (ii) explain when and why a speaker would choose one utterance over its alternative, thus directly probing pragmatic competence through contrastive reasoning. We systematically evaluate 22 LLMs across 3 key training stages: after pre-training, supervised fine-tuning (SFT), and preference optimization, to examine the development of pragmatic competence. Our results show that even base models exhibit notable sensitivity to pragmatic cues, which improves consistently with increases in model and data scale. Additionally, SFT and RLHF contribute further gains, particularly in cognitive-pragmatic scenarios. These findings highlight pragmatic competence as an emergent and compositional property of LLM training and offer new insights for aligning models with human communicative norms.

  • Schizophrenia Stigma in News Media: A Comprehensive Natural Language Processing Approach

    Schizophrenia Bulletin · 2026-04-10

    articleOpen access

    BACKGROUND AND HYPOTHESIS: Stigma toward individuals with schizophrenia is well documented, yet the extent to which stigmatizing views are present in media remains understudied. This study investigates sentiment and violence-related content in US news coverage. We hypothesize that media with schizophrenia-related keywords will have higher negative and lower positive ratings, and more violent ratings than news for illnesses of comparable burden. STUDY DESIGN: We conducted an observational comparative content analysis on 116 866 news transcripts from CNN, Fox News, and MSNBC (2016-2023). Sentiment ratings were derived using a roBERTa large language model, and violence-related language was measured using a dictionary-based natural language processing tool (LIWC-22). STUDY RESULTS: News related to schizophrenia demonstrated significantly higher negative sentiment (β = 0.078, t (4061) = 6.85, P < .001), lower positive sentiment (β = -0.10, t (4061) = -11.55, P < .001), and higher violent content (β = 0.14, t (4061) = 3.31, P < .001) compared to news on comparable illnesses. Additionally, schizophrenia-related keywords were used in clinical contexts significantly less often (SZ = 48.49%, ILL = 97.84%, Χ2(1) = 1276.4, P < .001). CONCLUSIONS: The results indicate a pervasive association between schizophrenia and negative, violent language in media, reinforcing existing stigma. These findings highlight a need for interventions targeting media portrayal to reduce public stigma against individuals with schizophrenia.

  • Enhancing Procedural Justness of Encounters Through Substantiation (EPJETS): The Atlantic County Randomized Controlled Trial, New Jersey, 2022-2024

    ICPSR Data Holdings · 2026-03-11

    datasetOpen access

    The Enhancing Procedural Justness of Encounters Through Substantiation (EPJETS) project was a collaborative initiative between researchers from Stockton, Rutgers, and Northwestern Universities and the police departments of Atlantic City and Pleasantville, New Jersey that aimed to test whether incorporating principles of procedural justice, sharing body-worn camera (BWC) footage with drivers following traffic stops, and strategically targeting identified high-traffic crash locations for enforcement could improve public trust and perceptions of police legitimacy. Between October 2022 and June 2024, the study evaluated 1,423 traffic stops conducted for speeding violations by comparing standard enforcement protocols to a novel procedural justice-based intervention. Drivers who were speeding were surveyed immediately after the stop by researchers to determine whether the EPJETS protocol positively affected their perceptions of officer treatment and the effectiveness of BWCs.

  • Language Choice in Nigerian Social Media Hate Speech

    2026-01-01

    articleOpen accessSenior author

    Language choice in multilingual societies is rarely arbitrary.In Nigerian, English, Nigerian Pidgin (NP) and indigenous languages are strategically deployed in online discourse, yet little is known about how they function in hostile contexts.Here we conduct the first systematic analysis of NP in online hate speech on two platforms, Twitter and Instagram.Using a linguistically enriched annotation scheme, we label each post for class, targeted group, language variety, and hate type.Our results show that NP is disproportionately used in offensive and hateful discourse, particularly against Hausa, women, and LGBTQ+ groups, and that insults are the dominant hate strategy.Cross-domain evaluation further reveals that classifiers trained on Twitter systematically overpredict hate on Instagram, highlighting challenges of domain transfer.These findings underscore NP's role as a linguistic resource for hostility and its sociolinguistic salience in amplifying stereotypes and affect.For NLP, the work demonstrates the need for NP-specific resources, sensitivity to figurative strategies, and domain adaptation across platforms.By bridging sociolinguistics and computational modeling, this study contributes new evidence on how language choice shapes online hate speech in a multilingual African context.

  • TASER: Table Agents for Schema-guided Extraction and Recommendation

    Underline Science Inc. · 2026-03-06

    otherOpen access

    Real-world financial filings report critical information about an entity's investment holdings, essential for assessing that entity's risk, profitability, and relationship profile. Yet, these details are often buried in messy, multi-page, fragmented tables that are difficult to parse, hindering downstream QA and data normalization. Specifically, 99.4% of the tables in our financial table dataset lack bounding boxes, with the largest table spanning 44 pages. To address this, we present TASER (Table Agents for Schema-guided Extraction and Recommendation), a continuously learning, agentic table extraction system that converts highly unstructured, multi-page, heterogeneous tables into normalized, schema-conforming outputs. Guided by an initial portfolio schema, TASER executes table detection, classification, extraction, and recommendations in a single pipeline. Our Recommender Agent reviews unmatched outputs and proposes schema revisions, enabling TASER to outperform vision-based table detection models such as Table Transformer by 10.1%. Within this continuous learning process, larger batch sizes yield a 104.3% increase in useful schema recommendations and a 9.8% increase in total extractions. To train TASER, we manually labeled 22,584 pages and 3,213 tables covering $731.7 billion in holdings, culminating in TASERTab to facilitate research on real-world financial tables and structured outputs. Our results highlight the promise of continuously learning agents for robust extractions from complex tabular data.

  • Enhancing Procedural Justness of Encounters Through Substantiation (EPJETS): The Atlantic County Randomized Controlled Trial, New Jersey, 2022-2024

    ICPSR Data Holdings · 2026-03-11

    datasetOpen access

    The Enhancing Procedural Justness of Encounters Through Substantiation (EPJETS) project was a collaborative initiative between researchers from Stockton, Rutgers, and Northwestern Universities and the police departments of Atlantic City and Pleasantville, New Jersey that aimed to test whether incorporating principles of procedural justice, sharing body-worn camera (BWC) footage with drivers following traffic stops, and strategically targeting identified high-traffic crash locations for enforcement could improve public trust and perceptions of police legitimacy. Between October 2022 and June 2024, the study evaluated 1,423 traffic stops conducted for speeding violations by comparing standard enforcement protocols to a novel procedural justice-based intervention. Drivers who were speeding were surveyed immediately after the stop by researchers to determine whether the EPJETS protocol positively affected their perceptions of officer treatment and the effectiveness of BWCs.

  • The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models

    2026-01-01

    articleOpen accessSenior author

    Kefan Yu, Qingcheng Zeng, Weihao Xuan, Wanxin Li, Jingyi Wu, Rob Voigt. Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2026.

  • Language Choice in Nigerian Social Media Hate Speech

    Underline Science Inc. · 2026-03-14

    otherOpen accessSenior author

    Language choice in multilingual societies is rarely arbitrary. In Nigerian, English, Nigerian Pidgin (NP) and indigenous languages are strategically deployed in online discourse, yet little is known about how they function in hostile contexts. Here we conduct the first systematic analysis of NP in online hate speech on two platforms, Twitter and Instagram. Using a linguistically enriched annotation scheme, we label each post for class, targeted group, language variety, and hate type. Our results show that NP is disproportionately used in offensive and hateful discourse, particularly against Hausa, women, and LGBTQ+ groups, and that insults are the dominant hate strategy. Cross-domain evaluation further reveals that classifiers trained on Twitter systematically over-predict hate on Instagram, highlighting challenges of domain transfer. These findings underscore NP’s role as a linguistic resource for hostility and its sociolinguistic salience in amplifying stereotypes and affect. For NLP, the work demonstrates the need for NP-specific resources, sensitivity to figurative strategies, and domain adaptation across platforms. By bridging sociolinguistics and computational modeling, this study contributes new evidence on how language choice shapes online hate speech in a multilingual African context.

Frequent coauthors

Labs

  • CANS and CANTS LabPI

    Computational Potentials for Multimodality with a Case Study in Head Position

Awards & honors

  • 2nd Place Best Paper Award (2014)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Rob Voigt

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup