
Dr. Fraser Brown
· Dr.VerifiedCarnegie Mellon University · Software and Societal Systems Department
Active 1979–2026
About
Dr. Fraser Brown specializes in verification and security techniques for complex software systems. His research focuses on areas including program correctness, compiler verification, systems security, cryptographic proof systems, and bug finding. Dr. Brown's expertise encompasses the development and application of methods to ensure the correctness and security of software, particularly in challenging domains such as browser just-in-time (JIT) compilers and cryptographic compilers. His work contributes to advancing the reliability and security of software systems through rigorous verification and innovative security approaches.
Research topics
- Computer Science
- Psychology
- Natural Language Processing
- Linguistics
- Political Science
- Artificial Intelligence
- Sociology
- Philosophy
- Social psychology
- Multimedia
- Gender studies
- Mathematics education
- History
Selected publications
Reconsidering Corpus Methods in HEL
Journal of English Linguistics · 2026-05-09
articleOpen access1st authorCorrespondingTurning text into numbers—and ultimately into an interpretable result—typically involves a processing pipeline. Whether explicit or hidden inside code, each step in that pipeline involves a host of decisions. This article follows a pipeline and unpacks some of the attendant decisions that analysts face at various stages, highlighting the specific challenges that confront analysts investigating historical data. Data, for example, can be less available; the quality of those data can be poor; and the development of tools and models can lag when compared to what is available for present-day English. In the face of such challenges, options are explored, but no argument is made for a particular approach or method—results and their interpretation will be shaped by analysts’ epistemic commitments and expertise. An argument is made, however, that an effective processing pipeline will be generative. It will be one that not only can be explained and defended but also can help us see the histories of English—its structures, its uses, and its users—in new ways. In that spirit, a processing pipeline can be understood not as a series of immobilizing decisions, but as an invitation.
mda.biber: Functions for Multi-Dimensional Analysis
2025-10-07
datasetOpen access1st authorCorrespondingMulti-Dimensional Analysis (MDA) is an adaptation of factor analysis developed by Douglas Biber (1992) <<a href="https://doi.org/10.1007%2FBF00136979" target="_top">doi:10.1007/BF00136979</a>>. Its most common use is to describe language as it varies by genre, register, and use. This package contains functions for carrying out the calculations needed to describe and plot MDA results: dimension scores, dimension means, and factor loadings.
Developing Students’ Statistical Expertise Through Writing in the Age of AI
Journal of Statistics and Data Science Education · 2025-04-28
articleOpen accessSenior authorAs large language models (LLMs) such as GPT have become more accessible, concerns about their potential effects on students’ learning have grown. In data science education, the specter of students’ turning to LLMs raises multiple issues, as writing is a means not just of conveying information but of developing their statistical reasoning. In our study, we engage with questions surrounding LLMs and their pedagogical impact by: (a) quantitatively and qualitatively describing how select LLMs write report introductions and complete data analysis reports; and (b) comparing patterns in texts authored by LLMs to those authored by students and by published researchers. Our results show distinct differences between machine-generated and human-generated writing, as well as between novice and expert writing. Those differences are evident in how writers manage information, modulate confidence, signal importance, and report statistics. The findings can help inform classroom instruction, whether that instruction is aimed at dissuading the use of LLMs or at guiding their use as a productivity tool. It also has implications for students’ development as statistical thinkers and writers. What happens when they offload the work of data science to a model that doesn’t write quite like a data scientist? Supplementary materials for this article are available online.
spell.replacer: Probabilistic Spelling Correction in a Character Vector
2025-09-03
datasetOpen access1st authorCorrespondingAutomatically replaces "misspelled" words in a character vector based on their string distance from a list of words sorted by their frequency in a corpus. The default word list provided in the package comes from the Corpus of Contemporary American English. Uses the Jaro-Winkler distance metric for string similarity as implemented in van der Loo (2014) <<a href="https://doi.org/10.32614%2FRJ-2014-011" target="_top">doi:10.32614/RJ-2014-011</a>>. The word frequency data is derived from Davies (2008-) "The Corpus of Contemporary American English (COCA)" <<a href="https://www.english-corpora.org/coca/" target="_top">https://www.english-corpora.org/coca/</a>>.
Do LLMs write like humans? Variation in grammatical and rhetorical styles
Proceedings of the National Academy of Sciences · 2025-02-18 · 32 citations
articleOpen accessSenior authorLarge language models (LLMs) are capable of writing grammatical text that follows instructions, answers questions, and solves problems. As they have advanced, it has become difficult to distinguish their output from human-written text. While past research has found some differences in features such as word choice and punctuation and developed classifiers to detect LLM output, none has studied the rhetorical styles of LLMs. Using several variants of Llama 3 and GPT-4o, we construct two parallel corpora of human- and LLM-written texts from common prompts. Using Douglas Biber’s set of lexical, grammatical, and rhetorical features, we identify systematic differences between LLMs and humans and between different LLMs. These differences persist when moving from smaller models to larger ones and are larger for instruction-tuned models than base models. This observation of differences demonstrates that despite their advanced abilities, LLMs struggle to match human stylistic variation. Attention to more advanced linguistic features can hence detect patterns in their behavior not previously recognized.
Do LLMs write like humans? Variation in grammatical and rhetorical styles
arXiv (Cornell University) · 2024-10-21
preprintOpen accessSenior authorLarge language models (LLMs) are capable of writing grammatical text that follows instructions, answers questions, and solves problems. As they have advanced, it has become difficult to distinguish their output from human-written text. While past research has found some differences in surface features such as word choice and punctuation, and developed classifiers to detect LLM output, none has studied the rhetorical styles of LLMs. Using several variants of Llama 3 and GPT-4o, we construct two parallel corpora of human- and LLM-written texts from common prompts. Using Douglas Biber's set of lexical, grammatical, and rhetorical features, we identify systematic differences between LLMs and humans and between different LLMs. These differences persist when moving from smaller models to larger ones, and are larger for instruction-tuned models than base models. This observation of differences demonstrates that despite their advanced abilities, LLMs struggle to match human stylistic variation. Attention to more advanced linguistic features can hence detect patterns in their behavior not previously recognized.
The “chanification” of white supremacist extremism
Computational and Mathematical Organization Theory · 2024-09-18 · 1 citations
articleOpen accessAbstract Much research has focused on the role of the alt-right in pushing far-right narratives into mainstream discourse. In this work, we focus on the alt-right’s effects on extremist narratives themselves. From 2012 to 2017, we find a rise in alt-right, 4chan-like discourse styles across multiple communication platforms known for white supremacist extremism, such as Stormfront. This discourse style incorporates inflammatory insults, irreverent comments, and talk about memes and online “chan” culture itself. A network analysis of one far-right extremist platform suggests that central users adopt and spread this alt-right style. This analysis has implications for understanding influence and change in online white supremacist extremism, as well as the role of style in white supremacist communications. Warning: This paper contains examples of hateful and offensive language.
Assessing Writing · 2024-03-30 · 1 citations
articleOpen accessRecently, formative feedback in writing instruction has been supported by technologies generally referred to as Automated Writing Evaluation tools. However, such tools are limited in their capacity to explore specific disciplinary genres, and they have shown mixed results in student writing improvement. We explore how technology-enhanced writing interventions can positively affect student attitudes toward and beliefs about writing, both reinforcing content knowledge and increasing student motivation. Using a student-facing text-visualization tool called Write & Audit, we hosted revision workshops for students (n = 30) in an introductory-level statistics course at a large North American University. The tool is designed to be flexible: instructors of various courses can create expectations and predefine topics that are genre-specific. In this way, students are offered non-evaluative formative feedback which redirects them to field-specific strategies. To gauge the usefulness of Write & Audit, we used a previously validated survey instrument designed to measure the construct model of student motivation (Ling et al. 2021). Our results show significant increases in student self-efficacy and beliefs about the importance of content in successful writing. We contextualize these findings with data from three student think-aloud interviews, which demonstrate metacognitive awareness while using the tool. Ultimately, this exploratory study is non-experimental, but it contributes a novel approach to automated formative feedback and confirms the promising potential of Write & Audit.
pseudobibeR: Aggregate Counts of Linguistic Features
2024-11-19
datasetOpen access1st authorCorrespondingCalculates the lexicogrammatical and functional features described by Biber (1985) <<a href="https://doi.org/10.1515%2Fling.1985.23.2.337" target="_top">doi:10.1515/ling.1985.23.2.337</a>> and widely used for text-type, register, and genre classification tasks.
Dense and Disconnected: Analyzing the Sedimented Style of ChatGPT-Generated Text at Scale
Written Communication · 2024-08-04 · 19 citations
articleChatGPT and other LLMs are at the forefront of pedagogical considerations in classrooms across the academy. Many studies have spoken to the technology’s capacity to generate one-off texts in a variety of genres. This study complements those by inquiring into its capacity to generate compelling texts at scale. In this study, we quantitatively and qualitatively analyze a small corpus of generated texts in two genres and gauge it against novice and published academic writers along known dimensions of linguistic variation. Theoretically, we position and historicize ChatGPT as a writing technology and consider the ways in which generated text may not be congruent with established trajectories of writing development in higher education. Our study found that generated texts are more informationally dense than authored texts and often read as dialogically closed, “empty,” and “fluffy.” We close with a discussion of potentially explanatory linguistic features, as well as relevant pedagogical implications.
Frequent coauthors
- 6 shared
Michael Yoder
- 6 shared
Michael Laudenbach
New Jersey Institute of Technology
- 6 shared
Kathleen M. Carley
Carnegie Mellon University
- 4 shared
Alex Reinhart
- 4 shared
Suguru Ishizaki
Carnegie Mellon University
- 3 shared
Peggy Albers
Georgia State University
- 3 shared
Gordon Weinberg
- 2 shared
Ben Markey
Carnegie Mellon University
Labs
Education
B.A., English
Stanford
Ph.D., Computer Science
Stanford
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Dr. Fraser Brown
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup