AJ Alvero

· Assistant Research Professor & Computational SociologistVerified

Cornell University · Sociology

Active 2019–2025

h-index6

Citations165

Papers3835 last 5y

Funding—

Faculty page Lab page Website

See your match with AJ Alvero — sign in to PhdFit.Sign in

About

AJ Alvero is an Assistant Research Professor at the Cornell University Center for Data Science for Enterprise and Society, with affiliations in the Departments of Information Science and Sociology, and membership in the Future of Learning Lab. His research focuses on questions about outward expression and personal essence, such as whether it is possible to separate the art from the artist, the author from the text, or the speaker from their speech. Using these questions as intellectual starting points, he examines culture, language, and social processes of evaluation, particularly in high-stakes settings like college admissions and parole hearings, through the analysis of massive amounts of data generated in these contexts. He leverages administrative, experimental, synthetic AI, and social media data to demonstrate how individual and shared identity shape human expression, aiming to better understand how demography and culture intersect to create the information that builds and fuels the technology used to sort, stratify, and organize social life. His work has increasingly drawn him toward examining the tools, methods, and frameworks used by researchers in these settings. AJ Alvero is a computational social scientist whose research addresses sociological inquiries related to artificial intelligence, culture, language, education, race and ethnicity, and organizational decision making. His scholarship has appeared or is forthcoming in journals such as Science Advances, Poetics, The Oxford Handbook of the Sociology of Machine Learning, the American Journal of Sociology, Sociological Methods & Research, and Big Data & Society. He earned his PhD and an MS in statistics from Stanford University. Before entering academia, AJ was a high school English teacher in Miami, Florida.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Political Science
Computer Science
Sociology
Artificial Intelligence
Psychology
Social Science
Economics
Social psychology
Natural Language Processing
Law
Geography
Mathematics
Machine Learning
Philosophy
Cartography
History
Management
Gender studies
Accounting
Business
Public relations
Epistemology
Linguistics
Statistics

Selected publications

Digital Accents, Homogeneity-by-Design, and the Evolving Social Science of Written Language
2025-01-30
preprintOpen access1st authorCorresponding
*** Published in the Annual Review of Applied Linguistics: https://www.cambridge.org/core/journals/annual-review-of-applied-linguistics/article/abs/digital-accents-homogeneitybydesign-and-the-evolving-social-science-of-written-language/6F0DF411B71E82778B88F99F6E81FFBD *** Human language is increasingly written rather than just spoken, primarily due to the proliferation of digital technology in modern life. This trend has enabled the creation of generative AI trained on corpora containing trillions of words extracted from text on the internet. However, current language theory inadequately addresses digital text communication's unique characteristics and constraints. This paper systematically analyzes and synthesizes existing literature to map the theoretical landscape of digital language evolution. The evidence demonstrates that, parallel to spoken language, features of written communication are frequently correlated with the socially constructed demographic identities of writers, a phenomenon we refer to as ``digital accents.'' This conceptualization raises complex ontological questions about the nature of digital text and its relationship to identity. The same line of questioning, in conjunction with recent research, shows how generative AI systematically fails to capture the breadth of expression observed in human writing, an outcome we call ``homogeneity-by-design.'' By approaching text-based language from this theoretical framework while acknowledging its inherent limitations, social scientists studying language can strengthen their critical analysis of artificial intelligence systems and contribute meaningful insights to their development and improvement.
Publisher DOI
Generative AI Meets Open-Ended Survey Responses: Research Participant Use of AI and Homogenization
2025-02-24 · 4 citations
preprintOpen accessSenior author
The growing popularity of generative AI tools presents new challenges for data quality in online surveys and experiments. This study examines participants’ use of large language models to answer open-ended survey questions and describes empirical tendencies in human vs LLM-generated text responses. In an original survey of participants recruited from a popular online platform for sourcing social science research subjects, 34% reported using LLMs to help them answer open-ended survey questions. Simulations comparing human-written responses from three pre-ChatGPT studies with LLM-generated text reveal that LLM responses are more homogeneous and positive, particularly when they describe social groups in sensitive questions. These homogenization patterns may mask important underlying social variation in attitudes and beliefs among human subjects, raising concerns about data validity. Our findings shed light on the scope and potential consequences of participants’ LLM use in online research.
Publisher OA PDF DOI
Generative AI in Sociological Research: State of the Discipline
Sociological Science · 2025-12-11 · 1 citations
articleOpen access1st authorCorresponding
Generative artificial intelligence (GenAI) has garnered considerable attention for its poten- tial utility in research and scholarship, even among those who typically do not rely on computational tools. However, early commentators have also articulated concerns about how GenAI usage comes with enormous environmental costs, serious social risks, and a tendency to produce low-quality content. In the midst of both excitement and skepticism, it is crucial to take stock of how GenAI is actually being used. Our study focuses on sociological research as our site, and here we present findings from a survey of 433 authors of articles published in 50 sociology journals in the past five years. The survey provides an overview of the state of the discipline with regard to the use of GenAI by providing answers to fundamental questions: how (much) do scholars use the technology for their research; what are their reasons for using it; and how concerned, trustful, and optimistic are they about the technology? Of the approximately one third of respondents who self-report using GenAI at least weekly, the primary uses are for writing assistance and comparatively less so in planning, data collection, or data analysis. In both use and attitudes, there are surprisingly few differences between self-identified computational and non-computational researchers. In general, respondents are very concerned about the social and environmental consequences of GenAI. Trust in GenAI outputs is low, regardless of expertise or frequency of use. Although optimism that GenAI will improve is high, scholars are divided on whether GenAI will have a positive impact on the field.
Publisher DOI
Algorithmic Tradeoffs, Applied NLP, and the State-of-the-Art Fallacy
ArXiv.org · 2025-09-10
preprintOpen access1st authorCorresponding
Computational sociology is growing in popularity, yet the analytic tools employed differ widely in power, transparency, and interpretability. In computer science, methods gain popularity after surpassing benchmarks of predictive accuracy, becoming the "state of the art." Computer scientists favor novelty and innovation for different reasons, but prioritizing technical prestige over methodological fit could unintentionally limit the scope of sociological inquiry. To illustrate, we focus on computational text analysis and revisit a prior study of college admissions essays, comparing analyses with both older and newer methods. These methods vary in flexibility and opacity, allowing us to compare performance across distinct methodological regimes. We find that newer techniques did not outperform prior results in meaningful ways. We also find that using the current state of the art, generative AI and large language models, could introduce bias and confounding that is difficult to extricate. We therefore argue that sociological inquiry benefits from methodological pluralism that aligns analytic choices with theoretical and empirical questions. While we frame this sociologically, scholars in other disciplines may confront what we call the "state-of-the-art fallacy", the belief that the tool computer scientists deem to be the best will work across topics, domains, and questions.
Publisher OA PDF DOI
Policing the boundaries of Blackness: How Black and White Americans evaluate racial self-identifications
2025-12-11
articleOpen access
How do people assess the authenticity and legitimacy of another person’s racial self-identification? This study explores the racial conceptions held by both Black and White Americans as they decide who they believe can–and cannot–self-identify as Black across a range of contexts. Further, we examine how a person’s responses compare to their perceptions of how other Americans evaluate racial claims. Using a series of survey experiments, we find that respondents privilege the information contained in genetic ancestry tests over and above other attributes, such as self-identification. We do not observe meaningful differences by race in the treatment effects, illustrating the shared nature of these schemas. However, we find a discordance between respondents’ beliefs and their perceptions of how other Americans would respond in similar settings, suggesting that the attributes that people themselves use in both classifications and to judge authenticity differ from their perceptions of the broader social ‘rules’ regarding race.
Publisher DOI
Generative AI Meets Open-Ended Survey Responses: Research Participant Use of AI and Homogenization
2025-03-18 · 1 citations
preprintOpen accessSenior author
The growing popularity of generative AI tools presents new challenges for data quality in online surveys and experiments. This study examines participants’ use of large language models to answer open-ended survey questions and describes empirical tendencies in human vs LLM-generated text responses. In an original survey of participants recruited from a popular online platform for sourcing social science research subjects, 34% reported using LLMs to help them answer open-ended survey questions. Simulations comparing human-written responses from three pre-ChatGPT studies with LLM-generated text reveal that LLM responses are more homogeneous and positive, particularly when they describe social groups in sensitive questions. These homogenization patterns may mask important underlying social variation in attitudes and beliefs among human subjects, raising concerns about data validity. Our findings shed light on the scope and potential consequences of participants’ LLM use in online research.
Publisher OA PDF DOI
ChitterChatter: Curriculum-Aligned AI Speaking Partners for Language Learning Classrooms
2025-07-17
article
Despite the importance of speaking practice in language learning, most students struggle to find low-stakes opportunities for authentic oral communication. ChitterChatter addresses this challenge by providing an AI-powered tool that enables instructors to create curriculum-aligned, voice-enabled conversation activities for students. Built on OpenAI's Realtime API and designed through iterative feedback from language education experts, ChitterChatter offers personalized, adaptive speaking practice while maintaining a judgment-free environment that promotes student comfort and confidence. Our pilot study with university-level Spanish learners shows that students value the platform's ability to provide authentic conversation practice without fear of judgment, although barriers to adoption remain. This paper presents ChitterChatter's design, preliminary evaluation results, and future directions for enhancing the system. Our findings demonstrate the potential of AI conversation partners to support classroom language instruction by increasing both the quantity and quality of speaking practice opportunities for students.
Publisher DOI
Poor Alignment and Steerability of Large Language Models: Evidence from College Admission Essays
arXiv (Cornell University) · 2025-03-25 · 1 citations
preprintOpen access
People are increasingly using technologies equipped with large language models (LLM) to write texts for formal communication, which raises two important questions at the intersection of technology and society: Who do LLMs write like (model alignment); and can LLMs be prompted to change who they write like (model steerability). We investigate these questions in the high-stakes context of undergraduate admissions at a selective university by comparing lexical and sentence variation between essays written by 30,000 applicants to two types of LLM-generated essays: one prompted with only the essay question used by the human applicants; and another with additional demographic information about each applicant. We consistently find that both types of LLM-generated essays are linguistically distinct from human-authored essays, regardless of the specific model and analytical approach. Further, prompting a specific sociodemographic identity is remarkably ineffective in aligning the model with the linguistic patterns observed in human writing from this identity group. This holds along the key dimensions of sex, race, first-generation status, and geographic location. The demographically prompted and unprompted synthetic texts were also more similar to each other than to the human text, meaning that prompting did not alleviate homogenization. These issues of model alignment and steerability in current LLMs raise concerns about the use of LLMs in high-stakes contexts.
Publisher OA PDF DOI
Linguistic Affordances Framework: A linguistic-sociological approach for the social study of language technology
2025-01-01
preprintOpen accessSenior author
*** Forthcoming in Social Science Computer Review ***This paper describes a three-part framework to study how language technologies elucidate and shape linguistic relations in society. Reframing a mountain of evidence about language bias in LLMs, we introduce the concept of linguistic affordances to attend to how an object can shape social relations through language. First, we contextualize how language ideologies inform social relations in a particular setting. Next, we examine how language ideologies shape the construction of the linguistic affordances of a language technology. Finally, we examine how the linguistic affordances of language technologies lead to new associations that link language and social worth. We describe how this framework can inform both the study of language technologies and the use of language technologies in social science. We demonstrate the framework with two examples: the use of LLMs in college admissions and the adoption of LLMs in scientific publishing.
Publisher OA PDF DOI
The Death of the Author, Reconsidered: Spatial and Demographic Constraints on College Admissions Essay Writing
2025-08-23
articleOpen access1st authorCorresponding
Computational text analysis has grown in popularity among social scientists due to the massive influx of digitized data available to study. However, much of this research disconnects patterns observed in text from information about the original authors. Eliding authorship considerations from sociological analysis of text can potentially lead to claims and assertions of trends that are independent from the social actors, conditions, interactions, and contexts which the text was produced. While text analysis without authorship information can yield reasonable inferences about society, complementing that approach with research that explicitly considers the people producing the text could expand the theoretical and empirical scope of work in this area. In this paper, we adapt perspectives from sociolinguistics and explicitly consider categorical identity markers of authors and geography as foundational axes of variation in textual data. We explore these dimensions in a large corpus of college admissions essays (n = 254,820 essays submitted by 83,538 applicants) and metadata about applicant identity, including the ZIP code of their high school. After generating features of the essays using computational methods, we find that author identity markers, such as gender, parental education, and socioeconomic status are highly salient. We also find that ZIP code level socioeconomic measures are extremely correlated with the writing style and content of local applicants. We also find that individuals whose personal identities are spatially unique–that is, demographically different from others in their immediate content–were most likely to be misclassified by our models, indicating that writing is influenced both socially and spatially. This work clarifies how authorship characteristics, like identity and spatial context, constrain the breadth of what we write and how we write by showing strong alignment between text and authors that is observable through machine reading of text.
Publisher OA PDF DOI

Frequent coauthors

Anthony Lising Antonio
17 shared
Sonia Giebel
WZB Berlin Social Science Center
14 shared
Ben Gebre-Medhin
Mount Holyoke College
11 shared
Mitchell L. Stevens
Stanford University
8 shared
Benjamin W. Domingue
8 shared
Rebecca Pattichis
6 shared
Courtney Peña
Stanford University
6 shared
Leslie Patricia Luqueño
Stanford University
5 shared

Education

PhD, Sociology of Education; Education Data Science
Stanford University
2022
MS, Statistics
Stanford University
2021
MS, Foreign Language Education
Florida International University
2016
BA, English
University of Miami
2012

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with AJ Alvero

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you