
Diyi Yang
· Computational Social Science, Natural Language ProcessingVerifiedStanford University · Learning, Design, and Technology
Active 2010–2026
About
Diyi Yang is an assistant professor in the School of Interactive Computing at Georgia Tech. She is broadly interested in Computational Social Science and Natural Language Processing, with the goal of modeling human communication in social context and building socially-aware intelligent systems to support social interaction at scale. Diyi received her PhD from the Language Technologies Institute at Carnegie Mellon University, and her bachelor's degree from Shanghai Jiao Tong University, China. Her work has been published at leading NLP and HCI conferences, and has resulted in multiple award nominations from EMNLP 2015, ICWSM 2016, SIGCHI 2019, and CSCW 2020.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Natural Language Processing
- Data science
- Psychology
- Theoretical physics
- Geology
- Economics
- Developmental psychology
- Software engineering
- Algorithm
- Econometrics
- Physics
- Philosophy
Selected publications
Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing Tools with Educators in Hawai'i
2026-04-13 · 1 citations
articleOpen accessSenior authorAlthough generative AI is being deployed into classrooms with promises of aiding teachers, educators caution that these tools can have unintended pedagogical repercussions, including cultural misrepresentation and bias. These concerns are heightened in low-resource language and Indigenous education settings, where AI systems frequently underperform. We investigate these challenges in Hawai‘i, where public schools operate under a statewide mandate to integrate Hawaiian language and culture into education. Through four co-design workshops with 22 public school educators, we surfaced concerns about using generative AI in educational settings, particularly around cultural misrepresentation, and corresponding designs for auditing tools that address these issues. We find that educators envision tools grounded in specific Hawaiian cultural values and practices, such as tracing the genealogy of knowledge in source materials. Building on these insights, we conceptualize AI auditing as a community-oriented process rather than the work of isolated individuals, and discuss implications for designing auditing tools.
Verbalizing LLMs' Assumptions About the User to Calibrate Expectations and Reduce Sycophancy
2026-04-13
articleOpen accessConversation often requires inferring the speaker’s underlying goal rather than taking statements at face value. For instance, asking “does my outfit look OK?” may seek reassurance, not objective assessment. When people interact with large language models (LLMs), LLM outputs similarly reflect inferences about users’ intentions, though these inferences are currently opaque. We introduce Verbalized Assumptions, a framework for revealing an LLM’s implicit assumptions. We demonstrate this approach’s utility in three case studies on advice-seeking settings, where verbalized assumptions help understand and address LLM sycophancy, i.e., LLMs excessively affirming and validating users. First, we empirically show a mismatch of expectations: on queries that are typically validation-seeking in human conversational contexts, people expect LLMs, by contrast, to provide objective information, yet LLMs assume that users are seeking validation. Second, we link sycophancy to LLMs overwhelmingly assuming that users are validation-seeking. Finally, we demonstrate that verbalized assumptions enable probe-based steering to reduce sycophancy.
Mapping the Spiral of Silence: Surveying Unspoken Opinions in Online Communities
2026-04-13
preprintOpen accessWe often treat social media as a lens onto society. How might that lens distort the popularity of political and social viewpoints? We examine discrepancies between publicly posted and privately surveyed opinions within communities, contributing a measurement of the “spiral of silence” theory; the theory posits people are less likely to voice opinions when they believe they hold minority views, creating a reinforcing cycle where these opinions are expressed less. We surveyed members of politically-oriented Reddit communities about their willingness to post on contentious topics, yielding 439 responses across twelve subreddits. 72.1% of participants who perceive themselves in the minority remain silent and are half as likely to post compared to those who believe their opinion is in the majority. Community design factors, such as perceived diversity, are associated with less self-silencing. We provide recommendations for counteracting self-silencing at the community level (e.g., positive reinforcement, more transparent moderation). Overall, these results reveal gaps between online discourse and broader public opinion.
Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards
ArXiv.org · 2025-07-03
preprintOpen accessCompound AI systems integrating multiple components, such as Large Language Models, specialized tools, and traditional machine learning models, are increasingly deployed to solve complex real-world tasks. However, optimizing compound systems remains challenging due to their non-differentiable structures and diverse configuration types across components, including prompts, hyperparameters, and model parameters. To address this challenge, we propose Optimas, a unified framework for effective optimization of compound systems. The core idea of Optimas is to maintain one Local Reward Function (LRF) per component, each satisfying a local-global alignment property, i.e., each component's local reward correlates with the global system performance. In each iteration, Optimas efficiently adapts the LRFs to maintain this property while simultaneously maximizing each component's local reward. This approach enables independent updates of heterogeneous configurations using the designated optimization method, while ensuring that local improvements consistently lead to performance gains. We present extensive evaluations across five real-world compound systems to demonstrate that Optimas outperforms strong baselines by an average improvement of 11.92%, offering a general and effective approach for improving compound systems. Our website is at https://optimas.stanford.edu.
The Practice of Online Peer Counseling and the Potential for AI-Powered Support Tools
Proceedings of the ACM on Human-Computer Interaction · 2025-05-02 · 5 citations
articleSenior authorWhat challenges do volunteers providing peer support in online mental health platforms (OMHPs) face in operating and growing their communities? How could the HCI community develop human-AI systems to help? Recent work on online peer counseling has led to the development of novel AI tools for conversational interaction, but it remains unknown how such technology can fit into broader practices that include extratherapeutic tasks. In this research, we conducted interviews and design exercises with seventeen peer counselors from 7 Cups of Tea, a large online therapy and counseling platform, to design tools --- AI or not --- that resolve challenges that arise from day-to-day community practices. Participant responses suggest three classes of tools that could improve online peer counseling: real-time decision support, productivity, and management and training. Investigation of design motivations surfaced four practice-based challenges including chat interface limitations, difficulties in support seeker management, fragmented contexts of practice, and lack of visibility due to privacy concerns. Based on counselors' discussion of benefits and risks associated with AI features in the tools they designed, we offer suggestions for research on AI tools embedded within peer counseling practices, and connect our findings with broader implications about online peer counseling as a form of volunteer-based mental health practice.
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
ArXiv.org · 2025-05-17
preprintOpen accessInterpretability research now offers a variety of techniques for identifying abstract internal mechanisms in neural networks. Can such techniques be used to predict how models will behave on out-of-distribution examples? In this work, we provide a positive answer to this question. Through a diverse set of language modeling tasks--including symbol manipulation, knowledge retrieval, and instruction following--we show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior. Specifically, we propose two methods that leverage causal mechanisms to predict the correctness of model outputs: counterfactual simulation (checking whether key causal variables are realized) and value probing (using the values of those variables to make predictions). Both achieve high AUC-ROC in distribution and outperform methods that rely on causal-agnostic features in out-of-distribution settings, where predicting model behaviors is more crucial. Our work thus highlights a novel and significant application for internal causal analysis of language models.
Understanding #vent channels on Discord
First Monday · 2025-08-17 · 1 citations
articleOpen accessVent channels on Discord, which are chat channels developed for people to express frustration, can become an informal type of peer support system. This paper is a qualitative study of experiences with vent channels on Discord, examining the experiences of 13 participants through semi-structured interviews. We found that participants were able to meet their needs for social support via vent channels by receiving commiseration, advice, and validation from the responses of others. At the same time, vent channels could lead to frustration when participants have conflicting expectations for their interactions. We suggest ways that Discord or Discord server moderators can provide enhanced structure, clarity, and transparency in order to enable participants to have better experiences in vent channels.
Knoll: Creating a Knowledge Ecosystem for Large Language Models
ArXiv.org · 2025-05-25
preprintOpen accessLarge language models are designed to encode general purpose knowledge about the world from Internet data. Yet, a wealth of information falls outside this scope -- ranging from personal preferences to organizational policies, from community-specific advice to up-to-date news -- that users want models to access but remains unavailable. In this paper, we propose a knowledge ecosystem in which end-users can create, curate, and configure custom knowledge modules that are utilized by language models, such as ChatGPT and Claude. To support this vision, we introduce Knoll, a software infrastructure that allows users to make modules by clipping content from the web or authoring shared documents on Google Docs and GitHub, add modules that others have made, and rely on the system to insert relevant knowledge when interacting with an LLM. We conduct a public deployment of Knoll reaching over 200 users who employed the system for a diverse set of tasks including personalized recommendations, advice-seeking, and writing assistance. In our evaluation, we validate that using Knoll improves the quality of generated responses.
AudioJudge: Understanding What Works in Large Audio Model Based Speech Evaluation
ArXiv.org · 2025-07-17
preprintOpen accessSenior authorCurrent speech evaluation suffers from two critical limitations: the need and difficulty of designing specialized systems targeting individual audio characteristics, and poor correlation between automatic evaluation methods and human preferences. This work presents a systematic study of Large Audio Model (LAM) as a Judge, AudioJudge, investigating whether it can provide a unified evaluation framework that addresses both challenges. We systematically explore AudioJudge across audio characteristic detection tasks, including pronunciation, speaking rate, speaker identification and speech quality, and system-level human preference simulation for automated benchmarking. We investigate different prompt engineering strategies, finding that audio concatenation combined with in-context learning significantly improves performance across both audio characteristic detection and human preference simulation tasks. We further introduce a multi-aspect ensemble AudioJudge to enable general-purpose multi-aspect audio evaluation. This method decomposes speech assessment into specialized judges for lexical content, speech quality, and paralinguistic features, achieving up to 0.91 Spearman correlation with human preferences on our system ranking benchmark. Robustness analysis reveals that while LAMs maintain strong performance under acoustic noise, they exhibit significant verbosity and positional biases that require careful mitigation.
Quantifying large language model usage in scientific papers
Nature Human Behaviour · 2025-08-04 · 30 citations
article
Frequent coauthors
- 60 shared
William A. Held
- 58 shared
Caleb Ziems
- 27 shared
Jiaao Chen
- 22 shared
Carolyn Penstein Rosé
- 21 shared
Robert E. Kraut
- 20 shared
Yanchen Liu
Shanghai Maritime University
- 19 shared
Omar Shaikh
- 18 shared
Jiaao Chen
Labs
Stanford Natural Language Processing GroupPI
Performing groundbreaking Natural Language Processing research since 1999.
Education
Ph.D.
Language Technologies Institute at Carnegie Mellon University
B.S.
Shanghai Jiao Tong University, China
Awards & honors
- multiple award nominations from EMNLP 2015, ICWSM 2016, SIGC…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Diyi Yang
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup