David Jurgens
VerifiedUniversity of Michigan · Information
Active 1999–2025
Research topics
- Computer Science
- Artificial Intelligence
- Sociology
- Social Science
- Machine Learning
- Political Science
- Data science
- World Wide Web
- Geography
- Psychology
- Medicine
- Statistics
- Engineering
- Internet privacy
- Mathematics
Selected publications
Evaluation Framework for AI Systems in "the Wild"
ArXiv.org · 2025-04-23 · 1 citations
preprintOpen accessGenerative AI (GenAI) models have become vital across industries, yet current evaluation methods have not adapted to their widespread use. Traditional evaluations often rely on benchmarks and fixed datasets, frequently failing to reflect real-world performance, which creates a gap between lab-tested outcomes and practical applications. This white paper proposes a comprehensive framework for how we should evaluate real-world GenAI systems, emphasizing diverse, evolving inputs and holistic, dynamic, and ongoing assessment approaches. The paper offers guidance for practitioners on how to design evaluation methods that accurately reflect real-time capabilities, and provides policymakers with recommendations for crafting GenAI policies focused on societal impacts, rather than fixed performance numbers or parameter sizes. We advocate for holistic frameworks that integrate performance, fairness, and ethics and the use of continuous, outcome-oriented methods that combine human and automated assessments while also being transparent to foster trust among stakeholders. Implementing these strategies ensures GenAI models are not only technically proficient but also ethically responsible and impactful.
The Impact of Generative AI on Social Media: An Experimental Study
ArXiv.org · 2025-06-17 · 1 citations
preprintOpen accessGenerative Artificial Intelligence (AI) tools are increasingly deployed across social media platforms, yet their implications for user behavior and experience remain understudied, particularly regarding two critical dimensions: (1) how AI tools affect the behaviors of content producers in a social media context, and (2) how content generated with AI assistance is perceived by users. To fill this gap, we conduct a controlled experiment with a representative sample of 680 U.S. participants in a realistic social media environment. The participants are randomly assigned to small discussion groups, each consisting of five individuals in one of five distinct experimental conditions: a control group and four treatment groups, each employing a unique AI intervention-chat assistance, conversation starters, feedback on comment drafts, and reply suggestions. Our findings highlight a complex duality: some AI-tools increase user engagement and volume of generated content, but at the same time decrease the perceived quality and authenticity of discussion, and introduce a negative spill-over effect on conversations. Based on our findings, we propose four design principles and recommendations aimed at social media platforms, policymakers, and stakeholders: ensuring transparent disclosure of AI-generated content, designing tools with user-focused personalization, incorporating context-sensitivity to account for both topic and user intent, and prioritizing intuitive user interfaces. These principles aim to guide an ethical and effective integration of generative AI into social media.
Roles of Network and Identity in Hashtag Diffusion
2025-04-22 · 1 citations
articleOpen accessThe diffusion of culture online is theorized to be influenced by many interacting social factors (e.g., network and identity).However, most existing computational cascade models consider just a single factor (e.g., network or identity).This work offers a new framework for teasing apart the mechanisms underlying hashtag cascades.We curate a new dataset of 1,337 hashtags representing cultural innovation online, develop a 10-factor evaluation framework for comparing empirical and simulated cascades, and show that a combined network+identity model better simulates hashtag cascades than network-or identity-only counterfactuals.We also explore heterogeneity in performance: While a combined network+identity model best predicts the popularity of cascades, a network-only model best predicts cascade growth and an identity-only model best predicts adopter composition.The network+identity model has the highest comparative advantage among hashtags used for expressing racial or regional identity and talking about sports or news.In fact, we are able to predict what combination of network and/or identity best models each hashtag and use this to further improve performance.Our results show the utility of models incorporating the interactions of network, identity, and other social factors in the diffusion of hashtags in social media.
Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework
ArXiv.org · 2025-06-17 · 1 citations
preprintOpen accessSenior authorLarge language models (LLMs) are increasingly deployed in domains requiring moral understanding, yet their reasoning often remains shallow, and misaligned with human reasoning. Unlike humans, whose moral reasoning integrates contextual trade-offs, value systems, and ethical theories, LLMs often rely on surface patterns, leading to biased decisions in morally and ethically complex scenarios. To address this gap, we present a value-grounded framework for evaluating and distilling structured moral reasoning in LLMs. We benchmark 12 open-source models across four moral datasets using a taxonomy of prompts grounded in value systems, ethical theories, and cognitive reasoning strategies. Our evaluation is guided by four questions: (1) Does reasoning improve LLM decision-making over direct prompting? (2) Which types of value/ethical frameworks most effectively guide LLM reasoning? (3) Which cognitive reasoning strategies lead to better moral performance? (4) Can small-sized LLMs acquire moral competence through distillation? We find that prompting with explicit moral structure consistently improves accuracy and coherence, with first-principles reasoning and Schwartz's + care-ethics scaffolds yielding the strongest gains. Furthermore, our supervised distillation approach transfers moral competence from large to small models without additional inference cost. Together, our results offer a scalable path toward interpretable and value-grounded models.
The Muddy Waters of Modeling Empathy in Language: The Practical Impacts of Theoretical Constructs
ArXiv.org · 2025-01-24
preprintOpen accessConceptual operationalizations of empathy in NLP are varied, with some having specific behaviors and properties, while others are more abstract. How these variations relate to one another and capture properties of empathy observable in text remains unclear. To provide insight into this, we analyze the transfer performance of empathy models adapted to empathy tasks with different theoretical groundings. We study (1) the dimensionality of empathy definitions, (2) the correspondence between the defined dimensions and measured/observed properties, and (3) the conduciveness of the data to represent them, finding they have a significant impact to performance compared to other transfer setting features. Characterizing the theoretical grounding of empathy tasks as direct, abstract, or adjacent further indicates that tasks that directly predict specified empathy components have higher transferability. Our work provides empirical evidence for the need for precise and multidimensional empathy operationalizations.
Unstructured Evidence Attribution for Long Context Query Focused Summarization
arXiv (Cornell University) · 2025-02-20
preprintOpen accessSenior authorLarge language models (LLMs) are capable of generating coherent summaries from very long contexts given a user query, and extracting and citing evidence spans helps improve the trustworthiness of these summaries. Whereas previous work has focused on evidence citation with fixed levels of granularity (e.g. sentence, paragraph, document, etc.), we propose to extract unstructured (i.e., spans of any length) evidence in order to acquire more relevant and consistent evidence than in the fixed granularity case. We show how existing systems struggle to copy and properly cite unstructured evidence, which also tends to be "lost-in-the-middle". To help models perform this task, we create the Summaries with Unstructured Evidence Text dataset (SUnsET), a synthetic dataset generated using a novel pipeline, which can be used as training supervision for unstructured evidence summarization. We demonstrate across 5 LLMs and 4 datasets spanning human written, synthetic, single, and multi-document settings that LLMs adapted with SUnsET generate more relevant and factually consistent evidence with their summaries, extract evidence from more diverse locations in their context, and can generate more relevant and consistent summaries than baselines with no fine-tuning and fixed granularity evidence. We release SUnsET and our generation code to the public.
The persuasive role of generic-you in online interactions
Scientific Reports · 2025-01-08 · 3 citations
articleOpen accessPersuasion plays a crucial role in human communication. Yet, convincing someone to change their mind is often challenging. Here, we demonstrate that a subtle linguistic device, generic-you (i.e., "you" that refers to people in general, e.g., "You win some, you lose some"), is associated with successfully shifting people's pre-existing views in a naturalistic context. Leveraging Large Language Models, we conducted a preregistered study using a large ([Formula: see text] = 204,120) online debate dataset. Every use of generic-you in an argument was associated with an up to 14% percent increase in the odds of successful persuasion. These findings underscore the need to distinguish between the specific and generic uses of "you" in large-scale linguistic analyses, an aspect that has been overlooked in the literature. The robust association between generic-you and persuasion persisted with the inclusion of various covariates, and above and beyond other pronouns (i.e., specific-you, I or we). However, these findings do not imply causality. In Supplementary Experiment 2, arguments with generic-you (vs. first-person singular pronouns, e.g., I) were rated as more persuasive by open-minded individuals. In Supplementary Experiment 3, generic-you (vs. specific-you) arguments did not differentially predict attitude change. We discuss explanations for these results, including differential mechanisms, boundary conditions, and the possibility that people intuitively draw on generic-you when expressing more persuasive ideas. Together, these findings add to a growing literature on the interpersonal implications of broadening one's perspective via a subtle shift in language, while motivating future research on contextual and individual differences that may moderate these effects.
2025-04-24 · 1 citations
articleMapping the Podcast Ecosystem with the Structured Podcast Research Corpus
2025-01-01 · 3 citations
articleOpen accessGeographical Disparities in Navigating Rejection in Science Drive Disparities in its File Drawer
Academy of Management Proceedings · 2025-07-01
articleScientific progress relies on making research contributions public, typically through journal publication, enabling others to build on them. However, publishing often requires overcoming one or more rejections. This study examines how scientists' differential responses to manuscript rejection shape both published knowledge and the “file drawer” of unpublished research. Analyzing 126K manuscripts rejected by 62 STEM journals published by the Institute of Physics Publishing, we document several new empirical facts. Controlling for manuscript quality (proxied by peer review recommendations) and comparing authors from Western and non-Western countries, we find that authors based in Western countries are 5.9% more likely to publish rejected manuscripts elsewhere, publish 25 days faster, revise 6% less, and change co-authors 11.6% less. Although exploratory surveys of rejected authors are inconclusive, our empirical analyses implicate geographic differences in access to procedural knowledge – how to interpret feedback, revise a manuscript, and resubmit elsewhere. Specifically, post-rejection outcomes are better for corresponding authors with more access to procedural knowledge, such as authors with prior publishing experience or Western co-authors. These findings imply that the “file drawer” contains a disproportionate number of ideas from non-Western countries, partly due to disparities in procedural knowledge.
Recent grants
NSF · $500k · 2020–2025
CRII: RI: Explainable Recognition of Social Relationships from People's Linguistic Interactions
NSF · $207k · 2019–2022
CAREER: Fostering Prosocial Behavior and Well-Being in Online Communities
NSF · $581k · 2022–2027
Frequent coauthors
- 31 shared
Roberto Navigli
Sapienza University of Rome
- 25 shared
Núria Bel
- 25 shared
Sara Mendes
- 25 shared
Silvia Necşulescu
- 25 shared
Scott A. Hale
University of Oxford
- 22 shared
Mattia Samory
- 22 shared
Przemyslaw A. Grabowicz
- 19 shared
Fabian Flöck
Labs
David JurgensPI
Education
- 2014
PhD, Computer Science
University of California Los Angeles
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with David Jurgens
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup