Larry Birnbaum

· Professor of Computer Science

Northwestern University · Chemical Engineering

Active 1991–2026

h-index16

Citations797

Papers778 last 5y

Funding$504k

Faculty page

OpenAlex

See your match with Larry Birnbaum — sign in to PhdFit.Sign in

About

Larry Birnbaum is a Professor of Computer Science at Northwestern University, affiliated with the Master of Science in Artificial Intelligence program. His research and teaching focus on applied artificial intelligence and human-AI collaboration. He and his students develop, study, and apply new technologies in natural language processing (NLP), conversational interfaces, intelligent information systems, social media data analytics, machine learning, and computational journalism and media. His key areas of research include methods for the automatic generation of content by machine, specifically the automatic generation of narratives from data, and natural human-AI collaboration via conversational interaction. Birnbaum's work also spans intelligent information systems, including models of automatic and contextual search, information diversity, preference prediction, and recommendation using social media data. His research contributes to applications of AI in journalism and media.

Research topics

Artificial Intelligence
Computer Science
Natural Language Processing
Information Retrieval
Machine Learning
Engineering
Mathematics
Psychology
Human–computer interaction

Selected publications

Human, AI, and Hybrid Ensembles for Detection of Adaptive, RL-based Social Bots
arXiv (Cornell University) · 2026-03-25
preprintOpen access
The use of reinforcement learning to dynamically adapt and evade detection is now well-documented in several cybersecurity settings including Covert Social Influence Operations (CSIOs), in which bots try to spread disinformation. While AI bot detectors have improved greatly, they are largely limited to detecting static bots that do not adapt dynamically. We present the first systematic study comparing the ability of humans, AI models, and hybrid Human-AI ensembles in detecting adaptive bots powered by reinforcement learning. Using data from a controlled, IRB-approved, five-day experiment with participants interacting on a social media platform infiltrated by RL-trained bots spreading disinformation to influence participants on 4 topics, we examine factors potentially shaping human detection capabilities: demographic characteristics, temporal learning effects, social network position, engagement patterns, and collective intelligence mechanisms. We first test 13 hypotheses comparing human bot detection performance against state-of-the-art AI approaches utilizing both traditional machine learning and large language models. We further investigate several aggregation strategies that combine human reports of bots with AI predictions, as well as retraining protocols that leverage human supervision. Our findings challenge intuitive assumptions about bot detection, reveal unexpected patterns in how humans identify bots, and show that combining human bot reports with AI predictions outperforms humans alone and AI alone. We conclude with a discussion of the practical implications of these results for industry.
Publisher DOI
Human, AI, and Hybrid Ensembles for Detection of Adaptive, RL-based Social Bots
arXiv (Cornell University) · 2026-03-25
articleOpen access
The use of reinforcement learning to dynamically adapt and evade detection is now well-documented in several cybersecurity settings including Covert Social Influence Operations (CSIOs), in which bots try to spread disinformation. While AI bot detectors have improved greatly, they are largely limited to detecting static bots that do not adapt dynamically. We present the first systematic study comparing the ability of humans, AI models, and hybrid Human-AI ensembles in detecting adaptive bots powered by reinforcement learning. Using data from a controlled, IRB-approved, five-day experiment with participants interacting on a social media platform infiltrated by RL-trained bots spreading disinformation to influence participants on 4 topics, we examine factors potentially shaping human detection capabilities: demographic characteristics, temporal learning effects, social network position, engagement patterns, and collective intelligence mechanisms. We first test 13 hypotheses comparing human bot detection performance against state-of-the-art AI approaches utilizing both traditional machine learning and large language models. We further investigate several aggregation strategies that combine human reports of bots with AI predictions, as well as retraining protocols that leverage human supervision. Our findings challenge intuitive assumptions about bot detection, reveal unexpected patterns in how humans identify bots, and show that combining human bot reports with AI predictions outperforms humans alone and AI alone. We conclude with a discussion of the practical implications of these results for industry.
Publisher OA PDF
The Impact of Strategic Communication in Coopetitive Multiagent Settings
IEEE Transactions on Computational Social Systems · 2025-01-06 · 1 citations
article
We consider behavior of agents in a long-term multiagent coopetitive setting in which agents vary their cooperative and competitive stances over time. Using the game of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Diplomacy</i> as a testbed, we study how successful agents vary their coopetitive behavior, developing a new “style of play” (SoP) characterization of player behavior. We assess five novel SoP hypotheses about successful behavior. We propose two algorithms to automatically compute an agent’s SoP vector and describe the important factors in this computation. As an agent’s SoP depends on the game state and its perception of threat, we develop a novel “means, motive, and opportunity” (MMO) model of threat and show that we can predict threats effectively using this model. We provide novel insights into how agents should behave to more successfully achieve their goals in long-term coopetitive settings.
Publisher DOI
Selecting Interlacing Committees
ArXiv.org · 2025-09-02
preprintOpen access
Polarization is a major concern for a well-functioning society. Often, mass polarization of a society is driven by polarizing political representation, even when the latter is easily preventable. The existing computational social choice methods for the task of committee selection are not designed to address this issue. We enrich the standard approach to committee selection by defining two quantitative measures that evaluate how well a given committee interconnects the voters. Maximizing these measures aims at avoiding polarizing committees. While the corresponding maximization problems are NP-complete in general, we obtain efficient algorithms for profiles in the voter-candidate interval domain. Moreover, we analyze the compatibility of our goals with other representation objectives, such as excellence, diversity, and proportionality. We identify trade-offs between approximation guarantees, and describe algorithms that achieve simultaneous constant-factor approximations.
Publisher OA PDF DOI
Jupybara: Operationalizing a Design Space for Actionable Data Analysis and Storytelling with LLMs
ArXiv.org · 2025-01-28 · 1 citations
preprintOpen access
Mining and conveying actionable insights from complex data is a key challenge of exploratory data analysis (EDA) and storytelling. To address this challenge, we present a design space for actionable EDA and storytelling. Synthesizing theory and expert interviews, we highlight how semantic precision, rhetorical persuasion, and pragmatic relevance underpin effective EDA and storytelling. We also show how this design space subsumes common challenges in actionable EDA and storytelling, such as identifying appropriate analytical strategies and leveraging relevant domain knowledge. Building on the potential of LLMs to generate coherent narratives with commonsense reasoning, we contribute Jupybara, an AI-enabled assistant for actionable EDA and storytelling implemented as a Jupyter Notebook extension. Jupybara employs two strategies -- design-space-aware prompting and multi-agent architectures -- to operationalize our design space. An expert evaluation confirms Jupybara's usability, steerability, explainability, and reparability, as well as the effectiveness of our strategies in operationalizing the design space framework with LLMs.
Publisher OA PDF DOI
Selecting Interlacing Committees
2025-05-28
article
Polarization is a major concern for a well-functioning society. Often, mass polarization of a society is driven by polarizing political representation, even when the latter is easily preventable. The existing computational social choice methods for the task of committee selection are not designed to address this issue. We enrich the standard approach to committee selection by defining two quantitative measures that evaluate how well a given committee interconnects the voters. Maximizing these measures aims at avoiding polarizing committees. While the corresponding maximization problems are NP-complete in general, we obtain efficient algorithms for profiles in the voter-candidate interval domain. Moreover, we analyze the compatibility of our goals with other representation objectives, such as excellence, diversity, and proportionality. We identify trade-offs between approximation guarantees, and describe algorithms that achieve simultaneous constant-factor approximations.
Publisher DOI
Jupybara: Operationalizing a Design Space for Actionable Data Analysis and Storytelling with LLMs
2025-04-25 · 8 citations
articleOpen access
Mining and conveying actionable insights from complex data is a key challenge of exploratory data analysis (EDA) and storytelling.To address this challenge, we present a design space for actionable EDA and storytelling.Synthesizing theory and expert interviews, we highlight how semantic precision, rhetorical persuasion, and pragmatic relevance underpin effective EDA and storytelling.We also show how this design space subsumes common challenges in actionable EDA and storytelling, such as identifying appropriate analytical strategies and leveraging relevant domain knowledge.Building on the potential of LLMs to generate coherent narratives with commonsense reasoning, we contribute Jupybara, an AI-enabled assistant for actionable EDA and storytelling implemented as a Jupyter Notebook extension.Jupybara employs two strategiesdesign-space-aware prompting and multi-agent architectures-to operationalize our design space.An expert evaluation confirms Jupybara's usability, steerability, explainability, and reparability, as well as the effectiveness of our strategies in operationalizing the design space framework with LLMs.
Publisher OA PDF DOI
Can Nuanced Language Lead to More Actionable Insights? Exploring the Role of Generative AI in Analytical Narrative Structure
arXiv (Cornell University) · 2024-05-04
preprintOpen accessSenior author
Relevant language describing trends in data can be useful for generating summaries to help with readers' takeaways. However, the language employed in these often template-generated summaries tends to be simple, ranging from describing simple statistical information (e.g., extrema and trends) without additional context and richer language to provide actionable insights. Recent advances in Large Language Models (LLMs) have shown promising capabilities in capturing subtle nuances in language when describing information. This workshop paper specifically explores how LLMs can provide more actionable insights when describing trends by focusing on three dimensions of analytical narrative structure: semantic, rhetorical, and pragmatic. Building on prior research that examines visual and linguistic signatures for univariate line charts, we examine how LLMs can further leverage the semantic dimension of analytical narratives using quantified semantics to describe shapes in trends as people intuitively view them. These semantic descriptions help convey insights in a way that leads to a pragmatic outcome, i.e., a call to action, persuasion, warning vs. alert, and situational awareness. Finally, we identify rhetorical implications for how well these generated narratives align with the perceived shape of the data, thereby empowering users to make informed decisions and take meaningful actions based on these data insights.
Publisher OA PDF DOI
MARG: Multi-Agent Review Generation for Scientific Papers
ArXiv.org · 2024-01-08 · 8 citations
preprintOpen access
We study the ability of LLMs to generate feedback for scientific papers and develop MARG, a feedback generation approach using multiple LLM instances that engage in internal discussion. By distributing paper text across agents, MARG can consume the full text of papers beyond the input length limitations of the base LLM, and by specializing agents and incorporating sub-tasks tailored to different comment types (experiments, clarity, impact) it improves the helpfulness and specificity of feedback. In a user study, baseline methods using GPT-4 were rated as producing generic or very generic comments more than half the time, and only 1.7 comments per paper were rated as good overall in the best baseline. Our system substantially improves the ability of GPT-4 to generate specific and helpful feedback, reducing the rate of generic comments from 60% to 29% and generating 3.7 good comments per paper (a 2.2x improvement).
Publisher OA PDF DOI
Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models
2022 · 5 citations
Senior authorCorresponding
- Computer Science
- Computer Science
- Artificial Intelligence
How to usefully encode compositional task structure has long been a core challenge in AI. Recent work in chain of thought prompting has shown that for very large neural language models (LMs), explicitly demonstrating the inferential steps involved in a target task may improve performance over end-to-end learning that focuses on the target task alone. However, chain of thought prompting has significant limitations due to its dependency on huge pretrained LMs. In this work, we present compositional fine-tuning (CFT): an approach based on explicitly decomposing a target task into component tasks, and then fine-tuning smaller LMs on a curriculum of such component tasks. We apply CFT to recommendation tasks in two domains, world travel and local dining, as well as a previously studied inferential task (sports understanding). We show that CFT outperforms end-to-end learning even with equal amounts of data, and gets consistently better as more component tasks are modeled via fine-tuning. Compared with chain of thought prompting, CFT performs at least as well using LMs only 7.4% of the size, and is moreover applicable to task domains for which data are not available during pretraining.
Publisher OA PDF DOI

Recent grants

III: Small: An Architecture and Platform for Frictionless Information Systems
NSF · $504k · 2009–2014

Frequent coauthors

Kristian J. Hammond
23 shared
Doug Downey
12 shared
Jacob D. Herbst
Baum Consult
8 shared
Francisco Iacobelli
Loyola University Chicago
7 shared
Victor S. Bursztyn
7 shared
Ray Bareiss
6 shared
Christopher Johnson
Newcastle University
6 shared
Jiahui Liu
5 shared

Awards & honors

Best paper award (2013)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Larry Birnbaum

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you