Brent Hecht

· Associate Professor of Computer ScienceVerified

Northwestern University · Chemical Engineering

Active 2005–2024

h-index37

Citations5.6k

Papers13738 last 5y

Funding$1.3M

Faculty page

See your match with Brent Hecht — sign in to PhdFit.Sign in

About

Brent Hecht is an Associate Professor of Computer Science and Communication Studies at Northwestern University. He holds a Ph.D. in computer science from Northwestern University, a Master's degree in geography from UC Santa Barbara, and dual Bachelor's degrees in computer science and geography from Macalester College. His research lies at the intersection of human–computer interaction, social computing, and spatial computing, with active areas including location-aware technologies and understanding and improving the interactions between algorithms and society. Dr. Hecht employs mixed methods approaches, emphasizing 'big data' projects, and has collaborated with prominent research institutions such as Google Research, Xerox PARC, and Microsoft Research. His work has been featured by various media outlets including NPR, Time Magazine, The Washington Post, and others. He has received awards for his research at top-tier venues like ACM CHI, ACM CSCW, ACM Mobile HCI, and COSIT, and has been a keynote speaker at events such as WikiSym.

Research topics

Computer Science
World Wide Web
Psychology
Internet privacy
Information Retrieval
Social Science
Engineering
Political Science
Sociology
Programming language
Communication
Software engineering
Mathematics
Medicine
Social psychology
Knowledge management
Applied psychology
Business
Data science
Economics
Microeconomics

Selected publications

Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models
2024-01-01 · 11 citations
articleOpen access
Ying-Chun Lin, Jennifer Neville, Jack Stokes, Longqi Yang, Tara Safavi, Mengting Wan, Scott Counts, Siddharth Suri, Reid Andersen, Xiaofeng Xu, Deepak Gupta, Sujay Kumar Jauhar, Xia Song, Georg Buscher, Saurabh Tiwary, Brent Hecht, Jaime Teevan. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024.
Publisher OA PDF DOI
A Canary in the AI Coal Mine: American Jews May Be Disproportionately Harmed by Intellectual Property Dispossession in Large Language Model Training
2024-05-11 · 4 citations
preprintOpen access
Systemic property dispossession from minority groups has often been carried out in the name of technological progress. In this paper, we identify evidence that the current paradigm of large language models (LLMs) likely continues this long history. Examining common LLM training datasets, we find that a disproportionate amount of content authored by Jewish Americans is used for training without their consent. The degree of over-representation ranges from around 2x to around 6.5x. Given that LLMs may substitute for the paid labor of those who produced their training data, they have the potential to cause even more substantial and disproportionate economic harm to Jewish Americans in the coming years. This paper focuses on Jewish Americans as a case study, but it is probable that other minority communities (e.g., Asian Americans, Hindu Americans) may be similarly affected and, most importantly, the results should likely be interpreted as a “canary in the coal mine” that highlights deep structural concerns about the current LLM paradigm whose harms could soon affect nearly everyone. We discuss the implications of these results for the policymakers thinking about how to regulate LLMs as well as for those in the AI field who are working to advance LLMs. Our findings stress the importance of working together towards alternative LLM paradigms that avoid both disparate impacts and widespread societal harms.
Publisher OA PDF DOI
Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models
arXiv (Cornell University) · 2024-03-19
preprintOpen access
Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featurized ML models or text embeddings fall short in extracting generalizable patterns and are hard to interpret. In this work, we show that LLMs can extract interpretable signals of user satisfaction from their natural language utterances more effectively than embedding-based approaches. Moreover, an LLM can be tailored for USE via an iterative prompting framework using supervision from labeled examples. The resulting method, Supervised Prompting for User satisfaction Rubrics (SPUR), not only has higher accuracy but is more interpretable as it scores user satisfaction via learned rubrics with a detailed breakdown.
Publisher OA PDF DOI
Targeted Training for Multi-organization Recommendation
ACM Transactions on Recommender Systems · 2023-06-03
article
Making recommendations for users in diverse organizations ( orgs ) is a challenging task for workplace social platforms such as Microsoft Teams and Slack. The current industry-standard model training approaches either use data from all organizations to maximize information or train organization-specific models to minimize noise. Our real-world experiments show that both approaches are poorly suited for the multi-org recommendation setting where different organizations’ interaction patterns vary in their generalizability. We introduce targeted training , which improves on standard practices by automatically selecting a subset of orgs for model development whose data are cleanest and best represent global trends. We demonstrate how and when targeted training improves over global training through theoretical analysis and simulation. Our experiments on large-scale datasets from Microsoft Teams, SharePoint, Stack Exchange, DBLP, and Reddit show that in many cases targeted training can improve mean average precision (MAP) across orgs by 10–15% over global training, is more robust to orgs with lower data quality, and generalizes better to unseen orgs. Our training framework is applicable to a wide range of inductive recommendation models, from simple regression models to graph neural networks (GNNs).
Publisher DOI
Climate mitigation potentials of teleworking are sensitive to changes in lifestyle and workplace rather than ICT usage
Proceedings of the National Academy of Sciences · 2023-09-18 · 33 citations
articleOpen access
The growth in remote and hybrid work catalyzed by the COVID-19 pandemic could have significant environmental implications. We assess the greenhouse gas emissions of this transition, considering factors including information and communication technology, commuting, noncommute travel, and office and residential energy use. We find that, in the United States, switching from working onsite to working from home can reduce up to 58% of work's carbon footprint, and the impacts of IT usage are negligible, while office energy use and noncommute travel impacts are important. Our study also suggests that achieving the environmental benefits of remote work requires proper setup of people's lifestyle, including their vehicle choice, travel behavior, and the configuration of home and work environment.
Publisher OA PDF DOI
Causal Effect Estimation under Interference on Hypergraphs
AI Matters · 2023-06-01
article
Hypergraphs offer a powerful abstraction for representing multi-way group interactions, allowing hyperedges to connect any number of nodes. In contrast to prevailing approaches that focus on capturing statistical dependencies, our research explores hypergraphs from a causal perspective. Specifically, we tackle the problem of estimating individual treatment effects (ITE) on hypergraphs, aiming to determine the causal impact of interventions (e.g., wearing face covering) on outcomes (e.g., COVID-19 infection) for each individual node. Existing ITE estimation methods either assume no interference between individuals or consider interference only among connected individuals in regular graphs. However, such assumptions may not hold in real-world hypergraphs. Recognizing this, we propose a novel causality learning framework HyperSCI by modeling high-order interference on hyper-graphs. Through extensive experiments on real-world hypergraphs, we validate the effectiveness of HyperSCI and highlight the potential of causal inference in hypergraphs with complex group interactions. 1
Publisher DOI
The Dimensions of Data Labor: A Road Map for Researchers, Activists, and Policymakers to Empower Data Producers
2023-06-12 · 30 citations
articleOpen accessSenior author
Many recent technological advances (e.g. ChatGPT and search engines) are possible only because of massive amounts of user-generated data produced through user interactions with computing systems or scraped from the web (e.g. behavior logs, user-generated content, and artwork). However, data producers have little say in what data is captured, how it is used, or who it benefits. Organizations with the ability to access and process this data, e.g. OpenAI and Google, possess immense power in shaping the technology landscape. By synthesizing related literature that reconceptualizes the production of data for computing as “data labor”, we outline opportunities for researchers, policymakers, and activists to empower data producers in their relationship with tech companies, e.g advocating for transparency about data reuse, creating feedback channels between data producers and companies, and potentially developing mechanisms to share data’s revenue more broadly. In doing so, we characterize data labor with six important dimensions - legibility, end-use awareness, collaboration requirement, openness, replaceability, and livelihood overlap - based on the parallels between data labor and various other types of labor in the computing literature.
Publisher OA PDF DOI
Learning Causal Effects on Hypergraphs (Extended Abstract)
2023-08-01 · 1 citations
articleOpen access
Hypergraphs provide an effective abstraction for modeling multi-way group interactions among nodes, where each hyperedge can connect any number of nodes. Different from most existing studies which leverage statistical dependencies, we study hypergraphs from the perspective of causality. Specifically, we focus on the problem of individual treatment effect (ITE) estimation on hypergraphs, aiming to estimate how much an intervention (e.g., wearing face covering) would causally affect an outcome (e.g., COVID-19 infection) of each individual node. Existing works on ITE estimation either assume that the outcome of one individual should not be influenced by the treatment of other individuals (i.e., no interference), or assume the interference only exists between connected individuals in an ordinary graph. We argue that these assumptions can be unrealistic on real-world hypergraphs, where higher-order interference can affect the ITE estimations due to group interactions. We investigate high-order interference modeling, and propose a new causality learning framework powered by hypergraph neural networks. Extensive experiments on real-world hypergraphs verify the superiority of our framework over existing baselines.
Publisher OA PDF DOI
Measuring the Monetary Value of Online Volunteer Work
Proceedings of the International AAAI Conference on Web and Social Media · 2022-05-31 · 30 citations
articleOpen access
Online volunteers are a crucial labor force that keeps many for-profit systems afloat (e.g. social media platforms and online review sites). Despite their substantial role in upholding highly valuable technological systems, online volunteers have no way of knowing the value of their work. This paper uses content moderation as a case study and measures its monetary value to make apparent volunteer labor’s value. Using a novel dataset of private logs generated by moderators, we use linear mixed-effect regression and estimate that Reddit moderators worked a minimum of 466 hours per day in 2020. These hours are worth 3.4 million USD based on the median hourly wage for comparable content moderation services in the U.S. We discuss how this information may inform pathways to alleviate the one-sided relationship between technology companies and online volunteers.
Publisher OA PDF DOI
Measuring the Monetary Value of Online Volunteer Work
arXiv (Cornell University) · 2022-05-28
preprintOpen access
Online volunteers are a crucial labor force that keeps many for-profit systems afloat (e.g. social media platforms and online review sites). Despite their substantial role in upholding highly valuable technological systems, online volunteers have no way of knowing the value of their work. This paper uses content moderation as a case study and measures its monetary value to make apparent volunteer labor's value. Using a novel dataset of private logs generated by moderators, we use linear mixed-effect regression and estimate that Reddit moderators worked a minimum of 466 hours per day in 2020. These hours amount to 3.4 million USD a year based on the median hourly wage for comparable content moderation services in the U.S. We discuss how this information may inform pathways to alleviate the one-sided relationship between technology companies and online volunteers.
Publisher OA PDF DOI

Recent grants

CAREER: Understanding and Addressing Geographic Inequalities in Location-Aware Technologies
NSF · $87k · 2016–2016
III: Small: Collaborative Research: Automatically Generating Contextually-Relevant Visualizations
NSF · $121k · 2014–2017
CHS: Small: Collaborative Research: Human-Centered Semantic Relatedness
NSF · $248k · 2015–2016
CHS: Small: Collaborative Research: Human-Centered Semantic Relatedness
NSF · $248k · 2016–2019
III: Small: Collaborative Research: Automatically Generating Contextually-Relevant Visualizations
NSF · $93k · 2016–2018

Frequent coauthors

Johannes Schöning
University of St. Gallen
40 shared
Nicholas Vincent
University of East Anglia
20 shared
Isaac Johnson
Wikimedia Foundation
19 shared
Loren Terveen
University of Minnesota
19 shared
Jacob Thebault-Spieker
University of Wisconsin–Madison
18 shared
Darren Gergle
Northwestern University
15 shared
Jaime Teevan
15 shared
Hanlin Li
The University of Texas at Austin
12 shared

Awards & honors

awards for his research at top-tier publication venues in hu…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Brent Hecht

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you