
Steven Bethard
· Assistant Professor, School of InformationVerifiedUniversity of Arizona · Computer Science
Active 2002–2025
About
Steven Bethard is a professor whose research focuses on natural language processing, information extraction, and related areas within computer science. His work involves developing algorithms and models to improve the extraction and understanding of structured information from unstructured text, leveraging large language models and transformer-based architectures. Bethard's academic background includes supervising a diverse group of students across undergraduate, master's, doctoral, and post-doctoral levels, with dissertations and theses addressing topics such as structured information extraction, geocoding, linguistic knowledge probing, bias detection, and neural network algorithms for ontology-informed information extraction. His contributions aim to advance the capabilities of machine understanding of language and enhance applications in complex reasoning, data quality, and information retrieval.
Research topics
- Artificial Intelligence
- Computer Science
- Natural Language Processing
- Machine Learning
- Data Mining
- World Wide Web
- Speech recognition
- Algorithm
- Psychology
- Statistics
- Mathematics
- Programming language
- Engineering
Selected publications
A Semantic Parsing Framework for End-to-End Time Normalization
ArXiv.org · 2025-07-08
preprintOpen accessSenior authorTime normalization is the task of converting natural language temporal expressions into machine-readable representations. It underpins many downstream applications in information retrieval, question answering, and clinical decision-making. Traditional systems based on the ISO-TimeML schema limit expressivity and struggle with complex constructs such as compositional, event-relative, and multi-span time expressions. In this work, we introduce a novel formulation of time normalization as a code generation task grounded in the SCATE framework, which defines temporal semantics through symbolic and compositional operators. We implement a fully executable SCATE Python library and demonstrate that large language models (LLMs) can generate executable SCATE code. Leveraging this capability, we develop an automatic data augmentation pipeline using LLMs to synthesize large-scale annotated data with code-level validation. Our experiments show that small, locally deployable models trained on this augmented data can achieve strong performance, outperforming even their LLM parents and enabling practical, accurate, and interpretable time normalization.
2025-05-20
peer-review2025-03-03
peer-reviewApplying Transformer Architectures to Detect Cynical Comments in Spanish Social Media
2025-01-01
articleOpen accessDetecting cynical comments in online communication poses a significant challenge in humancomputer interaction, especially given the massive proliferation of discussions on platforms like YouTube.These comments often include offensive or disruptive patterns, such as sarcasm, negative feelings, specific reasons, and an attitude of being right.To address this problem, we present a web platform for the Spanish language that has been developed and leverages natural language processing and machine learning techniques.The platform detects comments and provides valuable information to users by focusing on analyzing comments.The core models are based on pre-trained architectures, including BETO, SpanBERTa, Multilingual BERT, RoBERTuito, and BERT, enabling robust detection of cynical comments.Our platform was trained and tested with Spanish comments from car analysis channels on YouTube.The results show that models achieve performance above 0.8 F1 for all types of cynical comments in the text classification task but achieve lower performance (around 0.6-0.7 F1) for the more arduous token classification task.
Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information
SSRN Electronic Journal · 2025-01-01 · 1 citations
preprintOpen accessEnvironmental Research Letters · 2025-05-30
articleOpen accessAbstract A citizen’s right to comment on, and criticize, government decisions makes a difference. The U.S. National Environmental Policy Act of 1969 (NEPA) institutionalized public engagement in environmental review in the belief it would lead to better decisions and more sustainable outcomes. But, 50 years later, NEPA’s public comment process has been criticized as costly and slow, while doing little to change outcomes. Data science now makes it possible to track progress and evaluate the influence of public participation. We examined 108 environmental impact statement (EIS) processes spanning 22 years. Our analysis revealed that public comments resulted in substantive decision alterations in 62% of cases, with 64% showing modifications to alternatives, 42% showing modifications to mitigation plans and 11% leading to the selection of an entirely new preferred alternative. When federal agencies changed project alternatives (78 EISs), 88% of the time (69 of the 78 EISs) they credited public comments as the reason. In 45 of the 108 EISs, agencies modified mitigation plans and credited public comments as the reason 100% of the time. Agencies only occasionally selected a new preferred alternative (21 out of 104 EISs), but when they did, they credited public comments as the reason 100% of the time. As the United States and the 190+ states and countries that have adopted NEPA’s example consider how to address environmental change, it is important to assess the role of public participation in environmental decision making. Our data say public comments matter.
Transformer-Based Temporal Information Extraction and Application: A Review
2025-01-01 · 2 citations
articleOpen accessSenior authorIdentifying task groupings for multi-task learning using pointwise V-usable information
Journal of Biomedical Informatics · 2025-07-16 · 2 citations
articleImproving Toponym Resolution by Predicting Attributes to Constrain Geographical Ontology Entries
2024-01-01 · 2 citations
articleOpen accessSenior authorZeyu Zhang, Egoitz Laparra, Steven Bethard. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2024.
2024-01-01 · 2 citations
articleOpen accessXin Su, Tiep Le, Steven Bethard, Phillip Howard. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024.
Recent grants
Extended Methods and Software Development for Health NLP
NIH · $5.0M · 2016–2025
Frequent coauthors
- 60 shared
James Pustejovsky
- 55 shared
Guergana Savova
Harvard University
- 43 shared
Leon Derczynski
- 43 shared
Marc Verhagen
Brandeis University
- 35 shared
Timothy A. Miller
- 25 shared
Wei-Te Chen
- 25 shared
Dmitriy Dligach
- 24 shared
Chen Lin
Shanghai Artificial Intelligence Laboratory
Labs
Not provided
Education
Ph.D., Computer Science and Cognitive Science
University of Colorado Boulder
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Steven Bethard
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup