
Owen Rambow
· IACS Endowed ChairStony Brook University · Mathematics
Active 1988–2025
About
Owen Rambow is the IACS Endowed Chair in the Department of Linguistics at Stony Brook University. His research focuses on natural language processing and computational linguistics, with specific interests including the detailed aspects of language such as morphology and syntax, as well as how language is used in context. Owen received a Ph.D. in Computer and Information Sciences from the University of Pennsylvania. He has professional experience working at AT&T Labs — Research, spent 15 years at Columbia University as a research scientist, and worked for three years at Elemental Cognition LLC, a startup dedicated to developing software for deep language understanding. At Columbia, he was part of the Center for Computational Learning Systems and co-founded CADIM, a research group specializing in Arabic natural language processing, which licenses advanced NLP tools. His group has also released several resources, including a richly annotated version of the Enron email corpus. Owen Rambow has published extensively in top conferences and journals. He has served as the Chair of the American chapter of the Association for Computational Linguistics, as program co-chair of the NAACL HLT 2016 conference, and has held roles as program committee chair or senior program committee member for numerous conferences and workshops.
Research topics
- Sociology
- Artificial Intelligence
- Computer Science
- Natural Language Processing
- Algorithm
- Human–computer interaction
- Psychology
- Epistemology
- Communication
Selected publications
Residualized Similarity for Faithfully Explainable Authorship Verification
2025-01-01
articleOpen accessResponsible use of authorship verification (AV) systems requires not only high-accuracy but also interpretable solutions.Specifically, for systems to be deployed in contexts where decisions have real-world consequences, their predictions must be explainable through interpretable features that can be traced to the original text.Neural methods achieve high accuracies, but their representations lack direct interpretability.Furthermore, LLM predictions cannot be explained faithfully -if there is an explanation given for a prediction, it doesn't represent the reasoning process behind the model's prediction.To address this gap, we introduce residualized similarity (RS), 1 a novel method that supplements systems using interpretable features with a neural network to improve their performance while maintaining interpretability.Authorship verification is fundamentally a similarity task, where the goal is to measure how likely two documents are to be written by the same author.The key idea is to use a neural network to predict a residual similarity, i.e. the error in the similarity predicted by the interpretable system.Our evaluation across four datasets shows that not only can we match the performance of state-of-the-art authorship verification models, but we can show how and to what degree the final prediction is faithful and interpretable.
2025-01-01 · 1 citations
articleOpen accessSenior authorThe paper explores the performance of LLMs in the context of multi-dimensional analytic writing assessments, i.e. their ability to provide both scores and comments based on multiple assessment criteria.Using a corpus of literature reviews written by L2 graduate students and assessed by human experts against 9 analytic criteria, we prompt several popular LLMs to perform the same task under various conditions.To evaluate the quality of feedback comments, we apply a novel feedback comment quality evaluation framework.This framework is interpretable, cost-efficient, scalable, and reproducible, compared to existing methods that rely on manual judgments.We find that LLMs can generate reasonably good and generally reliable multi-dimensional analytic assessments.We release our corpus and code 1 for reproducibility.
Synthetic Audio Helps for Cognitive State Tasks
ArXiv.org · 2025-02-10
preprintOpen accessSenior authorThe NLP community has broadly focused on text-only approaches of cognitive state tasks, but audio can provide vital missing cues through prosody. We posit that text-to-speech models learn to track aspects of cognitive state in order to produce naturalistic audio, and that the signal audio models implicitly identify is orthogonal to the information that language models exploit. We present Synthetic Audio Data fine-tuning (SAD), a framework where we show that 7 tasks related to cognitive state modeling benefit from multimodal training on both text and zero-shot synthetic audio data from an off-the-shelf TTS system. We show an improvement over the text-only modality when adding synthetic audio data to text-only corpora. Furthermore, on tasks and corpora that do contain gold audio, we show our SAD framework achieves competitive performance with text and synthetic audio compared to text and gold audio.
LVLMs are Bad at Overhearing Human Referential Communication
2025-01-01
articleOpen accessDuring conversation, speakers collaborate on spontaneous referring expressions, which they can then re-use in subsequent conversation with the same partner.Understanding such referring expressions is an important ability for an embodied agent so that it can carry out tasks in the real world.This requires integrating and understanding language, vision, and conversational interaction.We study the capabilities of seven state-of-the-art Large Vision Language Models (LVLMs) as overhearers to a corpus of spontaneous conversations between pairs of human discourse participants engaged in a collaborative object-matching task.We find that such a task remains challenging for current LVLMs, which fail to show a consistent performance improvement as they overhear more conversations from the same discourse participants repeating the same task for multiple rounds.We release our corpus and code 1 for reproducibility and to facilitate future research.
Active Few-Shot Learning for Text Classification
ArXiv.org · 2025-02-26 · 1 citations
preprintOpen accessThe rise of Large Language Models (LLMs) has boosted the use of Few-Shot Learning (FSL) methods in natural language processing, achieving acceptable performance even when working with limited training data. The goal of FSL is to effectively utilize a small number of annotated samples in the learning process. However, the performance of FSL suffers when unsuitable support samples are chosen. This problem arises due to the heavy reliance on a limited number of support samples, which hampers consistent performance improvement even when more support samples are added. To address this challenge, we propose an active learning-based instance selection mechanism that identifies effective support instances from the unlabeled pool and can work with different LLMs. Our experiments on five tasks show that our method frequently improves the performance of FSL. We make our implementation available on GitHub.
Synthetic Audio Helps for Cognitive State Tasks
2025-01-01
articleOpen accessSenior authorAutomatically recognizing a human's complete cognitive state from text is a difficult task; from text, a model has to recognize a combination of concepts including belief, emotion, common ground, sentiment, and intention.Humans do not only track and update cognitive state from the meaning of words and sentences, but also from paralinguistic cues such as prosody.The NLP community has broadly focused on textonly approaches to cognitive state tasks, but audio can provide vital missing information.We posit that text-to-speech (TTS) models learn to track aspects of cognitive state in order to produce naturalistic audio, and that the signal audio models implicitly identify is orthogonal to the information that language models exploit.We present Synthetic Audio Data fine-tuning (SAD), a framework where we show that seven tasks related to cognitive state modeling benefit from multimodal training on both text and zeroshot synthetic audio data from an off-the-shelf TTS system.We show an improvement over the text-only modality when adding synthetic audio data to text-only corpora.Furthermore, on tasks and corpora that do contain gold audio, we show our SAD framework achieves competitive performance using text and synthetic audio compared to text and gold audio.
Exploring Limitations of LLM Capabilities with Multi-Problem Evaluation
2025-01-01 · 3 citations
articleOpen accessSenior authorWe propose using prompts made up of multiple problems to evaluate LLM capabilities, an approach we call multi-problem evaluation.We examine 7 LLMs on 4 related task types constructed from 6 existing classification benchmarks.We find that while LLMs can generally perform multiple homogeneous classifications at once (Batch Classification) as well as when they do so separately, they perform significantly worse on two selection tasks that are conceptually equivalent to Batch Classification and involve selecting indices of text falling into each class label, either independently or altogether.We show that such a significant performance drop is due to LLMs' inability to adequately combine index selection with text classification.Such a drop is surprisingly observed across all LLMs attested, under zero-shot, few-shot, and CoT settings, and even with a novel synthetic dataset, potentially reflecting an inherent capability limitation with modern LLMs.
Active Few-Shot Learning for Text Classification
2025-01-01 · 4 citations
articleOpen accessSaeed Ahmadnia, Arash Yousefi Jordehi, Mahsa Hosseini Khasheh Heyran, Seyed Abolghasem Mirroshandel, Owen Rambow, Cornelia Caragea. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025.
LVLMs are Bad at Overhearing Human Referential Communication
ArXiv.org · 2025-09-15
preprintOpen accessDuring spontaneous conversations, speakers collaborate on novel referring expressions, which they can then re-use in subsequent conversations. Understanding such referring expressions is an important ability for an embodied agent, so that it can carry out tasks in the real world. This requires integrating and understanding language, vision, and conversational interaction. We study the capabilities of seven state-of-the-art Large Vision Language Models (LVLMs) as overhearers to a corpus of spontaneous conversations between pairs of human discourse participants engaged in a collaborative object-matching task. We find that such a task remains challenging for current LVLMs and they all fail to show a consistent performance improvement as they overhear more conversations from the same discourse participants repeating the same task for multiple rounds. We release our corpus and code for reproducibility and to facilitate future research.
ArXiv.org · 2025-02-17
preprintOpen accessSenior authorThe paper explores the performance of LLMs in the context of multi-dimensional analytic writing assessments, i.e. their ability to provide both scores and comments based on multiple assessment criteria. Using a corpus of literature reviews written by L2 graduate students and assessed by human experts against 9 analytic criteria, we prompt several popular LLMs to perform the same task under various conditions. To evaluate the quality of feedback comments, we apply a novel feedback comment quality evaluation framework. This framework is interpretable, cost-efficient, scalable, and reproducible, compared to existing methods that rely on manual judgments. We find that LLMs can generate reasonably good and generally reliable multi-dimensional analytic assessments. We release our corpus and code for reproducibility.
Frequent coauthors
- 52 shared
Nizar Habash
- 26 shared
Mona Diab
Carnegie Mellon University
- 23 shared
Vinodkumar Prabhakaran
- 17 shared
Alexis Nasr
- 15 shared
Jungo Kasai
Toyota Technological Institute at Chicago
- 15 shared
Robert Frank
- 15 shared
Ramy Eskander
- 14 shared
Srinivas Bangalore
Education
- 2000
Ph.D., Computer Science
University of California, San Diego
- 1997
M.S., Computer Science
University of California, San Diego
- 1995
B.S., Computer Science
University of California, San Diego
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Owen Rambow
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup