Douglas Downey

· Professor of Computer ScienceVerified

Northwestern University · Chemical Engineering

Active 1997–2026

h-index31

Citations8.2k

Papers178109 last 5y

Funding$1.2M1 active

Faculty page

See your match with Douglas Downey — sign in to PhdFit.Sign in

About

Douglas Downey is a Professor of Computer Science at Northwestern University, affiliated with the Master of Science in Artificial Intelligence and Master of Science in Robotics programs. His research focuses on natural language processing, machine learning, and artificial intelligence, with particular interest in the automatic construction of useful knowledge bases from Web text. He aims to develop techniques and prototypes that extend the state of the art in Web search and to establish a formal basis for learning from unstructured text without relying on hand-labeled data. Downey also works on ways to utilize human input more effectively in machine learning, exploring methods such as active learning and semi-supervised learning to improve the efficiency and effectiveness of machine learning systems.

Research topics

Computer Science
Artificial Intelligence
Natural Language Processing
Machine Learning
Information Retrieval
Psychology
Mathematics
Engineering
Human–computer interaction

Selected publications

Omakase: proactive assistance with actionable suggestions for evolving scientific research projects
arXiv (Cornell University) · 2026-04-10
articleOpen access
As AI agents become increasingly capable of complex knowledge tasks, the lack of context limits their capability to proactively reason about a user's latent needs throughout a long evolving project. In scientific research, many researchers still manually query a deep research system and compress their rich project contexts into short, targeted queries. Further, a deep research system produces exhaustive reports, making it difficult to identify concrete actions. To explore the opportunities of research assistants that are proactive throughout a research project, we conducted several studies (N=42) with a technology probe and an iterative prototype. The latest iteration of our system, Omakase, is a research assistant that monitors a user's project documents to infer timely queries to a deep research system. Omakase then distills long reports into suggestions contextualized to their evolving projects. Our evaluations showed that participants found the generated queries to be useful and timely, and rated Omakase's suggestions as significantly more actionable than the original reports.
Publisher OA PDF
Omakase: proactive assistance with actionable suggestions for evolving scientific research projects
arXiv (Cornell University) · 2026-04-10
preprintOpen access
As AI agents become increasingly capable of complex knowledge tasks, the lack of context limits their capability to proactively reason about a user's latent needs throughout a long evolving project. In scientific research, many researchers still manually query a deep research system and compress their rich project contexts into short, targeted queries. Further, a deep research system produces exhaustive reports, making it difficult to identify concrete actions. To explore the opportunities of research assistants that are proactive throughout a research project, we conducted several studies (N=42) with a technology probe and an iterative prototype. The latest iteration of our system, Omakase, is a research assistant that monitors a user's project documents to infer timely queries to a deep research system. Omakase then distills long reports into suggestions contextualized to their evolving projects. Our evaluations showed that participants found the generated queries to be useful and timely, and rated Omakase's suggestions as significantly more actionable than the original reports.
Publisher DOI
SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks
arXiv (Cornell University) · 2025-07-01
preprintOpen access
We present SciArena, an open and collaborative platform for evaluating foundation models on scientific literature-grounded tasks. Unlike traditional benchmarks for scientific literature understanding and synthesis, SciArena engages the research community directly, following the Chatbot Arena evaluation approach of community voting on model comparisons. By leveraging collective intelligence, SciArena offers a community-driven evaluation of model performance on open-ended scientific tasks that demand literature-grounded, long-form responses. The platform currently supports 47 foundation models and has collected over 20,000 votes from human researchers across diverse scientific domains. Our analysis of the data collected so far confirms its high quality. We discuss the results and insights based on the model ranking leaderboard. To further promote research in building model-based automated evaluation systems for literature tasks, we release SciArena-Eval, a meta-evaluation benchmark based on collected preference data. It measures the accuracy of models in judging answer quality by comparing their pairwise assessments with human votes. Our experiments highlight the benchmark's challenges and emphasize the need for more reliable automated evaluation methods.
Publisher OA PDF DOI
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
2025-01-01 · 2 citations
articleOpen access
David Wadden, Kejian Shi, Jacob Morrison, Alan Li, Aakanksha Naik, Shruti Singh, Nitzan Barzilay, Kyle Lo, Tom Hope, Luca Soldaini, Shannon Zejiang Shen, Doug Downey, Hannaneh Hajishirzi, Arman Cohan. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
Publisher OA PDF DOI
Ai2 Scholar QA: Organized Literature Synthesis with Attribution
ArXiv.org · 2025-04-15
preprintOpen access
Retrieval-augmented generation is increasingly effective in answering scientific questions from literature, but many state-of-the-art systems are expensive and closed-source. We introduce Ai2 Scholar QA, a free online scientific question answering application. To facilitate research, we make our entire pipeline public: as a customizable open-source Python package and interactive web app, along with paper indexes accessible through public APIs and downloadable datasets. We describe our system in detail and present experiments analyzing its key design decisions. In an evaluation on a recent scientific QA benchmark, we find that Ai2 Scholar QA outperforms competing systems.
Publisher OA PDF DOI
Intent-Aware Schema Generation And Refinement For Literature Review Tables
ArXiv.org · 2025-07-18
preprintOpen access
The increasing volume of academic literature makes it essential for researchers to organize, compare, and contrast collections of documents. Large language models (LLMs) can support this process by generating schemas defining shared aspects along which to compare papers. However, progress on schema generation has been slow due to: (i) ambiguity in reference-based evaluations, and (ii) lack of editing/refinement methods. Our work is the first to address both issues. First, we present an approach for augmenting unannotated table corpora with \emph{synthesized intents}, and apply it to create a dataset for studying schema generation conditioned on a given information need, thus reducing ambiguity. With this dataset, we show how incorporating table intents significantly improves baseline performance in reconstructing reference schemas. We start by comprehensively benchmarking several single-shot schema generation methods, including prompted LLM workflows and fine-tuned models, showing that smaller, open-weight models can be fine-tuned to be competitive with state-of-the-art prompted LLMs. Next, we propose several LLM-based schema refinement techniques and show that these can further improve schemas generated by these methods.
Publisher OA PDF DOI
Ai2 Scholar QA: Organized Literature Synthesis with Attribution
2025-01-01 · 6 citations
articleOpen access
Amanpreet Singh, Joseph Chee Chang, Dany Haddad, Aakanksha Naik, Jena D. Hwang, Rodney Kinney, Daniel S Weld, Doug Downey, Sergey Feldman. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). 2025.
Publisher OA PDF DOI
Front Matter
2025-01-01
paratextOpen access
Peter Jansen, Bhavana Dalvi Mishra, Harsh Trivedi, Bodhisattwa Prasad Majumder, Tom Hope, Tushar Khot, Doug Downey, Eric Horvitz. Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities. 2025.
Publisher OA PDF DOI
Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys
ArXiv.org · 2025-11-15
preprintOpen access
We apply preference learning to the task of language model-guided design of novel structural alloys. In contrast to prior work that focuses on generating stable inorganic crystals, our approach targets the synthesizeability of a specific structural class: BCC/B2 superalloys, an underexplored family of materials with potential applications in extreme environments. Using three open-weight models (LLaMA-3.1, Gemma-2, and OLMo-2), we demonstrate that language models can be optimized for multiple design objectives using a single, unified reward signal through Direct Preference Optimization (DPO). Unlike prior approaches that rely on heuristic or human-in-the-loop feedback (costly), our reward signal is derived from thermodynamic phase calculations, offering a scientifically grounded criterion for model tuning. To our knowledge, this is the first demonstration of preference-tuning a language model using physics-grounded feedback for structural alloy design. The resulting framework is general and extensible, providing a path forward for intelligent design-space exploration across a range of physical science domains.
Publisher OA PDF DOI
Intent-aware Schema Generation and Refinement for Literature Review Tables
2025-01-01 · 1 citations
articleOpen access
The increasing volume of academic literature makes it essential for researchers to organize, compare, and contrast collections of documents.Large language models (LLMs) can support this process by generating schemas defining shared aspects along which to compare papers.However, progress on schema generation has been slow due to: (i) ambiguity in referencebased evaluations, and (ii) lack of editing/refinement methods.Our work is the first to address both issues.First, we present an approach for augmenting unannotated table corpora with synthesized intents, and apply it to create a dataset for studying schema generation conditioned on a given information need, thus reducing ambiguity.With this dataset, we show how incorporating table intents significantly improves baseline performance in reconstructing reference schemas.We start by comprehensively benchmarking several singleshot schema generation methods, including prompted LLM workflows and fine-tuned models, showing that smaller, open-weight models can be fine-tuned to be competitive with stateof-the-art prompted LLMs.Next, we propose several LLM-based schema refinement techniques and show that these can further improve schemas generated by these methods.
Publisher OA PDF DOI

Recent grants

RI: Small: Extracting and Representing Commonsense Knowledge Using Language Models
NSF · $470k · 2020–2026
RI: Medium: Collaborative Research: Learning Representations of Language for Domain Adaptation
NSF · $200k · 2011–2016
CAREER: Web Information Extraction: Integration and Scaling
NSF · $563k · 2014–2020

Frequent coauthors

Daniel S. Weld
Allen Institute
89 shared
Kyle Lo
64 shared
Zejiang Shen
Massachusetts Institute of Technology
50 shared
Bailey Kuehl
41 shared
Erin Bransom
34 shared
Luca Soldaini
33 shared
Amanpreet Singh
32 shared
Chandra Bhagavatula
Allen Institute
30 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Douglas Downey

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you