
Douglas Downey
· Professor of Computer ScienceVerifiedNorthwestern University · Chemical Engineering
Active 1997–2026
About
Douglas Downey is a Professor of Computer Science at Northwestern University, affiliated with the Master of Science in Artificial Intelligence and Master of Science in Robotics programs. His research focuses on natural language processing, machine learning, and artificial intelligence, with particular interest in the automatic construction of useful knowledge bases from Web text. He aims to develop techniques and prototypes that extend the state of the art in Web search and to establish a formal basis for learning from unstructured text without relying on hand-labeled data. Downey also works on ways to utilize human input more effectively in machine learning, exploring methods such as active learning and semi-supervised learning to improve the efficiency and effectiveness of machine learning systems.
Research topics
- Computer Science
- Artificial Intelligence
- Natural Language Processing
- Machine Learning
- Information Retrieval
- Psychology
- Mathematics
- Engineering
- Human–computer interaction
Selected publications
Omakase: proactive assistance with actionable suggestions for evolving scientific research projects
arXiv (Cornell University) · 2026-04-10
articleOpen accessAs AI agents become increasingly capable of complex knowledge tasks, the lack of context limits their capability to proactively reason about a user's latent needs throughout a long evolving project. In scientific research, many researchers still manually query a deep research system and compress their rich project contexts into short, targeted queries. Further, a deep research system produces exhaustive reports, making it difficult to identify concrete actions. To explore the opportunities of research assistants that are proactive throughout a research project, we conducted several studies (N=42) with a technology probe and an iterative prototype. The latest iteration of our system, Omakase, is a research assistant that monitors a user's project documents to infer timely queries to a deep research system. Omakase then distills long reports into suggestions contextualized to their evolving projects. Our evaluations showed that participants found the generated queries to be useful and timely, and rated Omakase's suggestions as significantly more actionable than the original reports.
Omakase: proactive assistance with actionable suggestions for evolving scientific research projects
arXiv (Cornell University) · 2026-04-10
preprintOpen accessAs AI agents become increasingly capable of complex knowledge tasks, the lack of context limits their capability to proactively reason about a user's latent needs throughout a long evolving project. In scientific research, many researchers still manually query a deep research system and compress their rich project contexts into short, targeted queries. Further, a deep research system produces exhaustive reports, making it difficult to identify concrete actions. To explore the opportunities of research assistants that are proactive throughout a research project, we conducted several studies (N=42) with a technology probe and an iterative prototype. The latest iteration of our system, Omakase, is a research assistant that monitors a user's project documents to infer timely queries to a deep research system. Omakase then distills long reports into suggestions contextualized to their evolving projects. Our evaluations showed that participants found the generated queries to be useful and timely, and rated Omakase's suggestions as significantly more actionable than the original reports.
SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks
arXiv (Cornell University) · 2025-07-01
preprintOpen accessWe present SciArena, an open and collaborative platform for evaluating foundation models on scientific literature-grounded tasks. Unlike traditional benchmarks for scientific literature understanding and synthesis, SciArena engages the research community directly, following the Chatbot Arena evaluation approach of community voting on model comparisons. By leveraging collective intelligence, SciArena offers a community-driven evaluation of model performance on open-ended scientific tasks that demand literature-grounded, long-form responses. The platform currently supports 47 foundation models and has collected over 20,000 votes from human researchers across diverse scientific domains. Our analysis of the data collected so far confirms its high quality. We discuss the results and insights based on the model ranking leaderboard. To further promote research in building model-based automated evaluation systems for literature tasks, we release SciArena-Eval, a meta-evaluation benchmark based on collected preference data. It measures the accuracy of models in judging answer quality by comparing their pairwise assessments with human votes. Our experiments highlight the benchmark's challenges and emphasize the need for more reliable automated evaluation methods.
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
2025-01-01 · 2 citations
articleOpen accessDavid Wadden, Kejian Shi, Jacob Morrison, Alan Li, Aakanksha Naik, Shruti Singh, Nitzan Barzilay, Kyle Lo, Tom Hope, Luca Soldaini, Shannon Zejiang Shen, Doug Downey, Hannaneh Hajishirzi, Arman Cohan. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
Ai2 Scholar QA: Organized Literature Synthesis with Attribution
ArXiv.org · 2025-04-15
preprintOpen accessRetrieval-augmented generation is increasingly effective in answering scientific questions from literature, but many state-of-the-art systems are expensive and closed-source. We introduce Ai2 Scholar QA, a free online scientific question answering application. To facilitate research, we make our entire pipeline public: as a customizable open-source Python package and interactive web app, along with paper indexes accessible through public APIs and downloadable datasets. We describe our system in detail and present experiments analyzing its key design decisions. In an evaluation on a recent scientific QA benchmark, we find that Ai2 Scholar QA outperforms competing systems.
Intent-Aware Schema Generation And Refinement For Literature Review Tables
ArXiv.org · 2025-07-18
preprintOpen accessThe increasing volume of academic literature makes it essential for researchers to organize, compare, and contrast collections of documents. Large language models (LLMs) can support this process by generating schemas defining shared aspects along which to compare papers. However, progress on schema generation has been slow due to: (i) ambiguity in reference-based evaluations, and (ii) lack of editing/refinement methods. Our work is the first to address both issues. First, we present an approach for augmenting unannotated table corpora with \emph{synthesized intents}, and apply it to create a dataset for studying schema generation conditioned on a given information need, thus reducing ambiguity. With this dataset, we show how incorporating table intents significantly improves baseline performance in reconstructing reference schemas. We start by comprehensively benchmarking several single-shot schema generation methods, including prompted LLM workflows and fine-tuned models, showing that smaller, open-weight models can be fine-tuned to be competitive with state-of-the-art prompted LLMs. Next, we propose several LLM-based schema refinement techniques and show that these can further improve schemas generated by these methods.
Ai2 Scholar QA: Organized Literature Synthesis with Attribution
2025-01-01 · 6 citations
articleOpen accessAmanpreet Singh, Joseph Chee Chang, Dany Haddad, Aakanksha Naik, Jena D. Hwang, Rodney Kinney, Daniel S Weld, Doug Downey, Sergey Feldman. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). 2025.
2025-01-01
paratextOpen accessPeter Jansen, Bhavana Dalvi Mishra, Harsh Trivedi, Bodhisattwa Prasad Majumder, Tom Hope, Tushar Khot, Doug Downey, Eric Horvitz. Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities. 2025.
Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys
ArXiv.org · 2025-11-15
preprintOpen accessWe apply preference learning to the task of language model-guided design of novel structural alloys. In contrast to prior work that focuses on generating stable inorganic crystals, our approach targets the synthesizeability of a specific structural class: BCC/B2 superalloys, an underexplored family of materials with potential applications in extreme environments. Using three open-weight models (LLaMA-3.1, Gemma-2, and OLMo-2), we demonstrate that language models can be optimized for multiple design objectives using a single, unified reward signal through Direct Preference Optimization (DPO). Unlike prior approaches that rely on heuristic or human-in-the-loop feedback (costly), our reward signal is derived from thermodynamic phase calculations, offering a scientifically grounded criterion for model tuning. To our knowledge, this is the first demonstration of preference-tuning a language model using physics-grounded feedback for structural alloy design. The resulting framework is general and extensible, providing a path forward for intelligent design-space exploration across a range of physical science domains.
Intent-aware Schema Generation and Refinement for Literature Review Tables
2025-01-01 · 1 citations
articleOpen accessThe increasing volume of academic literature makes it essential for researchers to organize, compare, and contrast collections of documents.Large language models (LLMs) can support this process by generating schemas defining shared aspects along which to compare papers.However, progress on schema generation has been slow due to: (i) ambiguity in referencebased evaluations, and (ii) lack of editing/refinement methods.Our work is the first to address both issues.First, we present an approach for augmenting unannotated table corpora with synthesized intents, and apply it to create a dataset for studying schema generation conditioned on a given information need, thus reducing ambiguity.With this dataset, we show how incorporating table intents significantly improves baseline performance in reconstructing reference schemas.We start by comprehensively benchmarking several singleshot schema generation methods, including prompted LLM workflows and fine-tuned models, showing that smaller, open-weight models can be fine-tuned to be competitive with stateof-the-art prompted LLMs.Next, we propose several LLM-based schema refinement techniques and show that these can further improve schemas generated by these methods.
Recent grants
RI: Small: Extracting and Representing Commonsense Knowledge Using Language Models
NSF · $470k · 2020–2026
RI: Medium: Collaborative Research: Learning Representations of Language for Domain Adaptation
NSF · $200k · 2011–2016
CAREER: Web Information Extraction: Integration and Scaling
NSF · $563k · 2014–2020
Frequent coauthors
- 89 shared
Daniel S. Weld
Allen Institute
- 64 shared
Kyle Lo
- 50 shared
Zejiang Shen
Massachusetts Institute of Technology
- 41 shared
Bailey Kuehl
- 34 shared
Erin Bransom
- 33 shared
Luca Soldaini
- 32 shared
Amanpreet Singh
- 30 shared
Chandra Bhagavatula
Allen Institute
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Douglas Downey
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup