Devavrat Shah

Verified

Massachusetts Institute of Technology · Electrical Engineering & Computer Science

Active 1999–2024

h-index68

Citations19.4k

Papers517111 last 5y

Funding$2.5M

Faculty page

See your match with Devavrat Shah — sign in to PhdFit.Sign in

Research topics

Computer Science
Machine Learning
Artificial Intelligence
Political Science
Econometrics
Business
Economics
Medicine
Geography
Actuarial science
Mathematics
Engineering
Statistics
Operations research
Data science
Meteorology
World Wide Web
Algorithm
Environmental health

Selected publications

Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States
Proceedings of the National Academy of Sciences · 2022 · 311 citations
- Computer Science
- Political Science
- Artificial Intelligence
Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forecasting models provide specific, quantitative, and evaluable predictions that inform short-term decisions such as healthcare staffing needs, school closures, and allocation of medical supplies. Starting in April 2020, the US COVID-19 Forecast Hub (https://covid19forecasthub.org/) collected, disseminated, and synthesized tens of millions of specific predictions from more than 90 different academic, industry, and independent research groups. A multimodel ensemble forecast that combined predictions from dozens of groups every week provided the most consistently accurate probabilistic forecasts of incident deaths due to COVID-19 at the state and national level from April 2020 through October 2021. The performance of 27 individual models that submitted complete forecasts of COVID-19 deaths consistently throughout this year showed high variability in forecast skill across time, geospatial units, and forecast horizons. Two-thirds of the models evaluated showed better accuracy than a naïve baseline model. Forecast accuracy degraded as models made predictions further into the future, with probabilistic error at a 20-wk horizon three to five times larger than when predicting at a 1-wk horizon. This project underscores the role that collaboration and active coordination between governmental public-health agencies, academic modeling teams, and industry partners can play in developing modern modeling capabilities to support local, state, and federal response to outbreaks.
DOI
The United States COVID-19 Forecast Hub dataset
Scientific Data · 2022 · 126 citations
- Computer Science
- Machine Learning
- Computer Science
Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) partnered with an academic research lab at the University of Massachusetts Amherst to create the US COVID-19 Forecast Hub. Launched in April 2020, the Forecast Hub is a dataset with point and probabilistic forecasts of incident cases, incident hospitalizations, incident deaths, and cumulative deaths due to COVID-19 at county, state, and national, levels in the United States. Included forecasts represent a variety of modeling approaches, data sources, and assumptions regarding the spread of COVID-19. The goal of this dataset is to establish a standardized and comparable set of short-term forecasts from modeling teams. These data can be used to develop ensemble models, communicate forecasts to the public, create visualizations, compare models, and inform policies regarding COVID-19 mitigation. These open-source data are available via download from GitHub, through an online API, and through R packages.
DOI
Causal Matrix Completion
arXiv (Cornell University) · 2021 · 10 citations
- Computer Science
- Machine Learning
- Mathematics
Matrix completion is the study of recovering an underlying matrix from a sparse subset of noisy observations. Traditionally, it is assumed that the entries of the matrix are "missing completely at random" (MCAR), i.e., each entry is revealed at random, independent of everything else, with uniform probability. This is likely unrealistic due to the presence of "latent confounders", i.e., unobserved factors that determine both the entries of the underlying matrix and the missingness pattern in the observed matrix. For example, in the context of movie recommender systems -- a canonical application for matrix completion -- a user who vehemently dislikes horror films is unlikely to ever watch horror films. In general, these confounders yield "missing not at random" (MNAR) data, which can severely impact any inference procedure that does not correct for this bias. We develop a formal causal model for matrix completion through the language of potential outcomes, and provide novel identification arguments for a variety of causal estimands of interest. We design a procedure, which we call "synthetic nearest neighbors" (SNN), to estimate these causal estimands. We prove finite-sample consistency and asymptotic normality of our estimator. Our analysis also leads to new theoretical results for the matrix completion literature. In particular, we establish entry-wise, i.e., max-norm, finite-sample consistency and asymptotic normality results for matrix completion with MNAR data. As a special case, this also provides entry-wise bounds for matrix completion with MCAR data. Across simulated and real data, we demonstrate the efficacy of our proposed estimator.
Publisher OA PDF DOI
Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the US
medRxiv (Cold Spring Harbor Laboratory) · 2021 · 77 citations
- Computer Science
- Political Science
- Artificial Intelligence
Abstract Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forecasting models provide specific, quantitative, and evaluable predictions that inform short-term decisions such as healthcare staffing needs, school closures, and allocation of medical supplies. Starting in April 2020, the US COVID-19 Forecast Hub ( https://covid19forecasthub.org/ ) collected, disseminated, and synthesized tens of millions of specific predictions from more than 90 different academic, industry, and independent research groups. A multi-model ensemble forecast that combined predictions from dozens of different research groups every week provided the most consistently accurate probabilistic forecasts of incident deaths due to COVID-19 at the state and national level from April 2020 through October 2021. The performance of 27 individual models that submitted complete forecasts of COVID-19 deaths consistently throughout this year showed high variability in forecast skill across time, geospatial units, and forecast horizons. Two-thirds of the models evaluated showed better accuracy than a naïve baseline model. Forecast accuracy degraded as models made predictions further into the future, with probabilistic error at a 20-week horizon 3-5 times larger than when predicting at a 1-week horizon. This project underscores the role that collaboration and active coordination between governmental public health agencies, academic modeling teams, and industry partners can play in developing modern modeling capabilities to support local, state, and federal response to outbreaks. Significance Statement This paper compares the probabilistic accuracy of short-term forecasts of reported deaths due to COVID-19 during the first year and a half of the pandemic in the US. Results show high variation in accuracy between and within stand-alone models, and more consistent accuracy from an ensemble model that combined forecasts from all eligible models. This demonstrates that an ensemble model provided a reliable and comparatively accurate means of forecasting deaths during the COVID-19 pandemic that exceeded the performance of all of the models that contributed to it. This work strengthens the evidence base for synthesizing multiple models to support public health action.
DOI

Recent grants

Learning Graphical Models: Hardness and Tractability
NSF · $300k · 2015–2020
What Do Customers Like: A New Approach That Lets The Data Decide
NSF · $305k · 2010–2015
Revenue Management For Enterprise Users of Cloud Infrastructure
NSF · $360k · 2016–2021
CAREER: Implementable Network Algorithms via Randomization, Belief Propagation and Heavy Traffic
NSF · $450k · 2006–2012
EMT/MISC: Collaborative Research: Harnessing Statistical Physics for Computing and Communication
NSF · $180k · 2008–2012

Frequent coauthors

Muriel Médard
Massachusetts Institute of Technology
114 shared
Abbas El Gamal
103 shared
Giuseppe Caire
Technische Universität Berlin
98 shared
Marc Apter
Institute of Electrical and Electronics Engineers
98 shared
Gerhard Kramer
Technical University of Munich
98 shared
Ieee-Usa Chris Brantley
Drexel University
98 shared
Albert Guillén
Institute of Electrical and Electronics Engineers
98 shared
Robert Hebner
98 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Devavrat Shah

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup