
Ryan Tibshirani
VerifiedUniversity of California, Berkeley · Department of Statistics
Active 1991–2024
Research topics
- Computer Science
- Political Science
- Medicine
- Business
- Artificial Intelligence
- Environmental health
- Geography
- Operations research
- Psychology
- Econometrics
- Actuarial science
- Economics
- Engineering
- World Wide Web
- Data science
- Statistics
- Internet privacy
- Sociology
- Public relations
- Machine Learning
- Meteorology
- Virology
- Mathematics
- Finance
Selected publications
Proceedings of the National Academy of Sciences · 2022 · 311 citations
- Computer Science
- Political Science
- Artificial Intelligence
Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forecasting models provide specific, quantitative, and evaluable predictions that inform short-term decisions such as healthcare staffing needs, school closures, and allocation of medical supplies. Starting in April 2020, the US COVID-19 Forecast Hub (https://covid19forecasthub.org/) collected, disseminated, and synthesized tens of millions of specific predictions from more than 90 different academic, industry, and independent research groups. A multimodel ensemble forecast that combined predictions from dozens of groups every week provided the most consistently accurate probabilistic forecasts of incident deaths due to COVID-19 at the state and national level from April 2020 through October 2021. The performance of 27 individual models that submitted complete forecasts of COVID-19 deaths consistently throughout this year showed high variability in forecast skill across time, geospatial units, and forecast horizons. Two-thirds of the models evaluated showed better accuracy than a naïve baseline model. Forecast accuracy degraded as models made predictions further into the future, with probabilistic error at a 20-wk horizon three to five times larger than when predicting at a 1-wk horizon. This project underscores the role that collaboration and active coordination between governmental public-health agencies, academic modeling teams, and industry partners can play in developing modern modeling capabilities to support local, state, and federal response to outbreaks.
The United States COVID-19 Forecast Hub dataset
Scientific Data · 2022 · 126 citations
- Computer Science
- Machine Learning
- Computer Science
Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) partnered with an academic research lab at the University of Massachusetts Amherst to create the US COVID-19 Forecast Hub. Launched in April 2020, the Forecast Hub is a dataset with point and probabilistic forecasts of incident cases, incident hospitalizations, incident deaths, and cumulative deaths due to COVID-19 at county, state, and national, levels in the United States. Included forecasts represent a variety of modeling approaches, data sources, and assumptions regarding the spread of COVID-19. The goal of this dataset is to establish a standardized and comparable set of short-term forecasts from modeling teams. These data can be used to develop ensemble models, communicate forecasts to the public, create visualizations, compare models, and inform policies regarding COVID-19 mitigation. These open-source data are available via download from GitHub, through an online API, and through R packages.
Proceedings of the National Academy of Sciences · 2021 · 163 citations
Senior authorCorresponding- Environmental health
- Medicine
- Geography
The US COVID-19 Trends and Impact Survey (CTIS) is a large, cross-sectional, internet-based survey that has operated continuously since April 6, 2020. By inviting a random sample of Facebook active users each day, CTIS collects information about COVID-19 symptoms, risks, mitigating behaviors, mental health, testing, vaccination, and other key priorities. The large scale of the survey-over 20 million responses in its first year of operation-allows tracking of trends over short timescales and allows comparisons at fine demographic and geographic detail. The survey has been repeatedly revised to respond to emerging public health priorities. In this paper, we describe the survey methods and content and give examples of CTIS results that illuminate key patterns and trends and help answer high-priority policy questions relevant to the COVID-19 epidemic and response. These results demonstrate how large online surveys can provide continuous, real-time indicators of important outcomes that are not subject to public health reporting delays and backlogs. The CTIS offers high value as a supplement to official reporting data by supplying essential information about behaviors, attitudes toward policy and preventive measures, economic impacts, and other topics not reported in public health surveillance systems.
Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the US
medRxiv (Cold Spring Harbor Laboratory) · 2021 · 77 citations
- Computer Science
- Political Science
- Artificial Intelligence
Abstract Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forecasting models provide specific, quantitative, and evaluable predictions that inform short-term decisions such as healthcare staffing needs, school closures, and allocation of medical supplies. Starting in April 2020, the US COVID-19 Forecast Hub ( https://covid19forecasthub.org/ ) collected, disseminated, and synthesized tens of millions of specific predictions from more than 90 different academic, industry, and independent research groups. A multi-model ensemble forecast that combined predictions from dozens of different research groups every week provided the most consistently accurate probabilistic forecasts of incident deaths due to COVID-19 at the state and national level from April 2020 through October 2021. The performance of 27 individual models that submitted complete forecasts of COVID-19 deaths consistently throughout this year showed high variability in forecast skill across time, geospatial units, and forecast horizons. Two-thirds of the models evaluated showed better accuracy than a naïve baseline model. Forecast accuracy degraded as models made predictions further into the future, with probabilistic error at a 20-week horizon 3-5 times larger than when predicting at a 1-week horizon. This project underscores the role that collaboration and active coordination between governmental public health agencies, academic modeling teams, and industry partners can play in developing modern modeling capabilities to support local, state, and federal response to outbreaks. Significance Statement This paper compares the probabilistic accuracy of short-term forecasts of reported deaths due to COVID-19 during the first year and a half of the pandemic in the US. Results show high variation in accuracy between and within stand-alone models, and more consistent accuracy from an ensemble model that combined forecasts from all eligible models. This demonstrates that an ensemble model provided a reliable and comparatively accurate means of forecasting deaths during the COVID-19 pandemic that exceeded the performance of all of the models that contributed to it. This work strengthens the evidence base for synthesizing multiple models to support public health action.
Epidemic tracking and forecasting: Lessons learned from a tumultuous year
Proceedings of the National Academy of Sciences · 2021 · 36 citations
Senior authorCorresponding- Political Science
- Computer Science
- Artificial Intelligence
Epidemic forecasting has garnered increasing interest in the last decade, nurtured and scaffolded by various forecasting challenges organized by groups within the US federal government, including the Centers for Disease Control and Prevention (CDC) (1⇓–3), Office of Science and Technology Policy (OSTP) (4), and Defense Advanced Research Projects Agency (DARPA) (5), and elsewhere (6, 7). In 2017, after several years of experimentation with flu forecasting in academic groups, the CDC decided to incorporate influenza forecasting into its normal operations, including weekly public communications (8) and briefing to higher-ups. To provide more reliable infrastructure and support for its forecasting needs, the CDC in 2019 designated two national Centers of Excellence for Influenza Forecasting, one at the University of Massachusetts at Amherst (https://reichlab.io/people) and one at Carnegie Mellon University (https://delphi.cmu.edu/about/center-of-excellence/). Not unrelatedly, the last decade has also seen a rise in the importance of digital surveillance streams in public health, with improving epidemic tracking and forecasting models being a key application of these data. Digital streams, such as search and social media trends, have constituted a large part of the focus (9⇓⇓⇓⇓–14); however, even more broadly, data from auxiliary streams that operate outside of traditional public health reporting, such as online surveys, medical devices, or electronic medical records (EMRs), have received considerable attention as well (15⇓⇓⇓⇓⇓⇓⇓⇓⇓–25). The Carnegie Mellon Delphi group, which the two of us colead, has worked in both of these emerging disciplines—epidemic forecasting and building relevant auxiliary signals to aid such forecasting models—since 2012. In 2020, as the pandemic broke out, we struggled like many other groups to find ways to contribute to the national efforts to respond to the pandemic. We ended up shifting our … [↵][1]1To whom correspondence may be addressed. Email: ryantibs{at}cmu.edu. [1]: #xref-corresp-1-1
An open repository of real-time COVID-19 indicators
Proceedings of the National Academy of Sciences · 2021 · 71 citations
Senior authorCorresponding- Computer Science
- Internet privacy
- Data science
The COVID-19 pandemic presented enormous data challenges in the United States. Policy makers, epidemiological modelers, and health researchers all require up-to-date data on the pandemic and relevant public behavior, ideally at fine spatial and temporal resolution. The COVIDcast API is our attempt to fill this need: Operational since April 2020, it provides open access to both traditional public health surveillance signals (cases, deaths, and hospitalizations) and many auxiliary indicators of COVID-19 activity, such as signals extracted from deidentified medical claims data, massive online surveys, cell phone mobility data, and internet search trends. These are available at a fine geographic resolution (mostly at the county level) and are updated daily. The COVIDcast API also tracks all revisions to historical data, allowing modelers to account for the frequent revisions and backfill that are common for many public health data sources. All of the data are available in a common format through the API and accompanying R and Python software packages. This paper describes the data sources and signals, and provides examples demonstrating that the auxiliary signals in the COVIDcast API present information relevant to tracking COVID activity, augmenting traditional public health reporting and empowering research and decision-making.
Partnering with a global platform to inform research and public policy making
Survey research methods · 2020 · 54 citations
Senior authorCorresponding- Computer Science
- Political Science
- Business
This paper describes a partnership between Facebook and academic institutions to create a global COVID-19 symptom survey. The survey is available in 56 languages. A representative sample of Facebook users is invited on a daily basis to report on symptoms, social distancing behavior, mental health issues, and financial constraints. Facebook provides weights to reduce nonresponse and coverage bias. Privacy protection and disclosure avoidance mechanisms are implemented by both partners to meet global policy and industry requirements. Country and region-level statistics are published daily via dashboards, and microdata are available for researchers via data use agreements. Over 1 million responses are collected weekly.
Recent grants
Advancing Theory and Computation in Statistical Learning Problems
NSF · $150k · 2013–2017
Flexible and Adaptive Statistical Modeling
NSF · $345k · 2007–2012
Flexible and Adaptive Statistical Modeling
NSF · $500k · 2012–2016
Flexible and Adaptive Statistical Modeling
NSF · $497k · 1999–2004
NSF · $400k · 2016–2021
Frequent coauthors
- 38 shared
Robert Tibshirani
- 25 shared
Richard Lockhart
- 24 shared
Alden Green
- 21 shared
Logan Brooks
Carnegie Mellon University
- 20 shared
Roni Rosenfeld
- 20 shared
Aaditya Ramdas
- 17 shared
Jacob Bien
University of Southern California
- 16 shared
Jonathan Taylor
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ryan Tibshirani
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup