Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
William S. Cleveland

William S. Cleveland

· Shanti S. Gupta Distinguished Professor of Statistics, Professor of Computer Science (Courtesy)Verified

Purdue University · Statistics

Active 1904–2025

h-index53
Citations38.9k
Papers21314 last 5y
Funding$815k
See your match with William S. Cleveland — sign in to PhdFit.Sign in

About

William S. Cleveland is the Shanti S. Gupta Distinguished Professor of Statistics and also holds a courtesy appointment as a Professor of Computer Science at Purdue University. His research interests include data visualization, high performance computing for deep analysis of data, both big and small, machine learning, and statistics. Cleveland has made significant contributions to the fields of statistical graphics and data analysis, earning numerous awards such as the Lifetime Achievement Award in Graphics and Computing from the American Statistical Association, the Parzen Prize, and recognition as a Highly Cited Researcher by the American Society for Information Science and Technology. He is a Fellow of the American Statistical Association, the Institute of Mathematical Statistics, and the American Association for the Advancement of Science, among other honors. Cleveland has supervised many Ph.D. students from Purdue University and other institutions, contributing to the advancement of statistical science and data visualization.

Research topics

  • Computer Science
  • Geography
  • Meteorology
  • Oceanography
  • Environmental science
  • Geology
  • Political Science
  • Climatology
  • Atmospheric sciences
  • Library science
  • Engineering
  • Waste management
  • Law
  • World Wide Web

Selected publications

  • Is There a Future for Stochastic Modeling in Business and Industry in the Era of Machine Learning and Artificial Intelligence?

    Applied Stochastic Models in Business and Industry · 2025-03-01 · 6 citations

    articleOpen access

    ABSTRACT The paper arises from the experience of Applied Stochastic Models in Business and Industry which has seen, over the years, more and more contributions related to Machine Learning rather than to what was intended as a stochastic model. The very notion of a stochastic model (e.g., a Gaussian process or a Dynamic Linear Model) can be subject to change: What is a Deep Neural Network if not a stochastic model? The paper presents the views, supported by examples, of distinguished researchers in the field of business and industrial statistics. They are discussing not only whether there is a future for traditional stochastic models in the era of Machine Learning and Artificial Intelligence, but also how these fields can interact and gain new life for their development.

  • Climatology and decadal changes of Arctic atmospheric rivers based on ERA5 and MERRA-2

    Environmental Research Climate · 2023 · 8 citations

    Senior authorCorresponding
    • Environmental science
    • Climatology
    • Atmospheric sciences

    Abstract We present the Arctic atmospheric river (AR) climatology based on twelve sets of labels derived from ERA5 and MERRA-2 reanalyses for 1980–2019. The ARs were identified and tracked in the 3-hourly reanalysis data with a multifactorial approach based on either atmospheric column-integrated water vapor (IWV) or integrated water vapor transport (IVT) exceeding one of the three climate thresholds (75th, 85th, and 95th percentiles). Time series analysis of the AR event counts from the AR labels showed overall upward trends from the mid-1990s to 2019. The 75th IVT- and IWV-based labels, as well as the 85th IWV-based labels, are likely more sensitive to Arctic surface warming, therefore, detected some broadening of AR-affected areas over time, while the rest of the labels did not. Spatial exploratory analysis of these labels revealed that the AR frequency of occurrence maxima shifted poleward from over-land in 1980–1999 to over the Arctic Ocean and its outlying Seas in 2000–2019. Regions across the Atlantic, the Arctic, to the Pacific Oceans trended higher AR occurrence, surface temperature, and column-integrated moisture. Meanwhile, ARs were increasingly responsible for the rising moisture transport into the Arctic. Even though the increase of Arctic AR occurrence was primarily associated with long-term Arctic surface warming and moistening, the effects of changing atmospheric circulation could stand out locally, such as on the Pacific side over the Chukchi Sea. The changing teleconnection patterns strongly modulated AR activities in time and space, with prominent anomalies in the Arctic-Pacific sector during the latest decade. Besides, the extreme events identified by the 95th-percentile labels displayed the most significant changes and were most influenced by the teleconnection patterns. The twelve Arctic AR labels and the detailed graphics in the atlas can help navigate the uncertainty of detecting and quantifying Arctic ARs and their associated effects in current and future studies.

  • Climatology and Decadal Changes of Arctic Atmospheric Rivers Based on ERA5 and MERRA-2

    2023-05-25 · 1 citations

    preprintOpen accessSenior author

    We present {the} Arctic atmospheric river (AR) climatology based on twelve {sets of} labels derived from ERA5 and MERRA-2 reanalyses for 1980–2019. The ARs were identified and tracked in the 3-hourly reanalysis data with a multifactorial approach based on either atmospheric column-integrated water vapor ($IWV$) or integrated water vapor transport ($IVT$) exceeding one of the three climate thresholds (75th, 85th, and 95th percentiles). Time series analysis of the AR event counts from the AR labels showed overall upward trends from the mid-1990s to 2019. The 75th $IVT$- and $IWV$-based labels, as well as the 85th $IWV$-based labels, are likely more sensitive to Arctic surface warming, therefore, detected some broadening of AR-affected areas over time, while the rest of the labels did not. Spatial exploratory analysis of these labels revealed that the AR frequency of occurrence maxima shifted poleward from over-land in 1980–1999 to over the Arctic Ocean and its outlying Seas in 2000–2019. Regions across the Atlantic, the Arctic, to the Pacific Oceans trended higher AR occurrence, surface temperature, and column-integrated moisture. Meanwhile, ARs were increasingly responsible for the rising moisture transport into the Arctic. Even though the increase of Arctic AR occurrence was primarily associated with long-term Arctic surface warming and moistening, the effects of changing atmospheric circulation could stand out locally, such as on the Pacific side over the Chukchi Sea. The changing teleconnection patterns strongly modulated AR activities in time and space, with prominent anomalies in the Arctic-Pacific sector during the latest decade. Besides, the extreme events identified by the 95th-percentile labels displayed the most significant changes and were most influenced by the teleconnection patterns. The twelve Arctic AR labels and the detailed graphics in the atlas can help navigate the uncertainty of detecting and quantifying Arctic ARs and their associated effects in current and future studies.

  • Atlas of Arctic Atmospheric River Climatology Based on ERA5 and MERRA-2

    2023-03-09 · 1 citations

    preprintOpen accessSenior author

    We present an atlas of Arctic atmospheric river (AR) climatology based on twelve indices derived from ERA5 and MERRA-2 reanalyses for 1980–2019. The ARs were identified and tracked in the 3-hourly reanalysis data with a multifactorial approach based on either atmospheric column-integrated water vapor ($IWV$) or integrated water vapor transport ($IVT$) exceeding one of the three climate thresholds (75th, 85th, and 95th percentiles). Time series analysis of the AR event counts from the AR indices showed overall upward trends from the mid-1990s to 2019. The 75th $IVT$- and $IWV$-based indices, as well as the 85th $IWV$-based indices, are likely more sensitive to Arctic surface warming, therefore, detected some broadening of AR-affected areas over time, while the rest of the indices did not. Spatial exploratory analysis of these indices revealed that the AR frequency of occurrence maxima shifted poleward from over-land in 1980–1999 to over the Arctic Ocean and its outlying Seas in 2000–2019. Regions across the Atlantic, the Arctic, to the Pacific Oceans trended higher AR occurrence, surface temperature, and column-integrated moisture. Meanwhile, ARs were increasingly responsible for the rising moisture transport into the Arctic. Even though the increase of Arctic AR occurrence was primarily associated with long-term Arctic surface warming and moistening, the effects of changing atmospheric circulation could stand out locally, such as on the Pacific side over the Chukchi Sea. The changing teleconnection patterns strongly modulated AR activities in time and space, with prominent anomalies in the Arctic-Pacific sector during the latest decade. Besides, the extreme events identified by the 95th-percentile indices displayed the most significant changes and were most influenced by the teleconnection patterns. The twelve Arctic AR indices and the detailed graphics in the atlas can help navigate the uncertainty of detecting and quantifying Arctic ARs and their associated effects in current and future studies. Abstract content goes here

  • Atlas of Arctic Atmospheric River Climatology Based on ERA5 and MERRA-2

    2022 · 7 citations

    Senior authorCorresponding
    • Computer Science
    • Meteorology
    • Climatology

    Earth and Space Science Open Archive This preprint has been submitted to and is under consideration at Geophysical Research Letters. ESSOAr is a venue for early communication or feedback before peer review. Data may be preliminary.Learn more about preprints preprintOpen AccessYou are viewing the latest version by default [v1]Atlas of Arctic Atmospheric River Climatology Based on ERA5 and MERRA-2AuthorsChenZhangiDWen-WenTungiDWilliam SClevelandSee all authors Chen ZhangiDUnknowniDhttps://orcid.org/0000-0002-7018-1164view email addressThe email was not providedcopy email addressWen-Wen TungiDCorresponding Author• Submitting AuthorPurdue UniversityiDhttps://orcid.org/0000-0001-8627-1503view email addressThe email was not providedcopy email addressWilliam S ClevelandPurdue Universityview email addressThe email was not providedcopy email address

  • Scalable $k$-d trees for distributed data

    arXiv (Cornell University) · 2022-01-20

    preprintOpen access

    Data structures known as $k$-d trees have numerous applications in scientific computing, particularly in areas of modern statistics and data science such as range search in decision trees, clustering, nearest neighbors search, local regression, and so forth. In this article we present a scalable mechanism to construct $k$-d trees for distributed data, based on approximating medians for each recursive subdivision of the data. We provide theoretical guarantees of the quality of approximation using this approach, along with a simulation study quantifying the accuracy and scalability of our proposed approach in practice.

  • In Search of The Optimal Atmospheric River Index for US Precipitation: A Multifactorial Analysis

    2021-03-22 · 2 citations

    preprintOpen accessSenior author

    Atmospheric rivers (ARs) affect surface hydrometeorology in the US West Coast and Midwest. We systematically sought optimal AR indices for expressing surface precipitation impacts within the Atmospheric River Tracking Method Intercomparison Project (ARTMIP) framework. We adopted a multifactorial approach. Four factors—moisture fields, climatological thresholds, shape criteria, and temporal thresholds—collectively generated 81 West Coast AR indices and 81 Midwest indices from January 1980 to June 2017. Two moisture fields were extracted from the MERRA-2 data for ARTMIP: integrated water vapor transport (IVT) and integrated water vapor (IWV). Metrics for precipitation effects included two-way summary statistics relating the concurrence of AR and that of precipitation, per-event averaged precipitation rate, and per-event precipitation accumulation. We found that an optimal AR index for precipitation depends on the types of impact to be addressed, associated physical mechanisms in the affected regions, timing, and duration. In West Coast and Midwest, IWV-based AR indices identified the most abundant AR event time steps, most accurately associated AR to days with precipitation, and represented the presence of precipitation the best. With a lower climatological threshold, they detected the most accumulated precipitation with the longest event duration. Longer duration thresholds also led to higher accumulated precipitation, holding other factors constant. IWV-based indices are the overall choice for Midwest ARs under varying seasonal precipitation drivers. IVT-based indices suitably capture the accumulation of intense orographic precipitation on the West Coast. Indices combining IVT and IWV identify the fewest, shortest, but most intense AR precipitation episodes.

  • Distributed estimation through parallel approximants

    arXiv (Cornell University) · 2021-12-31

    preprintOpen access

    Designing scalable estimation algorithms is a core challenge in modern statistics. Here we introduce a framework to address this challenge based on parallel approximants, which yields estimators with provable properties that operate on the entirety of very large, distributed data sets. We first formalize the class of statistics which admit straightforward calculation in distributed environments through independent parallelization. We then show how to use such statistics to approximate arbitrary functional operators in appropriate spaces, yielding a general estimation framework that does not require data to reside entirely in memory. We characterize the $L^2$ approximation properties of our approach and provide fully implemented examples of sample quantile calculation and local polynomial regression in a distributed computing environment. A variety of avenues and extensions remain open for future work.

  • On the Analytic Power of Divide & Recombine (D&R)

    Proceedings of the International Conference on Statistics, Theory and Applications (ICSTA ...) · 2021-08-01

    articleOpen access1st authorCorresponding

    In D&R (aka Split & Conquer), the data are divided into subsets. The division serves as a base for analysis of big data and for data visualization. Different analytic processes are applied to the subsets that constitute a recombination of the information in the data. For big data there are three scenarios. (1) The division is based on the subject matter, e.g., financial data for 100 banks; the division is by bank, and the 100 outputs of analytic methods are further analyzed. (2) An analytic method is applied to each subset, and the outputs are recombined with a recombination method applied to get one result for all of the data. This can provide, for all if the data, estimates of parameters or more complex information such as a likelihood function. D&R research consists of finding division and recombination methods that maximize statistical accuracy. Parallel distributed environments like Hadoop and Spark provide high computational performance for (1) and ( For visualization, subsets are created by conditioning on one more variables of the analysis to create subsets of the other variables in the analysis. The subsets are displayed using the Trellis Display framework of multi-panel display. This provides a very powerful mechanism for exploratory study of multi-dimensional datasets, modeling the data, and understanding the results of analysis.

  • In Search of The Optimal Atmospheric River Index for US Precipitation: A Multifactorial Analysis

    2021-02-09 · 1 citations

    preprintOpen accessSenior author

    Atmospheric rivers (ARs) affect surface hydrometeorology in the US West Coast and Midwest. We systematically sought optimal AR indices for expressing surface precipitation impacts within the Atmospheric River Tracking Method Intercomparison Project (ARTMIP) framework. We adopted a multifactorial approach. Four factors—moisture fields, climatological thresholds, shape criteria, and temporal thresholds—collectively generated 81 West Coast AR indices and 81 Midwest indices from January 1980 to June 2017. Two moisture fields were extracted from the MERRA-2 data for ARTMIP: integrated water vapor transport (IVT) and integrated water vapor (IWV). CPC US Unified Precipitation data were used. Metrics for precipitation effects included two-way summary statistics relating the concurrence of AR and that of precipitation, per-event averaged precipitation rate, and per-event precipitation accumulation. We found that an optimal AR index for precipitation depends on the types of impact to be addressed, associated physical mechanisms in the affected regions, timing, and duration. In West Coast and Midwest, IWV-based AR indices identified the most abundant AR event time steps, most accurately associated AR to days with precipitation, and represented the presence of precipitation the best. With a lower climatological threshold, they detected the most accumulated precipitation with the longest event duration. Longer duration thresholds also led to higher accumulated precipitation, holding other factors constant. IWV-based indices are the overall choice for Midwest ARs under varying seasonal precipitation drivers. IVT-based indices suitably capture the accumulation of intense orographic precipitation on the West Coast. Indices combining IVT and IWV identify the fewest, shortest, but most intense AR precipitation episodes.

Recent grants

Frequent coauthors

Education

  • B.A., Mathematics

    Princeton

  • Ph.D., Statistics

    Yale University

Awards & honors

  • Lifetime Achievement Award in Graphics and Computing from th…
  • Parzen Prize (2016)
  • Highly Cited Researcher by the American Society for Informat…
  • Fellow of the Institute of Mathematical Statistics (1999)
  • Statistician of the Year by the Chicago Chapter of the Ameri…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with William S. Cleveland

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup