Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Peter Bickel

Peter Bickel

· Principal Investigators, Professors EmeritiVerified

University of California, Berkeley · Center for Computational Biology

Active 1964–2025

h-index101
Citations56.9k
Papers47136 last 5y
Funding$24.9M
See your match with Peter Bickel — sign in to PhdFit.Sign in

About

Peter Bickel is a Professor of Statistics at the University of California, Berkeley. He has held prominent leadership roles in the field of statistics, including serving as the President of the Bernoulli Society and the Institute of Mathematical Statistics. His distinguished career has been recognized with several prestigious honors, such as being named a MacArthur Fellow and receiving the COPSS prize. Additionally, he is a member of both the American Academy of Arts and Sciences and the National Academy of Sciences. Professor Bickel has also been awarded honorary Doctorate degrees from Hebrew University, Jerusalem in 1986, and from ETH Zurich in 2014.

Research topics

  • Computer Science
  • Data Mining
  • Mathematics
  • Artificial Intelligence
  • Machine Learning
  • Algorithm
  • Pure mathematics
  • Theoretical computer science
  • Applied mathematics
  • Mathematical analysis
  • Computational biology
  • Statistics
  • Genetics
  • Geometry
  • Biology

Selected publications

  • Assessing the role of volumetric brain information in multiple sclerosis progression

    Computational and Structural Biotechnology Journal · 2025-01-01

    articleOpen access

    Multiple sclerosis is a chronic autoimmune disease that affects the central nervous system. Understanding multiple sclerosis progression and identifying the implicated brain structures is crucial for personalized treatment decisions. Deformation-based morphometry utilizes anatomical magnetic resonance imaging to quantitatively assess volumetric brain changes at the voxel level, providing insight into how each brain region contributes to clinical progression with regards to neurodegeneration. Utilizing such voxel-level data from a relapsing multiple sclerosis clinical trial, we extend a model-agnostic feature importance metric to identify a robust and predictive feature set that corresponds to clinical progression. These features correspond to brain regions that are clinically meaningful in MS disease research, demonstrating their scientific relevance. When used to predict progression using classical survival models and 3D convolutional neural networks, the identified regions led to the best-performing models, demonstrating their prognostic strength. We also find that these features generalize well to other definitions of clinical progression and can compensate for the omission of highly prognostic clinical features, underscoring the predictive power and clinical relevance of deformation-based morphometry as a regional identification tool.

  • transfactor: Transcription factor activity estimation via probabilistic gene expression deconvolution

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-03-19

    preprintOpen access

    Abstract Gene expression is the primary modality being studied to differentiate between biological cells. Contemporary single-cell studies simultaneously measure genome-wide transcription levels for thousands of individual cells in a single experiment. While the characterization of cell population differences has often occurred through differential expression analysis, tiny effect sizes become statistically significant when thousands of cells are available for each population, compromising biological interpretation. Moreover, these large studies have spurred the development of methods to infer gene regulatory networks (GRNs) directly from the data, and GRN databases are becoming more comprehensive. In this work, we propose a statistical model for gene expression measures and an inference method that leverage GRNs to deconvolve transcription factor (TF) activity from gene expression, by probabilistically assigning mRNA molecules to TFs. This shifts the paradigm from investigating gene expression differences to regulatory differences at the level of TF activity, aiding interpretation and allowing prioritization of a limited number of TFs responsible for significant contributions to the observed gene expression differences. The inferred TF activities result in intuitive prioritization of TFs in terms of the (difference in) estimated number of molecules they produce, in contrast to other widely-used methods relying on arbitrary enrichment scores. Our model allows the incorporation of prior information on the regulatory potential between each TF and target gene through prior distributions, and is able to deal with both repressing and activating interactions. We compare our approach to other TF activity estimation methods using two simulation experiments and two case studies.

  • Nonparametric Evaluation of Noisy ICA Solutions

    2024-01-01

    article

    Independent Component Analysis (ICA) was introduced in the 1980's as a model for Blind Source Separation (BSS), which refers to the process of recovering the sources underlying a mixture of signals, with little knowledge about the source signals or the mixing process. While there are many sophisticated algorithms for estimation, different methods have different shortcomings. In this paper, we develop a nonparametric score to adaptively pick the right algorithm for ICA with arbitrary Gaussian noise. The novelty of this score stems from the fact that it just assumes a finite second moment of the data and uses the characteristic function to evaluate the quality of the estimated mixing matrix without any knowledge of the parameters of the noise distribution. In addition, we propose some new contrast functions and algorithms that enjoy the same fast computability as existing algorithms like FASTICA and JADE but work in domains where the former may fail. While these also may have weaknesses, our proposed diagnostic, as shown by our simulations, can remedy them. Finally, we propose a theoretical framework to analyze the local and global convergence properties of our algorithms.

  • Dissecting Gene Expression Heterogeneity: Generalized Pearson Correlation Squares and the <i>K</i> -Lines Clustering Algorithm

    Journal of the American Statistical Association · 2024-04-15 · 2 citations

    articleOpen access

    -lines clustering algorithm in dissecting complex but interpretable relationships. The estimation and inference procedures are implemented in the R package gR2 (https://github.com/lijy03/gR2).

  • Assessing the Role of Volumetric Brain Information in Multiple Sclerosis Progression

    arXiv (Cornell University) · 2024-12-12

    preprintOpen access

    Multiple sclerosis is a chronic autoimmune disease that affects the central nervous system. Understanding multiple sclerosis progression and identifying the implicated brain structures is crucial for personalized treatment decisions. Deformation-based morphometry utilizes anatomical magnetic resonance imaging to quantitatively assess volumetric brain changes at the voxel level, providing insight into how each brain region contributes to clinical progression with regards to neurodegeneration. Utilizing such voxel-level data from a relapsing multiple sclerosis clinical trial, we extend a model-agnostic feature importance metric to identify a robust and predictive feature set that corresponds to clinical progression. These features correspond to brain regions that are clinically meaningful in MS disease research, demonstrating their scientific relevance. When used to predict progression using classical survival models and 3D convolutional neural networks, the identified regions led to the best-performing models, demonstrating their prognostic strength. We also find that these features generalize well to other definitions of clinical progression and can compensate for the omission of highly prognostic clinical features, underscoring the predictive power and clinical relevance of deformation-based morphometry as a regional identification tool.

  • Nonparametric Evaluation of Noisy ICA Solutions

    arXiv (Cornell University) · 2024-01-16

    preprintOpen access

    Independent Component Analysis (ICA) was introduced in the 1980's as a model for Blind Source Separation (BSS), which refers to the process of recovering the sources underlying a mixture of signals, with little knowledge about the source signals or the mixing process. While there are many sophisticated algorithms for estimation, different methods have different shortcomings. In this paper, we develop a nonparametric score to adaptively pick the right algorithm for ICA with arbitrary Gaussian noise. The novelty of this score stems from the fact that it just assumes a finite second moment of the data and uses the characteristic function to evaluate the quality of the estimated mixing matrix without any knowledge of the parameters of the noise distribution. In addition, we propose some new contrast functions and algorithms that enjoy the same fast computability as existing algorithms like FASTICA and JADE but work in domains where the former may fail. While these also may have weaknesses, our proposed diagnostic, as shown by our simulations, can remedy them. Finally, we propose a theoretical framework to analyze the local and global convergence properties of our algorithms.

  • Semi-parametric estimation of treatment effects in randomised experiments

    Journal of the Royal Statistical Society Series B (Statistical Methodology) · 2023-07-19 · 8 citations

    articleOpen access

    Abstract We develop new semi-parametric methods for estimating treatment effects. We focus on settings where the outcome distributions may be thick tailed, where treatment effects may be small, where sample sizes are large, and where assignment is completely random. This setting is of particular interest in recent online experimentation. We propose using parametric models for the treatment effects, leading to semi-parametric models for the outcome distributions. We derive the semi-parametric efficiency bound for the treatment effects for this setting, and propose efficient estimators. In the leading case with constant quantile treatment effects, one of the proposed efficient estimators has an interesting interpretation as a weighted average of quantile treatment effects, with the weights proportional to minus the second derivative of the log of the density of the potential outcomes. Our analysis also suggests an extension of Huber’s model and trimmed mean to include asymmetry.

  • Network Inference Using the Hub Model and Variants

    Journal of the American Statistical Association · 2023-02-22

    article

    Statistical network analysis primarily focuses on inferring the parameters of an observed network. In many applications, especially in the social sciences, the observed data is the groups formed by individual subjects. In these applications, the network is itself a parameter of a statistical model. Zhao and Weko propose a model-based approach, called the hub model, to infer implicit networks from grouping behavior. The hub model assumes that each member of the group is brought together by a member of the group called the hub. The set of members which can serve as a hub is called the hub set. The hub model belongs to the family of Bernoulli mixture models. Identifiability of Bernoulli mixture model parameters is a notoriously difficult problem. This article proves identifiability of the hub model parameters and estimation consistency under mild conditions. Furthermore, this article generalizes the hub model by introducing a model component that allows hubless groups in which individual nodes spontaneously appear independent of any other individual. We refer to this additional component as the null component. The new model bridges the gap between the hub model and the degenerate case of the mixture model—the Bernoulli product. Identifiability and consistency are also proved for the new model. In addition, a penalized likelihood approach is proposed to estimate the hub set when it is unknown. Supplementary materials for this article are available online.

  • Interpretable sensitivity analysis for balancing weights

    Journal of the Royal Statistical Society Series A (Statistics in Society) · 2023-03-22 · 18 citations

    articleOpen access

    Abstract Assessing sensitivity to unmeasured confounding is an important step in observational studies, which typically estimate effects under the assumption that all confounders are measured. In this paper, we develop a sensitivity analysis framework for balancing weights estimators, an increasingly popular approach that solves an optimization problem to obtain weights that directly minimizes covariate imbalance. In particular, we adapt a sensitivity analysis framework using the percentile bootstrap for a broad class of balancing weights estimators. We prove that the percentile bootstrap procedure can, with only minor modifications, yield valid confidence intervals for causal effects under restrictions on the level of unmeasured confounding. We also propose an amplification—a mapping from a one-dimensional sensitivity analysis to a higher dimensional sensitivity analysis—to allow for interpretable sensitivity parameters in the balancing weights framework. We illustrate our method through extensive real data examples.

  • Network Inference Using the Hub Model and Variants

    Figshare · 2023-01-01

    datasetOpen access

    Statistical network analysis primarily focuses on inferring the parameters of an observed network. In many applications, especially in the social sciences, the observed data is the groups formed by individual subjects. In these applications, the network is itself a parameter of a statistical model. Zhao and Weko (2019) propose a model-based approach, called the hub model, to infer implicit networks from grouping behavior. The hub model assumes that each member of the group is brought together by a member of the group called the hub. The set of members which can serve as a hub is called the hub set. The hub model belongs to the family of Bernoulli mixture models. Identifiability of Bernoulli mixture model parameters is a notoriously difficult problem. This paper proves identifiability of the hub model parameters and estimation consistency under mild conditions. Furthermore, this paper generalizes the hub model by introducing a model component that allows hubless groups in which individual nodes spontaneously appear independent of any other individual. We refer to this additional component as the null component. The new model bridges the gap between the hub model and the degenerate case of the mixture model – the Bernoulli product. Identifiability and consistency are also proved for the new model. In addition, a penalized likelihood approach is proposed to estimate the hub set when it is unknown.

Recent grants

Frequent coauthors

  • Rainer Friedrich

    University of Stuttgart

    53 shared
  • Aiyou Chen

    53 shared
  • Jerome Sacks

    51 shared
  • Richard A. Berk

    University of Pennsylvania

    50 shared
  • Frederic Paik Schoenberg

    University of California, Los Angeles

    50 shared
  • Elizabeth Kelly

    50 shared
  • Sallie Keller‐McNulty

    United States Census Bureau

    50 shared
  • Katherine Campbell

    University of Miami

    49 shared
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Peter Bickel

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup