Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Jerome H. Friedman

Jerome H. Friedman

· Professor of StatisticsVerified

Stanford University · Statistics

Active 1955–2024

h-index103
Citations231.9k
Papers32715 last 5y
Funding$420k
See your match with Jerome H. Friedman — sign in to PhdFit.Sign in

About

Jerome H. Friedman is a Professor Emeritus of Statistics at Stanford University with over 20 years of service in the department. He is recognized as one of the world's leading researchers in statistics and data mining, with a primary research interest in machine learning. His extensive work includes publications on a wide range of data mining topics such as nearest neighbor classification, logistical regressions, and high-dimensional data analysis. Dr. Friedman has made significant contributions to the field of data science, and his research continues to influence the development of statistical methods and machine learning techniques.

Research topics

  • Computer Science
  • Machine Learning
  • Data Mining
  • Artificial Intelligence
  • Mathematics
  • Statistics
  • Programming language
  • Econometrics
  • Applied mathematics

Selected publications

  • Function Trees: Transparent Machine Learning

    arXiv (Cornell University) · 2024-03-19 · 2 citations

    preprintOpen access1st authorCorresponding

    The output of a machine learning algorithm can usually be represented by one or more multivariate functions of its input variables. Knowing the global properties of such functions can help in understanding the system that produced the data as well as interpreting and explaining corresponding model predictions. A method is presented for representing a general multivariate function as a tree of simpler functions. This tree exposes the global internal structure of the function by uncovering and describing the combined joint influences of subsets of its input variables. Given the inputs and corresponding function values, a function tree is constructed that can be used to rapidly identify and compute all of the function's main and interaction effects up to high order. Interaction effects involving up to four variables are graphically visualized.

  • Questionnaire data

    TIB Data Manager · 2024-01-01

    datasetOpen access1st authorCorresponding
  • Lockout: Sparse Regularization of Neural Networks

    arXiv (Cornell University) · 2021-07-15 · 1 citations

    preprintOpen accessSenior author

    Many regression and classification procedures fit a parameterized function $f(x;w)$ of predictor variables $x$ to data $\{x_{i},y_{i}\}_1^N$ based on some loss criterion $L(y,f)$. Often, regularization is applied to improve accuracy by placing a constraint $P(w)\leq t$ on the values of the parameters $w$. Although efficient methods exist for finding solutions to these constrained optimization problems for all values of $t\geq0$ in the special case when $f$ is a linear function, none are available when $f$ is non-linear (e.g. Neural Networks). Here we present a fast algorithm that provides all such solutions for any differentiable function $f$ and loss $L$, and any constraint $P$ that is an increasing monotone function of the absolute value of each parameter. Applications involving sparsity inducing regularization of arbitrary Neural Networks are discussed. Empirical results indicate that these sparse solutions are usually superior to their dense counterparts in both accuracy and interpretability. This improvement in accuracy can often make Neural Networks competitive with, and sometimes superior to, state-of-the-art methods in the analysis of tabular data.

  • Representational Gradient Boosting: Backpropagation in the Space of Functions

    IEEE Transactions on Pattern Analysis and Machine Intelligence · 2021-12-23 · 7 citations

    articleOpen access

    The estimation of nested functions (i.e., functions of functions) is one of the central reasons for the success and popularity of machine learning. Today, artificial neural networks are the predominant class of algorithms in this area, known as representational learning. Here, we introduce Representational Gradient Boosting (RGB), a nonparametric algorithm that estimates functions with multi-layer architectures obtained using backpropagation in the space of functions. RGB does not need to assume a functional form in the nodes or output (e.g., linear models or rectified linear units), but rather estimates these transformations. RGB can be seen as an optimized stacking procedure where a meta algorithm learns how to combine different classes of functions (e.g., Neural Networks (NN) and Gradient Boosting (GB)), while building and optimizing them jointly in an attempt to compensate each other's weaknesses. This highlights a stark difference with current approaches to meta-learning that combine models only after they have been built independently. We showed that providing optimized stacking is one of the main advantages of RGB over current approaches. Additionally, due to the nested nature of RGB we also showed how it improves over GB in problems that have several high-order interactions. Finally, we investigate both theoretically and in practice the problem of recovering nested functions and the value of prior knowledge.

  • Principal component‐guided sparse regression

    Canadian Journal of Statistics · 2021-04-16 · 5 citations

    preprintOpen access

    Abstract We propose a new method for supervised learning, the “ principal components lasso ” (“pcLasso”). It combines the lasso ( ℓ 1 ) penalty with a quadratic penalty that shrinks the coefficient vector toward the feature matrix's leading principal components (PCs). pcLasso can be especially powerful if the features are preassigned to groups. In that case, pcLasso shrinks each group‐wise component of the solution toward the leading PCs of that group. The pcLasso method also carries out selection of feature groups. We provide some theory and illustrate the method on some simulated and real data examples.

  • Lasso and Elastic-Net Regularized Generalized Linear Models [R package glmnet version 4.1-1]

    2021 · 82 citations

    1st authorCorresponding
    • Computer Science
    • Mathematics
    • Computer Science
  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction 2nd Edition

    2020 · 232 citations

    Senior authorCorresponding
    • Computer Science
    • Artificial Intelligence
    • Computer Science

    https://stars.library.ucf.edu/etextbooks/1452/thumbnail.jpg

  • Reply to Nock and Nielsen: On the work of Nock and Nielsen and its relationship to the additive tree

    Proceedings of the National Academy of Sciences · 2020-04-07

    letterOpen access

    The observation that decision trees are boosting algorithms, as cited in our work (1) and acknowledged by Nock and Nielsen (2), was first established by refs. 3 and 4. This was later used by refs. 5 and 6 to develop, to the best of our knowledge, the first decision tree algorithms based purely on boosting. This work, cited in our article, precedes refs. 7 and 8 cited by Nock and Nielsen (2). The original and important contributions of refs. 7 and 8 as they pertain to this discussion was to theoretically prove convergence rates for decision tree algorithms built with boosting, along with … [↵][1]1To whom correspondence may be addressed. Email: gilmer.valdes{at}ucsf.edu. [1]: #xref-corresp-1-1

  • Predicting Regression Probability Distributions with Imperfect Data Through Optimal Transformations

    arXiv (Cornell University) · 2020-01-27 · 2 citations

    preprintOpen access1st authorCorresponding

    The goal of regression analysis is to predict the value of a numeric outcome variable y given a vector of joint values of other (predictor) variables x. Usually a particular x-vector does not specify a repeatable value for y, but rather a probability distribution of possible y--values, p(y|x). This distribution has a location, scale and shape, all of which can depend on x, and are needed to infer likely values for y given x. Regression methods usually assume that training data y-values are perfect numeric realizations from some well behaived p(y|x). Often actual training data y-values are discrete, truncated and/or arbitrary censored. Regression procedures based on an optimal transformation strategy are presented for estimating location, scale and shape of p(y|x) as general functions of x, in the possible presence of such imperfect training data. In addition, validation diagnostics are presented to ascertain the quality of the solutions.

  • Predicting Regression Probability Distributions with Imperfect Data\n Through Optimal Transformations

    arXiv (Cornell University) · 2020-01-27 · 2 citations

    preprintOpen access1st authorCorresponding

    The goal of regression analysis is to predict the value of a numeric outcome\nvariable y given a vector of joint values of other (predictor) variables x.\nUsually a particular x-vector does not specify a repeatable value for y, but\nrather a probability distribution of possible y--values, p(y|x). This\ndistribution has a location, scale and shape, all of which can depend on x, and\nare needed to infer likely values for y given x. Regression methods usually\nassume that training data y-values are perfect numeric realizations from some\nwell behaived p(y|x). Often actual training data y-values are discrete,\ntruncated and/or arbitrary censored. Regression procedures based on an optimal\ntransformation strategy are presented for estimating location, scale and shape\nof p(y|x) as general functions of x, in the possible presence of such imperfect\ntraining data. In addition, validation diagnostics are presented to ascertain\nthe quality of the solutions.\n

Recent grants

Frequent coauthors

  • Stanley M. Flatté

    University of California, Santa Cruz

    70 shared
  • Phillip Kott

    Stanford University

    64 shared
  • Jae Lee

    64 shared
  • Wen‐Hua Ju

    64 shared
  • Patrick Tendick

    64 shared
  • Michael Friendly

    64 shared
  • Gábor J. Székely

    64 shared
  • Carlo di Lauro

    University of Naples Federico II

    64 shared

Awards & honors

  • Named the applied statistics thesis prize for our emeritus c…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Jerome H. Friedman

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup