Jerome H. Friedman

· Professor of StatisticsVerified

Stanford University · Statistics

Active 1955–2024

h-index103

Citations231.9k

Papers32715 last 5y

Funding$420k

Faculty page Website

See your match with Jerome H. Friedman — sign in to PhdFit.Sign in

About

Jerome H. Friedman is a Professor Emeritus of Statistics at Stanford University with over 20 years of service in the department. He is recognized as one of the world's leading researchers in statistics and data mining, with a primary research interest in machine learning. His extensive work includes publications on a wide range of data mining topics such as nearest neighbor classification, logistical regressions, and high-dimensional data analysis. Dr. Friedman has made significant contributions to the field of data science, and his research continues to influence the development of statistical methods and machine learning techniques.

Research topics

Computer Science
Machine Learning
Data Mining
Artificial Intelligence
Mathematics
Statistics
Programming language
Econometrics
Applied mathematics

Selected publications

Function Trees: Transparent Machine Learning
arXiv (Cornell University) · 2024-03-19 · 2 citations
preprintOpen access1st authorCorresponding
The output of a machine learning algorithm can usually be represented by one or more multivariate functions of its input variables. Knowing the global properties of such functions can help in understanding the system that produced the data as well as interpreting and explaining corresponding model predictions. A method is presented for representing a general multivariate function as a tree of simpler functions. This tree exposes the global internal structure of the function by uncovering and describing the combined joint influences of subsets of its input variables. Given the inputs and corresponding function values, a function tree is constructed that can be used to rapidly identify and compute all of the function's main and interaction effects up to high order. Interaction effects involving up to four variables are graphically visualized.
Publisher OA PDF DOI
Questionnaire data
TIB Data Manager · 2024-01-01
datasetOpen access1st authorCorresponding
Publisher DOI
Lockout: Sparse Regularization of Neural Networks
arXiv (Cornell University) · 2021-07-15 · 1 citations
preprintOpen accessSenior author
Many regression and classification procedures fit a parameterized function $f(x;w)$ of predictor variables $x$ to data $\{x_{i},y_{i}\}_1^N$ based on some loss criterion $L(y,f)$. Often, regularization is applied to improve accuracy by placing a constraint $P(w)\leq t$ on the values of the parameters $w$. Although efficient methods exist for finding solutions to these constrained optimization problems for all values of $t\geq0$ in the special case when $f$ is a linear function, none are available when $f$ is non-linear (e.g. Neural Networks). Here we present a fast algorithm that provides all such solutions for any differentiable function $f$ and loss $L$, and any constraint $P$ that is an increasing monotone function of the absolute value of each parameter. Applications involving sparsity inducing regularization of arbitrary Neural Networks are discussed. Empirical results indicate that these sparse solutions are usually superior to their dense counterparts in both accuracy and interpretability. This improvement in accuracy can often make Neural Networks competitive with, and sometimes superior to, state-of-the-art methods in the analysis of tabular data.
Publisher OA PDF DOI
Representational Gradient Boosting: Backpropagation in the Space of Functions
IEEE Transactions on Pattern Analysis and Machine Intelligence · 2021-12-23 · 7 citations
articleOpen access
The estimation of nested functions (i.e., functions of functions) is one of the central reasons for the success and popularity of machine learning. Today, artificial neural networks are the predominant class of algorithms in this area, known as representational learning. Here, we introduce Representational Gradient Boosting (RGB), a nonparametric algorithm that estimates functions with multi-layer architectures obtained using backpropagation in the space of functions. RGB does not need to assume a functional form in the nodes or output (e.g., linear models or rectified linear units), but rather estimates these transformations. RGB can be seen as an optimized stacking procedure where a meta algorithm learns how to combine different classes of functions (e.g., Neural Networks (NN) and Gradient Boosting (GB)), while building and optimizing them jointly in an attempt to compensate each other's weaknesses. This highlights a stark difference with current approaches to meta-learning that combine models only after they have been built independently. We showed that providing optimized stacking is one of the main advantages of RGB over current approaches. Additionally, due to the nested nature of RGB we also showed how it improves over GB in problems that have several high-order interactions. Finally, we investigate both theoretically and in practice the problem of recovering nested functions and the value of prior knowledge.
Publisher OA PDF DOI
Principal component‐guided sparse regression
Canadian Journal of Statistics · 2021-04-16 · 5 citations
preprintOpen access
Abstract We propose a new method for supervised learning, the “ principal components lasso ” (“pcLasso”). It combines the lasso ( ℓ 1 ) penalty with a quadratic penalty that shrinks the coefficient vector toward the feature matrix's leading principal components (PCs). pcLasso can be especially powerful if the features are preassigned to groups. In that case, pcLasso shrinks each group‐wise component of the solution toward the leading PCs of that group. The pcLasso method also carries out selection of feature groups. We provide some theory and illustrate the method on some simulated and real data examples.
Publisher OA PDF DOI
Lasso and Elastic-Net Regularized Generalized Linear Models [R package glmnet version 4.1-1]
2021 · 82 citations
1st authorCorresponding
- Computer Science
- Mathematics
- Computer Science
Publisher
The Elements of Statistical Learning: Data Mining, Inference, and Prediction 2nd Edition
2020 · 232 citations
Senior authorCorresponding
- Computer Science
- Artificial Intelligence
- Computer Science
https://stars.library.ucf.edu/etextbooks/1452/thumbnail.jpg
Publisher
Reply to Nock and Nielsen: On the work of Nock and Nielsen and its relationship to the additive tree
Proceedings of the National Academy of Sciences · 2020-04-07
letterOpen access
The observation that decision trees are boosting algorithms, as cited in our work (1) and acknowledged by Nock and Nielsen (2), was first established by refs. 3 and 4. This was later used by refs. 5 and 6 to develop, to the best of our knowledge, the first decision tree algorithms based purely on boosting. This work, cited in our article, precedes refs. 7 and 8 cited by Nock and Nielsen (2). The original and important contributions of refs. 7 and 8 as they pertain to this discussion was to theoretically prove convergence rates for decision tree algorithms built with boosting, along with … [↵][1]1To whom correspondence may be addressed. Email: gilmer.valdes{at}ucsf.edu. [1]: #xref-corresp-1-1
Publisher OA PDF DOI
Predicting Regression Probability Distributions with Imperfect Data Through Optimal Transformations
arXiv (Cornell University) · 2020-01-27 · 2 citations
preprintOpen access1st authorCorresponding
The goal of regression analysis is to predict the value of a numeric outcome variable y given a vector of joint values of other (predictor) variables x. Usually a particular x-vector does not specify a repeatable value for y, but rather a probability distribution of possible y--values, p(y|x). This distribution has a location, scale and shape, all of which can depend on x, and are needed to infer likely values for y given x. Regression methods usually assume that training data y-values are perfect numeric realizations from some well behaived p(y|x). Often actual training data y-values are discrete, truncated and/or arbitrary censored. Regression procedures based on an optimal transformation strategy are presented for estimating location, scale and shape of p(y|x) as general functions of x, in the possible presence of such imperfect training data. In addition, validation diagnostics are presented to ascertain the quality of the solutions.
Publisher OA PDF DOI
Predicting Regression Probability Distributions with Imperfect Data\n Through Optimal Transformations
arXiv (Cornell University) · 2020-01-27 · 2 citations
preprintOpen access1st authorCorresponding
The goal of regression analysis is to predict the value of a numeric outcome\nvariable y given a vector of joint values of other (predictor) variables x.\nUsually a particular x-vector does not specify a repeatable value for y, but\nrather a probability distribution of possible y--values, p(y|x). This\ndistribution has a location, scale and shape, all of which can depend on x, and\nare needed to infer likely values for y given x. Regression methods usually\nassume that training data y-values are perfect numeric realizations from some\nwell behaived p(y|x). Often actual training data y-values are discrete,\ntruncated and/or arbitrary censored. Regression procedures based on an optimal\ntransformation strategy are presented for estimating location, scale and shape\nof p(y|x) as general functions of x, in the possible presence of such imperfect\ntraining data. In addition, validation diagnostics are presented to ascertain\nthe quality of the solutions.\n
Publisher OA PDF DOI

Recent grants

Topics in Predictive and Descriptive Data Mining
NSF · $420k · 2002–2008

Frequent coauthors

Stanley M. Flatté
University of California, Santa Cruz
70 shared
Phillip Kott
Stanford University
64 shared
Jae Lee
64 shared
Wen‐Hua Ju
64 shared
Patrick Tendick
64 shared
Michael Friendly
64 shared
Gábor J. Székely
64 shared
Carlo di Lauro
University of Naples Federico II
64 shared

Awards & honors

Named the applied statistics thesis prize for our emeritus c…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Jerome H. Friedman

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you