
Jerome H. Friedman
· Professor of StatisticsVerifiedStanford University · Statistics
Active 1955–2024
About
Jerome H. Friedman is a Professor Emeritus of Statistics at Stanford University with over 20 years of service in the department. He is recognized as one of the world's leading researchers in statistics and data mining, with a primary research interest in machine learning. His extensive work includes publications on a wide range of data mining topics such as nearest neighbor classification, logistical regressions, and high-dimensional data analysis. Dr. Friedman has made significant contributions to the field of data science, and his research continues to influence the development of statistical methods and machine learning techniques.
Research topics
- Computer Science
- Machine Learning
- Data Mining
- Artificial Intelligence
- Mathematics
- Statistics
- Programming language
- Econometrics
- Applied mathematics
Selected publications
Function Trees: Transparent Machine Learning
arXiv (Cornell University) · 2024-03-19 · 2 citations
preprintOpen access1st authorCorrespondingThe output of a machine learning algorithm can usually be represented by one or more multivariate functions of its input variables. Knowing the global properties of such functions can help in understanding the system that produced the data as well as interpreting and explaining corresponding model predictions. A method is presented for representing a general multivariate function as a tree of simpler functions. This tree exposes the global internal structure of the function by uncovering and describing the combined joint influences of subsets of its input variables. Given the inputs and corresponding function values, a function tree is constructed that can be used to rapidly identify and compute all of the function's main and interaction effects up to high order. Interaction effects involving up to four variables are graphically visualized.
TIB Data Manager · 2024-01-01
datasetOpen access1st authorCorrespondingLockout: Sparse Regularization of Neural Networks
arXiv (Cornell University) · 2021-07-15 · 1 citations
preprintOpen accessSenior authorMany regression and classification procedures fit a parameterized function $f(x;w)$ of predictor variables $x$ to data $\{x_{i},y_{i}\}_1^N$ based on some loss criterion $L(y,f)$. Often, regularization is applied to improve accuracy by placing a constraint $P(w)\leq t$ on the values of the parameters $w$. Although efficient methods exist for finding solutions to these constrained optimization problems for all values of $t\geq0$ in the special case when $f$ is a linear function, none are available when $f$ is non-linear (e.g. Neural Networks). Here we present a fast algorithm that provides all such solutions for any differentiable function $f$ and loss $L$, and any constraint $P$ that is an increasing monotone function of the absolute value of each parameter. Applications involving sparsity inducing regularization of arbitrary Neural Networks are discussed. Empirical results indicate that these sparse solutions are usually superior to their dense counterparts in both accuracy and interpretability. This improvement in accuracy can often make Neural Networks competitive with, and sometimes superior to, state-of-the-art methods in the analysis of tabular data.
Representational Gradient Boosting: Backpropagation in the Space of Functions
IEEE Transactions on Pattern Analysis and Machine Intelligence · 2021-12-23 · 7 citations
articleOpen accessThe estimation of nested functions (i.e., functions of functions) is one of the central reasons for the success and popularity of machine learning. Today, artificial neural networks are the predominant class of algorithms in this area, known as representational learning. Here, we introduce Representational Gradient Boosting (RGB), a nonparametric algorithm that estimates functions with multi-layer architectures obtained using backpropagation in the space of functions. RGB does not need to assume a functional form in the nodes or output (e.g., linear models or rectified linear units), but rather estimates these transformations. RGB can be seen as an optimized stacking procedure where a meta algorithm learns how to combine different classes of functions (e.g., Neural Networks (NN) and Gradient Boosting (GB)), while building and optimizing them jointly in an attempt to compensate each other's weaknesses. This highlights a stark difference with current approaches to meta-learning that combine models only after they have been built independently. We showed that providing optimized stacking is one of the main advantages of RGB over current approaches. Additionally, due to the nested nature of RGB we also showed how it improves over GB in problems that have several high-order interactions. Finally, we investigate both theoretically and in practice the problem of recovering nested functions and the value of prior knowledge.
Principal component‐guided sparse regression
Canadian Journal of Statistics · 2021-04-16 · 5 citations
preprintOpen accessAbstract We propose a new method for supervised learning, the “ principal components lasso ” (“pcLasso”). It combines the lasso ( ℓ 1 ) penalty with a quadratic penalty that shrinks the coefficient vector toward the feature matrix's leading principal components (PCs). pcLasso can be especially powerful if the features are preassigned to groups. In that case, pcLasso shrinks each group‐wise component of the solution toward the leading PCs of that group. The pcLasso method also carries out selection of feature groups. We provide some theory and illustrate the method on some simulated and real data examples.
Lasso and Elastic-Net Regularized Generalized Linear Models [R package glmnet version 4.1-1]
2021 · 82 citations
1st authorCorresponding- Computer Science
- Mathematics
- Computer Science
The Elements of Statistical Learning: Data Mining, Inference, and Prediction 2nd Edition
2020 · 232 citations
Senior authorCorresponding- Computer Science
- Artificial Intelligence
- Computer Science
https://stars.library.ucf.edu/etextbooks/1452/thumbnail.jpg
Reply to Nock and Nielsen: On the work of Nock and Nielsen and its relationship to the additive tree
Proceedings of the National Academy of Sciences · 2020-04-07
letterOpen accessThe observation that decision trees are boosting algorithms, as cited in our work (1) and acknowledged by Nock and Nielsen (2), was first established by refs. 3 and 4. This was later used by refs. 5 and 6 to develop, to the best of our knowledge, the first decision tree algorithms based purely on boosting. This work, cited in our article, precedes refs. 7 and 8 cited by Nock and Nielsen (2). The original and important contributions of refs. 7 and 8 as they pertain to this discussion was to theoretically prove convergence rates for decision tree algorithms built with boosting, along with … [↵][1]1To whom correspondence may be addressed. Email: gilmer.valdes{at}ucsf.edu. [1]: #xref-corresp-1-1
Predicting Regression Probability Distributions with Imperfect Data Through Optimal Transformations
arXiv (Cornell University) · 2020-01-27 · 2 citations
preprintOpen access1st authorCorrespondingThe goal of regression analysis is to predict the value of a numeric outcome variable y given a vector of joint values of other (predictor) variables x. Usually a particular x-vector does not specify a repeatable value for y, but rather a probability distribution of possible y--values, p(y|x). This distribution has a location, scale and shape, all of which can depend on x, and are needed to infer likely values for y given x. Regression methods usually assume that training data y-values are perfect numeric realizations from some well behaived p(y|x). Often actual training data y-values are discrete, truncated and/or arbitrary censored. Regression procedures based on an optimal transformation strategy are presented for estimating location, scale and shape of p(y|x) as general functions of x, in the possible presence of such imperfect training data. In addition, validation diagnostics are presented to ascertain the quality of the solutions.
arXiv (Cornell University) · 2020-01-27 · 2 citations
preprintOpen access1st authorCorrespondingThe goal of regression analysis is to predict the value of a numeric outcome\nvariable y given a vector of joint values of other (predictor) variables x.\nUsually a particular x-vector does not specify a repeatable value for y, but\nrather a probability distribution of possible y--values, p(y|x). This\ndistribution has a location, scale and shape, all of which can depend on x, and\nare needed to infer likely values for y given x. Regression methods usually\nassume that training data y-values are perfect numeric realizations from some\nwell behaived p(y|x). Often actual training data y-values are discrete,\ntruncated and/or arbitrary censored. Regression procedures based on an optimal\ntransformation strategy are presented for estimating location, scale and shape\nof p(y|x) as general functions of x, in the possible presence of such imperfect\ntraining data. In addition, validation diagnostics are presented to ascertain\nthe quality of the solutions.\n
Recent grants
Topics in Predictive and Descriptive Data Mining
NSF · $420k · 2002–2008
Frequent coauthors
- 70 shared
Stanley M. Flatté
University of California, Santa Cruz
- 64 shared
Phillip Kott
Stanford University
- 64 shared
Jae Lee
- 64 shared
Wen‐Hua Ju
- 64 shared
Patrick Tendick
- 64 shared
Michael Friendly
- 64 shared
Gábor J. Székely
- 64 shared
Carlo di Lauro
University of Naples Federico II
Awards & honors
- Named the applied statistics thesis prize for our emeritus c…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Jerome H. Friedman
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup