XGBoost was neither the the first gradient boosting algorithm or the first implementation of the gradient boosting algorithm.
Precursor Algorithm
The seminal precursor algorithm to XGBoost was the Gradient Boosting Machine algorithm.
This algorithm was described by Jerome Friedman in his 2001 paper:
- Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics Vol. 29, No. 5 (Oct., 2001), pp. 1189-1232 (44 pages).
A pre-print version of the paper is available from Friedman’s personal home page:
The abstract for the paper was as follows:
Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent “boosting” paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such “TreeBoost” models are presented. Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.
Precursor Implementation
The seminal precursor implementation of gradient boosting prior to XGBoost was the “Generalized Boosted Regression Models” or gbm
package in R developed by Greg Ridgeway.
Perhaps first released in 2002, or earlier:
- Generalized boosted modeling (GBM) (internet archive)
The description of the package was as follows:
An implementation of extensions to Freund and Schapire’s AdaBoost algorithm and Friedman’s gradient boosting machine. Includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart).
An updated version of the package was created called gbm3
and is maintained by Greg Ridgeway.