What is XGBoost Fast?

Background

XGBoost, the eXtreme Gradient Boosting algorithm, is renowned for its exceptional speed compared to other gradient boosting implementations.

This performance advantage has made XGBoost a popular choice among data scientists and machine learning practitioners.

Understanding the key optimizations that contribute to XGBoost’s speed can help you appreciate its efficiency and make informed decisions when selecting algorithms for your projects.

Sparse Aware Implementation

One of the primary reasons for XGBoost’s speed is its ability to efficiently handle sparse data. In many real-world datasets, a significant portion of the features may have zero values. XGBoost leverages this sparsity by using a sparse aware implementation, which avoids unnecessary calculations involving zero values. This optimization greatly reduces computation time, especially when dealing with high-dimensional datasets.

Weighted Quantile Sketch

Finding the optimal split points is a crucial step in building decision trees, which are the building blocks of gradient boosting. XGBoost employs a novel approach called the weighted quantile sketch to approximate the best split points. This algorithm is significantly faster than the exact greedy algorithm used in traditional gradient boosting implementations. By using approximations, XGBoost sacrifices a small amount of accuracy for a substantial speed-up in the split finding process.

Column Block for Parallel Learning

XGBoost introduces the concept of column blocks to enable parallel learning. It divides the feature columns into multiple blocks and builds histograms for each block independently. These histograms are then used to determine the optimal split points. By processing the blocks in parallel, XGBoost can take advantage of multi-core processors and distribute the workload, resulting in faster training times.

Cache-Aware Access

Another optimization technique employed by XGBoost is cache-aware access. During the split finding process, XGBoost stores gradient statistics in the cache, minimizing the need for expensive main memory accesses. By designing the algorithm to be cache-aware, XGBoost reduces cache misses and improves overall performance. This is particularly beneficial when working with large datasets that don’t fit entirely in the cache.

Blocks for Out-of-Core Computation

XGBoost also includes support for out-of-core computation, which allows it to handle datasets that are too large to fit into memory. It achieves this by loading the data into memory one block at a time. This block-wise loading mechanism enables XGBoost to train on massive datasets that would otherwise be infeasible to process on a single machine. Out-of-core computation extends XGBoost’s scalability and makes it suitable for big data applications.

Additional Optimizations