XGBoost 2.0 | XGBoosting

Help

XGBoost 2.0 is a major release that brings a wide array of new features, optimizations, and improvements to the popular gradient boosting library.

This release focuses on enhancing performance, efficiency, and user experience, making it an exciting update for data scientists and machine learning practitioners.

In this article, we’ll provide an overview of the most significant changes introduced in XGBoost 2.0.

Multi-Target Trees with Vector-Leaf Outputs

One of the key developments in XGBoost 2.0 is the initial work on vector-leaf tree models for multi-target tasks, such as multi-target regression, multi-label classification, and multi-class classification.

Previously, XGBoost would build a separate model for each target. With this new feature, XGBoost can construct a single tree for all targets, potentially offering benefits like preventing overfitting, producing smaller models, and considering the correlation between targets.

New Device Parameter

XGBoost 2.0 introduces a new device parameter that simplifies specifying the device on which to run the library.

This parameter replaces the previous gpu_id, gpu_hist, gpu_predictor, cpu_predictor, and gpu_coord_descent parameters.

Users can now easily select the device using the device parameter along with the device ordinal.

Hist as Default Tree Method

Starting from version 2.0, the hist tree method becomes the default choice in XGBoost.

In earlier versions, the library would choose between approx or exact methods based on the input data and training environment.

With hist as the default, XGBoost aims to provide more efficient and consistent training across different scenarios.

Optimized Histogram Size on CPU

To help users control the memory footprint of XGBoost, version 2.0 introduces a new parameter called max_cached_hist_node.

This parameter allows users to limit the CPU cache size for histograms, preventing XGBoost from caching histograms too aggressively.

While this may impact performance, it becomes crucial when growing deep trees.

Additionally, XGBoost reduces the memory usage of the hist and approx tree methods on distributed systems by cutting the cache size in half.

Improved External Memory Support

XGBoost 2.0 brings significant improvements to the external memory support, particularly for the hist tree method.

Although still an experimental feature, the performance has been greatly enhanced by replacing the old file IO logic with memory mapping.

This change not only boosts performance but also reduces CPU memory usage. Users are encouraged to try the external memory support with the hist tree method when the memory saving by QuantileDMatrix is insufficient.

Enhancements to Learning to Rank

The learning-to-rank task in XGBoost has received a brand-new implementation with a set of advanced features.

These include a new parameter for choosing the pair construction strategy, controlling the number of samples per group, an experimental implementation of unbiased learning-to-rank, support for custom gain functions with NDCG, deterministic GPU computation, and more.

NDCG is now the default objective function, and the performance of metrics has been improved using caches.

Other Notable Updates

XGBoost 2.0 brings several other notable updates and improvements:

The base_score can now be automatically estimated based on input labels for optimal accuracy.
Quantile regression is now supported, allowing users to minimize the quantile loss.
L1 and quantile regression objectives now support the learning rate.
Users can export the quantile values used for the hist tree method.
Progress has been made on column-based split for federated learning, with support for vertical federated learning.
The PySpark integration has received optimizations and new features, such as GPU-based prediction and improved data initialization.
Input handling and performance have been enhanced, particularly for numpy and other data structures.

It’s important to note that XGBoost 2.0 introduces some breaking changes. Users are now required to specify the format for text input, and the predictor parameter has been removed. For a complete list of changes and detailed information, please refer to the official XGBoost documentation.