XGBoost Use Less Memory

Performance

Reducing memory usage when training with XGBoost involves various techniques to manage dataset size and model complexity efficiently.

Here’s a bulleted list of strategies to help decrease memory consumption:

Use Smaller Data Types:
- Convert data to more memory-efficient types, such as changing float64 to float32 or using int8 for categorical features if possible.
Limit the Depth of Trees:
- Set the max_depth parameter to a lower value. Smaller trees use less memory during both training and prediction.
Increase the Minimum Child Weight:
- Adjust the min_child_weight parameter to a higher value, which helps in creating simpler trees and reducing model size.
Utilize Feature Subsampling:
- Use colsample_bytree, colsample_bylevel, and colsample_bynode parameters to limit the number of features used in each tree, reducing the amount of memory needed for training.
Opt for Histogram-based Split Finding:
- Set tree_method to hist or gpu_hist (for GPU). These methods use histograms for split finding, which aggregate continuous feature values into discrete bins, significantly reducing memory usage.
Reduce Training Data Size:
- Apply subsample to randomly sample a fraction of the training instances for building trees. This reduces the memory footprint during training.
Use Sparse Matrices:
- If your data contains many zeros or missing values, convert it into a sparse format (like CSR), which XGBoost can handle more efficiently in terms of memory.
Adjust Cache Awareness:
- Modify the updater parameter to include or exclude certain methods like grow_colmaker,prune depending on memory constraints and dataset characteristics.
Set a Lower Number of Bins:
- Decrease the max_bin parameter to use fewer bins in histogram-based training, which can lead to significant memory savings.
Employ Quantile Sketch Algorithm:
- Configure tree_method to approx, which uses a sketching algorithm to find approximate split points, managing memory better than exact methods on large datasets.
Enable DGC for Gradient Compression:
- When using distributed training, enable Distributed Gradient Compression (dgc) to reduce memory and bandwidth used by gradient communication.
Tune Garbage Collection:
- In Python, ensure efficient garbage collection by manually triggering cleanup or optimizing memory management in your code, especially in long training loops.

Implementing these techniques can help manage memory more effectively when using XGBoost, particularly when working with large datasets or on memory-constrained systems.

See Also