The nthread
parameter in XGBoost controls the number of parallel threads used for training, which can significantly speed up the training process on multi-core machines.
This parameter is an alias for the n_jobs
parameter.
The nthread
parameter was deprecated in 2017 in favor of the n_jobs
parameter
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
# Generate synthetic data
X, y = make_classification(n_samples=100000, n_features=20, n_informative=10, n_redundant=5, random_state=42)
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize the XGBoost classifier with a specific nthread value
model = XGBClassifier(nthread=4, eval_metric='logloss')
# Fit the model
model.fit(X_train, y_train)
The nthread
parameter determines the number of CPU cores used for parallel processing during the training of an XGBoost model.
It accepts integer values, with -1 indicating that all available cores should be used.
The default value of nthread
in XGBoost is -1, which means all available CPU cores are used by default. It’s important to note that nthread
only affects the training speed and has no impact on the model’s performance.
When setting the nthread
parameter, there is a trade-off between training speed and resource consumption:
- Higher values can significantly reduce training time but may monopolize system resources, potentially slowing down or freezing other applications.
- Lower values (or 1) can be used to limit resource usage, but this comes at the cost of slower training.
Consider the following guidelines when choosing the nthread
value:
- For dedicated machines or when fast training is a priority, set
nthread
to -1 to use all available cores. - For shared systems or limited resources, set
nthread
to a lower value (e.g., half of the available cores) to balance training speed and resource usage. - Setting
nthread
to 1 effectively disables parallelization and may be suitable for low-priority tasks or when resources are scarce.
When working with the nthread
parameter, keep these practical tips in mind:
- Experiment with different
nthread
values to find the optimal balance between training speed and resource usage for your specific setup. - Be cautious when using all available cores (
nthread=-1
), as it may slow down or freeze other applications during training. - Monitor system resources (CPU usage, memory) while training with different
nthread
values to ensure the machine remains responsive.