The nthread parameter in XGBoost controls the number of parallel threads used for training, which can significantly speed up the training process on multi-core machines.
This parameter is an alias for the n_jobs parameter.
The nthread parameter was deprecated in 2017 in favor of the n_jobs parameter
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
# Generate synthetic data
X, y = make_classification(n_samples=100000, n_features=20, n_informative=10, n_redundant=5, random_state=42)
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize the XGBoost classifier with a specific nthread value
model = XGBClassifier(nthread=4, eval_metric='logloss')
# Fit the model
model.fit(X_train, y_train)
The nthread parameter determines the number of CPU cores used for parallel processing during the training of an XGBoost model.
It accepts integer values, with -1 indicating that all available cores should be used.
The default value of nthread in XGBoost is -1, which means all available CPU cores are used by default. It’s important to note that nthread only affects the training speed and has no impact on the model’s performance.
When setting the nthread parameter, there is a trade-off between training speed and resource consumption:
- Higher values can significantly reduce training time but may monopolize system resources, potentially slowing down or freezing other applications.
- Lower values (or 1) can be used to limit resource usage, but this comes at the cost of slower training.
Consider the following guidelines when choosing the nthread value:
- For dedicated machines or when fast training is a priority, set
nthreadto -1 to use all available cores. - For shared systems or limited resources, set
nthreadto a lower value (e.g., half of the available cores) to balance training speed and resource usage. - Setting
nthreadto 1 effectively disables parallelization and may be suitable for low-priority tasks or when resources are scarce.
When working with the nthread parameter, keep these practical tips in mind:
- Experiment with different
nthreadvalues to find the optimal balance between training speed and resource usage for your specific setup. - Be cautious when using all available cores (
nthread=-1), as it may slow down or freeze other applications during training. - Monitor system resources (CPU usage, memory) while training with different
nthreadvalues to ensure the machine remains responsive.