The n_jobs
parameter in XGBoost controls the number of parallel threads used for training, which can significantly speed up the training process on multi-core machines.
The n_jobs
parameter determines the number of CPU cores used for parallel processing during the training of an XGBoost model.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
# Generate synthetic data
X, y = make_classification(n_samples=100000, n_features=20, n_informative=10, n_redundant=5, random_state=42)
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize the XGBoost classifier with a specific n_jobs value
model = XGBClassifier(n_jobs=4, eval_metric='logloss')
# Fit the model
model.fit(X_train, y_train)
The n_jobs
parameter is an alias for the (deprecated) nthread
parameter in the XGBoost API.
You can learn more about the nthread
parameter in the examples:
The n_jobs
parameter accepts integer values, with -1 indicating that all available cores should be used.
- The default value of
n_jobs
in XGBoost is set to the number of logical CPU cores in the system.
It’s important to note that n_jobs
only affects the training speed and has no impact on the model’s performance.
Choosing the Right “n_jobs” Value
When setting the n_jobs
parameter, there is a trade-off between training speed and resource consumption:
- Higher values can significantly reduce training time but may monopolize system resources, potentially slowing down or freezing other applications.
- Lower values (or 1) can be used to limit resource usage, but this comes at the cost of slower training.
Consider the following guidelines when choosing the n_jobs
value:
- For dedicated machines or when fast training is a priority, set
n_jobs
to -1 to use all available cores. - For shared systems or limited resources, set
n_jobs
to a lower value (e.g., half of the available cores) to balance training speed and resource usage. - Setting
n_jobs
to 1 effectively disables parallelization and may be suitable for low-priority tasks or when resources are scarce.
You can learn more about configuring the n_jobs
parameter in the example:
- Tune XGBoost “n_jobs” Parameter
- XGBoost Configure “n_jobs” for Grid Search
- XGBoost Configure “n_jobs” for Random Search
Practical Tips
- Experiment with different
n_jobs
values to find the optimal balance between training speed and resource usage for your specific setup. - Be cautious when using all available cores (
n_jobs=-1
), as it may slow down or freeze other applications during training. - Monitor system resources (CPU usage, memory) while training with different
n_jobs
values to ensure the machine remains responsive.