Configure XGBoost "n_jobs" Parameter

The n_jobs parameter in XGBoost controls the number of parallel threads used for training, which can significantly speed up the training process on multi-core machines.

The n_jobs parameter determines the number of CPU cores used for parallel processing during the training of an XGBoost model.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Generate synthetic data
X, y = make_classification(n_samples=100000, n_features=20, n_informative=10, n_redundant=5, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the XGBoost classifier with a specific n_jobs value
model = XGBClassifier(n_jobs=4, eval_metric='logloss')

# Fit the model
model.fit(X_train, y_train)

The n_jobs parameter is an alias for the (deprecated) nthread parameter in the XGBoost API.

You can learn more about the nthread parameter in the examples:

The n_jobs parameter accepts integer values, with -1 indicating that all available cores should be used.

The default value of n_jobs in XGBoost is set to the number of logical CPU cores in the system.

It’s important to note that n_jobs only affects the training speed and has no impact on the model’s performance.

Choosing the Right “n_jobs” Value

When setting the n_jobs parameter, there is a trade-off between training speed and resource consumption:

Higher values can significantly reduce training time but may monopolize system resources, potentially slowing down or freezing other applications.
Lower values (or 1) can be used to limit resource usage, but this comes at the cost of slower training.

Consider the following guidelines when choosing the n_jobs value:

For dedicated machines or when fast training is a priority, set n_jobs to -1 to use all available cores.
For shared systems or limited resources, set n_jobs to a lower value (e.g., half of the available cores) to balance training speed and resource usage.
Setting n_jobs to 1 effectively disables parallelization and may be suitable for low-priority tasks or when resources are scarce.

You can learn more about configuring the n_jobs parameter in the example:

Practical Tips

Experiment with different n_jobs values to find the optimal balance between training speed and resource usage for your specific setup.
Be cautious when using all available cores (n_jobs=-1), as it may slow down or freeze other applications during training.
Monitor system resources (CPU usage, memory) while training with different n_jobs values to ensure the machine remains responsive.

Choosing the Right “n_jobs” Value

Practical Tips

See Also