XGBoost Compare "num_boost_round" vs "n_estimators" Parameters

Parameters

When configuring an XGBoost model, you’ll come across two parameters that seem to do the same thing: num_boost_round and n_estimators.

Both of these parameters control the number of boosting rounds (or iterations) performed by the XGBoost algorithm.

Understanding the difference between them is crucial for effectively configuring your XGBoost models.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import xgboost as xgb

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, n_features=10, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert data to DMatrix format
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test)

# Initialize the scikit-learn XGBoost classifier
model_sklearn = xgb.XGBClassifier(n_estimators=100)

# Fit both models on the training data
model_sklearn.fit(X_train, y_train)
model_native = xgb.train({'objective': 'binary:logistic'}, dtrain, num_boost_round=100)

# Make predictions using both models on the test data
pred_sklearn = model_sklearn.predict(X_test)
pred_native = model_native.predict(dtest)
pred_native = pred_native.round()

# Print the accuracy scores for both models
print(f"Accuracy (scikit-learn API): {accuracy_score(y_test, pred_sklearn):.4f}")
print(f"Accuracy (native API): {accuracy_score(y_test, pred_native):.4f}")

The num_boost_round and n_estimators parameters both determine the number of boosting iterations performed by XGBoost. Each boosting round adds a new tree to the ensemble, incrementally improving the model’s performance. Increasing the number of boosting rounds can lead to better performance but may also increase the risk of overfitting and lengthen training time.

The main difference between these parameters lies in the API being used:

num_boost_round is used when working with the native XGBoost API.
n_estimators is used when working with the scikit-learn API.

Using a parameter with the wrong API will result in a warning:

WARNING: Parameters: { "num_boost_round" } are not used.

Or:

WARNING: Parameters: { "n_estimators" } are not used.

Therefore, the choice between these parameters depends on the API you are using and does not affect the model’s performance. It’s important to note that the value assigned to either parameter should be the same to achieve equivalent results.

When setting the number of boosting rounds, it’s recommended to start with a low number (e.g., 50-100) and gradually increase it while monitoring performance on a validation set. You can also use early stopping to prevent overfitting by specifying a validation set and the early_stopping_rounds parameter. Cross-validation can help find the optimal number of boosting rounds for the given dataset and problem.

Keep in mind that the optimal number of boosting rounds may vary depending on the dataset size, complexity, and noise level. The exact relationship between the number of boosting rounds and the dataset characteristics or problem domain varies on a case-by-case basis.

See Also