XGBoost Evaluate Model using xgboost.cv() Native API

XGBoost provides a built-in function for performing k-fold cross-validation, which can simplify your code and potentially speed up the evaluation process compared to using an external library like scikit-learn. The cv() function in XGBoost’s native API makes it easy to perform cross-validation with just a few lines of code.

from sklearn.datasets import fetch_california_housing
import xgboost as xgb

# Load the California Housing dataset
X, y = fetch_california_housing(return_X_y=True)

# Create a DMatrix object from the data
data = xgb.DMatrix(X, label=y)

# Specify the XGBoost parameters
params = {
    'objective': 'reg:squarederror',
    'learning_rate': 0.1,
    'max_depth': 6,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'seed': 42

# Perform k-fold cross-validation
cv_results = xgb.cv(

# Print the cross-validation results
print(f"Best RMSE: {cv_results['test-rmse-mean'].min():.2f} at iteration {cv_results['test-rmse-mean'].idxmin()}")

Here’s what’s happening:

  1. We load the California Housing dataset and create a DMatrix object from the data, which is the data structure used by XGBoost’s native API.
  2. We specify the XGBoost parameters in a dictionary, including the objective function, learning rate, max depth, subsample, colsample_bytree, and random seed.
  3. We use xgb.cv() to perform 5-fold cross-validation, specifying the parameters, training data, number of boosting rounds, evaluation metric (RMSE), and other settings.
  4. We print the cross-validation results, which include the mean and standard deviation of the evaluation metric for each fold and iteration.
  5. Finally, we print the best RMSE score and the corresponding iteration.

By using XGBoost’s native API for cross-validation, you can take advantage of its optimized implementation and keep your code concise and focused on the XGBoost-specific configuration.

