XGBoosting Home | About | Contact | Examples

Configure XGBoost Model with Parameters Defined in a dict

The XGBoost library provides a scikit-learn compatible API, allowing users to integrate XGBoost models seamlessly with scikit-learn’s tools and workflows.

This example demonstrates how to configure an XGBoost model using the scikit-learn API, with the model’s hyperparameters defined in a dictionary. B

By leveraging the scikit-learn API, you can take advantage of its familiar interface and utility functions while still benefiting from XGBoost’s performance and flexibility.

Storing model hyperparameters in a dict allows for the configuration to be easily reviewed, modified, and stored.

import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, cross_val_score
from xgboost import XGBClassifier

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the XGBoost parameters in a dictionary
params = {
    'objective': 'binary:logistic',
    'max_depth': 3,
    'learning_rate': 0.1,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'seed': 42
}

# Create an instance of the XGBClassifier with the specified parameters
model = XGBClassifier(**params)

# Perform 5-fold cross-validation and print the mean accuracy
cv_scores = cross_val_score(model, X_train, y_train, cv=5)
print(f"Mean cross-validation accuracy: {cv_scores.mean():.4f}")

# Train the XGBoost model on the full training set
model.fit(X_train, y_train)

# Evaluate the model's performance on the test set
accuracy = model.score(X_test, y_test)
print(f"Test accuracy: {accuracy:.4f}")

In this example:

  1. We generate a synthetic binary classification dataset using scikit-learn’s make_classification function.

  2. We split the data into training and test sets using train_test_split.

  3. We define the XGBoost parameters in a dictionary called params. This dictionary includes the objective function, maximum depth, learning rate, subsample ratio, colsample by tree, and random seed.

  4. We create an instance of the XGBClassifier with the specified parameters using the double-asterisk (**) operator to unpack the dictionary.

  5. We perform 5-fold cross-validation using scikit-learn’s cross_val_score function to assess the model’s performance and print the mean accuracy across the folds.

  6. We train the XGBoost model on the full training set using the fit method.

  7. Finally, we evaluate the model’s performance on the test set using the score method, which returns the accuracy.

By utilizing the scikit-learn API for XGBoost, you can leverage the familiar and consistent interface provided by scikit-learn while still harnessing the power of XGBoost. This approach allows for seamless integration with other scikit-learn tools and simplifies the process of model evaluation and selection.



See Also