How to Use XGBoost XGBClassifier

The xgboost.XGBClassifier class provides a streamlined way to train powerful XGBoost models for classification tasks with the scikit-learn library.

This example demonstrates how to use XGBClassifier to train a model on the breast cancer dataset, showcasing the key steps involved: loading data, splitting into train/test sets, defining model parameters, training the model, and evaluating its performance.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import xgboost as xgb

# Load the breast cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define XGBClassifier model parameters
params = {
    'objective': 'binary:logistic',
    'max_depth': 3,
    'learning_rate': 0.1,
    'n_estimators': 100,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'random_state': 42
}

# Instantiate XGBClassifier with the parameters
model = xgb.XGBClassifier(**params)

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate model performance
accuracy = accuracy_score(y_test, y_pred)
confusion = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print(f"Confusion Matrix:\n{confusion}")
print(f"Classification Report:\n{report}")

In this example, we first load the breast cancer dataset using sklearn.datasets.load_breast_cancer(). We then split the data into training and test sets using sklearn.model_selection.train_test_split().

Next, we define the XGBClassifier model parameters in a dictionary. The 'objective' parameter is set to 'binary:logistic' for binary classification, and other parameters like 'max_depth', 'learning_rate', and 'n_estimators' control the model’s complexity and training process.

We instantiate the XGBClassifier with the defined parameters and train the model using the fit() method on the training data. After training, we make predictions on the test set using the predict() method.

Finally, we evaluate the model’s performance using various metrics from sklearn.metrics. We calculate the accuracy score, confusion matrix, and classification report, printing them to showcase the model’s effectiveness.

By following this example, you can quickly train an XGBoost model for binary classification tasks using the xgboost.XGBClassifier class, while maintaining control over the model’s hyperparameters and easily evaluating its performance.

See Also