How to Use XGBoost XGBRFClassifier

The xgboost.XGBRFClassifier class is XGBoost’s implementation of the random forest algorithm for classification tasks. It combines the power of ensemble learning with the efficiency and performance of XGBoost, making it a compelling choice for many classification problems.

This example demonstrates how to use XGBRFClassifier to train a model on the iris dataset, a classic multi-class classification task. We’ll cover the key steps involved: loading data, splitting into train/test sets, defining model parameters, training the model, and evaluating its performance.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import xgboost as xgb

# Load the iris dataset
data = load_iris()
X, y = data.data, data.target

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define XGBRFClassifier model parameters
params = {
    'n_estimators': 100,
    'subsample': 0.8,
    'colsample_bynode': 0.8,
    'max_depth': 3,
    'random_state': 42
}

# Instantiate XGBRFClassifier with the parameters
model = xgb.XGBRFClassifier(**params)

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate model performance
accuracy = accuracy_score(y_test, y_pred)
confusion = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print(f"Confusion Matrix:\n{confusion}")
print(f"Classification Report:\n{report}")

In this example, we begin by loading the iris dataset using sklearn.datasets.load_iris(). We then split the data into training and test sets with sklearn.model_selection.train_test_split().

Next, we define the XGBRFClassifier model parameters in a dictionary. The 'n_estimators' parameter controls the number of trees in the forest, while 'subsample' and 'colsample_bynode' introduce randomness by sampling observations and features, respectively. The 'max_depth' parameter limits the depth of each tree.

We create an instance of the XGBRFClassifier with the defined parameters and train the model using the fit() method on the training data. After training, we make predictions on the test set using the predict() method.

Finally, we evaluate the model’s performance using metrics from sklearn.metrics. We calculate the accuracy score, generate the confusion matrix, and print the classification report. These metrics provide insights into the model’s effectiveness in classifying the iris species.

By following this example, you can easily train an XGBoost random forest classifier using the xgboost.XGBRFClassifier class, while controlling the model’s hyperparameters and evaluating its performance on a multi-class classification task.

See Also