Predict with XGBoost's scikit-learn API

XGBoost integrates seamlessly with scikit-learn, allowing you to use its familiar API for making predictions.

This example demonstrates how to train an XGBoost model using XGBClassifier or XGBRegressor and make predictions using the predict() method.

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor

# Load the California housing dataset
X, y = fetch_california_housing(return_X_y=True)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an XGBRegressor
model = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)

# Fit the model on the training data
model.fit(X_train, y_train)

# Make predictions on the test data
predictions = model.predict(X_test)

print("Predicted values:\n", predictions[:5])  # Print the first 5 predictions

The predict() method takes a feature matrix X as input and returns an array of predicted values.

For regression tasks, these are the predicted continuous values.

For classification tasks, predict() returns the predicted class labels.

When using predict() with XGBoost models in scikit-learn, keep in mind:

Ensure that the input data X has the same number of features as the data used to train the model.
If you performed any data preprocessing steps (e.g., scaling, one-hot encoding) on the training data, apply the same transformations to the input data before making predictions.
The returned predictions will be in the same order as the samples in the input data X.

Using the scikit-learn API with XGBoost provides a consistent and familiar interface for making predictions, allowing you to easily integrate XGBoost models into your existing scikit-learn workflows.

See Also