XGBoost Evaluate Model using Train-Test Split

Evaluate

Evaluating your XGBoost model’s performance is crucial to ensure it generalizes well to unseen data and isn’t overfitting to your training set.

The train/test split is a standard method for assessing model performance by holding out a portion of your data for testing.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from xgboost import XGBClassifier

# Load the iris dataset
X, y = load_iris(return_X_y=True)

# Create an XGBClassifier
model = XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model.fit(X_train, y_train)

# Make predictions on test data
y_pred = model.predict(X_test)

# Calculate evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='macro')
recall = recall_score(y_test, y_pred, average='macro')
f1 = f1_score(y_test, y_pred, average='macro')

# Print the evaluation metrics
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-score: {f1:.2f}")

Here’s what’s happening:

We load the iris dataset and create an XGBClassifier with specified hyperparameters.
We split the data into training and testing sets using train_test_split(), with 20% of the data reserved for testing.
We train the model on the training data using fit().
We make predictions on the test data using predict().
We calculate evaluation metrics (accuracy, precision, recall, F1-score) using functions from sklearn.metrics.
We print the evaluation metrics to get a sense of how well our model is performing.

By evaluating your model on held-out test data, you get a more realistic assessment of how it will perform on new, unseen data. This helps detect overfitting and guides model selection and hyperparameter tuning.

See Also