Evaluating your XGBoost model’s performance is crucial to ensure it generalizes well to unseen data and isn’t overfitting to your training set.
The train/test split is a standard method for assessing model performance by holding out a portion of your data for testing.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from xgboost import XGBClassifier
# Load the iris dataset
X, y = load_iris(return_X_y=True)
# Create an XGBClassifier
model = XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train the model
model.fit(X_train, y_train)
# Make predictions on test data
y_pred = model.predict(X_test)
# Calculate evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='macro')
recall = recall_score(y_test, y_pred, average='macro')
f1 = f1_score(y_test, y_pred, average='macro')
# Print the evaluation metrics
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-score: {f1:.2f}")
Here’s what’s happening:
- We load the iris dataset and create an XGBClassifier with specified hyperparameters.
- We split the data into training and testing sets using
train_test_split()
, with 20% of the data reserved for testing. - We train the model on the training data using
fit()
. - We make predictions on the test data using
predict()
. - We calculate evaluation metrics (accuracy, precision, recall, F1-score) using functions from
sklearn.metrics
. - We print the evaluation metrics to get a sense of how well our model is performing.
By evaluating your model on held-out test data, you get a more realistic assessment of how it will perform on new, unseen data. This helps detect overfitting and guides model selection and hyperparameter tuning.