Voting Ensemble With an XGBoost Model

Ensemble methods combine multiple diverse models to make predictions, often resulting in improved performance compared to individual models. In this example, we demonstrate how to use XGBoost as part of a VotingClassifier ensemble in scikit-learn.

By combining XGBoost with other diverse models like KNN and SVM in a voting ensemble, we can potentially leverage the strengths of each model and improve overall classification performance.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, n_features=20, n_informative=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define individual models
xgb = XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
knn = KNeighborsClassifier(n_neighbors=5)
svc = SVC(kernel='rbf', probability=True, random_state=42)

# Create the voting ensemble
voting_ensemble = VotingClassifier(estimators=[('xgb', xgb), ('knn', knn), ('svc', svc)], voting='soft')

# Train and evaluate individual models
xgb.fit(X_train, y_train)
knn.fit(X_train, y_train)
svc.fit(X_train, y_train)

y_pred_xgb = xgb.predict(X_test)
y_pred_knn = knn.predict(X_test)
y_pred_svc = svc.predict(X_test)

accuracy_xgb = accuracy_score(y_test, y_pred_xgb)
accuracy_knn = accuracy_score(y_test, y_pred_knn)
accuracy_svc = accuracy_score(y_test, y_pred_svc)
print(f'XGB: {accuracy_xgb}')
print(f'KNN: {accuracy_knn}')
print(f'SVC: {accuracy_svc}')

# Train and evaluate the voting ensemble
voting_ensemble.fit(X_train, y_train)
y_pred_voting = voting_ensemble.predict(X_test)
accuracy_voting = accuracy_score(y_test, y_pred_voting)
print(f'Voting Ensemble: {accuracy_voting}')

# Visualize the performance comparison
models = ['XGBoost', 'KNN', 'SVC', 'Voting Ensemble']
accuracies = [accuracy_xgb, accuracy_knn, accuracy_svc, accuracy_voting]

plt.figure(figsize=(8, 6))
plt.bar(models, accuracies)
plt.title('Model Performance Comparison')
plt.xlabel('Model')
plt.ylabel('Accuracy')
plt.ylim(0.8, 1.0)
plt.show()

The plot may look like the following:

xgboost plot voting ensemble

In this example, we generate a synthetic binary classification dataset using scikit-learn’s make_classification function and split it into train and test sets.

We define three individual models: XGBoost, k-Nearest Neighbor, and SVM. Then, we create a VotingClassifier ensemble with these models, using the ‘soft’ voting strategy, which uses the predicted probabilities to make the final prediction.

We train each individual model on the training data and evaluate their performance on the test set using accuracy as the metric. We also train and evaluate the voting ensemble.

Finally, we visualize the performance comparison using a bar plot, which shows the accuracies of the individual models and the voting ensemble.

By incorporating XGBoost into a voting ensemble alongside other diverse models, we can potentially improve the overall classification performance compared to using XGBoost or any other model individually. The voting ensemble leverages the collective wisdom of the models to make more robust predictions.

See Also