XGBoost for the Sonar Dataset

The Sonar dataset is a classic dataset used for binary classification, distinguishing between sonar signals bounced off a metal cylinder and those bounced off a roughly cylindrical rock.

In this example, we’ll load the Sonar dataset using fetch_openml from scikit-learn, perform hyperparameter tuning using GridSearchCV with common XGBoost parameters, save the best model, load it, and use it to make predictions.

from sklearn.datasets import fetch_openml
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.preprocessing import LabelEncoder
from xgboost import XGBClassifier

# Load the Sonar dataset
sonar = fetch_openml('sonar', as_frame=True)
X, y = sonar.data, sonar.target

# Print key information about the dataset
print(f"Dataset shape: {X.shape}")
print(f"Features: {sonar.feature_names}")
print(f"Target variable: {sonar.target_names}")

# Encode target variable
y = LabelEncoder().fit_transform(y)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Define parameter grid
param_grid = {
    'max_depth': [3, 4, 5],
    'learning_rate': [0.1, 0.01, 0.05],
    'n_estimators': [50, 100, 200],
    'subsample': [0.8, 1.0],
    'colsample_bytree': [0.8, 1.0]
}

# Create XGBClassifier
model = XGBClassifier(objective='binary:logistic', random_state=42, n_jobs=1)

# Perform grid search
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, n_jobs=-1)
grid_search.fit(X_train, y_train)

# Print best score and parameters
print(f"Best score: {grid_search.best_score_:.3f}")
print(f"Best parameters: {grid_search.best_params_}")

# Access best model
best_model = grid_search.best_estimator_

# Save best model
best_model.save_model('best_model_sonar.ubj')

# Load saved model
loaded_model = XGBClassifier()
loaded_model.load_model('best_model_sonar.ubj')

# Use loaded model for predictions
predictions = loaded_model.predict(X_test)

# Print accuracy score
accuracy = loaded_model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.3f}")

Running the example, you will see results similar to the following:

Dataset shape: (208, 60)
Features: ['attribute_1', 'attribute_2', 'attribute_3', 'attribute_4', 'attribute_5', 'attribute_6', 'attribute_7', 'attribute_8', 'attribute_9', 'attribute_10', 'attribute_11', 'attribute_12', 'attribute_13', 'attribute_14', 'attribute_15', 'attribute_16', 'attribute_17', 'attribute_18', 'attribute_19', 'attribute_20', 'attribute_21', 'attribute_22', 'attribute_23', 'attribute_24', 'attribute_25', 'attribute_26', 'attribute_27', 'attribute_28', 'attribute_29', 'attribute_30', 'attribute_31', 'attribute_32', 'attribute_33', 'attribute_34', 'attribute_35', 'attribute_36', 'attribute_37', 'attribute_38', 'attribute_39', 'attribute_40', 'attribute_41', 'attribute_42', 'attribute_43', 'attribute_44', 'attribute_45', 'attribute_46', 'attribute_47', 'attribute_48', 'attribute_49', 'attribute_50', 'attribute_51', 'attribute_52', 'attribute_53', 'attribute_54', 'attribute_55', 'attribute_56', 'attribute_57', 'attribute_58', 'attribute_59', 'attribute_60']
Target variable: ['Class']
Best score: 0.843
Best parameters: {'colsample_bytree': 0.8, 'learning_rate': 0.05, 'max_depth': 4, 'n_estimators': 100, 'subsample': 0.8}
Accuracy: 0.857

In this example, we load the Sonar dataset using fetch_openml, print key information about the dataset, split it into train and test sets, define a parameter grid, create an XGBClassifier, perform a grid search with cross-validation, print the best score and parameters, access and save the best model, load the saved model, make predictions, and print the accuracy score.

By following this approach, you can quickly apply XGBoost to the Sonar dataset, find the best hyperparameters, save and load the best model, and evaluate its performance.

See Also