XGBoost Compare "seed" vs "random_state" Parameters

Parameters

Both the seed and random_state parameters in XGBoost control the random number generation, ensuring the reproducibility of results.

The seed parameter is preferred in the native XGBoost API, while random_state is used in the scikit-learn API.

This example demonstrates how to use both parameters and compares their functionality with the scikit-learn API.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create two XGBoost classifiers, one using "seed" and the other using "random_state"
model_seed = XGBClassifier(seed=42, eval_metric='logloss')
model_random_state = XGBClassifier(random_state=42, eval_metric='logloss')

# Train both models on the training set
model_seed.fit(X_train, y_train)
model_random_state.fit(X_train, y_train)

# Make predictions on the test set
predictions_seed = model_seed.predict(X_test)
predictions_random_state = model_random_state.predict(X_test)

# Compare the results
assert (predictions_seed == predictions_random_state).all()

This example demonstrates how to use both parameters and compares their functionality with the native XGBoost API.

import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert data to DMatrix, which is a native XGBoost data structure
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Set up parameters for XGBoost
params = {
    'objective': 'binary:logistic',
    'eval_metric': 'logloss',
    'seed': 42  # Using 'seed' here
}

# Train the model using 'seed'
model_seed = xgb.train(params, dtrain, num_boost_round=10)

# Adjust params to use 'random_state'
params = {
    'objective': 'binary:logistic',
    'eval_metric': 'logloss',
    'random_state': 42  # Using 'random_state' here
}

# Train the model using 'random_state'
model_random_state = xgb.train(params, dtrain, num_boost_round=10)

# Make predictions on the test set
predictions_seed = model_seed.predict(dtest).round()
predictions_random_state = model_random_state.predict(dtest).round()

# Compare the results
assert (predictions_seed == predictions_random_state).all()

The seed and random_state parameters serve the same purpose in XGBoost, controlling the random number generation to ensure reproducibility of results. The main difference between the two is the API in which they are used. The seed parameter is used in the native XGBoost API, while random_state is used in the scikit-learn API, conforming to the scikit-learn convention and making it more familiar to users of scikit-learn.

When working with XGBoost, it is recommended to use seed when using the native XGBoost API and random_state when using the scikit-learn API.

Set these parameters to a fixed value to ensure the reproducibility of results across different runs. The choice between seed and random_state ultimately depends on the API being used and personal preference.

See Also