Both the seed
and random_state
parameters in XGBoost control the random number generation, ensuring the reproducibility of results.
The seed
parameter is preferred in the native XGBoost API, while random_state
is used in the scikit-learn API.
This example demonstrates how to use both parameters and compares their functionality with the scikit-learn API.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create two XGBoost classifiers, one using "seed" and the other using "random_state"
model_seed = XGBClassifier(seed=42, eval_metric='logloss')
model_random_state = XGBClassifier(random_state=42, eval_metric='logloss')
# Train both models on the training set
model_seed.fit(X_train, y_train)
model_random_state.fit(X_train, y_train)
# Make predictions on the test set
predictions_seed = model_seed.predict(X_test)
predictions_random_state = model_random_state.predict(X_test)
# Compare the results
assert (predictions_seed == predictions_random_state).all()
This example demonstrates how to use both parameters and compares their functionality with the native XGBoost API.
import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Convert data to DMatrix, which is a native XGBoost data structure
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
# Set up parameters for XGBoost
params = {
'objective': 'binary:logistic',
'eval_metric': 'logloss',
'seed': 42 # Using 'seed' here
}
# Train the model using 'seed'
model_seed = xgb.train(params, dtrain, num_boost_round=10)
# Adjust params to use 'random_state'
params = {
'objective': 'binary:logistic',
'eval_metric': 'logloss',
'random_state': 42 # Using 'random_state' here
}
# Train the model using 'random_state'
model_random_state = xgb.train(params, dtrain, num_boost_round=10)
# Make predictions on the test set
predictions_seed = model_seed.predict(dtest).round()
predictions_random_state = model_random_state.predict(dtest).round()
# Compare the results
assert (predictions_seed == predictions_random_state).all()
The seed
and random_state
parameters serve the same purpose in XGBoost, controlling the random number generation to ensure reproducibility of results. The main difference between the two is the API in which they are used. The seed
parameter is used in the native XGBoost API, while random_state
is used in the scikit-learn API, conforming to the scikit-learn convention and making it more familiar to users of scikit-learn.
When working with XGBoost, it is recommended to use seed
when using the native XGBoost API and random_state
when using the scikit-learn API.
Set these parameters to a fixed value to ensure the reproducibility of results across different runs. The choice between seed
and random_state
ultimately depends on the API being used and personal preference.