Measuring the execution time of XGBoost predictions is crucial for understanding the performance of your model and optimizing its efficiency.
In this example, we’ll demonstrate how to use the time.perf_counter()
function to measure prediction times and compare the performance of single-threaded and multi-threaded predictions.
import time
from sklearn.datasets import make_classification
from xgboost import XGBClassifier
# Generate a synthetic dataset for binary classification
X, y = make_classification(n_samples=1000000, n_features=20, n_informative=10, n_redundant=5, random_state=42)
# Initialize two XGBClassifier models with different numbers of threads
model_single_thread = XGBClassifier(n_jobs=1, nthread=1, random_state=42)
model_multi_thread = XGBClassifier(n_jobs=-1, nthread=4, random_state=42)
# Train both models on the generated dataset
model_single_thread.fit(X, y)
model_multi_thread.fit(X, y)
# Measure prediction time for the single-threaded model
start_time = time.perf_counter()
predictions_single_thread = model_single_thread.predict(X)
end_time = time.perf_counter()
single_thread_time = end_time - start_time
# Measure prediction time for the multi-threaded model
start_time = time.perf_counter()
predictions_multi_thread = model_multi_thread.predict(X)
end_time = time.perf_counter()
multi_thread_time = end_time - start_time
# Print the prediction times
print(f"Single-threaded prediction time: {single_thread_time:.3f} seconds")
print(f"Multi-threaded prediction time: {multi_thread_time:.3f} seconds")
You may see results that look as follows:
Single-threaded prediction time: 1.823 seconds
Multi-threaded prediction time: 0.547 seconds
In this example:
We generate a synthetic dataset for binary classification using
sklearn.datasets.make_classification()
with 100,000 samples and 20 features.We initialize two
XGBClassifier
models: one with a single thread (n_jobs=1
,nthread=1
) and another with multiple threads (n_jobs=-1
,nthread=4
).We train both models on the generated dataset using the
fit()
method.To measure the prediction time for the single-threaded model, we:
- Record the start time using
time.perf_counter()
. - Make predictions on the training data using the
predict()
method. - Record the end time using
time.perf_counter()
. - Calculate the prediction time by subtracting the start time from the end time.
- Record the start time using
We repeat the same process to measure the prediction time for the multi-threaded model.
Finally, we print the prediction times for both models to compare their performance.
By using time.perf_counter()
, you can accurately measure the execution time of XGBoost predictions and assess the impact of using different numbers of threads. This information can help you make informed decisions about the optimal configuration for your model, balancing prediction speed and resource utilization.