Blazing Fast Hyperparameter Tuning

Hyperparameter tuning is an essential tool in the data scientist’s toolbox to improve model performance. But hyperparameter tuning can be very costly and/or consume a significant amount of time, requiring data scientists to compromise on the amount of hyperparameter combinations they evaluate or the tuning frequency.

The DataHeroes Data Structure can be used to perform hyperparameter tuning, orders of magnitude faster and cheaper, on a single machine or on a cluster.

Convert your ML model to a DataHeroes Data Structure from a numpy array, Pandas DataFrame or a file(s), with a few simple commands:

# Initialize the service for training.
# Change the number of instances ‘n_instances’ to match your dataset.
service_obj = CoresetTreeServiceDTC(
    data_params=data_params,
    optimized_for=”training”,
    n_instances=XXX,
    model_cls=XGBClassifier,
)
# Build the Coreset Tree
# The Coreset tree uses the local file system to store its data.
# After this step you will have a new directory .dataheroes_cache
service_obj.build(X=X, y=y)

Once the DataHeroes Data Structure is built, perform Hyperparameter tuning using your favorite library:

# To hyperparameter tune, use the library’s built-in grid_search function,
# which would run dramatically faster than GridSearchCV
# Adjust the hyperparameters and scoring function to your needs
# (or use the default LGBMRegressor scoring by setting scoring=None).
param_grid = {
    'learning_rate': [0.1, 0.01],
    'n_estimators': [250, 500, 1000],
    'max_depth': [4, 6]
)

from sklearn.metrics import make_scorer
scoring = make_scorer(mean_squared_error)

optimal_hyperparameters, trained_model = service_obj.grid_search(
    param_grid=param_grid,
    scoring=scoring,
    refit=True,
    verbose=2
)

Get Started

The following charts demonstrates hyperparameter tuning using a single instance with 16 CPUs and 64GB RAM using the full dataset to evaluate 144 hyperparameter combinations, compared to using the DataHeroes Data Structure to evaluate both 144 and 864 combinations.

Save Time and Money

Using the DataHeroes Data Structure, the time and cost of hyperparameter tuning was reduced by 97%! And evaluating 864 combinations still saved 80% in terms of time and cost!

Improve Model Quality

More importantly, the DataHeroes Data Structure allowed us to find a better model, improving our balanced accuracy by 9.4%!

Ready to Get Started?

Get Started Contact Us

Our Blog

Stay updated with our latest blog posts, news, and announcements by reading here.

Hyperparameter Tuning Methods Every Data Scientist Should Know

Learn More

Unleashing the Power of ML: The Art of Training Models and Its Vital Significance

Learn More

Comparing Customer Segmentation Techniques: KMeans vs. KMeans Coreset from DataHeroes

Learn More