Hyperparameter tuning is an essential tool in the data scientist’s toolbox to improve model performance. But hyperparameter tuning can be very costly and/or consume a significant amount of time, requiring data scientists to compromise on the amount of hyperparameter combinations they evaluate or the tuning frequency.
The DataHeroes Data Structure can be used to perform hyperparameter tuning, orders of magnitude faster and cheaper, on a single machine or on a cluster.
Convert your ML model to a DataHeroes Data Structure from a numpy array, Pandas DataFrame or a file(s), with a few simple commands:
# Initialize the service for training. # Change the number of instances ‘n_instances’ to match your dataset. service_obj = CoresetTreeServiceDTC( data_params=data_params, optimized_for=”training”, n_instances=XXX, model_cls=XGBClassifier, ) # Build the Coreset Tree # The Coreset tree uses the local file system to store its data. # After this step you will have a new directory .dataheroes_cache service_obj.build(X=X, y=y)
Once the DataHeroes Data Structure is built, perform Hyperparameter tuning using your favorite library:
# To hyperparameter tune, use the library’s built-in grid_search function, # which would run dramatically faster than GridSearchCV # Adjust the hyperparameters and scoring function to your needs # (or use the default LGBMRegressor scoring by setting scoring=None). param_grid = { 'learning_rate': [0.1, 0.01], 'n_estimators': [250, 500, 1000], 'max_depth': [4, 6] ) from sklearn.metrics import make_scorer scoring = make_scorer(mean_squared_error) optimal_hyperparameters, trained_model = service_obj.grid_search( param_grid=param_grid, scoring=scoring, refit=True, verbose=2 )
The following charts demonstrates hyperparameter tuning using a single instance with 16 CPUs and 64GB RAM using the full dataset to evaluate 144 hyperparameter combinations, compared to using the DataHeroes Data Structure to evaluate both 144 and 864 combinations.
Save Time and Money
Using the DataHeroes Data Structure, the time and cost of hyperparameter tuning was reduced by 97%! And evaluating 864 combinations still saved 80% in terms of time and cost!
Improve Model Quality
More importantly, the DataHeroes Data Structure allowed us to find a better model, improving our balanced accuracy by 9.4%!
Ready to Get Started?
Our Blog
Stay updated with our latest blog posts, news, and announcements by reading here.