Training your model more frequently helps overcome drift and improves model quality but is often expensive and time consuming. The DataHeroes Data Structure allows you to retrain your model frequently and evaluate many hyperparameter combinations, without increasing costs.
Convert your ML model to a DataHeroes Data Structure from a numpy array, Pandas DataFrame or a file(s), with a few simple commands:
# Initialize the service for training. # Change the number of instances ‘n_instances’ to match your dataset. service_obj = CoresetTreeServiceDTC( data_params=data_params, optimized_for=”training”, n_instances=XXX, model_cls=XGBClassifier, ) # Build the Coreset Tree service_obj.build_from_df(datasets=df)
Easily add new data from production to an existing DataHeroes Data Structure, and the model can be updated in minutes, without creating bias
# Add additional data to the Coreset tree. # The Coreset tree is automatically updated to reflect the old and newly # added data allowing models to be retrained quickly, to overcome model # degradation. service_obj.partial_build_from_df(datasets=df_2)
Train your model on the DataHeroes Data Structure, order of magnitude faster
# Fit a Decision tree model using XGBoost directly on the Coreset tree. # Provide the same parameters to the fit, predict and predict_proba as you # would provide XGBoost (e.g.: adjusting n_estimators). from sklearn.metrics import balanced_accuracy_score coreset_model = service_obj.fit(level=0, n_estimators=500) coreset_score = balanced_accuracy_score(y_test, service_obj.predict(X_test))
Retrain and Tune Frequently while Lowering Compute Costs
The following chart demonstrates the importance of frequent retraining by comparing the accuracy degradation in a model that was trained once over a period of 6 months (the red line), versus a model retrained and tuned weekly on the same dataset with the same features (the blue line). Overall, weekly retraining improved model accuracy by 10.6% and improved F1 score by 20.3%!
Furthermore, the model trained weekly using the DataHeroes Data Structure, consumes less compute over a full year, compared to the baseline model, trained every 6 months. Even though the weekly trained model is trained 26 times more frequently than the baseline model, it consumes 22% less compute.
The DataHeroes Python library allows data scientists to easily convert any existing ML model to a DataHeroes Data Structure, and move to Real-Time Machine Learning today!
Ready to Get Started?
Our Blog
Stay updated with our latest blog posts, news, and announcements by reading here.