A machine learning algorithm studies a set of input data and generates predictions for a target variable based on what it learned from previous sets of historical data. As long as the relationship between the input data and the target variable remains constant, the model will maintain accuracy.
However, the ever-changing real world sometimes shifts that relationship, leading to a previously successful machine learning model to degrade in performance. This degradation is known as concept drift, and AI developers must address it to keep a model productive and useful over time.
Concept Drift vs. Data Drift
Both concept and data drift are inevitable changes in the real world that negatively impact ML model performance. But data scientists and developers still treat them separately, as they call for different remediation strategies.
Concept drift focuses on a change in the relationship between the input data and the target variable. For instance, an email spam detection AI might look for keywords, phrases, and links that suggest an email is spam-like in nature. But if the email user changes preferences and suddenly finds interest in previously considered “spam,” concept drift has occurred and the model is no longer applicable.
Data drift is an unexpected change in the statistical distribution of the input data itself. An ML model running on an older distribution of input data will likely make erroneous predictions when it starts reading new data. Sales and marketing teams are familiar with this type of drift since it occurs whenever product demand changes from seasonality or consumer preferences shift unexpectedly.
How Concept Drift Happens
Drift prevention starts with determining when the drift has occurred so that developers can respond with model retraining. Concept drift can occur in one of many ways.
- A Sudden– Drift is a disruptive and long-lasting shift in the relationship between the target variable and the input data. Many data scientists see the COVID-19 pandemic as the catalyst for various sudden concept drifts in machine learning worldwide.
- An Incremental– Drift is a steady transition from one concept to another. The drift may be a smooth trend line or staggered, where the new concept becomes more common over time until it takes over.
- Recurring – Drift is a new concept that takes hold temporarily before returning to the old concept. Seasonal demand in certain products is a classic example of recurring drift. If the new concept shows up for an incredibly short time, the drift is known as a blip.
AI developers can prepare for inevitable model drift by designing their detection systems according to the type of concept drift they expect. For example, a data set that likely undergoes gradual shift benefits the most from a concept drift detection mechanism that looks for that type of drift.
Handling Concept Drift in Machine Learning
Companies that rely on machine learning accuracy have policies in place to address concept drift.
Drift Detection
Because concept drift deals with how the input data and target variables link together, studying merely the input data set is not sufficient for detecting concept drift.
Instead, developers look for the consequences of concept drift by tracking the model’s performance over time. Any sudden deterioration in accuracy is a sign of concept drift. ML developers alternatively look at the model’s confidence scores for each prediction. Suddenly shifting confidence scores with new data sets is another sign of concept drift.
It’s also not uncommon for developers to keep a static model as a benchmark for comparison against new models.
Model Retraining
When it becomes apparent that concept drift has occurred, the next step should be immediate retraining and updating the ML model to work more in line with current environments.
The frequency of retraining depends on the industry and the volatility of the input data. Some organizations train on average every month, whereas others might do so twice a year.
Building Drift-Compatible Models
Data scientists have the option of setting weights on newer data points, which the algorithm will recognize as high priority in order to mitigate concept drift.
Companies sometimes start from scratch as well and create entirely new models to respond to sudden concept drift. This strategy was popular in the wake of the COVID-19 pandemic.