What is Transfer Learning?
Transfer learning is a type of deep learning technique that uses a pre-trained machine learning model for a purpose that’s different from yet related to its original function. The basic idea is that the knowledge developed from its original task can be reapplied to help the AI system more effectively learn how to accomplish the new task. Because transfer learning requires considerable processing power, it’s most commonly applied in language processing and computer vision.
How Does Transfer Learning Work?
Transfer learning typically begins with a system trained on a large dataset and a ‘source task’ which essentially forms the core of the machine learning model.
Once this pre-training is complete, programmers then identify which parts of the model pertain to the new task and which parts need to be retrained. For example, if a machine learning model were trained to identify a motorcycle, the components of that model that pertain to image recognition could be used to train the model to identify shoes. Instead of having to retrain an entire system from scratch, programmers can simply build upon the foundation of the original system.
Benefits of Transfer Learning
The most significant benefit of transfer learning is that it saves considerable time and resources compared to training a model from scratch. It’s also valuable for scenarios with limited data or when there are only unlabelled datasets available.
Other benefits of transfer learning include:
- Reducing the amount of training data required. Instead of needing a large set of labeled data for every new model, programmers can use smaller data sets to the same effect.
- The ability to train models in simulated rather than real-world environments.
- Potentially improved performance compared to traditional machine learning.
- The ability to more efficiently train and deploy multiple models simultaneously.
Transfer learning also solves another long-standing problem with machine learning, known as overfitting.
Transfer Learning and Overfitting
Machine learning models are often only accurate within environments related to their original training data. Any changes to their operating environment directly and adversely affects their accuracy. A significant enough change may require full retraining.
Although machine learning has become significantly more advanced over the years, this remains a persistent problem, tangentially related to another issue with machine learning known as overfitting.
Essentially, overfitting occurs when a machine learning model cannot generalize, and instead skews to its training data. This can happen for several reasons:
- The training data set is too small to accurately train the model.
- There’s too much irrelevant data within the data set.
- The model has trained for too long on a single sample data set.
- The model makes inaccurate associations due to its complexity and the noise within its training data.
For instance, a machine learning model trained to recognize oranges but only trained with photos that show the fruit in a bowl might assume the bowl is a characteristic of an orange, and struggle to identify an orange growing on a tree in the wild.
Provided the pre-trained model was provided with an adequately-sized, high-quality data set, transfer learning doesn’t suffer from any of the above issues. This does not, however, mean that it is flawless. There are certain circumstances in which transfer learning can fail, including:
- When there’s a domain mismatch between datasets. This results in a model that’s stuck in a local minimum and unable to provide accurate predictions.
- If programmers relied too heavily on augmentation during training. This can result in a pre-trained model whose performance is worse than a model that was trained entirely from scratch.
- If the dataset provided to the pre-trained model is too small to support accurate predictions.