Transfer Learning
Last Updated on December 17, 2020 by Editorial Team
Author(s): Johar M. Ashfaque
Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. It is a popular approach in deep learning where pre-trained models are used as the starting point on computer vision and natural language processing tasks given the vast compute and time resources required to develop neural network models on these problems and from the huge jumps in a skill that they provide on related problems.
What is Transfer Learning?
Transfer learning is a machine learning technique where a model trained on one task is re-purposed on a second relatedΒ task.
Transfer learning and domain adaptation refer to the situation where what has been learned in one settingΒ β¦ is exploited to improve generalization in anotherΒ setting
β Page 526, Deep Learning, 2016.
Transfer learning is an optimization that allows rapid progress or improved performance when modeling the secondΒ task.
Transfer learning is the improvement of learning in a new task through the transfer of knowledge from a related task that has already beenΒ learned.
β Chapter 11: Transfer Learning, Handbook of Research on Machine Learning Applications, 2009.
Transfer learning is related to problems such as multi-task learning and concept drift and is not exclusively an area of study for deep learning. Nevertheless, transfer learning is popular in deep learning given the enormous resources required to train deep learning models or the large and challenging datasets on which deep learning models areΒ trained.
Transfer learning only works in deep learning if the model features learned from the first task areΒ general.
In transfer learning, we first train a base network on a base dataset and task, and then we repurpose the learned features, or transfer them, to a second target network to be trained on a target dataset and task. This process will tend to work if the features are general, meaning suitable to both base and target tasks, instead of specific to the baseΒ task.
β How transferable are the features in deep neural networks?
This form of transfer learning used in deep learning is called inductive transfer. This is where the scope of possible models (model bias) is narrowed in a beneficial way by using a model fit on a different but relatedΒ task.
How to Use Transfer Learning?
You can use transfer learning on your own predictive modeling problems. The two common approaches are asΒ follows:
- Develop a ModelΒ Approach
- Pre-trained ModelΒ Approach
Develop a ModelΒ Approach
- Select Source Task. You must select a related predictive modeling problem with an abundance of data where there is some relationship in the input data, output data, and/or concepts learned during the mapping from input to outputΒ data.
- Develop Source Model. Next, you must develop a skillful model for this first task. The model must be better than a naive model to ensure that some feature learning has been performed.
- Reuse Model. The model fit on the source task can then be used as the starting point for a model on the second task of interest. This may involve using all or parts of the model, depending on the modeling technique used.
- Tune Model. Optionally, the model may need to be adapted or refined on the input-output pair data available for the task of interest.
Pre-trained ModelΒ Approach
- Select the Source Model. A pre-trained source model is chosen from available models. Many research institutions release models on large and challenging datasets that may be included in the pool of candidate models from which to chooseΒ from.
- Reuse Model. The model pre-trained model can then be used as the starting point for a model’s second task of interest. This may involve using all or parts of the model, depending on the modeling technique used.
- Tune Model. Optionally, the model may need to be adapted or refined on the input-output pair data available for the task of interest.
This second type of transfer learning is common in the field of deep learning.
Examples of Transfer Learning with DeepΒ Learning
Let’s make this concrete with two common examples of transfer learning with deep learningΒ models.
Transfer Learning with ImageΒ Data
It is common to perform transfer learning with predictive modeling problems that use image data as input. This may be a prediction task that takes photographs or video data as input. For these types of problems, it is common to use a deep learning model pre-trained for a large and challenging image classification task such as the ImageNet 1000-class photograph classification competition.
Three examples of models of this typeΒ include:
- Oxford VGGΒ Model
- Google Inception Model
- Microsoft ResNetΒ Model
This approach is effective because the images were trained on a large corpus of photographs and require the model to make predictions on a relatively large number of classes, in turn, requiring that the model efficiently learn to extract features from photographs to perform well on theΒ problem.
In their Stanford course on Convolutional Neural Networks for Visual Recognition, the authors caution to carefully choose how much of the pre-trained model to use in your newΒ model.
[Convolutional Neural Networks] features are more generic in early layers and more original-dataset-specific in laterΒ layers
β Transfer Learning, CS231n Convolutional Neural Networks for Visual Recognition
Transfer Learning with LanguageΒ Data
It is common to perform transfer learning with natural language processing problems that use text as input or output. For these types of problems, a word embedding is used that is a mapping of words to a high-dimensional continuous vector space where different words with a similar meaning have a similar vector representation. Efficient algorithms exist to learn these distributed word representations.
Two examples of models of this typeΒ include:
- Googleβs word2vecΒ Model
- Stanfordβs GloVeΒ Model
These distributed word representation models can be downloaded and incorporated into deep learning language models in either the interpretation of words as input or the generation of words as output from theΒ model.
In his book on Deep Learning for Natural Language Processing, Yoav Goldberg cautions:
β¦ one can download pre-trained word vectors that were trained on very large quantities of text [β¦] differences in training regimes and underlying corpora have a strong influence on the resulting representations, and that the available pre-trained representations may not be the best choice for [your] particular useΒ case.
β Page 135, Neural Network Methods in Natural Language Processing, 2017.
Some Kagglebooks Links:
- Style Transfer in Action – Tiger – JMA
- Style Transfer in Action – LOTR Great Eagles- JMA
- Neural Style Transfer – Just For Fun – JMA
Bio: Johar Ashfaque is a Trainee Clinical Coder and Data Scientist at NHS EKHUFT. Prior to this Johar was a Post-Doc at MPI-SWS in Germany. He earned his Ph.D. in Theoretical Physics (String Theory and Phenomenology) from the University of Liverpool and his BSc. from the University of Kent at Canterbury, UK.
Transfer Learning was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI