Create Indonesian Recipe Generator by Fine-tuning T5, BART, and GPT-2
Last Updated on July 26, 2023 by Editorial Team
Author(s): Haryo Akbarianto Wibowo
Originally published on Towards AI.
An Indonesian recipe generator Deep Learning model trained by fine-tuning pre-trained models such as T5, BART, and GPT-2
Hello Everyone! Welcome to my first technical post on my personal blog! In this post, I will write about one of my fun projects, Indonesia Recipe Generator. This is the continuation of my previous medium post with a more modern approach.
This post will tell you the details about my experiment in creating an Indonesia recipe generator.
This post is reposted from: https://haryoa.github.io/posts/id-recipe-generator/
You can visit it for a good reading experience.
Repository and Demo
I will provide the best model and also the code to train it.
Feel free to try and download it :).
- U+1F917 Huggingface Space (demo): Space
- U+1F917 Huggingface Model (download the model): Model
- U+2728 Repository (to train the model): GitHub repository
Introduction
In the past, I created an Indonesia recipe generator in my medium post by using a seq2seq Deep Learning approach such as Gated Recurrent Unit (GRU) and Transformers. I want to revisit my past work and improve it. So, this post will post some improvements from the previous one.
The model in my previous post was raw and not pre-trained. So, it didnβt have any prior knowledge to use in the training process. One of the improvements that can increase the quality of the model that I mentioned in my previous blog is using a pre-trained model. Many state-of-the-art research works implement it to improve the quality of the model from the non-pre-trained one. Moreover, currently, itβs the era of the pre-trained model in Deep Learning. Nowadays, many pre-trained models appear and have outstanding results when the model has trained again (we call it fine-tune
) on the target dataset. Therefore, itβs intriguing to try it for my recipe generator project.
In this experiment, I will use off-the-shelf publicly available pre-trained models. Since the data is in the Indonesian language, I need to use models pre-trained with the Indonesian data. They also need to handle sequence generation problems since the problem that I want to tackle is a text generation problem. I searched and only found T5, BART, and GPT models. Thus, I decided to experiment by using these models.
Data, Preprocessing, Exploration Data Analysis
Since this is the continuation of my previous project, I use the same data that I used previously. Feel free to read more detail about it through my medium post below.
U+1F356U+1F372 Recibrew! Predicting Food Ingredients with Deep Learning!!U+1F372U+1F356
βΆοΈβΆοΈ Step by step story about my fun self project on predicting foodβs ingredients using seq2seq in Deep Learningβ¦
pub.towardsai.net
Method
In this section, I will describe the models that Iβve tried and mentioned above. All of the models that I used are transformer-based models. I will briefly describe them.
Bart
BART is a transformer-based model that is pre-trained through learning a corrupted input. It has an encoder-decoder architecture like the Transformer model with a few modifications such as the replacement of the activation function. There are several corrupted scenarios that the author of the BART tried. The released final system is a model which is trained through sentence shuffling and token masking. The idea is to make BART learn perturbation and have the capability of doing causal language modeling. To apply it to the system, I fine-tuned the model to the data. Here is an illustration of how pre-trained BART was built.
In this experiment, I used IndoBART as the pre-trained model, which is released in this paper. The model is pre-trained through the Indo4B dataset, Common Crawl, and Wikipedia. The data has Indonesian, Sundanese, and Javanese languages. It is available publicly in the Huggingface model.
T5
T5 is also a transformer-based model that is pre-trained with a corrupted input. T5 is pre-trained by doing token masking. different from the BART which uses the data for causal language model training, T5 uses the training data as a seq2seq problem. The data may contain Translation, Question Answering, and Classification problems. The model is fed through learning those tasks with the addition of learning a corrupted input. Here is an illustration of how the model is pre-trained.
I used the T5 model which is available in the HuggingFace. It is pre-trained by using the Indonesian mC4 dataset.
GPT
GPT-2 is an auto-regressive pre-trained model that is pre-trained through causal language modeling that has no perturbation in the input (unlike BART and T5). Then, the pre-trained model is fine-tuned to our data. I used the IndoGPT model that is also released together with IndoBART in the same paper. The model is also pre-trained with the same data as IndoBART.
Since the model is not encoder-decoder architecture, we need to reshape our input and make it a language modeling problem.
Setup
I will split this section into code technical setup, model setup, and hyperparameter setup.
Code Technical Setup
To make the training script, I used Pytorch as the deep learning framework. I wrap them with Pytorch Lightning6. I used the implementation of Model Checkpoint, Early Stopping, and 16-bit Precision from the Pytorch Lightning.
For metric calculation, I used BLEU Score. BLEU Score is a popular metric for sequence-to-sequence problems. I use off-the-shelf BLEU score implementation from the sacrebleu
Python package.
Model Setup
I applied several modifications to the input of the model. For the architecture, I used an off-the-shelf implementation that Huggingface provided.
For GPT, since it needs one input, I contacted the food name and recipe into an input with a special symbol >>>
.
Input: <FOOD> >>> <INGREDIENTS>
Output: Shift the input (e.g.: Input: `Apple Fruit end_of_token`, Output: `Fruit end_of_token`)
T5 has seq2seq architecture, so I did a small modification to the input. From what Iβve read, T5 is pre-trained with a βpromptingβ style of input. For example: input: summarize: <ARTICLE>
. So, I follow it and change the data to become like that. Below is how I present the input-output of the model
Input: resep: <FOOD>
Output: <INGREDIENTS>
I didnβt do any changes in the BART model, so I provide the input and the output as-is.
Input: <FOOD>
Output: <INGREDIENTS>
Hyperparameter Setup
I used Adam as the optimizer technique. The learning rate varies depending on the architecture. I handpicked several learning-rate values based on several resources and I tried some of these values. I picked 1e-4
, 1e-5
, and 1e-4
as the learning rate of the model of GPT, BART, and T5 respectively. I used early stopping criteria to avoid model overfitting. It will stop training if the validation loss doesnβt increase for 5 epochs. To pick the best model, I used the model that has the lowest validation loss. I used AdamW as the optimizer of the model.
To make the training faster, I used Automatic Mixed Precision (AMP) that Pytorch provided. Unfortunately, T5 canβt use AMP. So, I didnβt use AMP when I fine-tuned the T5 model.
Following my past article, to make a fair comparison, I used the Greedy decoder as the decoding strategy to predict the output for each of the models. You can see the details about how a greedy decoder works in my past blog.
Experiment Results
Below is the result of my experiment.
With my setup, IndoBART outperforms other models. T5, IndoBART, and IndoGPT have higher BLEU scores than the transformer vanilla. It indicates that a pre-trained seq2seq model may help to increase the performance of the model. All of the models that are trained on Indobenchmarkβs data outperform the model that is trained on the C4 model (T5). Itβs interesting to see the potential of each pre-trained model.
Analysis
Please visit my blog for the best experience to see this section.
Conclusion
In this post, I experimented with an Indonesian recipe generator using pre-trained models. IndoBART outperforms other models based on the BLEU score. We can also conclude that fine-tuning a pre-trained model is generally better than the non-pre-trained one. It is interesting to see that it really works!
Actually, there are many things to be explored here. For example, it is interesting to see the effect of pre-trained vs the non-pre-trained one of BART, T5, and GPT. I also need to do some rigorous analysis of the trained model. Sadly, because of my limited resources, I cannot do it for now.
In the future, I plan to write about the current progress of the seq2seq model. There are many new interesting published papers in Machine Learning conferences in 2022. I will study and write about it in my blog.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI