Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Create Indonesian Recipe Generator by Fine-tuning T5, BART, and GPT-2
Latest   Machine Learning

Create Indonesian Recipe Generator by Fine-tuning T5, BART, and GPT-2

Last Updated on July 26, 2023 by Editorial Team

Author(s): Haryo Akbarianto Wibowo

Originally published on Towards AI.

An Indonesian recipe generator Deep Learning model trained by fine-tuning pre-trained models such as T5, BART, and GPT-2

Preview Image is from Unsplash by Brooke Lark

Hello Everyone! Welcome to my first technical post on my personal blog! In this post, I will write about one of my fun projects, Indonesia Recipe Generator. This is the continuation of my previous medium post with a more modern approach.

This post will tell you the details about my experiment in creating an Indonesia recipe generator.

This post is reposted from: https://haryoa.github.io/posts/id-recipe-generator/

You can visit it for a good reading experience.

Repository and Demo

I will provide the best model and also the code to train it.

Feel free to try and download it :).

  • U+1F917 Huggingface Space (demo): Space
  • U+1F917 Huggingface Model (download the model): Model
  • U+2728 Repository (to train the model): GitHub repository

Introduction

In the past, I created an Indonesia recipe generator in my medium post by using a seq2seq Deep Learning approach such as Gated Recurrent Unit (GRU) and Transformers. I want to revisit my past work and improve it. So, this post will post some improvements from the previous one.

The model in my previous post was raw and not pre-trained. So, it didn’t have any prior knowledge to use in the training process. One of the improvements that can increase the quality of the model that I mentioned in my previous blog is using a pre-trained model. Many state-of-the-art research works implement it to improve the quality of the model from the non-pre-trained one. Moreover, currently, it’s the era of the pre-trained model in Deep Learning. Nowadays, many pre-trained models appear and have outstanding results when the model has trained again (we call it fine-tune) on the target dataset. Therefore, it’s intriguing to try it for my recipe generator project.

In this experiment, I will use off-the-shelf publicly available pre-trained models. Since the data is in the Indonesian language, I need to use models pre-trained with the Indonesian data. They also need to handle sequence generation problems since the problem that I want to tackle is a text generation problem. I searched and only found T5, BART, and GPT models. Thus, I decided to experiment by using these models.

Data, Preprocessing, Exploration Data Analysis

Since this is the continuation of my previous project, I use the same data that I used previously. Feel free to read more detail about it through my medium post below.

U+1F356U+1F372 Recibrew! Predicting Food Ingredients with Deep Learning!!U+1F372U+1F356

▶️▶️ Step by step story about my fun self project on predicting food’s ingredients using seq2seq in Deep Learning…

pub.towardsai.net

Method

In this section, I will describe the models that I’ve tried and mentioned above. All of the models that I used are transformer-based models. I will briefly describe them.

Bart

BART is a transformer-based model that is pre-trained through learning a corrupted input. It has an encoder-decoder architecture like the Transformer model with a few modifications such as the replacement of the activation function. There are several corrupted scenarios that the author of the BART tried. The released final system is a model which is trained through sentence shuffling and token masking. The idea is to make BART learn perturbation and have the capability of doing causal language modeling. To apply it to the system, I fine-tuned the model to the data. Here is an illustration of how pre-trained BART was built.

BART pre-training. It uses corrupted input (sentence shuffling and token masking) to predict the uncorrupted one

In this experiment, I used IndoBART as the pre-trained model, which is released in this paper. The model is pre-trained through the Indo4B dataset, Common Crawl, and Wikipedia. The data has Indonesian, Sundanese, and Javanese languages. It is available publicly in the Huggingface model.

T5

T5 is also a transformer-based model that is pre-trained with a corrupted input. T5 is pre-trained by doing token masking. different from the BART which uses the data for causal language model training, T5 uses the training data as a seq2seq problem. The data may contain Translation, Question Answering, and Classification problems. The model is fed through learning those tasks with the addition of learning a corrupted input. Here is an illustration of how the model is pre-trained.

T5 pre-training. It uses several tasks with a promptings style as the input of the model.

I used the T5 model which is available in the HuggingFace. It is pre-trained by using the Indonesian mC4 dataset.

GPT

GPT-2 is an auto-regressive pre-trained model that is pre-trained through causal language modeling that has no perturbation in the input (unlike BART and T5). Then, the pre-trained model is fine-tuned to our data. I used the IndoGPT model that is also released together with IndoBART in the same paper. The model is also pre-trained with the same data as IndoBART.

Since the model is not encoder-decoder architecture, we need to reshape our input and make it a language modeling problem.

Setup

I will split this section into code technical setup, model setup, and hyperparameter setup.

Code Technical Setup

To make the training script, I used Pytorch as the deep learning framework. I wrap them with Pytorch Lightning6. I used the implementation of Model Checkpoint, Early Stopping, and 16-bit Precision from the Pytorch Lightning.

For metric calculation, I used BLEU Score. BLEU Score is a popular metric for sequence-to-sequence problems. I use off-the-shelf BLEU score implementation from the sacrebleu Python package.

Model Setup

I applied several modifications to the input of the model. For the architecture, I used an off-the-shelf implementation that Huggingface provided.

For GPT, since it needs one input, I contacted the food name and recipe into an input with a special symbol >>>.

Input: <FOOD> >>> <INGREDIENTS>
Output: Shift the input (e.g.: Input: `Apple Fruit end_of_token`, Output: `Fruit end_of_token`)

T5 has seq2seq architecture, so I did a small modification to the input. From what I’ve read, T5 is pre-trained with a ‘prompting’ style of input. For example: input: summarize: <ARTICLE>. So, I follow it and change the data to become like that. Below is how I present the input-output of the model

Input: resep: <FOOD>
Output: <INGREDIENTS>

I didn’t do any changes in the BART model, so I provide the input and the output as-is.

Input: <FOOD>
Output: <INGREDIENTS>

Hyperparameter Setup

I used Adam as the optimizer technique. The learning rate varies depending on the architecture. I handpicked several learning-rate values based on several resources and I tried some of these values. I picked 1e-4, 1e-5, and 1e-4 as the learning rate of the model of GPT, BART, and T5 respectively. I used early stopping criteria to avoid model overfitting. It will stop training if the validation loss doesn’t increase for 5 epochs. To pick the best model, I used the model that has the lowest validation loss. I used AdamW as the optimizer of the model.

To make the training faster, I used Automatic Mixed Precision (AMP) that Pytorch provided. Unfortunately, T5 can’t use AMP. So, I didn’t use AMP when I fine-tuned the T5 model.

Following my past article, to make a fair comparison, I used the Greedy decoder as the decoding strategy to predict the output for each of the models. You can see the details about how a greedy decoder works in my past blog.

Experiment Results

Below is the result of my experiment.

With my setup, IndoBART outperforms other models. T5, IndoBART, and IndoGPT have higher BLEU scores than the transformer vanilla. It indicates that a pre-trained seq2seq model may help to increase the performance of the model. All of the models that are trained on Indobenchmark’s data outperform the model that is trained on the C4 model (T5). It’s interesting to see the potential of each pre-trained model.

Analysis

Please visit my blog for the best experience to see this section.

Conclusion

Random cat~. Photo by Manja Vitolic on Unsplash.

In this post, I experimented with an Indonesian recipe generator using pre-trained models. IndoBART outperforms other models based on the BLEU score. We can also conclude that fine-tuning a pre-trained model is generally better than the non-pre-trained one. It is interesting to see that it really works!

Actually, there are many things to be explored here. For example, it is interesting to see the effect of pre-trained vs the non-pre-trained one of BART, T5, and GPT. I also need to do some rigorous analysis of the trained model. Sadly, because of my limited resources, I cannot do it for now.

In the future, I plan to write about the current progress of the seq2seq model. There are many new interesting published papers in Machine Learning conferences in 2022. I will study and write about it in my blog.

Source : Pixabay by Geralt

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓