A Journey into the Fabulous Applications of Transformers — Part 2
Last Updated on December 14, 2022 by Editorial Team
Author(s): Dr. Dharini R
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
A Journey into the Fabulous Applications of Transformers — Part 2
Demo with Emphasis on NLP using Python, Hugging Face
Transformer architecture is widely used in Natural Language Processing and it highly contributed to the need-of-the-hour Large Language Models (LLM). With the rise of transfer learning in NLP using LLMs, many models are built and shared with the research community in Hugging Face.
We are going to see some of the intriguing applications of transformers in NLP with demos and explanations. This article is a continuation of a previous one linked below.
The hugging face models used in this article can also be used as a Plug-and-Play model using the Accelerated Inference API. Kindly refer to this article to know more about how to utilize a model using its API.
Plug-and-Play ML Models with ‘Accelerated Inference API’ from Hugging Face
The demo code for all the applications discussed, along with the Colab link, is given in this GitHub link. Being an introductory article, the details about the models discussed are linked in the respective sections. For all the applications to work (except for sentence similarity), we need to install transformers library using pip install transformers.
The applications we are going to see in this article are
6. Translation
An interesting and useful application with transformers is the ability to translate text between languages. Several models are available for the translation task, depending on the languages being used in the training phase.
In our example, we are going to translate an English language text into Italian language. To know what model can perform the translation, we have to explore the Hugging Face library. The model named Helsinki-NLP/opus-mt-en-it performs this translation, and we will use that for our demo.
The first step is to instantiate the pipeline with the task translation.The task name varies according to the languages being dealt with. More information on the translation of the text in different languages, the corresponding models, and the datasets can be found in this link.
In our case, the task is translation_en_to_it and is given along with the model name as follows.
from transformers import pipeline
text_translator = pipeline("translation_en_to_it", model="Helsinki-NLP/opus-mt-en-it")
input_text is stored with the English sentence that we wish to translate.
input_text = "This text is translated from English to Italian"
In the following line, we load the input to our model and store the output in italian_text. On printing the same, we get the translated version of our input.
italian_text = text_translator(input_text, clean_up_tokenization_spaces=True)
print(italian_text)
Output:
[{'translation_text': "Questo testo è tradotto dall'inglese all'italiano"}]
7. Token Classification
Token classification is the task of assigning a tag/label to words present in the text. An individual word in the sentence is called a token. Examples of token classification are Named Entity Recognition(NER) and Part-of-Speech(PoS) tagging.
We have covered NER in the first part of our article, and so here we will look at the second task, PoS tagging. Part-of-Speech tagging is the process of identifying the corresponding part of speech of a word and labeling it with tags like nouns, pronouns, adjectives, adverbs etc, in a sentence.
We begin by importing pipeline and instantiating it with the task token-classification.
from transformers import pipeline
pos_classifier = pipeline("token-classification", model = "vblagoje/bert-english-uncased-finetuned-pos")
As given above, the model we have used is vblagoje/bert-english-uncased-finetuned-pos. Now we shall classify the tokens and display them.
tags = pos_classifier("Hello I am happy when I see waterfalls.")
print(tags)
Output:
[{'entity': 'INTJ', 'score': 0.9958117, 'index': 1, 'word': 'hello', 'start': 0, 'end': 5}, {'entity': 'PRON', 'score': 0.9995704, 'index': 2, 'word': 'i', 'start': 6, 'end': 7}, {'entity': 'AUX', 'score': 0.9966484, 'index': 3, 'word': 'am', 'start': 8, 'end': 10}, {'entity': 'ADJ', 'score': 0.99829704, 'index': 4, 'word': 'happy', 'start': 11, 'end': 16}, {'entity': 'ADV', 'score': 0.99861526, 'index': 5, 'word': 'when', 'start': 17, 'end': 21}, {'entity': 'PRON', 'score': 0.9995561, 'index': 6, 'word': 'i', 'start': 22, 'end': 23}, {'entity': 'VERB', 'score': 0.99941325, 'index': 7, 'word': 'see', 'start': 24, 'end': 27}, {'entity': 'NOUN', 'score': 0.9958626, 'index': 8, 'word': 'waterfalls', 'start': 28, 'end': 38}, {'entity': 'PUNCT', 'score': 0.9996631, 'index': 9, 'word': '.', 'start': 38, 'end': 39}]
To see the output clearly, we shall import pandas and create a dataframe of the result and see the output as shown below.
import pandas as pd
df_tags = pd.DataFrame(tags)
print(df_tags)
Output:
entity score index word start end
0 INTJ 0.995812 1 hello 0 5
1 PRON 0.999570 2 i 6 7
2 AUX 0.996648 3 am 8 10
3 ADJ 0.998297 4 happy 11 16
4 ADV 0.998615 5 when 17 21
5 PRON 0.999556 6 i 22 23
6 VERB 0.999413 7 see 24 27
7 NOUN 0.995863 8 waterfalls 28 38
8 PUNCT 0.999663 9 . 38 39
8. Sentence Similarity
An interesting application with transformers is the ability to extract sentence embeddings that captures semantic information. A sentence embedding is the numerical representation of a whole sentence and is used as input to the model. In our example, we will use a sentence transformer to generate sentence embeddings and use it to identify the similarity between two sentences.
For this example, we have to install sentence-transformers library from hugging face, unlike the other applications where we will install transformers library. There are many models available for the task of sentence similarity, and we are going to use the model sentence-transformers/all-MiniLM-L6-v2 in our example.
The first step is to import SentenceTransformer and util as shown below. The util package is needed to use the cosine similarity function to measure the distance between the two embeddings.
!pip install sentence-transformers
from sentence_transformers import SentenceTransformer, util
Next step, we load the input_sentences with sentences that we would like to try our model with. We give the specific model name to be used to the SentenceTransformer.
similarity_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
input_sentences = ["I'm happy", "I'm not sad"]
In the next step, we extract the embedding for both sentences. The embeddings for both sentences are stored in separate variables.
embedding_1= similarity_model.encode(input_sentences[0], convert_to_tensor=True)
embedding_2 = similarity_model.encode(input_sentences[1], convert_to_tensor=True)
In the last step, we utilize the cos_sim function from util to calculate the distance between two vectors (embeddings)
print(util.pytorch_cos_sim(embedding_1, embedding_2))
Output:
tensor([[0.4624]])
The more the value closer to 1, the more the similarity between the sentences. If the value is closer to 0, we can understand that the sentences are not similar.
9. Zero-shot classification
Zero-shot classification has its root in Zero-shot learning which basically means a classifier is trained with one set of labels and tested with a different set of labels. The evaluating set of labels is not seen by the model at all.
This means we can take a zero-shot classification model, give input and also give labels of our own. The model will classify the text according to the labels given by us, even though the model was not trained by any of these labels. Now we will try out the same in our demo. Let us start by instantiating the pipeline with the task zero-shot-classification.
from transformers import pipeline
zero_shot_classifier = pipeline("zero-shot-classification")
Output:
No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%
1.15k/1.15k [00:00<00:00, 28.0kB/s]
Downloading: 100%
1.63G/1.63G [00:26<00:00, 63.5MB/s]
Downloading: 100%
26.0/26.0 [00:00<00:00, 467B/s]
Downloading: 100%
899k/899k [00:00<00:00, 5.16MB/s]
Downloading: 100%
456k/456k [00:00<00:00, 2.73MB/s]
Downloading: 100%
1.36M/1.36M [00:00<00:00, 2.71MB/s]
As we have not given a specific model, the model facebook/bart-large-mnli is selected by the library, and the model weights are downloaded and referred with zero_shot_classifier.
The next step is to define the input text and the labels with which we need the classification. Using the zero_shot_classifier, we pass the input text and labels to be classified.
input_sentence = "I am hungry and angry. I think an ice cream will make me feel good"
labels = ['food', 'travel','entertainment','sad','happy','neutral']
results = zero_shot_classifier(input_sentence, labels)
print(results)
Output:
{'sequence': 'I am hungry and angry. I think an ice cream will make me feel good', 'labels': ['food', 'sad', 'entertainment', 'travel', 'neutral', 'happy'], 'scores': [0.8720405101776123, 0.07575708627700806, 0.023996146395802498, 0.010163746774196625, 0.009797507897019386, 0.00824508722871542]}
To see the output clearly, we shall import pandas library and see the result in a data frame. As shown in the below data frame, the score for the labels food and sad are more than the other labels. We can see that the sentence given, relates to the label food more than all the other labels. So we get the highest score in that.
import pandas as pd
df_result = pd.DataFrame(results)
print(df_result.loc[:, ['labels','scores']])
Output:
labels scores
0 food 0.872041
1 sad 0.075757
2 entertainment 0.023996
3 travel 0.010164
4 neutral 0.009798
5 happy 0.008245
10. Fill Mask
The concept of Masked Language Modeling (MLM) deals with predicting words to replace the purposefully masked words in the data. This helps in improving the statistical understanding of the language with which the model is trained and leads to better text representations. The merit of MLM is the self-supervised pretraining that does not need labeled data for training.
The task Fill Mask, thus dealing with masking random words in the given sentence and exploring the various suggestions given by the model for replacing the mask.
Let us start with instantiating pipeline with the fill-mask task as shown below. Since we have not given a specific model, the default model (distilroberta-base) is downloaded.
from transformers import pipeline
fill_mask_clf = pipeline("fill-mask")
Output:
No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%
480/480 [00:00<00:00, 13.5kB/s]
Downloading: 100%
331M/331M [00:08<00:00, 43.4MB/s]
Downloading: 100%
899k/899k [00:00<00:00, 1.56MB/s]
Downloading: 100%
456k/456k [00:00<00:00, 1.77MB/s]
Downloading: 100%
1.36M/1.36M [00:00<00:00, 1.77MB/s]
Now we give a sentence to the model by masking a word in it with the <mask> token. On printing the output, we see five different possible words for the masked word, the corresponding score, the token representation, and the complete sentence with the predicted word.
print(fill_mask_clf("artificial intelligence is going to be <mask> in the future"))
Output:
[{'score': 0.061495572328567505,
'token': 30208,
'token_str': ' commonplace',
'sequence': 'artificial intelligence is going to be commonplace in the future'},
{'score': 0.05322642996907234,
'token': 25107,
'token_str': ' ubiquitous',
'sequence': 'artificial intelligence is going to be ubiquitous in the future'},
{'score': 0.0344822071492672,
'token': 5616,
'token_str': ' useful',
'sequence': 'artificial intelligence is going to be useful in the future'},
{'score': 0.033160075545310974,
'token': 6128,
'token_str': ' everywhere',
'sequence': 'artificial intelligence is going to be everywhere in the future'},
{'score': 0.025654001161456108,
'token': 956,
'token_str': ' needed',
'sequence': 'artificial intelligence is going to be needed in the future'}]
Summary
In these two parts of the article, we went through the most important 10 applications that are achievable with the transformers. The takeaway is that, with the rise of transformers and large language models, complex tasks that would have needed heavy cost and time are now available to use at ease. It also has to be noted that the same model can be used for different tasks.
The idea is to understand the possibilities of a model and fine-tune it for your own purpose. The more we try, the more we learn. Proceed and Succeed!!!
Please find more articles related to NLP in this page.
Thank you!!
A Journey into the Fabulous Applications of Transformers — Part 2 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI