Exploring the Power of the Transformers Library for Natural Language Processing
Last Updated on August 1, 2023 by Editorial Team
Author(s): Rafay Qayyum
Originally published on Towards AI.
Natural language processing (NLP) is a branch of Artificial Intelligence that deals with giving computers the ability to understand text and spoken words in the same way human beings can.NLP has made significant advancements recently thanks to developments in deep learning models. The Transformer is a novel architecture that has revolutionized various NLP tasks. In this article, we will explore the basics of Transformers for NLP and walk through some code examples to get started.
Introduction:
Hugging Faceβs Transformers library for Python is open-source and offers a complete set of tools and pre-trained models for dealing with a variety of NLP applications. It is based on the Transformer architecture, which has emerged as the de facto model for sequential data modeling in NLP. The library provides Pipeline
object, which makes it easy to use for inference.
In this article, we will dive into various applications of the transformers library in Natural Language Processing (NLP). We will explore how this library can be utilized for tasks like Sentiment Analysis, Text Generation, Masked/AutoEncoding Language Model, Named Entity Recognition, Zero-Shot Classification, Neural Machine Translation, Question Answering, and Text Summarization.
Sentiment Analysis
Sentiment analysis is the task of determining the sentiment or emotion expressed in a piece of text. Transformers provide a convenient way to perform sentiment analysis using pre-trained models. Letβs dive into some code examples:
!pip install transformers
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
result = classifier("This is not such a great movie")
print(result)
# OUTPUT: [{'label': 'NEGATIVE', 'score': 0.9989928603172302}]
# You can also pass more than one sample
result = classifier(['This is such a good movie',
'I dont like it'])
print(result)
# OUTPUT: [{'label': 'POSITIVE', 'score': 0.9998687505722046},
# {'label': 'NEGATIVE', 'score': 0.9909656047821045}]
In the above code, we install the transformers
library and import the pipeline
module. We then create a sentiment analysis classifier using the pipeline
function. We pass in a single text and a list of texts to the classifier and it predicted the sentiment for each input.
Text Generation
Text generation is another exciting application of Transformers. It involves predicting the next words or sequences of text based on the input. Generating coherent text using deep learning can be a challenging task. Deep learning models, such as recurrent neural networks (RNNs) and transformer-based models like GPT (Generative Pre-trained Transformer), have shown significant progress in generating coherent and contextually relevant text.
Letβs take a look at how to generate text using Transformers:
from transformers import pipeline
generator = pipeline('text-generation')
prompt = 'Neural networks with attention have been used with great success.'
result = generator(prompt)
print(result)
# OUTPUT:
#[{'generated_text': 'Neural networks with attention have been used with great success.\
# The primary goal of this article is to explore what types of neural networks have been
# used to measure attention and are particularly suitable for research of this kind, such as working memory, learning tasks,'}]
result = generator(prompt, num_return_sequences=3)
print(result)
# OUTPUT: [{'generated_text': 'Neural networks with attention have been used with great success.\
# They can be designed that use magnetic stripes and magnetometers to visualize a human brain\
# and give it special attention, or they can show it at a low-level. Their goal is to give'},
#{'generated_text': 'Neural networks with attention have been used with great success. Here we show\
# that the neural network with attention is trained against an object using a non-neuronal\
# approach that uses the idea of training instead of stimulus-dependent training (RDS)'},
#{'generated_text': 'Neural networks with attention have been used with great success. There's one\
# big caveat, however: They're just very much one thing too many.That's why this is our\
# "Best Case" article. We're not just talking about'}]
result = generator(prompt, max_length=30)
print(result)
# OUTPUT:
#[{'generated_text': 'Neural networks with attention have been used with great success.\
# Thus, we can expect to make significant improvement in the next few years. A'}]
In the above code, we create a text generation pipeline using the pipeline
function. We pass in a prompt, and the model generates the next words or sequences of text based on the prompt. We can control the number of generated sequences using the num_return_sequences
parameter, and limit the maximum length of the generated text using the max_length
parameter.
Masking/AutoEncoding (Article Spinning)
Masking/AutoEncoding is another technique available in the Transformers library. Certain words or tokens in the input sequence are randomly masked or replaced with a special token (e.g., <mask>). The modelβs objective is to predict the original masked tokens based on the surrounding context. This approach helps the model learn the relationships and dependencies between words in a sentence.
Letβs see how we can generate a missing word using the transformers library:
from transformers import pipeline
mlm = pipeline('fill-mask')
result = mlm('Consumers drive French <mask>')
print(result)
# OUTPUT:
#[{'score': 0.20347557961940765, 'token': 1677,
# 'token_str': ' cars', 'sequence': 'Consumers drive French cars'},
# {'score': 0.021551305428147316, 'token': 866,
# 'token_str': ' economy', 'sequence': 'Consumers drive French economy'}]
result = mlm('France\'s economic growth accelerated in the last three months\
of 2004, driven by <mask> spending, a report shows.')
print(result)
# OUTPUT:
# [{'score': 0.2391912043094635, 'token': 2267, 'token_str': ' consumer',
# 'sequence': "France's economic growth accelerated in the last three months\
# of 2004, driven by consumer spending, a report shows."},
# {'score': 0.028679588809609413, 'token': 168, 'token_str': ' government',
# 'sequence': "France's economic growth accelerated in the last three months\
# of 2004, driven by government spending, a report shows."}]
In the above code, we create a pipeline for masked language modeling using the pipeline
function. We provide a sentence with a masked word represented by <mask>
, and the model returns a list of dictionaries. Each dictionary contains a token
, score
, token_str
and sequence
. The score
represents the confidence of the model for that particular token.
Named Entity Recognition (NER)
Named Entity Recognition (NER) is a subtask of natural language processing (NLP) that focuses on identifying and classifying named entities in text. Named entities are specific words or phrases that represent entities such as persons, organizations, locations, dates, and more. This can be useful for information extraction from a given text.
Transformers can be used for NER tasks as well. Letβs take a look at an example:
from transformers import pipeline
ner = pipeline('ner',aggregation_strategy='simple',device=0)
result = ner('Apple Inc. was founded by Steve Jobs and Steve Wozniak.')
print(result)
# OUTPUT: [{'entity_group': 'ORG', 'score': 0.9996176,
# 'word': 'Apple Inc', 'start': 0, 'end': 9},
# {'entity_group': 'PER', 'score': 0.9946492,
# 'word': 'Steve Jobs', 'start': 26, 'end': 36},
# {'entity_group': 'PER', 'score': 0.88902617,
# 'word': 'Steve Wozniak', 'start': 41, 'end': 54}]
result = ner('The Eiffel Tower is located in Paris, France.')
print(result)
# OUTPUT: [{'entity_group': 'MISC', 'score': 0.6937692,
# 'word': 'Eiffel Tower', 'start': 4, 'end': 16},
# {'entity_group': 'LOC', 'score': 0.99946004,
# 'word': 'Paris', 'start': 31, 'end': 36},
# {'entity_group': 'LOC', 'score': 0.9993007,
# 'word': 'France', 'start': 38, 'end': 44}]
In the above code, we create a named entity recognition pipeline using the pipeline
function. The aggregation strategy βsimpleβ will attempt to group entities following the default schema:
(A, B-TAG), (B, I-TAG), (C,I-TAG),
(D, B-TAG2) (E, B-TAG2) will end up being
[{βwordβ: ABC, βentityβ: βTAGβ},
{βwordβ: βDβ, βentityβ: βTAG2β},
{βwordβ: βEβ, βentityβ: βTAG2β}]
We pass in a sentence, and the model identifies and classifies the named entities present in the text. The output is a list of dictionaries. A dictionary will contain word(s), score, entity group, start, and end. entity_group
is the entity model assigned to the word(s), the score
is the confidence of the model, the start
is the starting character in the text, and the end
is the ending character.
Zero-Shot Classification
Zero-shot classification allows us to classify text into multiple predefined categories without the need for specific training examples for each category. Traditionally, in supervised learning, a model is trained on labeled data with specific classes or categories. However, in zero-shot classification, the model is trained to understand the relationship between textual inputs and a set of general-purpose βmeta-labelsβ or βattribute labelsβ that describe different characteristics or properties of the data.
This approach allows for more flexible and generalized classification tasks, as the model can transfer its knowledge across related classes without the need for explicit training on every specific class.
Letβs dive into the code:
from transformers import pipeline
classifier = pipeline("zero-shot-classification",device=0)
result = classifier("I am looking for a new smartphone.",
candidate_labels=["technology", "sports"])
print(result)
# OUTPUT: {'sequence': 'I am looking for a new smartphone.',
# 'labels': ['technology', 'sports'],
# 'scores': [0.9867954254150391, 0.008458604104816914]}
result = classifier("This is a great Movie",
candidate_labels=['positive','negative'])
print(result)
# OUTPUT: {'sequence': 'This is a great Movie',
# 'labels': ['positive', 'negative'],
# 'scores': [0.9972434043884277, 0.002756547648459673]}
In the above code, we utilize the pipeline
function from the Transformers library to create a zero-shot classification pipeline. By providing a text input and a list of candidate labels, the model predicts the most suitable label for the given input. This approach allows us to classify text without requiring specific training data for each label.
Note that the labels in the output of the zero-shot classification pipeline may not always be in the same order as the input. This is because the ordering of the labels is determined by the scores of the labels. They are arranged in descending order.
Neural Machine Translation
Neural Machine Translation (NMT) involves translating text from one language to another using neural network models. Transformers have greatly improved the quality and efficiency of NMT systems.
You will need to install Transformers and sentencepiece using the following command; if youβre using Google Colab, you'll also need to restart runtime.
!pip install transformers sentencepiece transformers[sentencepiece]
Letβs see an example of translating English to Spanish using the Transformers library:
from transformers import pipeline
translator=pipeline("translation",model='Helsinki-NLP/opus-mt-en-es',device=0)
result = translator("Hello, how are you?")
print(result)
# OUTPUT:
# [{'translation_text': 'Hola, ΒΏcΓ³mo estΓ‘s?'}]
result = translator("I love natural language processing.")
print(result)
# OUTPUT:
# [{'translation_text': 'Me encanta el procesamiento del lenguaje natural.'}]
In the code snippet above, we create a translation pipeline using the pipeline
function from Transformers. We specify the task βTranslationβ, and model for English-to-Spanish translationHelsinki-NLP/opus-mt-en-es
. The model automatically handles sentence-level translations, providing us with accurate translations in real-time.
Question Answering
Question Answering (QA) systems aim to automatically answer questions based on a given context or document. Transformers have shown exceptional performance in QA tasks, including the ability to understand the context and provide precise answers. The modelβs output is actually just a slice of the input context string. So, the model will generate wrong answers for the questions with no answer in the input string. The confidence in those answers will be low.
Letβs take a look at an example:
from transformers import pipeline
question_answerer = pipeline("question-answering")
context = "The Eiffel Tower is a wrought-iron lattice tower located in Paris, France."
question = "Where is the Eiffel Tower located?"
result = question_answerer(question=question, context=context)
print(result)
# OUTPUT: {'score': 0.846054196357727, 'start': 60,
# 'end': 73, 'answer': 'Paris, France'}
context='Today, I made a peanut butter sandwich.'
question='What did I do today?'
result = question_answerer(question=question, context=context)
print(result)
# OUTPUT: {'score': 0.7968316674232483, 'start': 9,
# 'end': 38, 'answer': 'made a peanut butter sandwich'}
In the code example above, we create a question-answering pipeline using the pipeline
function. We provide a context (a piece of text containing relevant information) and a question related to the context. The model processes the context and generates the answer to the question based on the given information. The start
and end
are the indexes of starting and ending characters of the answer in the context string.
Text Summarization
Text summarization involves condensing a larger piece of text into a shorter, more concise version while preserving its key information. Transformers have shown remarkable capabilities in generating high-quality summaries. There are primary approaches to Text Summarization:
- Extractive Summarization: In extractive summarization, the summary is created by selecting and combining the most relevant sentences or phrases from the original text. Extractive summarization does not involve generating new sentences but rather extracts and rearranges parts of the original text.
- Abstractive Summarization: Abstractive summarization involves generating new sentences that convey the essence of the original text in a more concise form. This approach requires a deeper understanding of the text, as the model must interpret and rephrase the information in a way that captures the main ideas.
Letβs see how it can be done with the Transformers library:
from transformers import pipeline
summarizer = pipeline("summarization",device=0)
text = "Text summarization involves condensing a larger piece of text into a \
shorter, more concise version while preserving its key information.\
Transformers have shown remarkable capabilities in generating high-quality \
summaries. There are primary approaches to Text Summarization: \
Extractive Summarization: In extractive summarization, the summary is created \
by selecting and combining the most relevant sentences or phrases from the \
original text. Extractive summarization does not involve generating new \
sentences but rather extracts and rearranges parts of the original text. \
Abstractive Summarization: Abstractive summarization involves generating new \
sentences that convey the essence of the original text in a more concise form. \
This approach requires a deeper understanding of the text, as the model must \
interpret and rephrase the information in a way that captures the main ideas."
result = summarizer(text)
print(result)
# OUTPUT:
# [{'summary_text': ' Text summarization involves condensing a larger piece of\
# text into a shorter, more concise version while preserving its key\
# information . Transformers have shown remarkable capabilities in generating\
# high-quality summaries . There are primary approaches to Text Summarization:\
# extractive summarization and abstractive summarizing .'}]
In the above code, we create a summarization pipeline using the pipeline
function. We pass in a piece of text that we want to summarize. The model then processes the text and produces a condensed version with key information intact. You can also specify the max_length
and min_length
parameters to control the length of the summary but in some cases, it will cut off the sentence in the middle because the max length has been reached.
Conclusion
Hugging Faceβs Transformers library has emerged as a game-changer in the field of natural language processing. It provides a variety of pre-trained Transformer models and tools for creating cutting-edge NLP applications.
We have gone through many use cases for the Transformers library throughout this article, such as sentiment analysis, text generation, masked/autoencoding language models, named entity recognition, zero-shot classification, neural machine translation, question answering, and text summarization.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI