Automating Words: How GRUs Power the Future of Text Generation
Author(s): Tejashree_Ganesan
Originally published on Towards AI.
Automating Words: How GRUs Power the Future of Text Generation
Isnβt it incredible how far language technology has come? Natural Language Processing, or NLP, used to be about just getting computers to follow basic commands. Now, though, weβre seeing computers actually starting to understand language and even respond in ways that feel surprisingly human. Think about the difference this makes! Instead of just getting literal responses, weβre moving toward a world where machines genuinely βgetβ what weβre saying. And itβs not just about understanding us, itβs about generating responses too and making our interactions feel smoother and more natural. Itβs wild to think about where this could take us next!
Among these breakthroughs, one exciting area is text generation. Imagine a machine writing anything from emails to stories or generating entire conversations. This is what weβll be exploring here. To achieve such fluent and coherent text generation, researchers use various models, each with unique strengths. One powerful tool for this purpose is the Gated Recurrent Unit (GRU) network. GRUs have gained popularity because they balance two key aspects: they capture long-term dependencies in text (helping the machine remember relevant information across sentences) and they do so efficiently, keeping things fast and manageable.
So, letβs dive in and see how GRUs work their magic to make machines a little more βhumanβ in the way they generate language.
What is Text Generation?
With growing applications across different industries, text generation is considered to be a crucial area of research in natural language processing. Many organizations need to create a large volume of content such as personalized product descriptions, customer support responses, or even social media posts. Manually generating these contents can be time-consuming and labor-intensive. A practical solution to address this challenge is automating text generation.
Before getting into how text generation works letβs get to know the brief overview.
Text generation is said to be the branch of natural language processing (NLP) and it is primarily focused on creating coherent and contextually relevant texts automatically. The text-generation process involves utilizing algorithms and models to generate the written content based on the given input, for instance, it can be a prompt, a set of keywords, or even a specific context. The generated text by the model can vary in length and complexity which is typically based on the requirements of the task and the capabilities of the underlying model.
Understanding Gated Recurrent Unit
So, have you heard of GRUs? Theyβre called Gated Recurrent Units, and theyβre basically an upgraded type of neural network that came out in 2014. Think of GRUs as a lighter, simpler alternative to a model called LSTM (Long Short-Term Memory). Both LSTM and GRU are used to help computers βrememberβ important info when working with sequences, like predicting words in a sentence. But GRUs keep things quick and efficient by cutting out some extra steps that LSTMs have. So, if you want something thatβs powerful but wonβt slow things down, GRUs are a great pick!
GRU holds the power to retain long-term dependencies which makes it possible to process sequential data like time-series data, text, and speech.
A gated Recurrent Unit (GRU) has two special βgatesβ that help control the flow of information as it processes data. These gates are called the update and the reset gate.
The update gate decides how much of the old information (from previous steps) should be kept and how much new information should be added. The reset gate helps the GRU forget irrelevant information that is no longer needed.
Breakdown of how these gates work:
The GRU takes two things as input: the current data (input) and a set of information from the previous step (hidden state).
Gate Calculations:
For each gate (update and reset), the GRU does some math called element-wise multiplication. This just means it multiplies each number from the input and hidden state by the numbers (called weights) assigned to the gate.
Activation Function:
After the multiplication, the GRU uses an activation function to make adjustments to each number. This helps the GRU decide how much information to keep or forget in the gates.
Example of GRU
Letβs understand GRU with an example. Suppose we want to teach a GRU model to write short sentences based on some input words. Imagine we are training the model to generate sentences about the weather.
1. Training the Model:
At first, we provide the GRU model with lots of examples of sentences about the weather, like
It is sunny today. The weather is rainy. It might be cloudy tomorrow
The model learns the patterns from these sentences. It starts to understand which words commonly follow each other.
2. Generating Text:
Once the GRU model is trained, we can give it a starting word, and it will generate the rest of the sentence. For example:
If we give the model the word βIt,β the model predicts the next word could be βis.β Now, the input becomes βIt is,β and the model predicts the next word is βsunny.β The model continues to generate more words, like βtoday.β
So the output of the above input βItβ might be:
βIt is sunny todayβ
3. Whatβs Happening Inside:
The GRU model works by remembering important information (like common word pairs) and forgetting the less important information as it moves through the sentence. This helps it create sentences that make sense.
How GRU works for text generation β Practical example
Step 1: Import the necessary libraries
- Numpy: We utilize numpy to convert our sequences of tokenized words into arrays so the model can process that
- TensorFlow and Keras: We use TensorFlowβs Keras API to define, compile, and train the GRU model. The layers used from Keras include GRU for the GRU architecture, Dense for the fully connected output layer, and Embedding for representing words as dense vectors.
- Tokenizer and Text Preprocessing: The tokenizer is used here to convert the training text into a sequence of numbers and pad_sequences to make sure all sequences have the same length before being input into the model.
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
Step 2: Create the Input data
Now we create the input text, where this input text is a longer passage. And it is used as input data for training the GRU model.
data = [
"it was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness,",
"it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness,",
"it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us,",
"we were all going direct to Heaven, we were all going direct the other way β in short, the period was so far like the present period,",
"that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only."]
Step 3: Tokenization
We turn the sentences into numbers using a tokenizer. Itβs like giving each word its own ID so the computer can understand the text better. This way, instead of words, weβre working with numbers that the model can actually use.
tokenizer = Tokenizer()
tokenizer.fit_on_texts(data)
total_words = len(tokenizer.word_index)+1
Step 4: Creating Sequences
Sequences of words are created where each sequence contains one more word than the previous. This teaches the model how one word follows another in the passage.
input_sequences = []
for sentence in data:
token_list = tokenizer.texts_to_sequences([sentence])[0]
for i in range(1, len(token_list)):
n_gram_sequence = token_list[:i+1]
input_sequences.append(n_gram_sequence)
Step 5: Padding Sequences
Since sentences can be all sorts of lengths, we add padding to make sure theyβre all the same size. This way, the GRU model can process them smoothly without any issues.
max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen = max_sequence_len, padding = 'pre'))
Step 6: Splitting into Input and Output
We split the sequences into input (X) and output (y).
- input (X) β first few words of the sentence
- output (y) β next word that needs to be predicted
X, y = input_sequences[:,:-1], input_sequences[:,-1]
y = np.array(y)
Step 7: One-hot encoding
We now convert the output(y) into a one-hot vector. So what does it do?
It makes each word represented as 0s and 1s.
- 1 β position corresponding to the correct word
- 0 β The rest of the positions
y = np.eye(total_words)[y]
Step 8: GRU model architecture
We define a GRU model with an embedding layer to convert words into dense vectors, a GRU layer is responsible for processing the sequential data, and a final dense layer with a softmax activation function that predicts the next word.
model = Sequential()
model.add(Embedding(total_words, 100, input_length=max_sequence_len-1))
model.add(GRU(150))
model.add(Dense(total_words, activation='softmax'))
#Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Step 9: Model Training
The model is trained on the input data to learn the patterns of word sequences in the text. And then it uses two important things such as loss function and the Adam optimizer for the faster convergence
model.fit(X, y, epochs = 100, verbose = 1)
Step 10: Text Generation
After the model is trained, we can give it any word or phrase to start with, and itβll come up with the next word based on what it learned. So, if we start with something like βwe wereβ, it might continue with βall going to Heavenβ¦β and keep building from there.
def generate_text(seed_text, next_words):
for _ in range(next_words):
token_list = tokenizer.texts_to_sequences([seed_text])[0]
token_list = pad_sequences([token_list], maxlen=max_sequence_len-1,padding='pre')
predicted = np.argmax(model.predict(token_list), axis=-1)
output_word = ""
for word, index in tokenizer.word_index.items():
if index == predicted:
output_word = word
break
seed_text += " " + output_word
return seed_text
#Generate a sentence
print(generate_text("it was the best", 20))
Output
it was the best of times it was the worst of times it was the age of wisdom it was the age of foolishness
Benefits of Text Generation using GAN
1. Better Language Models:
In the future, language models like GPT will get even better. They will create text that sounds even more like it was written by a human.
These models will understand context and emotions better, making their writing more natural.
2. Combining Text with Images and Audio:
Text generation wonβt just be about writing. Models will learn to work with text, pictures, and sound at the same time.
For example, a model could create a description for a picture or write a story based on a video.
3. Personalized Text:
Text generation will get more personalized, meaning the AI will create text that fits a personβs preferences and style.
This will be useful for things like customized product descriptions or personalized chatbots.
4. Text for All Languages:
Future models will focus on generating text in different languages, including low-resource languages that donβt have a lot of data.
This will help expand AIβs use globally, making text generation possible for more people.
5. Specialized Text Writing:
Text generation will become more focused on specific areas, like legal writing, medical reports, or technical documentation.
This will help create more accurate and professional text for different industries.
Conclusion
Text generation is super helpful in natural language processing (NLP) because it lets us create text automatically. GRUs, or Gated Recurrent Units, are great for this, theyβre good at remembering important info, which helps them come up with sentences that actually make sense. GRU is simpler and faster compared to other models, making it a good choice for creating text. As technology improves, GRU and other models will become even more helpful for tasks like generating content and making communication easier.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI