Automating Words: How GRUs Power the Future of Text Generation

Author(s): Tejashree_Ganesan

Originally published on Towards AI.

Automating Words: How GRUs Power the Future of Text Generation

Isn’t it incredible how far language technology has come? Natural Language Processing, or NLP, used to be about just getting computers to follow basic commands. Now, though, we’re seeing computers actually starting to understand language and even respond in ways that feel surprisingly human. Think about the difference this makes! Instead of just getting literal responses, we’re moving toward a world where machines genuinely “get” what we’re saying. And it’s not just about understanding us, it’s about generating responses too and making our interactions feel smoother and more natural. It’s wild to think about where this could take us next!

Among these breakthroughs, one exciting area is text generation. Imagine a machine writing anything from emails to stories or generating entire conversations. This is what we’ll be exploring here. To achieve such fluent and coherent text generation, researchers use various models, each with unique strengths. One powerful tool for this purpose is the Gated Recurrent Unit (GRU) network. GRUs have gained popularity because they balance two key aspects: they capture long-term dependencies in text (helping the machine remember relevant information across sentences) and they do so efficiently, keeping things fast and manageable.

So, let’s dive in and see how GRUs work their magic to make machines a little more “human” in the way they generate language.

Automating Words: How GRUs Power the Future of Text Generation — Image from Pexels

What is Text Generation?

With growing applications across different industries, text generation is considered to be a crucial area of research in natural language processing. Many organizations need to create a large volume of content such as personalized product descriptions, customer support responses, or even social media posts. Manually generating these contents can be time-consuming and labor-intensive. A practical solution to address this challenge is automating text generation.

Before getting into how text generation works let’s get to know the brief overview.

Text generation is said to be the branch of natural language processing (NLP) and it is primarily focused on creating coherent and contextually relevant texts automatically. The text-generation process involves utilizing algorithms and models to generate the written content based on the given input, for instance, it can be a prompt, a set of keywords, or even a specific context. The generated text by the model can vary in length and complexity which is typically based on the requirements of the task and the capabilities of the underlying model.

Understanding Gated Recurrent Unit

So, have you heard of GRUs? They’re called Gated Recurrent Units, and they’re basically an upgraded type of neural network that came out in 2014. Think of GRUs as a lighter, simpler alternative to a model called LSTM (Long Short-Term Memory). Both LSTM and GRU are used to help computers “remember” important info when working with sequences, like predicting words in a sentence. But GRUs keep things quick and efficient by cutting out some extra steps that LSTMs have. So, if you want something that’s powerful but won’t slow things down, GRUs are a great pick!

GRU holds the power to retain long-term dependencies which makes it possible to process sequential data like time-series data, text, and speech.

A gated Recurrent Unit (GRU) has two special “gates” that help control the flow of information as it processes data. These gates are called the update and the reset gate.

The update gate decides how much of the old information (from previous steps) should be kept and how much new information should be added. The reset gate helps the GRU forget irrelevant information that is no longer needed.

Breakdown of how these gates work:

The GRU takes two things as input: the current data (input) and a set of information from the previous step (hidden state).

Gate Calculations:

For each gate (update and reset), the GRU does some math called element-wise multiplication. This just means it multiplies each number from the input and hidden state by the numbers (called weights) assigned to the gate.

Activation Function:

After the multiplication, the GRU uses an activation function to make adjustments to each number. This helps the GRU decide how much information to keep or forget in the gates.

Example of GRU

Let’s understand GRU with an example. Suppose we want to teach a GRU model to write short sentences based on some input words. Imagine we are training the model to generate sentences about the weather.

1. Training the Model:

At first, we provide the GRU model with lots of examples of sentences about the weather, like

It is sunny today. The weather is rainy. It might be cloudy tomorrow

The model learns the patterns from these sentences. It starts to understand which words commonly follow each other.

2. Generating Text:

Once the GRU model is trained, we can give it a starting word, and it will generate the rest of the sentence. For example:

If we give the model the word “It,” the model predicts the next word could be “is.” Now, the input becomes “It is,” and the model predicts the next word is “sunny.” The model continues to generate more words, like “today.”

So the output of the above input “It” might be:

“It is sunny today”

3. What’s Happening Inside:

The GRU model works by remembering important information (like common word pairs) and forgetting the less important information as it moves through the sentence. This helps it create sentences that make sense.

How GRU works for text generation — Practical example

Step 1: Import the necessary libraries

Numpy: We utilize numpy to convert our sequences of tokenized words into arrays so the model can process that
TensorFlow and Keras: We use TensorFlow’s Keras API to define, compile, and train the GRU model. The layers used from Keras include GRU for the GRU architecture, Dense for the fully connected output layer, and Embedding for representing words as dense vectors.
Tokenizer and Text Preprocessing: The tokenizer is used here to convert the training text into a sequence of numbers and pad_sequences to make sure all sequences have the same length before being input into the model.

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

Step 2: Create the Input data

Now we create the input text, where this input text is a longer passage. And it is used as input data for training the GRU model.

data = [
 "it was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness,",
 "it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness,",
 "it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us,",
 "we were all going direct to Heaven, we were all going direct the other way – in short, the period was so far like the present period,",
 "that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only."]

Step 3: Tokenization

We turn the sentences into numbers using a tokenizer. It’s like giving each word its own ID so the computer can understand the text better. This way, instead of words, we’re working with numbers that the model can actually use.

tokenizer = Tokenizer()
tokenizer.fit_on_texts(data)
total_words = len(tokenizer.word_index)+1

Step 4: Creating Sequences

Sequences of words are created where each sequence contains one more word than the previous. This teaches the model how one word follows another in the passage.

input_sequences = []
for sentence in data:
 token_list = tokenizer.texts_to_sequences([sentence])[0]
 for i in range(1, len(token_list)):
 n_gram_sequence = token_list[:i+1]
 input_sequences.append(n_gram_sequence)

Step 5: Padding Sequences

Since sentences can be all sorts of lengths, we add padding to make sure they’re all the same size. This way, the GRU model can process them smoothly without any issues.

max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen = max_sequence_len, padding = 'pre'))

Step 6: Splitting into Input and Output

We split the sequences into input (X) and output (y).

input (X) — first few words of the sentence
output (y) — next word that needs to be predicted

X, y = input_sequences[:,:-1], input_sequences[:,-1]
y = np.array(y)

Step 7: One-hot encoding

We now convert the output(y) into a one-hot vector. So what does it do?
It makes each word represented as 0s and 1s.

1 — position corresponding to the correct word
0 — The rest of the positions

y = np.eye(total_words)[y]

Step 8: GRU model architecture

We define a GRU model with an embedding layer to convert words into dense vectors, a GRU layer is responsible for processing the sequential data, and a final dense layer with a softmax activation function that predicts the next word.

model = Sequential()
model.add(Embedding(total_words, 100, input_length=max_sequence_len-1))
model.add(GRU(150))
model.add(Dense(total_words, activation='softmax'))
#Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Step 9: Model Training

The model is trained on the input data to learn the patterns of word sequences in the text. And then it uses two important things such as loss function and the Adam optimizer for the faster convergence

model.fit(X, y, epochs = 100, verbose = 1)

Step 10: Text Generation

After the model is trained, we can give it any word or phrase to start with, and it’ll come up with the next word based on what it learned. So, if we start with something like “we were”, it might continue with “all going to Heaven…” and keep building from there.

def generate_text(seed_text, next_words):
 for _ in range(next_words):
 token_list = tokenizer.texts_to_sequences([seed_text])[0]
 token_list = pad_sequences([token_list], maxlen=max_sequence_len-1,padding='pre')
 predicted = np.argmax(model.predict(token_list), axis=-1)
 output_word = ""
 for word, index in tokenizer.word_index.items():
 if index == predicted:
 output_word = word
 break
 seed_text += " " + output_word
 return seed_text
#Generate a sentence
print(generate_text("it was the best", 20))

Output

it was the best of times it was the worst of times it was the age of wisdom it was the age of foolishness

Benefits of Text Generation using GAN

1. Better Language Models:

In the future, language models like GPT will get even better. They will create text that sounds even more like it was written by a human.

These models will understand context and emotions better, making their writing more natural.

2. Combining Text with Images and Audio:

Text generation won’t just be about writing. Models will learn to work with text, pictures, and sound at the same time.

For example, a model could create a description for a picture or write a story based on a video.

3. Personalized Text:

Text generation will get more personalized, meaning the AI will create text that fits a person’s preferences and style.

This will be useful for things like customized product descriptions or personalized chatbots.

4. Text for All Languages:

Future models will focus on generating text in different languages, including low-resource languages that don’t have a lot of data.

This will help expand AI’s use globally, making text generation possible for more people.

5. Specialized Text Writing:

Text generation will become more focused on specific areas, like legal writing, medical reports, or technical documentation.

This will help create more accurate and professional text for different industries.

Conclusion

Text generation is super helpful in natural language processing (NLP) because it lets us create text automatically. GRUs, or Gated Recurrent Units, are great for this, they’re good at remembering important info, which helps them come up with sentences that actually make sense. GRU is simpler and faster compared to other models, making it a good choice for creating text. As technology improves, GRU and other models will become even more helpful for tasks like generating content and making communication easier.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Automating Words: How GRUs Power the Future of Text Generation

Author(s): Tejashree_Ganesan

Automating Words: How GRUs Power the Future of Text Generation

What is Text Generation?

Understanding Gated Recurrent Unit

Breakdown of how these gates work:

Example of GRU

How GRU works for text generation — Practical example

Output

Benefits of Text Generation using GAN

1. Better Language Models:

2. Combining Text with Images and Audio:

3. Personalized Text:

4. Text for All Languages:

5. Specialized Text Writing:

Conclusion

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Automating Words: How GRUs Power the Future of Text Generation

Author(s): Tejashree_Ganesan

Automating Words: How GRUs Power the Future of Text Generation

What is Text Generation?

Understanding Gated Recurrent Unit

Breakdown of how these gates work:

Example of GRU

How GRU works for text generation — Practical example

Output

Benefits of Text Generation using GAN

1. Better Language Models:

2. Combining Text with Images and Audio:

3. Personalized Text:

4. Text for All Languages:

5. Specialized Text Writing:

Conclusion

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement