Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Productionβ€”our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Automating Words: How GRUs Power the Future of Text Generation
Latest   Machine Learning

Automating Words: How GRUs Power the Future of Text Generation

Author(s): Tejashree_Ganesan

Originally published on Towards AI.

Automating Words: How GRUs Power the Future of Text Generation

Isn’t it incredible how far language technology has come? Natural Language Processing, or NLP, used to be about just getting computers to follow basic commands. Now, though, we’re seeing computers actually starting to understand language and even respond in ways that feel surprisingly human. Think about the difference this makes! Instead of just getting literal responses, we’re moving toward a world where machines genuinely β€œget” what we’re saying. And it’s not just about understanding us, it’s about generating responses too and making our interactions feel smoother and more natural. It’s wild to think about where this could take us next!

Among these breakthroughs, one exciting area is text generation. Imagine a machine writing anything from emails to stories or generating entire conversations. This is what we’ll be exploring here. To achieve such fluent and coherent text generation, researchers use various models, each with unique strengths. One powerful tool for this purpose is the Gated Recurrent Unit (GRU) network. GRUs have gained popularity because they balance two key aspects: they capture long-term dependencies in text (helping the machine remember relevant information across sentences) and they do so efficiently, keeping things fast and manageable.

So, let’s dive in and see how GRUs work their magic to make machines a little more β€œhuman” in the way they generate language.

Image from Pexels

What is Text Generation?

With growing applications across different industries, text generation is considered to be a crucial area of research in natural language processing. Many organizations need to create a large volume of content such as personalized product descriptions, customer support responses, or even social media posts. Manually generating these contents can be time-consuming and labor-intensive. A practical solution to address this challenge is automating text generation.

Before getting into how text generation works let’s get to know the brief overview.

Text generation is said to be the branch of natural language processing (NLP) and it is primarily focused on creating coherent and contextually relevant texts automatically. The text-generation process involves utilizing algorithms and models to generate the written content based on the given input, for instance, it can be a prompt, a set of keywords, or even a specific context. The generated text by the model can vary in length and complexity which is typically based on the requirements of the task and the capabilities of the underlying model.

Understanding Gated Recurrent Unit

So, have you heard of GRUs? They’re called Gated Recurrent Units, and they’re basically an upgraded type of neural network that came out in 2014. Think of GRUs as a lighter, simpler alternative to a model called LSTM (Long Short-Term Memory). Both LSTM and GRU are used to help computers β€œremember” important info when working with sequences, like predicting words in a sentence. But GRUs keep things quick and efficient by cutting out some extra steps that LSTMs have. So, if you want something that’s powerful but won’t slow things down, GRUs are a great pick!

GRU holds the power to retain long-term dependencies which makes it possible to process sequential data like time-series data, text, and speech.

A gated Recurrent Unit (GRU) has two special β€œgates” that help control the flow of information as it processes data. These gates are called the update and the reset gate.

Memory unit of GRU β€” created by author

The update gate decides how much of the old information (from previous steps) should be kept and how much new information should be added. The reset gate helps the GRU forget irrelevant information that is no longer needed.

Breakdown of how these gates work:

The GRU takes two things as input: the current data (input) and a set of information from the previous step (hidden state).

Gate Calculations:

For each gate (update and reset), the GRU does some math called element-wise multiplication. This just means it multiplies each number from the input and hidden state by the numbers (called weights) assigned to the gate.

Activation Function:

After the multiplication, the GRU uses an activation function to make adjustments to each number. This helps the GRU decide how much information to keep or forget in the gates.

Example of GRU

Let’s understand GRU with an example. Suppose we want to teach a GRU model to write short sentences based on some input words. Imagine we are training the model to generate sentences about the weather.

1. Training the Model:

At first, we provide the GRU model with lots of examples of sentences about the weather, like

It is sunny today. The weather is rainy. It might be cloudy tomorrow

The model learns the patterns from these sentences. It starts to understand which words commonly follow each other.

2. Generating Text:

Once the GRU model is trained, we can give it a starting word, and it will generate the rest of the sentence. For example:

If we give the model the word β€œIt,” the model predicts the next word could be β€œis.” Now, the input becomes β€œIt is,” and the model predicts the next word is β€œsunny.” The model continues to generate more words, like β€œtoday.”

So the output of the above input β€œIt” might be:

β€œIt is sunny today”

3. What’s Happening Inside:

The GRU model works by remembering important information (like common word pairs) and forgetting the less important information as it moves through the sentence. This helps it create sentences that make sense.

How GRU works for text generation β€” Practical example

Step 1: Import the necessary libraries

  • Numpy: We utilize numpy to convert our sequences of tokenized words into arrays so the model can process that
  • TensorFlow and Keras: We use TensorFlow’s Keras API to define, compile, and train the GRU model. The layers used from Keras include GRU for the GRU architecture, Dense for the fully connected output layer, and Embedding for representing words as dense vectors.
  • Tokenizer and Text Preprocessing: The tokenizer is used here to convert the training text into a sequence of numbers and pad_sequences to make sure all sequences have the same length before being input into the model.
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

Step 2: Create the Input data

Now we create the input text, where this input text is a longer passage. And it is used as input data for training the GRU model.

data = [
"it was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness,",
"it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness,",
"it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us,",
"we were all going direct to Heaven, we were all going direct the other way – in short, the period was so far like the present period,",
"that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only."]

Step 3: Tokenization

We turn the sentences into numbers using a tokenizer. It’s like giving each word its own ID so the computer can understand the text better. This way, instead of words, we’re working with numbers that the model can actually use.

tokenizer = Tokenizer()
tokenizer.fit_on_texts(data)
total_words = len(tokenizer.word_index)+1

Step 4: Creating Sequences

Sequences of words are created where each sequence contains one more word than the previous. This teaches the model how one word follows another in the passage.

input_sequences = []
for sentence in data:
token_list = tokenizer.texts_to_sequences([sentence])[0]
for i in range(1, len(token_list)):
n_gram_sequence = token_list[:i+1]
input_sequences.append(n_gram_sequence)

Step 5: Padding Sequences

Since sentences can be all sorts of lengths, we add padding to make sure they’re all the same size. This way, the GRU model can process them smoothly without any issues.

max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen = max_sequence_len, padding = 'pre'))

Step 6: Splitting into Input and Output

We split the sequences into input (X) and output (y).

  • input (X) β€” first few words of the sentence
  • output (y) β€” next word that needs to be predicted
X, y = input_sequences[:,:-1], input_sequences[:,-1]
y = np.array(y)

Step 7: One-hot encoding

We now convert the output(y) into a one-hot vector. So what does it do?
It makes each word represented as 0s and 1s.

  • 1 β€” position corresponding to the correct word
  • 0 β€” The rest of the positions
y = np.eye(total_words)[y]

Step 8: GRU model architecture

We define a GRU model with an embedding layer to convert words into dense vectors, a GRU layer is responsible for processing the sequential data, and a final dense layer with a softmax activation function that predicts the next word.

model = Sequential()
model.add(Embedding(total_words, 100, input_length=max_sequence_len-1))
model.add(GRU(150))
model.add(Dense(total_words, activation='softmax'))
#Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Step 9: Model Training

The model is trained on the input data to learn the patterns of word sequences in the text. And then it uses two important things such as loss function and the Adam optimizer for the faster convergence

model.fit(X, y, epochs = 100, verbose = 1)

Step 10: Text Generation

After the model is trained, we can give it any word or phrase to start with, and it’ll come up with the next word based on what it learned. So, if we start with something like β€œwe were”, it might continue with β€œall going to Heaven…” and keep building from there.

def generate_text(seed_text, next_words):
for _ in range(next_words):
token_list = tokenizer.texts_to_sequences([seed_text])[0]
token_list = pad_sequences([token_list], maxlen=max_sequence_len-1,padding='pre')
predicted = np.argmax(model.predict(token_list), axis=-1)
output_word = ""
for word, index in tokenizer.word_index.items():
if index == predicted:
output_word = word
break
seed_text += " " + output_word
return seed_text
#Generate a sentence
print(generate_text("it was the best", 20))

Output

it was the best of times it was the worst of times it was the age of wisdom it was the age of foolishness

Benefits of Text Generation using GAN

1. Better Language Models:

In the future, language models like GPT will get even better. They will create text that sounds even more like it was written by a human.

These models will understand context and emotions better, making their writing more natural.

2. Combining Text with Images and Audio:

Text generation won’t just be about writing. Models will learn to work with text, pictures, and sound at the same time.

For example, a model could create a description for a picture or write a story based on a video.

3. Personalized Text:

Text generation will get more personalized, meaning the AI will create text that fits a person’s preferences and style.

This will be useful for things like customized product descriptions or personalized chatbots.

4. Text for All Languages:

Future models will focus on generating text in different languages, including low-resource languages that don’t have a lot of data.

This will help expand AI’s use globally, making text generation possible for more people.

5. Specialized Text Writing:

Text generation will become more focused on specific areas, like legal writing, medical reports, or technical documentation.

This will help create more accurate and professional text for different industries.

Conclusion

Text generation is super helpful in natural language processing (NLP) because it lets us create text automatically. GRUs, or Gated Recurrent Units, are great for this, they’re good at remembering important info, which helps them come up with sentences that actually make sense. GRU is simpler and faster compared to other models, making it a good choice for creating text. As technology improves, GRU and other models will become even more helpful for tasks like generating content and making communication easier.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓