Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Classifying the Unstructured: A Guide to Text Classification with Representation and Generative Models
Latest   Machine Learning

Classifying the Unstructured: A Guide to Text Classification with Representation and Generative Models

Last Updated on January 15, 2025 by Editorial Team

Author(s): Shivam Dattatray Shinde

Originally published on Towards AI.

This article will delve into the various methodologies to perform text classification using transformer-based models, explaining their principles, applications. We’ll explore both representation-focused and generative approaches, leveraging the flexibility and power of transformer architectures to tackle unstructured text data.

Photo by Natalia Y. on Unsplash

Agenda

  1. What are representation language models?
  2. What are generative language models?
  3. Text Classification Methods
  4. Text classification using representation language models
  5. Text classification using generative language models

What are Representation Language Models?

The original transformer architecture was designed as an encoder-decoder model primarily for machine translation tasks. However, it was not well-suited for other tasks like text classification.

To address this limitation, a new architecture called Bidirectional Encoder Representation of Transformer (BERT) was introduced. BERT focuses on text representation and is derived from the encoder component of the original transformer. Unlike the original transformer, BERT does not include a decoder.

BERT is specifically designed to create contextualized embeddings, which outperform traditional embeddings generated by models like Word2Vec. Contextualized embeddings take into account the context in which words appear, resulting in more meaningful and versatile representations of text.

How is BERT Trained?

BERT uses a masked language modeling technique during training. This involves masking certain words in a sentence and training the model to predict the masked words based on the surrounding context.

For example, consider the input:
β€œThe lake is ____.”
The model is trained to predict words such as β€œbeautiful,” β€œserene,” or β€œcool” based on the context provided by the rest of the sentence.

What are Generative Language Models

Decoder-only architectures, like the encoder-only BERT architecture, are highly effective in specific applications. One of the most notable examples of a decoder-only architecture is the Generative Pretrained Transformer (GPT).

Generative language models operate by taking text as input and predicting the next word in the sequence. While their primary training objective is to predict the next word, this functionality alone is not particularly useful in isolation. However, these models become significantly more powerful when adapted for tasks such as serving as a chatbot.

Here’s how a chatbot built on a generative language model functions:
When a user provides input text, the generative language model predicts the next word in the sequence. This predicted word is appended to the user’s original input, forming a new, extended text sequence. The model then uses this updated sequence to predict the next word. This process repeats iteratively, generating responses word by word.

Text-Classification Methods

Text Classification Methods

Text classification using representation language models

Using Task-Specific Models
A task-specific model, like BERT, is trained directly for a specific task, such as text classification.

Using Embedding Models

Using Classification Model
This approach involves converting input text tokens into contextual embeddings using representation models like BERT. These embeddings are then fed into a classification model.

Source: Hands-On Large Language Models By Jay Alammar, Maarten Grootendorst

This process has two steps: the BERT model generates embeddings, while only the classification model is trainable. BERT itself remains fixed during training.

Using Cosine Similarity

Source: Hands-On Large Language Models By Jay Alammar, Maarten Grootendorst

This method entails generating embeddings for both the input text to be classified and the classification labels. Next, the cosine similarity between the input text embedding and each label embedding is calculated. The input text is then assigned to the label with the highest similarity score.

Source: Hands-On Large Language Models By Jay Alammar, Maarten Grootendorst

Text classification using generative language models

Text classification using generative language models differs significantly from that of representational language models. Generative models are sequence-to-sequence models, producing output in the form of text or sentences rather than directly assigning labels.

For example:
If the input text is β€œBest movie ever!”, a generative language model might predict β€œThe sentiment of the movie is positive.” However, unlike representational models, generative models don’t automatically provide labels without explicit instructions.

If you simply input β€œBest movie ever!” into a generative model, it won’t inherently understand what to do. To classify the sentiment of the input, you need to provide a clear instruction, such as β€œClassify the input movie sentiment as Positive or Negative.”

Moreover, the model’s classification accuracy heavily depends on the clarity of the instruction. Ambiguous or unclear instructions can lead to incorrect or irrelevant outputs.

Explore how varying prompts lead to different classification outputs from the generative language model in the diagram below.

Source: Hands-On Large Language Models By Jay Alammar, Maarten Grootendorst

Outro

Thank you so much for reading. If you liked this article, don’t forget to press that clap icon. Follow me on Medium and LinkedIn for more such articles.

Are you struggling to choose what to read next? Don’t worry, I have got you covered.

From Words to Vectors: Exploring Text Embeddings

This article will guide you through the various techniques for transforming text into formats that machines can…

pub.towardsai.net

and more …

Beyond Labels: The Magic of Autoencoders in Unsupervised Learning

In a world where labeled data is often scarce, autoencoders provide a powerful solution for extracting insights from…

pub.towardsai.net

Have a great day!

References

Hands-On Large Language Models

AI has acquired startling new language capabilities in just the past few years. Driven by rapid advances in deep…

learning.oreilly.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓