Towards AI Can Help your Team Adopt AI: Corporate Training, Consulting, and Talent Solutions.


From Text to Beyond Words
Latest   Machine Learning

From Text to Beyond Words

Last Updated on July 25, 2023 by Editorial Team

Author(s): Akash Rawat

Originally published on Towards AI.

A brief history of the Large Language Models (LLMs)

Photo by Andy Kelly on Unsplash

It seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers… They would be able to converse with each other to sharpen their wits. At some stage therefore, we should have to expect the machines to take control.

— Alan Turing (1912–1954)

Hello readers, today we live in the age of Large Language Models (LLMs), which empower software like GPT4, ChatGPT, DALL·E, and many other AI technologies. These technologies are responsible for some of the most significant breakthroughs in the history of mankind, and hence we are on the verge of a significant societal shift. Shortly, possibly within our lifetime, AI systems that we develop and widely use may become vastly more intelligent than the combined intelligence of all humans. This could be a blessing for mankind at one end, while the other end awaits a curse.

It can be called a blessing because of the countless possibilities that are discovered and yet to be discovered that hold the potential to empower humanity, liberating from widespread poverty, suffering, and timeless human aspiration, ‘happiness’.

Call it a curse because of the power wielded by super-intelligent AGI (Artificial General Intelligence), having the potential to intentionally or unintentionally wipe out entire human civilization. This threat can manifest in the form of Orwellian totalitarianism, as depicted in the novel “1984,” or Huxley’s dystopia in the novel “Brave New World,” where he states, “People will come to love their oppression, to adore the technologies that undo their capacities to think.”

We are currently experiencing a rapid and profound transition from one phase of existence to another, and we are well aware of the fate that befalls species that fail to adapt to a changing world — they face extinction. Hence, it is important for us to wholeheartedly study these subjects by immersing ourselves in their exploration, we gain the knowledge and insight necessary to navigate the extraordinary path that lies before us. Let us begin our journey of exploration through this article, “From Text to Beyond Words: A Brief History of Large Language Models”.


Imagine having a clever friend who can understand what you’re saying and respond in a way that makes sense. Language models are like those clever friends but in the form of computer programs. They use advanced techniques to learn from a lot of text and become really good at understanding and generating language. They can do things like completing sentences, translating languages, answering questions, and analyzing the sentiment or emotion in the text.

The Origin: Rise of the Large Language Models

Studying early language models was important because they laid the foundation for later advancements. They taught us more about how language works and how computers can learn from it. But they couldn’t fully understand the complexities of human language. They used different approaches to make sense of words and sentences.

One approach was using rules, which were like instructions for how to process language. These rules were created by experts and told the computer how to analyze and generate language. But these rule-based systems struggled with the complexities of human language and often couldn’t understand the full meaning.

Another approach was using statistics, which means looking at patterns in lots of language examples. Computers would learn from these patterns and make guesses about what words should come next. While this approach was better at handling some language complexities, it still had limitations in understanding context and generating meaningful sentences.

Later, a more advanced model came along, which used new techniques that made them much better at understanding and generating language. This new model could capture the connections between words and understand context much more effectively. It was called Transformer.

The Transformer: A Breakthrough for Language Models

Photo by Praswin Prakashan on Unsplash

Well, of course, not the Bumblebee, we are talking about a deep learning model here in sequence-to-sequence problems like neural machine translation, early proposals used RNNs (Recurrent Neural Networks) in an encoder-decoder architecture. However, these architectures struggled with retaining information from the beginning of long sequences when new elements were added. The encoder's hidden state was typically associated with the most recent word in the input sentence. Consequently, if the decoder only relied on the last hidden state, it would lose important information about the initial elements. To address this limitation, the attention mechanism was introduced.

Instead of relying solely on the last state of the encoder, the attention mechanism enables the decoder to access all states of the encoder, capturing information from the entire input sequence. This involves extracting a weighted sum of past encoder states, allowing the decoder to assign importance to each element of the input when predicting the next output element. As this approach still has a limitation: each sequence must be processed one element at a time. Both the encoder and decoder need to wait for the t-1 steps to complete before processing the t-th step. Consequently, when dealing with large datasets, this approach becomes time-consuming and computationally inefficient.

The Transformer model utilizes a self-attention mechanism to extract features for each word, determining their importance in relation to other words in the sentence. Unlike recurrent units, this feature extraction involves weighted sums and activations, making it highly parallelizable and efficient.

This use of the attention mechanism was introduced in the paper, “Attention is all you need” (Vaswani, Ashish & Shazeer, Noam & Parmar, Niki & Uszkoreit, Jakob & Jones, Llion & Gomez, Aidan & Kaiser, Lukasz & Polosukhin, Illia. (2017))[1]. This paper made a significant breakthrough in using the attention mechanism, which was the key enhancement for a model known as Transformer.

The most famous current models that emerged in NLP tasks consist of dozens of transformers and one of its variants was GPT-2.

Predecessors to Large Language Models

Here, we’ll explore two influential models, Word2Vec and GloVe, which revolutionized the representation of words in NLP. Additionally, we’ll delve into recurrent neural networks (RNNs) and their ability to process sequential data. Let’s uncover the key aspects of these models and their contributions to the field of language processing.

  • Word2Vec: A popular model introduced in 2013. It represents words as dense vectors in a high-dimensional space, capturing word meanings. By training on large text data, it learns to predict surrounding words given a target word. Word2Vec transformed word representation in natural language processing, enabling a better understanding of word meanings.
  • GloVe: Introduced in 2014, is another influential model. It represents words as vectors in a continuous space and uses global statistics about word co-occurrence. By considering the context of words, GloVe captures both semantic and syntactic relationships, enhancing language understanding.
  • Recurrent Neural Networks (RNNs): RNNs are neural networks that process sequential data like sentences. They maintain an internal memory to capture previous information. RNNs excel at generating relevant output based on input sequence but struggle with long-term dependencies and grasping extensive context.

They demonstrated the importance of learning distributed representations of words, capturing semantic relationships, and modeling sequential data. This laid the foundation for advanced large-scale language models such as GPT-3 and beyond, pushing the boundaries of language processing.

Evolution of large-scale models

Tracing the timeline of advancements in large-scale language models, from GPT-1 to GPT-3 and beyond.

  • GPT-1 (Generative Pre-Trained Transformer 1): In 2018, OpenAI introduced GPT-1, a pioneering large-scale language model based on transformers. It was trained on vast amounts of internet text data and showed impressive language skills, excelling in various tasks.
  • GPT-2 (Generative Pre-Trained Transformer 2): Released in 2019, GPT-2 elevated large-scale language models to new levels. With a larger dataset than GPT-1 and 1.5 billion parameters, it showcased exceptional text-generation abilities. Although initially restricted due to concerns, OpenAI later made the full model accessible to the public.
  • GPT-3 (Generative Pre-Trained Transformer 3): Unveiled in 2020, GPT-3 represented a groundbreaking advance in large-scale language modeling. It became one of the largest models ever created, with 175 billion parameters. GPT-3 demonstrated extraordinary language generation skills and delivered exceptional performance across diverse tasks, from answering questions to code generation and lifelike conversations. ChatGPT is based on the GPT-3 architecture. The term “ChatGPT” is often used to refer to the specific implementation of the GPT-3 model designed for interactive conversations and dialogue systems.

Further, Sam Altman, the CEO of OpenAI, in one of his interviews, confirmed that the GPT-4 will have around 100 trillion parameters. So, it will be another huge leap in the advancement of super AGI.

Will Jobs be Affected?

Well, we do not doubt that this massive leap in the field of Artificial Intelligence is going to create new jobs. But does that also mean that some of the jobs that we see around the world today may not exist tomorrow?

Let's see what Sam Altman answered to one of similar questions in one of his interviews.

“A big category that can be massively impacted, I guess I would say, customer service category that I could see there are just way fewer jobs relatively soon. I am not certain about that, but I could believe it. I want to be clear; I think these systems will make a lot of jobs just go away. Every technological revolution does. They will enhance many jobs and make them much better and much more fun and much higher paid, and they will create new jobs that are difficult for us to imagine even if we start to see the first glimpses of them.

I think, we, as a society, are confused about whether we want to work more or work less. And certainly, about whether most people like their jobs and get value out of their jobs or not. Some people do. I love my job; I suspect you do too. That’s a real privilege, not everybody gets to say that. If we can move more of the world to better jobs and work to something that can be a broader concept, not something that you have to do to be able to eat but something you do as a creative expression and a way to find fulfillment and happiness and whatever else. Even if those jobs look extremely different from the jobs of today, I think that’s great.

— Sam Altman, CEO of OpenAI

So, this pretty much sums up my article. Apologies if it was too long, I hope you liked it. We talked about the rise of LLMs and witnessed their journey “From Text to Beyond Words. One thing that we know for sure is that these models will continue to improve at a much faster rate day by day. But that does not assure that they will not impact our lives in a negative manner. There will be new jobs, but some present jobs will disappear too; we will be able to do 10x more, but then there will be 10x more to do. There is only one truth, that world will not be the same again.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓