Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Pre-train, Prompt, and Predict — Part2
Latest   Machine Learning

Pre-train, Prompt, and Predict — Part2

Last Updated on July 17, 2023 by Editorial Team

Author(s): Harshit Sharma

Originally published on Towards AI.

2 step approach to Prompting

(This is Part 2 of a multi-part series describing the prompting paradigm in NLP. The content is inspired by this paper (a survey paper explaining the prompting methods in NLP)

(Source: Image from Paper) Prompting paradigms

In Part 1, we went over the 4 Paradigms of NLP, namely:

  • Fully Supervised (non-neural network)
  • Fully Supervised (neural network)
  • Pre-train and Fine-Tune
  • Pre-train, Prompt, and Predict

We are interested in the 4th paradigm — Pre-train, Prompt, and Predict — which took the entire NLP landscape by storm. I suggest going through Part 1 if not done already .. to appreciate the beauty of this paradigm.

In this article, we will be going through the basics of prompting — what a prompt is, how we get desired results through prompting, and its mathematical essence.

Let's get startedU+270C

Let's first understand the difference between Supervised NLP and Prompting:

Supervised NLP: We take an input x and predict a target y. We all know that it takes pairs of (x,y) to learn the parameters

Prompting: This guy is different. Here, we don’t need any pairs of x and y. Rather, it just modifies the behavior of a pre-trained language model to get the desired output. It's a 2 step process. And this is how:

Quick points before we move on:

[X] — is the Input slot
[Z] — is the Answer slot

Each template consists of [X] (the input slot) and [Z] (the answer slot)

But I still didn’t get it .. what is a PROMPT here?

The x’ that is obtained after the application of the template function on x is known as the Prompt.

Moreover, depending on where [Z] lies in x’, there are two types of prompts:

Eventually, we will be using [Z] to get our answer, as we will see soon, which is the sentiment.

It's time for the magic. Let's use [Z] to elicit the desired response from the pre-trained Large Language Model (LLM), by replacing the answer slot with possible answer candidates:

What we just did was leverage the knowledge our pre-trained LLM had. We did it by passing the filled prompts (prompts that are filled with candidate answers) to possible candidates (z) and asking the LLM the probability of seeing the filled prompt.

In the case of generative tasks, z will be the entire vocabulary of the English language (or any other language of interest), but
In the case of classification tasks, z will be limited to a handful of labels such as excellent, good, horrible, bad, etc.

Mathematically, we did the following:

We took the example of classification and saw how the appropriate label could be derived from the LLM. Below are many other example tasks that can be performed using this paradigm:

(Image from Paper) Examples of input, template, and answer for different tasks

That's all for now !! In the next part, we will be diving into the design considerations of prompting, including Prompt and Answer engineering.

Follow me and Subscribe so that you don’t miss out on the Prompting series and the upcoming articles on ML/NLP.

Follow Intuitive Shorts, to read quick and intuitive summaries of ML/NLP/DS concepts.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓