Pre-train, Prompt, and Predict — Part2
Last Updated on July 17, 2023 by Editorial Team
Author(s): Harshit Sharma
Originally published on Towards AI.
2 step approach to Prompting
(This is Part 2 of a multi-part series describing the prompting paradigm in NLP. The content is inspired by this paper (a survey paper explaining the prompting methods in NLP)
In Part 1, we went over the 4 Paradigms of NLP, namely:
- Fully Supervised (non-neural network)
- Fully Supervised (neural network)
- Pre-train and Fine-Tune
- Pre-train, Prompt, and Predict
We are interested in the 4th paradigm — Pre-train, Prompt, and Predict — which took the entire NLP landscape by storm. I suggest going through Part 1 if not done already .. to appreciate the beauty of this paradigm.
In this article, we will be going through the basics of prompting — what a prompt is, how we get desired results through prompting, and its mathematical essence.
Let's get startedU+270C
Let's first understand the difference between Supervised NLP and Prompting:
Supervised NLP: We take an input x and predict a target y. We all know that it takes pairs of (x,y) to learn the parameters
Prompting: This guy is different. Here, we don’t need any pairs of x and y. Rather, it just modifies the behavior of a pre-trained language model to get the desired output. It's a 2 step process. And this is how:
Quick points before we move on:
[X] — is the Input slot
[Z] — is the Answer slot
Each template consists of [X] (the input slot) and [Z] (the answer slot)
But I still didn’t get it .. what is a PROMPT here?
The x’ that is obtained after the application of the template function on x is known as the Prompt.
Moreover, depending on where [Z] lies in x’, there are two types of prompts:
Eventually, we will be using [Z] to get our answer, as we will see soon, which is the sentiment.
It's time for the magic. Let's use [Z] to elicit the desired response from the pre-trained Large Language Model (LLM), by replacing the answer slot with possible answer candidates:
What we just did was leverage the knowledge our pre-trained LLM had. We did it by passing the filled prompts (prompts that are filled with candidate answers) to possible candidates (z) and asking the LLM the probability of seeing the filled prompt.
In the case of generative tasks, z will be the entire vocabulary of the English language (or any other language of interest), but
In the case of classification tasks, z will be limited to a handful of labels such as excellent, good, horrible, bad, etc.
Mathematically, we did the following:
We took the example of classification and saw how the appropriate label could be derived from the LLM. Below are many other example tasks that can be performed using this paradigm:
That's all for now !! In the next part, we will be diving into the design considerations of prompting, including Prompt and Answer engineering.
Follow me and Subscribe so that you don’t miss out on the Prompting series and the upcoming articles on ML/NLP.
Follow Intuitive Shorts, to read quick and intuitive summaries of ML/NLP/DS concepts.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI