Pre-train, Prompt, and PredictPart1
Generative AI   Latest   Machine Learning   Natural Language Processing

Pre-train, Prompt, and Predict – Part1

Last Updated on March 4, 2023

Author(s): Harshit Sharma


Originally published on Towards AI.

Pre-train, Prompt, and Predictβ€Šβ€”β€ŠPart1

The 4 Paradigms inΒ NLP

(This is a multi-part series describing the prompting paradigm in NLP. The content is inspired by this paper (a survey paper explaining the prompting methods inΒ NLP)

(Source: Image from Paper) Prompting paradigms

I came across this wonderful paper on Prompting while going through this amazing course on Advanced NLP (UMass). Being a survey paper, they have given a holistic explanation of this latest paradigm inΒ NLP.

Over multiple articles, we will be discussing the key highlights from the paper and learn why Prompting is considered to be β€œThe Second Sea Change inΒ NLP”.

To appreciate what is prompting and to get started, Part 1 discusses 4 major paradigms that have occurred over the pastΒ years.

Let’s get startedΒ !!

Fully-Supervised Learning (Non-Neural Network)
Feature Engineering

  • Supervised learning required input-output examples to train theΒ model.
  • In the Pre-Neural-Network era, these NLP models required
    Feature Engineering, where NLP researchers use domain knowledge to extract features from limited data and infuse inductive bias into theΒ model
  • There was NO Relation between the Language Models and the downstream tasks that were solved. Each task had to have its own trainedΒ model

Fully-Supervised Learning (Neural Network)
Architecture Engineering

  • Neural Networks came and with that the automatic learning of features from training data. Manual feature engineering was no longer necessary
  • The focus shifted to Architecture Engineering, where NN architectures were engineered to provide the appropriate inductive bias to theΒ model
  • Again, No relation between the training of language models and solving the downstream tasks. Each task was solved using its own model architecture.

β€”β€Šβ€”β€Šβ€”β€Šβ€”β€Šβ€”-The First Sea Changeβ€Šβ€”β€Šβ€” β€”β€”Β β€”

Pre-train and Fine-Tune
Objective Engineering

  • This was the first time a Language Model was pre-trained on massive data and later adapted to downstream tasks via fine-tuning using task-specific objectives.
  • The focus shifted to Objective Engineering, designing training objectives both during the pre-training and fine-tuning stages
  • The below diagram shows how Language Models play a central role in this paradigm. Unsupervised training of LMs is combined with Task specific supervised fine-tuning

(Source: Paper, modified by Author) Relationship between Language Model pre-training and various downstream tasks.β€”β€Šβ€”β€Šβ€”β€Šβ€”β€Šβ€” -The Second Sea Changeβ€Šβ€”β€Šβ€”β€Šβ€”β€Šβ€”Β β€”

Prompt Engineering

  • Instead of adapting LM to a specific task via objective engineering, the downstream tasks are reformulated using a Textual Prompt.
    Eg: To find the emotion of β€œI missed the bus today”, feed the model β€œI missed the bus today. I felt so _____”. The trained LM will try to fill the blank with appropriate emotion, eventually giving us the emotion of theΒ input.
  • This doesn’t require any task-specific training
  • This calls for a focus on Prompt Engineering since prompt needs to be engineered correctly in order to elicit appropriate / desired response from theΒ model.

That’s all for Part 1!! In Part 2, we will be diving into Prompting, its basics, its applications, various design considerations while designing prompts,Β etc.

Pre-train, Prompt, and Predictβ€Šβ€"β€ŠPart1


