Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


Don’t Be Overwhelmed by NLP
Natural Language Processing

Don’t Be Overwhelmed by NLP

Last Updated on August 24, 2020 by Editorial Team

Author(s): Mukul Malik

Natural Language Processing

Don’t Be Overwhelmed by NLP Research

How to cope with the volume of ongoing research in NLP which is probably infeasible for you

-by under CC-BY-3.0

What is going on?

NLP is the new Computer Vision

With enormous amount go textual datasets available; giants like Google, Microsoft, Facebook etc have diverted their focus towards NLP.

Models using thousands of super-costly TPUs/GPUs, making them infeasible for most.

This gave me anxiety! (we’ll come back to that)

Let’s these Tweets put things into perspective:

Tweet 1:

Tweet 2: (read the trailing tweet)


In about last one-year following knowledge became mainstream:

  • Transformers was followed by Reformer, Longformer, GTrXL, Linformer, and others.
  • BERT was followed by XLNet, RoBERTa, AlBERT, Electra, BART, T5, Big Bird, and others.
  • Model Compression was extended by DistilBERT, TinyBERT, BERT-of-Theseus, Huffman Coding, Movement Pruning, PrunBERT, MobileBERT, and others.
  • Even new tokenizations were introduced: Byte-Pair encoding (BPE), Word-Piece Encoding (WPE), Sentence-Piece Encoding (SPE), and others.

This is barely the tip of the iceberg.

So while you were trying to understand and implement a model, a bunch of new lighter and faster models were already available.

How to Cope with it?

The answer is short:

you don’t need to know it all, know only what is necessary


I read them all to realize most of the research is re-iteration of similar concepts.

At the end of the day (vaguely speaking):

  • the reformer is hashed version of the transformers and longfomer is a convolution-based counterpart of the transformers
  • all compression techniques are trying to consolidate information
  • everything from BERT to GPT3 is just a language model

Priorities -> Pipeline over Modules

Learn to use what’s available, efficiently, before jumping on to what else can be used

In practice, these models are a small part of a much bigger pipeline.

Take an example of Q&A Systems. Given millions of documents, for this task, something like ElasticSearch is way more essential to the pipeline than a new Q&A model (comparatively).

In production success of you, the pipeline will not (only) be determined by how awesome is your Deep Learning model but also by:

  • the latency of the inference time (read about Onnx, quantization)
  • predictability of the results and boundary cases
  • the ease of fine-tuning
  • the ease of reproducing the model on a similar dataset

Personal Experience

I was working on an Event Extraction pipeline, which used:

  • 4 different transformer-based models
  • 1 RNN-based model

But. At the heart of the entire pipeline were:

  • WordNet
  • FrameNet
  • Word2Vec
  • Regular-Expressions

And. Most of my team’s focus was on:

  • Extraction of text from PPTs, images & tables
  • Cleaning & preprocessing text
  • Visualization of results
  • Optimization of ElasticSearch
  • Format of info for Neo4J

It is more essential to have an average performing pipeline than to have a non-functional pipeline with a few brilliant modules.

Neither Christopher Manning nor Andrew NG knows it all. They just know what is required and when it is required; well enough.

So, have realistic expectations of yourself.

Thank you!

Don’t Be Overwhelmed by NLP was originally published in Towards AI — Multidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓