Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


The NLP Cypher | 12.13.20
Latest   Machine Learning   Newsletter

The NLP Cypher | 12.13.20

Last Updated on July 24, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

Soldiers in a Mountain Gorge, with a Storm U+007C Vernet


The NLP Cypher U+007C 12.13.20

NeurIPS Aftermath

Hey, welcome back! Another week and another conference (NeurIPS) goes by and many-a-things have been trickling down the NLP pipe.

First things first, GPT-3 paper won a trophy:

Also, if you need to play catch-up, here’s a list of NLP-centric papers found at NeurIPS:

NeurIPS 2020: Key Research Papers in Natural Language Processing (NLP) & Conversational AI

Here are the most interesting NLP and conversational AI research papers introduced at NeurIPS 2020.


Blurred Secrets

This is Depix, a library for recovering passwords from pixelized screenshots. If you are using pixelization to safe-guard sensitive info, you need a new method. The library has already accrued 10K stars on GitHub U+1F62DU+1F62D. Also, beurtschipper’s profile pic gets nominated for headshot of the week. This is awesome work by the way! (P.S., it requires that pixelized images were created with a linear box filter)



Depix is a tool for recovering passwords from pixelized screenshots. This implementation works on pixelized images that…


340 Cipher Goes Bye-Bye

If you like ciphers, (besides the NLP Cypher U+1F601) the zodiac killer’s 340 cipher was decrypted this week U+1F440. For background, the zodiac killer was a cold-blooded serial killer that rampaged California during the late 60s-early 70s and became famous for sending encrypted messages to authorities. His first cipher was decrypted early on, but the infamous 340 cipher remained a mystery all these years (51 to be exact). Until now… On December 5th, the deciphered Zodiac message was sent to the FBI by a small group of private citizens. To know how they cracked it, watch this:


PyTorch Lightning U+007C Sharded Training

You can now get huge memory savings by adding a single flag to your Lightning trainer. PyTorch Lightning now offers this feature on its library for those who wish to shard their training jobs across multiple GPUs. They include an easy to use sample for training a language model (from NVIDIAs NeMo library, which btw, you can find several notebooks on the Super Duper NLP Repo U+1F601) on the WikiText dataset.

Introducing PyTorch Lightning Sharded: Train SOTA Models, With Half The Memory

Lightning 1.1 reveals Sharded Training — train deep learning models on multiple GPUs saving over 50% on memory, with no…


If you want the tech stuff plus more on model parallelism from Lightning:

Multi-GPU training – PyTorch Lightning 1.1.0 documentation

To train on CPU/GPU/TPU without changing your code, we need to build a few good habits 🙂 Delete any calls to .cuda()…


A Tale of Slopes and Expectations

Found this great resource while perusing NeurIPS merch. It includes videos and slides are on all things math w/r/t machine learning.

  1. Overview video
  2. Introduction to Integration video slides
  3. Numerical Integration video slides
  4. Monte Carlo Integration video slides
  5. Normalizing Flows video slides
  6. Inference in Time Series video slides
  7. Backpropagation and Automatic Differentiation video slides
  8. Forward Backward Algorithm video slides
  9. Implicit Function Theorem video slides
  10. Method of Adjoints video slides
  11. Method of Lagrange video slides
  12. Stochastic Gradient Estimators video slides

There and Back Again: A Tale of Slopes and Expectations

Companion webpage to the book "Mathematics for Machine Learning". Copyright 2020 by Marc Peter Deisenroth, A. Aldo…



A new Microsoft pretrained model MPNet, out of NeurIPS, combines the advantages of masked language modeling (aka BERT style) MLM and permuted language modeling PLM (aka XLNET style). Their GitHub also includes scripts for pretraining and downstream tasks such as SQuAD and Glue benchmark. Their blog post gives some background on the advantages and disadvantages of both training objectives and benchmarks compared to other models. (Found on HF’s model hub as well)

MPNet combines strengths of masked and permuted language modeling for language understanding …

Pretrained language models have been a hot research topic in natural language processing. These models, such as BERT…


Graph Mining U+007C NeurIPS

16 videos talks from Google w/r/t graph mining at NeurIPS U+1F525U+1F525

They highlight applications of graph-based learning and graph algorithms for a wide range of areas such as detecting fraud and abuse, query clustering and duplication detection, image and multi-modal data analysis, privacy-respecting data mining and recommendation, and experimental design under interference.

Graph Mining @ NeurIPS

The Graph Mining team at Google is excited to be presenting at the 2020 NeurIPS Conference. Please join us on Sunday…


Repo Cypher U+1F468‍U+1F4BB

A collection of recent released repos that caught our U+1F441


CTRLSum is a generic controllable summarization system to manipulate text summaries given control tokens in the form of keywords or prefix.

You can also generate text out of the box and they also included training/eval scripts.


This is PyTorch implementation of the paper: CTRLsum: Towards Generic Controllable Text Summarization Junxian He…


Topical Change

Using transformers for topic change detection using the Terms-of-Service (ToS) dataset.

“Topic change” as described in this repo is on the paragraph level, not sentence.


by Dennis Aumiller*, Satya Almasian*, Sebastian Lackner and Michael Gertz *Equal Contribution. This repository contains…



Repo for training GPT-2 for goal-oriented dialogue tasks. In the paper, mentioned in the repo, it was benchmarked for response generation, policy optimization (act and response generation), end-to-end modeling (belief state, act and response generation), and dialog state tracking on the MultiWoz 2.0 dataset.


This is the code and data for the AAAI 2021 paper "UBAR: Towards Fully End-to-End Task-Oriented Dialog System with…


Knowledge Graph Enhanced Relation Extraction

Improves the performance of relation extraction models by jointly training on relation extraction and knowledge graph link prediction tasks.

Training script included:


Knowledge Graph Enhanced Relation Extraction George Stoica, Emmanouil Antonios Platanios, and Barnabás Póczos NeurIPS…


Dataset of the Week: CrossNER

A named entity recognition (NER) dataset spanning over five diverse domains (politics, natural science, music, literature, and artificial intelligence).


Where is it?


CrossNER: Evaluating Cross-Domain Named Entity Recognition ( Accepted in AAAI-2021) [PDF] CrossNER is a fully-labeled…


Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

For complete coverage, follow our Twitter: @Quantum_Stat

Quantum Stat

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓