Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


NLP News Cypher | 02.16.20
Latest   Machine Learning   Newsletter

NLP News Cypher | 02.16.20

Last Updated on July 27, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

Photo by David Arakelyan on Unsplash

Natural Language Processing (NLP) Weekly Newsletter

NLP News Cypher U+007C 02.16.20

The Great Gig in the Sky

And… We’re back! How was your week?

Last week was intensely fun and adventurous, many new datasets, studies, and NLP research were shot out of a cannon!

Also in NYC, the AAAI conference happened. And there, the Turing award winners (LeCun, Bengio, and Hinton) came together for some exciting talks.

Yoshua was so excited he even started a blog… his first words:

Yoshua Bengio's blog – first words – Yoshua Bengio

I often write comments and posts on social media but these tend to be only temporarily visible, so I thought I needed a…


Yann gave a talk and shared slides:


Edit description


Yann’s self-supervised learning talk shows how much NLP has brought us in the past few years. You may have heard of “masking” from BERT and other transformers, this key concept of filtering out data that causes models to adapt to lack of information is the crux of the slides, and its consequences are far-reaching. Just ask the peeps in Computer Vision:

Last but not least, at the AAAI conference, the outstanding paper award went to the authors of the WinoGrande dataset from the Allen Institute. Special thanks to Chandra for forwarding us the dataset a couple of weeks ago. It’s been added to the vault of the Big Bad NLP Database. U+1F44D

This Week:

The Galkin Graphs Cometh

Compressing BERT

DeepMind Keeps It PG-19

DeepSpeed and ZeRO Cool

Open Domain QA Strikes Back!

Why Models Fail

Kaggle, Colab and Mr. Burns

Dataset of the Week: WinoGrande

The Galkin Graphs Cometh

“This year AAAI got 1591 accepted papers among which about 140 are graph-related.”

Can’t have a conference without mentioning Galkin’s coverage of knowledge graphs!

What’s popular:

Dumping knowledge graphs on Language Models…

Entity matching over knowledge graphs with different schemas…

Temporal knowledge graphs aka dynamic graphs…

And for those building goal-oriented bots U+1F447, check out the Schema-Guided Dialogue State Tracking workshop paper:


Knowledge Graphs @ AAAI 2020

The first major AI event of 2020 is already here! Hope you had a nice holiday break U+1F384, or happy New Year if your…


Compressing BERT

Peeps dropped a new BERT on Hugging Face’s community library. It’s the compressed version of BERT and outperforms the distilled version on 6 GLUE tasks (it’s actually comparable to the base model)! This is great for those looking to save money on computing! (like me U+1F601)

canwenxu/BERT-of-Theseus-MNLI · Hugging Face

See our paper "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing". BERT-of-Theseus is a new compressed…


DeepMind Keeps It PG-19

The research giant released a new transformer, The Compressive Transformer and a new dataset, PG-19, for language modeling.

The dataset consists of 28,000 books from Project Gutenberg published before 1919.

The transformer aims to help with the memory constraints of current transformers by being able to maintain long-context over book text.

“The Compressive Transformer is able to produce narrative in a variety of styles, from multi-character dialogue, first-person diary entries, or third-person prose.”


A new model and dataset for long-range memory

This blog introduces a new long-range memory model, the Compressive Transformer, alongside a new benchmark for…


DeepSpeed and ZeRO Cool

Microsoft had to remind big tech it had a few aces under its sleeve. It released a new library called DeepSpeed. What is it?

“…[It] is a new parallelized optimizer that greatly reduces the resources needed for model and data parallelism while massively increasing the number of parameters that can be trained. Researchers have used these breakthroughs to create Turing Natural Language Generation (Turing-NLG), the largest publicly known language model at 17 billion parameters, which you can learn more about in this accompanying blog post.

Yes, you read correctly, 17 billion params. And it’s compatible with PyTorch. This new library is able to train even larger transformers but more efficiently!

ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters…

The latest trend in AI is that larger natural language models provide better accuracy; however, larger models are…


Open Domain QA Strikes Back!

Open Domain QA has pretty much stalled over the last few years. Ever since Facebook dropped DrQA, the field hasn’t seen much improvement, until a few days ago when a new model from Google achieved SOTA on the Natural Questions benchmark by several points!


A thread written by @kelvin_guu

New from Google Research! REALM: http://realm.page.link/paper We pretrain an LM that sparsely attends over all of…


P.S. they plan to open-source it:



Why Models Fail

Oh boy, sometimes things don’t go as expected when releasing models in the wild. We all understand the angst felt when looking at SOTA models under-performing due to the trivial properties of an emerging environment (things are adaptive, not static in the real-world).

Hady Elsahar discusses the problems of domain shift and its effects on model performance in this intuitive medium piece:

Predicting when ML models fail in production

To Annotate or Not? Predicting Performance Drop under Domain Shift. An #EMNLP2019 paper by Hady Elsahar and Matthias…


Kaggle, Colab, and Mr. Burns

Now on Kaggle, you can use up to 30 hours per week of TPUs and up to 3 hours at a time in a single session.

Tensor Processing Units (TPUs) Documentation

Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your data…



Introducing Colab Pro, for $10 a month it delivers better GPUs and longer runtimes:

Google introduces Colab Pro w/ faster GPUs, more memory – 9to5Google

Google Colab is a useful tool for data scientists and AI researchers sharing work online. The company this week quietly…


Google be like:

Mr. Burns

Dataset of the Week: WinoGrande

What is it?

Formulated as a fill-in-a-blank task with binary options, the goal is to choose the right option for a given sentence which requires commonsense reasoning.


Sentence: Katrina had the financial means to afford a new car while Monica did not, since _ had a high paying job.

Option1: Katrina

Option2: Monica

Where is it?


Version 1.1 (Dec 2nd, 2019) Download dataset by download_winogrande.sh ./data/ ├── train_[xs,s,m,l,xl].jsonl # training…


Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends or social media!

For complete coverage, follow our twitter: @Quantum_Stat


Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓