NLP News Cypher | 06.28.20
NLP News Cypher | 06.28.20

The Final Frontier

Geoff Hinton backpropagated himself and cancelled a bunch of meetings. He realized that his idea on perceptual learning had a central flaw and went scorched earth on his planned talks. Hinton doesn’t mess around.

Last week was a blur…

…oh, and there’s a new supercomputer in Japan:

The World's New Fastest Supercomputer Is an Exascale Machine for AI

Twice a year, the world's fastest supercomputers take a test to see which is top of class. These hundred-million-dollar…

The Big Bad NLP Database was updated U+1F60E. We added 63 new datasets, taking us to 545 total! Thank you to all contributors: Erik Chan, Ramya Tekumalla, Monisha Jegadeesan, and Purva Tendulkar!

P.S. there’s a U+1F680 button at the bottom of the page if you want achieve escape velocity.

The Big Bad NLP Database – Quantum Stat

Datasets for various tasks in Natural Language Processing – Quantum Stat

Lastly, we updated our homepage, inspired by Darkwing Duck and Jean-Luc Picard. If you need NLP software, drop us a note! U+1F601

We Build NLP for the Bravehearts

This Week:

Deep Learning Drizzle

Colab Tricks and Treats

Colab of the Week

Interactive Tabular Data

Another AI Survey

Dataset of the Week: ClarQ

In this industrious blog post, Amit Chaudhary shows us all the magic tricks that Colab has to offer. In total, there are 17 awesome hacks you can try out such as: Jupyter notebook keyboard shortcuts, running flask apps from Colab, managing Colab from command line (just to name a few!) and many more.

Google Colab Tips for Power Users

Colab is one of the best products to come from Google. It has made GPUs freely accessible to learners and practitioners…

While we’re on the subject of Colab, Jason Phang built a great Colab for using the HF libraries for multi-task training. The project involves using 3 datasets for 3 separate tasks: semantic text similarity, natural language inference (NLI) and multiple choice QA. Jason highlights the creation of 3 different models while sharing the same encoder. And this is awesome because “the shared encoder ensures that during training, all updates will update the same encoder weights, and also does not consume any additional GPU memory.”

Google Colaboratory

Multi-Task Training

Deep Learning Drizzle

If you like to watch YouTube tutorials/ lectures there’s a major database that aggregates such videos by domain. Enjoy!

Deep Learning Drizzle

“Read enough so you start developing intuitions and then trust your intuitions and go for it!” ​ Prof.

Interactive Tabular Data

While Wolfram is not searching for a Theory of Everything he’s gotta run a business. In a new Wolfram blog post, they explore new capabilities for manipulating Datasets (tabular dataframes) in Mathematica 12.1.

New 12.1 Dataset Interactive Controls and Formatting Options-Wolfram Blog

June 23, 2020 – Christopher Carlson, Senior User Interface Developer, User Interfaces In his blog post announcing the…

Another AI Survey

Several 100 C-Suite peeps were surveyed regarding their use of AI in the enterprise.


-CTO’s make all the decisions.

-Azure is #1 AI Cloud provider.

-Text is the most widely used data type.

-Biggest bottleneck in working with AI is ‘lack of resources/talent’


Dataset of the Week: ClarQ

What is it?

Dataset used for clarification question generation consisting of ∼2M examples distributed across 173 domains of StackExchange.


Where is it?


This dataset is meant for training and evaluation of Clarification Question Generation Systems. The details and the…

