NLP News Cypher | 02.16.20
Last Updated on July 27, 2023 by Editorial Team
Author(s): Ricky Costa
Originally published on Towards AI.
Natural Language Processing (NLP) Weekly Newsletter
NLP News Cypher U+007C 02.16.20
The Great Gig in the Sky
Andβ¦ Weβre back! How was your week?
Last week was intensely fun and adventurous, many new datasets, studies, and NLP research were shot out of a cannon!
Also in NYC, the AAAI conference happened. And there, the Turing award winners (LeCun, Bengio, and Hinton) came together for some exciting talks.
Yoshua was so excited he even started a blog⦠his first words:
Yoshua Bengio's blog – first words – Yoshua Bengio
I often write comments and posts on social media but these tend to be only temporarily visible, so I thought I needed aβ¦
yoshuabengio.org
Yann gave a talk and shared slides:
lecun-20200209-aaai.pdf
Edit description
drive.google.com
Yannβs self-supervised learning talk shows how much NLP has brought us in the past few years. You may have heard of βmaskingβ from BERT and other transformers, this key concept of filtering out data that causes models to adapt to lack of information is the crux of the slides, and its consequences are far-reaching. Just ask the peeps in Computer Vision:
Last but not least, at the AAAI conference, the outstanding paper award went to the authors of the WinoGrande dataset from the Allen Institute. Special thanks to Chandra for forwarding us the dataset a couple of weeks ago. Itβs been added to the vault of the Big Bad NLP Database. U+1F44D
This Week:
The Galkin Graphs Cometh
Compressing BERT
DeepMind Keeps It PG-19
DeepSpeed and ZeRO Cool
Open Domain QA Strikes Back!
Why Models Fail
Kaggle, Colab and Mr. Burns
Dataset of the Week: WinoGrande
The Galkin Graphs Cometh
βThis year AAAI got 1591 accepted papers among which about 140 are graph-related.β
Canβt have a conference without mentioning Galkinβs coverage of knowledge graphs!
Whatβs popular:
Dumping knowledge graphs on Language Modelsβ¦
Entity matching over knowledge graphs with different schemasβ¦
Temporal knowledge graphs aka dynamic graphsβ¦
And for those building goal-oriented bots U+1F447, check out the Schema-Guided Dialogue State Tracking workshop paper:
Knowledge Graphs @ AAAI 2020
The first major AI event of 2020 is already here! Hope you had a nice holiday break U+1F384, or happy New Year if yourβ¦
medium.com
Compressing BERT
Peeps dropped a new BERT on Hugging Faceβs community library. Itβs the compressed version of BERT and outperforms the distilled version on 6 GLUE tasks (itβs actually comparable to the base model)! This is great for those looking to save money on computing! (like me U+1F601)
canwenxu/BERT-of-Theseus-MNLI Β· Hugging Face
See our paper "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing". BERT-of-Theseus is a new compressedβ¦
huggingface.co
DeepMind Keeps It PG-19
The research giant released a new transformer, The Compressive Transformer and a new dataset, PG-19, for language modeling.
The dataset consists of 28,000 books from Project Gutenberg published before 1919.
The transformer aims to help with the memory constraints of current transformers by being able to maintain long-context over book text.
βThe Compressive Transformer is able to produce narrative in a variety of styles, from multi-character dialogue, first-person diary entries, or third-person prose.β
Blog:
A new model and dataset for long-range memory
This blog introduces a new long-range memory model, the Compressive Transformer, alongside a new benchmark forβ¦
deepmind.com
DeepSpeed and ZeRO Cool
Microsoft had to remind big tech it had a few aces under its sleeve. It released a new library called DeepSpeed. What is it?
ββ¦[It] is a new parallelized optimizer that greatly reduces the resources needed for model and data parallelism while massively increasing the number of parameters that can be trained. Researchers have used these breakthroughs to create Turing Natural Language Generation (Turing-NLG), the largest publicly known language model at 17 billion parameters, which you can learn more about in this accompanying blog post.β
Yes, you read correctly, 17 billion params. And itβs compatible with PyTorch. This new library is able to train even larger transformers but more efficiently!
ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parametersβ¦
The latest trend in AI is that larger natural language models provide better accuracy; however, larger models areβ¦
www.microsoft.com
Open Domain QA Strikes Back!
Open Domain QA has pretty much stalled over the last few years. Ever since Facebook dropped DrQA, the field hasnβt seen much improvement, until a few days ago when a new model from Google achieved SOTA on the Natural Questions benchmark by several points!
A thread written by @kelvin_guu
New from Google Research! REALM: http://realm.page.link/paper We pretrain an LM that sparsely attends over all ofβ¦
threader.app
P.S. they plan to open-source it:
Paper:
Why Models Fail
Oh boy, sometimes things donβt go as expected when releasing models in the wild. We all understand the angst felt when looking at SOTA models under-performing due to the trivial properties of an emerging environment (things are adaptive, not static in the real-world).
Hady Elsahar discusses the problems of domain shift and its effects on model performance in this intuitive medium piece:
Predicting when ML models fail in production
To Annotate or Not? Predicting Performance Drop under Domain Shift. An #EMNLP2019 paper by Hady Elsahar and Matthiasβ¦
medium.com
Kaggle, Colab, and Mr. Burns
Now on Kaggle, you can use up to 30 hours per week of TPUs and up to 3 hours at a time in a single session.
Tensor Processing Units (TPUs) Documentation
Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your dataβ¦
www.kaggle.com
Colab:
Introducing Colab Pro, for $10 a month it delivers better GPUs and longer runtimes:
Google introduces Colab Pro w/ faster GPUs, more memory – 9to5Google
Google Colab is a useful tool for data scientists and AI researchers sharing work online. The company this week quietlyβ¦
9to5google.com
Google be like:
Dataset of the Week: WinoGrande
What is it?
Formulated as a fill-in-a-blank task with binary options, the goal is to choose the right option for a given sentence which requires commonsense reasoning.
Sample:
Sentence: Katrina had the financial means to afford a new car while Monica did not, since _ had a high paying job.
Option1: Katrina
Option2: Monica
Where is it?
allenai/winogrande
Version 1.1 (Dec 2nd, 2019) Download dataset by download_winogrande.sh ./data/ βββ train_[xs,s,m,l,xl].jsonl # trainingβ¦
github.com
Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
If you enjoyed this article, help us out and share with friends or social media!
For complete coverage, follow our twitter: @Quantum_Stat
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI