Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

NLP News Cypher | 07.19.20
Latest   Machine Learning   Newsletter

NLP News Cypher | 07.19.20

Last Updated on July 27, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

Photo by mohammad alizade on Unsplash

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

NLP News Cypher U+007C 07.19.20

Modularity

Twitter’s Help Desk had a productive work week. How productive? U+1F447

very productive

In case you missed it, celebrity Twitter accounts were hacked in a bitcoin ponzi scheme scam. As Twitter and co. scrambled to put the fire out, they deactivated all blue check accounts. A glitch in the matrix!

A couple of days later, select Cloudflare servers went dark as they blamed bad routing for a drop-in services U+1F9D0(sorry Discord). To say the least, tech had a rough week.

But that didn’t stop me from randomly browsing the darknet to find out more about the recent hacking. And I found nothing! Yay! However, I did discover that the US Secret Service purchased a 4-year contract for crypto software from Coinbase, a digital currency exchange. Yep, it’s even in the US Gov’s public fillings:

beta.SAM.gov

Edit description

beta.sam.gov

Why does that matter? According to Benzinga, “Coinbase also collects private user data as part of the anti-money-laundering requirements on its platforms.” U+1F648

Also, ICML happened! Here are some awesome papers:

Stanford AI Lab Papers and Talks at ICML 2020

The International Conference on Machine Learning (ICML) 2020 is being hosted virtually from July 13th – July 18th…

ai.stanford.edu

Highlight: Graph-based, Self-Supervised Program Repair from Diagnostic Feedback

Google at ICML 2020

Machine learning is a key strategic focus at Google, with highly active groups pursuing research in virtually all…

ai.googleblog.com

Highlight: REALM: Retrieval-Augmented Language Model Pre-Training

Carnegie Mellon University at ICML 2020

Carnegie Mellon University is proud to present 44 papers at the 37th International Conference on Machine Learning (ICML…

blog.ml.cmu.edu

Highlight: XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Facebook Research at ICML 2020

Machine learning experts from around the world are gathering virtually for the 2020 International Conference on Machine…

ai.facebook.com

Highlight: Aligned Cross Entropy for Non-Autoregressive Machine Translation

Workshop Highlight: Language in Reinforcement Learning:

LaReL 2020

Language is one of the most impressive human accomplishments and is believed to be the core to our ability to learn…

larel-ws.github.io

Honorable Mention

U+26A1Super Duper NLP Repo U+26A1

FYI: Another 52 notebooks were added bringing us to 233 total NLP Colabs. Thank you for contributing: Manu Romero, Abhishek Mishra, Nikhil Narayan, Oleksii Trekhleb, Chris Tran, Prasanna Kumar & Cristiano De Nobili.

The Super Duper NLP Repo

Colab notebooks for various tasks in NLP

notebooks.quantumstat.com

This Week

Adapters, AdapterHub and Modularity (w/ a Top Secret Interview)

GPT-3 Aftermath

Hyperparameter Optimization Using Simple Transformers

UIs for Machine Learning Prototyping

Visualization: Set for Stun

Graph Based Deep Learning Repo

Open-Domain Conversational AI

Dataset of the Week: Critical Role Dungeons and Dragons Dataset (CRD3)

Adapters, AdapterHub and Modularity

Once in a while, cool things happen, and this past week, the AdapterHub framework dropped. In the next evolution of NLP transfer learning, adapters deliver a new (and more modular) architecture.

Research Paper (easy read)U+007C Github

The Hub:

AdapterHub – 175 adapters for 21 text tasks and 32 languages

Loading existing adapters from our repository is as simple as adding one additional line of code: model =…

adapterhub.ml

Oh, we assumed most of you would say “WTF are adapters?!” As a result, we were really excited to speak with AdapterHub’s author Jonas Pfeiffer to get us up to speed on everything adapters and their framework: U+1F447

U+1F60E
  1. Hi Jonas, congrats on your new and awesome framework AdapterHub! For those out of the loop, how would you simply describe adapters?

“Adapters are small modular units encapsulated within every layer of a transformer model, which learn to store task or language specific information. This is achieved by training *only* the newly introduced adapter weights, while keeping the rest of the pre-trained model fixed. The most fascinating concept about adapters is their modularity which opens up many possibilities of combining the knowledge from many adapters trained on a multitude of tasks. In order to make training adapters and subsequently sharing them as easy as possible, we have proposed the AdapterHub framework.”

2. What are some advantages of adapters vs. traditional fine-tuning of pretrained models?

“There are many advantages for both NLP engineers in industry as well as researchers. For practitioners maybe the most interesting concept is an adapters small storage space. Adapters only require 3.5Mb (sharing >99% of the parameters across all tasks), and still achieve state of the art performance. This means, in order to store 125 adapter models on a device, you require as much space as 2 fully fine-tuned BERT models. The biggest advantage provided by adapters is due to their modularity. By freezing pre-trained model weights, traditional multi-task learning problems such as catastrophic forgetting and catastrophic interference between tasks no longer exist. Adapters can thus be trained separately on all kinds of tasks, and subsequently composed or stacked to combine the information stored within them.”

3. When training an adapter, how does its training time compare with traditional fine-tuning?

“So far, we have observed that training adapter is often faster than full fine-tuning. For some setups we can see gains of up to 30% in the time required to perform a single training step. This is because we do not require to perform a backward pass through the entire model such as the BERT embedding matrix, but also due to PyTorch optimization strategies. Unfortunately, for smaller data sets, adapters require more steps than full fine-tuning due to the random weight initialization. We believe that efficiency is a crucial property relevant to many practical applications. This is why we are currently investigating the computational efficiency of different architectures on a larger scale, including several training and inference scenarios.”

4. You have created AdapterHub for the community to find, train and/or use adapters; where can one go to find more information on how they can help?

“Adapters have only been introduced recently, so the research field is quite new. We have tried to summarize our vision about adapters in our paper which we have published together with the AdapterHub framework. For us the AdapterHub is a long term project which we are hoping that the NLP community will be able to leverage in order to develop new research directions, building on adapters and their amazing modular capabilities.”

5. Do you view adapters as the next important step for transfer learning in NLP?

“Sharing information across tasks has a longstanding history in machine learning where multi-task learning has arguably received the most attention, coming with many issues. By first encapsulating the stored information within frozen parameters and then combining it, we are able to mitigate several of these issues. Modularity of knowledge components which can be combined on-the-fly, and at will, is extremely desirable and impactful. So yes, from my perspective adapters are a very important and promising direction for transfer learning and I strongly believe that they have the capacity to speed up research in this field.”

Fin U+1F440

As you can see from Jonas’s answers, this is a remarkable advancement in transfer learning and model architecture. The AdapterHub framework is built on top of Hugging Face’s library and only requires 1–2 lines of code (on top of the usual code you’re used to in the Transformers library) to initialize an adapter.

To show how easy it is to get started (with inference), we created a Colab with BERT stacked with an SST-2 adapter (binary sentiment analysis). Give it a whirl and don’t forget to checkout AdapterHub and train those adapters! And thank you to Jonas for the great intro!

Colab of the Week U+1F91F

Google Colaboratory

Edit description

colab.research.google.com

GPT-3 Aftermath

GPT-3’s getting a lot of feedback this week. On his blog, Max Woolf opines on GPT-3’s impressive abilities and where the language model falls short.

The TL;DR:

  • Blackbox issues continue.
  • Model is slow.
  • Model output still needs to be cherry-picked but at better rates than GPT-2.
  • Insensitive outputs still a problem.

Blog:

Tempering Expectations for GPT-3 and OpenAI's API

On May 29th, OpenAI released a paper on GPT-3, their next iteration of Transformers-based text generation neural…

minimaxir.com

Honorable mention of Yoav Goldberg’s interaction with GPT-3 is also worthwhile to check out on his Twitter feed: https://twitter.com/yoavgo

Hyperparameter Optimization Using Simple Transformers

From the author of the Simple Transformers library, Thilina Rajapakse explores hyperparameter optimization on the Recognizing Textual Entailment (RTE) task. An intuitive step-by-step guide (code included) in addition with visualization from W&B Sweeps integrated in his library.

Hyperparameter Optimization for Optimum Transformer Models

How to tune your hyperparameters with Simple Transformers for better Natural Langauge Processing.

towardsdatascience.com

UIs for Machine Learning Prototyping

Want to add a quick UI to visualize your transformer model? Say hello to Gradio. The library includes Colab/Jupyter support so you can tunnel your inferences from Colab directly to the browser. It includes TensorFlow and PyTorch support, and can be used for CV and NLP demos alike.

FYI, 2 Gradio notebooks are included in the latest update of the Super Duper NLP Repo! Head there (or here ) for a quick demo of its capabilities.

gradio-app/gradio

Quickly create customizable UI components around your TensorFlow or PyTorch models, or even arbitrary Python functions…

github.com

Paper:

LINK

Visualization: Set for Stun

A new python visualization library is out. And the optics are impressive. If you want to venture out of the matplotlib nerd world, check out Multiplex for its stunning visuals.

All it takes to draw a simple text visualization is 10 lines of code:

  1. 3 lines to import matplotlib, Multiplex and the visualization style;
  2. 3 lines to set up the visualization object, load the data and set the style;
  3. 4 lines to draw and show the visualization, including a title and caption.

NicholasMamo/multiplex-plot

Visualizations should tell a story, and tell it in a beautiful way. Multiplex is a visualization library for Python…

github.com

Graph Based Deep Learning Repo

This is a handy resource if you want the low down on graphs & deep learning. This repo contains research literature/survey reviews indexed by year and conference. U+1F440

naganandy/graph-based-deep-learning-literature

The repository contains links to in graph-based deep learning. The links to conference publications are arranged in the…

github.com

Open-Domain Conversational AI

Facebook AI, and its ParlAI library is world-class when it comes to open-domain conversational agents (remember Blender). They released an illustrative overview of what it takes to build great conversational agents, current research and future directions.

LINK

Dataset of the Week: Critical Role Dungeons and Dragons Dataset (CRD3)

What is it?

Dataset is collected from 159 Critical Role DD episodes transcribed to text dialogues, consisting of 398,682 turns. It also includes corresponding abstractive summaries collected from the Fandom wiki.

Sample

Where is it?

RevanthRameshkumar/CRD3

This paper describes the Critical Role Dungeons and Dragons Dataset (CRD3) and related analyses. Critical Role is an…

github.com

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends!

For complete coverage, follow our Twitter: @Quantum_Stat

www.quantumstat.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓