NLP News Cypher | 07.19.20
Last Updated on July 24, 2023 by Editorial Team
Author(s): Ricky Costa
Originally published on Towards AI.
NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
NLP News Cypher U+007C 07.19.20
Modularity
Twitterβs Help Desk had a productive work week. How productive? U+1F447
In case you missed it, celebrity Twitter accounts were hacked in a bitcoin ponzi scheme scam. As Twitter and co. scrambled to put the fire out, they deactivated all blue check accounts. A glitch in the matrix!
A couple of days later, select Cloudflare servers went dark as they blamed bad routing for a drop-in services U+1F9D0(sorry Discord). To say the least, tech had a rough week.
But that didnβt stop me from randomly browsing the darknet to find out more about the recent hacking. And I found nothing! Yay! However, I did discover that the US Secret Service purchased a 4-year contract for crypto software from Coinbase, a digital currency exchange. Yep, itβs even in the US Govβs public fillings:
beta.SAM.gov
Edit description
beta.sam.gov
Why does that matter? According to Benzinga, βCoinbase also collects private user data as part of the anti-money-laundering requirements on its platforms.β U+1F648
Also, ICML happened! Here are some awesome papers:
Stanford AI Lab Papers and Talks at ICML 2020
The International Conference on Machine Learning (ICML) 2020 is being hosted virtually from July 13th – July 18thβ¦
ai.stanford.edu
Highlight: Graph-based, Self-Supervised Program Repair from Diagnostic Feedback
Google at ICML 2020
Machine learning is a key strategic focus at Google, with highly active groups pursuing research in virtually allβ¦
ai.googleblog.com
Highlight: REALM: Retrieval-Augmented Language Model Pre-Training
Carnegie Mellon University at ICML 2020
Carnegie Mellon University is proud to present 44 papers at the 37th International Conference on Machine Learning (ICMLβ¦
blog.ml.cmu.edu
Highlight: XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
Facebook Research at ICML 2020
Machine learning experts from around the world are gathering virtually for the 2020 International Conference on Machineβ¦
ai.facebook.com
Highlight: Aligned Cross Entropy for Non-Autoregressive Machine Translation
Workshop Highlight: Language in Reinforcement Learning:
LaReL 2020
Language is one of the most impressive human accomplishments and is believed to be the core to our ability to learnβ¦
larel-ws.github.io
U+26A1Super Duper NLP Repo U+26A1
FYI: Another 52 notebooks were added bringing us to 233 total NLP Colabs. Thank you for contributing: Manu Romero, Abhishek Mishra, Nikhil Narayan, Oleksii Trekhleb, Chris Tran, Prasanna Kumar & Cristiano De Nobili.
The Super Duper NLP Repo
Colab notebooks for various tasks in NLP
notebooks.quantumstat.com
This Week
Adapters, AdapterHub and Modularity (w/ a Top Secret Interview)
GPT-3 Aftermath
Hyperparameter Optimization Using Simple Transformers
UIs for Machine Learning Prototyping
Visualization: Set for Stun
Graph Based Deep Learning Repo
Open-Domain Conversational AI
Dataset of the Week: Critical Role Dungeons and Dragons Dataset (CRD3)
Adapters, AdapterHub and Modularity
Once in a while, cool things happen, and this past week, the AdapterHub framework dropped. In the next evolution of NLP transfer learning, adapters deliver a new (and more modular) architecture.
Research Paper (easy read)U+007C Github
The Hub:
AdapterHub – 175 adapters for 21 text tasks and 32 languages
Loading existing adapters from our repository is as simple as adding one additional line of code: model =β¦
adapterhub.ml
Oh, we assumed most of you would say βWTF are adapters?!β As a result, we were really excited to speak with AdapterHubβs author Jonas Pfeiffer to get us up to speed on everything adapters and their framework: U+1F447
- Hi Jonas, congrats on your new and awesome framework AdapterHub! For those out of the loop, how would you simply describe adapters?
βAdapters are small modular units encapsulated within every layer of a transformer model, which learn to store task or language specific information. This is achieved by training *only* the newly introduced adapter weights, while keeping the rest of the pre-trained model fixed. The most fascinating concept about adapters is their modularity which opens up many possibilities of combining the knowledge from many adapters trained on a multitude of tasks. In order to make training adapters and subsequently sharing them as easy as possible, we have proposed the AdapterHub framework.β
2. What are some advantages of adapters vs. traditional fine-tuning of pretrained models?
βThere are many advantages for both NLP engineers in industry as well as researchers. For practitioners maybe the most interesting concept is an adapters small storage space. Adapters only require 3.5Mb (sharing >99% of the parameters across all tasks), and still achieve state of the art performance. This means, in order to store 125 adapter models on a device, you require as much space as 2 fully fine-tuned BERT models. The biggest advantage provided by adapters is due to their modularity. By freezing pre-trained model weights, traditional multi-task learning problems such as catastrophic forgetting and catastrophic interference between tasks no longer exist. Adapters can thus be trained separately on all kinds of tasks, and subsequently composed or stacked to combine the information stored within them.β
3. When training an adapter, how does its training time compare with traditional fine-tuning?
βSo far, we have observed that training adapter is often faster than full fine-tuning. For some setups we can see gains of up to 30% in the time required to perform a single training step. This is because we do not require to perform a backward pass through the entire model such as the BERT embedding matrix, but also due to PyTorch optimization strategies. Unfortunately, for smaller data sets, adapters require more steps than full fine-tuning due to the random weight initialization. We believe that efficiency is a crucial property relevant to many practical applications. This is why we are currently investigating the computational efficiency of different architectures on a larger scale, including several training and inference scenarios.β
4. You have created AdapterHub for the community to find, train and/or use adapters; where can one go to find more information on how they can help?
βAdapters have only been introduced recently, so the research field is quite new. We have tried to summarize our vision about adapters in our paper which we have published together with the AdapterHub framework. For us the AdapterHub is a long term project which we are hoping that the NLP community will be able to leverage in order to develop new research directions, building on adapters and their amazing modular capabilities.β
5. Do you view adapters as the next important step for transfer learning in NLP?
βSharing information across tasks has a longstanding history in machine learning where multi-task learning has arguably received the most attention, coming with many issues. By first encapsulating the stored information within frozen parameters and then combining it, we are able to mitigate several of these issues. Modularity of knowledge components which can be combined on-the-fly, and at will, is extremely desirable and impactful. So yes, from my perspective adapters are a very important and promising direction for transfer learning and I strongly believe that they have the capacity to speed up research in this field.β
Fin U+1F440
As you can see from Jonasβs answers, this is a remarkable advancement in transfer learning and model architecture. The AdapterHub framework is built on top of Hugging Faceβs library and only requires 1β2 lines of code (on top of the usual code youβre used to in the Transformers library) to initialize an adapter.
To show how easy it is to get started (with inference), we created a Colab with BERT stacked with an SST-2 adapter (binary sentiment analysis). Give it a whirl and donβt forget to checkout AdapterHub and train those adapters! And thank you to Jonas for the great intro!
Colab of the Week U+1F91F
Google Colaboratory
Edit description
colab.research.google.com
GPT-3 Aftermath
GPT-3βs getting a lot of feedback this week. On his blog, Max Woolf opines on GPT-3βs impressive abilities and where the language model falls short.
The TL;DR:
- Blackbox issues continue.
- Model is slow.
- Model output still needs to be cherry-picked but at better rates than GPT-2.
- Insensitive outputs still a problem.
Blog:
Tempering Expectations for GPT-3 and OpenAI's API
On May 29th, OpenAI released a paper on GPT-3, their next iteration of Transformers-based text generation neuralβ¦
minimaxir.com
Honorable mention of Yoav Goldbergβs interaction with GPT-3 is also worthwhile to check out on his Twitter feed: https://twitter.com/yoavgo
Hyperparameter Optimization Using Simple Transformers
From the author of the Simple Transformers library, Thilina Rajapakse explores hyperparameter optimization on the Recognizing Textual Entailment (RTE) task. An intuitive step-by-step guide (code included) in addition with visualization from W&B Sweeps integrated in his library.
Hyperparameter Optimization for Optimum Transformer Models
How to tune your hyperparameters with Simple Transformers for better Natural Langauge Processing.
towardsdatascience.com
UIs for Machine Learning Prototyping
Want to add a quick UI to visualize your transformer model? Say hello to Gradio. The library includes Colab/Jupyter support so you can tunnel your inferences from Colab directly to the browser. It includes TensorFlow and PyTorch support, and can be used for CV and NLP demos alike.
FYI, 2 Gradio notebooks are included in the latest update of the Super Duper NLP Repo! Head there (or here ) for a quick demo of its capabilities.
gradio-app/gradio
Quickly create customizable UI components around your TensorFlow or PyTorch models, or even arbitrary Python functionsβ¦
github.com
Paper:
Visualization: Set for Stun
A new python visualization library is out. And the optics are impressive. If you want to venture out of the matplotlib nerd world, check out Multiplex for its stunning visuals.
All it takes to draw a simple text visualization is 10 lines of code:
- 3 lines to import matplotlib, Multiplex and the visualization style;
- 3 lines to set up the visualization object, load the data and set the style;
- 4 lines to draw and show the visualization, including a title and caption.
NicholasMamo/multiplex-plot
Visualizations should tell a story, and tell it in a beautiful way. Multiplex is a visualization library for Pythonβ¦
github.com
Graph Based Deep Learning Repo
This is a handy resource if you want the low down on graphs & deep learning. This repo contains research literature/survey reviews indexed by year and conference. U+1F440
naganandy/graph-based-deep-learning-literature
The repository contains links to in graph-based deep learning. The links to conference publications are arranged in theβ¦
github.com
Open-Domain Conversational AI
Facebook AI, and its ParlAI library is world-class when it comes to open-domain conversational agents (remember Blender). They released an illustrative overview of what it takes to build great conversational agents, current research and future directions.
Dataset of the Week: Critical Role Dungeons and Dragons Dataset (CRD3)
What is it?
Dataset is collected from 159 Critical Role DD episodes transcribed to text dialogues, consisting of 398,682 turns. It also includes corresponding abstractive summaries collected from the Fandom wiki.
Sample
Where is it?
RevanthRameshkumar/CRD3
This paper describes the Critical Role Dungeons and Dragons Dataset (CRD3) and related analyses. Critical Role is anβ¦
github.com
Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
If you enjoyed this article, help us out and share with friends!
For complete coverage, follow our Twitter: @Quantum_Stat
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI