NLP News Cypher | 02.23.20
Last Updated on July 24, 2023 by Editorial Team
Author(s): Ricky Costa
Originally published on Towards AI.
NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
NLP News Cypher U+007C 02.23.20
DacΔ nu riΘti nu cΓ’Θtigi
If you are wondering what the subtitle means, itβs Romanian. It means this:
How was your week?
Last week we updated The Big Bad NLP Database once more, thank you to Ayman Alhelbawy for contributing!
This Week:
Los Trends
The Truth Hurts
Summarize Me
Hugging NER
Hey Neighbor, Iβm a Language Model
Tip of the (Red) Hat
The Annotated GPT-2
Dataset of the Week: Dialogue NLI
Los Trends
OβReilly, the creator of all those books we read, analyzed their online learning platform to gain insight into the most popular trends in tech.
TL;DR:
- Python is killing it (R.I.P. to R)
- Cloud usage is killing it (micro-services and their containers)
- Interest in NLP grew+22% in 2019. U+1F440
Full Picture:
5 key areas for tech leaders to watch in 2020
O'Reilly online learning contains information about the trends, topics, and issues tech leaders need to watch andβ¦
www.oreilly.com
The Truth Hurts
The Allen Institute has plenty of demos for you to play with. And recently, one demo caught my eye on Twitter. This model is able to judge whether a statement is true or false based on the conditions you assign it in natural language. According to AI2:
βThe ROVER (Reasoning Over Rules) model reads in a new rulebase of natural language facts and rules, and decides the truth value of new statements under closed world (and negation as failure) assumptions.β
Demo:
ROVER: Reasoning Over Rules
Edit description
rule-reasoning.apps.allenai.org
Summarize Me
Summarizing from neat and structured text is hard. But summarizing from forum/conversational data is even harder. Peeps out of Microsoft Asia came out with a new paper on extractive summarization discussing the usage of attention.
The model achieves SOTA ROUGE scores on the Trip Advisor forum discussion dataset!
Hugging NER
Hey, wanna do some named entity recognition using SOTA transformers via Hugging Faceβs library? Want the code U+1F447
Colab:
Google Colaboratory
Edit description
colab.research.google.com
GitHub:
huggingface/transformers
U+1F917 Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch. β¦
github.com
Hey Neighbor, Iβm a Language Model
Applying a k-Nearest Neighbors (kNN) to a language model can give a boost in performance according to new research from Facebook Research and Stanford. On the Wikitext-103 language model dataset, this new hack gives a 2.9 point improvement on perplexity without any additional training!
Paper:
Tip of the (Red) Hat
Red Hat came out with a new report on the state of open source software! According to the report, proprietary software is expected to plunge in the next few years and open-source software is expected to increase. They also share why open source is so popular (besides being free) and top areas of adoption in the enterprise. If you sell open-sourced software, you should check this out:
The Annotated GPT-2
Hey, remember the annotated transformer? That was when Mr. Rush annotated the infamous βAttention is All you Needβ paper with code. Well, it looks like we have one now for GPT-2 from Mr. Arora. Enjoy!
GPT-2:
The Annotated GPT-2
Introduction Prerequisites Language Models are Unsupervised Multitask Learners Abstract Model Architecture (GPT-2)β¦
amaarora.github.io
BERT Learns to Code
Microsoft is using BERT to code U+1F440. In a paper released this month, their research team used natural language text and code from several programming languages and deployed it during pretraining. Specifically, it was trained with both natural language & code pairs, and with unimodal data (code without natural language pairs).
The model achieves SOTA on downstream tasks such as natural language code search and code-to-documentation generation.
Paper:
Dataset of the Week: Dialogue NLI
What is it?
Dataset consists of sentence pairs labeled as entailment, neutral, or contradiction for the task of natural language inference.
Sample:
Where is it?
Dialogue Natural Language Inference
Abstract: Consistency is a long standing issue faced by dialogue models. In this paper, we frame the consistency ofβ¦
wellecks.github.io
Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
If you enjoyed this article, help us out and share with friends or social media!
For complete coverage, follow our twitter: @Quantum_Stat
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI