Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

NLP News Cypher | 02.23.20
Latest   Machine Learning   Newsletter

NLP News Cypher | 02.23.20

Last Updated on July 24, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

Photo by Massimiliano Morosinotto on Unsplash

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

NLP News Cypher U+007C 02.23.20

Dacă nu riști nu câștigi

If you are wondering what the subtitle means, it’s Romanian. It means this:

How was your week?

Last week we updated The Big Bad NLP Database once more, thank you to Ayman Alhelbawy for contributing!

This Week:

Los Trends

The Truth Hurts

Summarize Me

Hugging NER

Hey Neighbor, I’m a Language Model

Tip of the (Red) Hat

The Annotated GPT-2

Dataset of the Week: Dialogue NLI

Los Trends

O’Reilly, the creator of all those books we read, analyzed their online learning platform to gain insight into the most popular trends in tech.

TL;DR:

  1. Python is killing it (R.I.P. to R)
  2. Cloud usage is killing it (micro-services and their containers)
  3. Interest in NLP grew+22% in 2019. U+1F440

Full Picture:

5 key areas for tech leaders to watch in 2020

O'Reilly online learning contains information about the trends, topics, and issues tech leaders need to watch and…

www.oreilly.com

The Truth Hurts

The Allen Institute has plenty of demos for you to play with. And recently, one demo caught my eye on Twitter. This model is able to judge whether a statement is true or false based on the conditions you assign it in natural language. According to AI2:

“The ROVER (Reasoning Over Rules) model reads in a new rulebase of natural language facts and rules, and decides the truth value of new statements under closed world (and negation as failure) assumptions.”

Demo:

ROVER: Reasoning Over Rules

Edit description

rule-reasoning.apps.allenai.org

Summarize Me

Summarizing from neat and structured text is hard. But summarizing from forum/conversational data is even harder. Peeps out of Microsoft Asia came out with a new paper on extractive summarization discussing the usage of attention.

The model achieves SOTA ROUGE scores on the Trip Advisor forum discussion dataset!

LINK

Hugging NER

Hey, wanna do some named entity recognition using SOTA transformers via Hugging Face’s library? Want the code U+1F447

Colab:

Google Colaboratory

Edit description

colab.research.google.com

GitHub:

huggingface/transformers

U+1F917 Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch. …

github.com

Hey Neighbor, I’m a Language Model

Applying a k-Nearest Neighbors (kNN) to a language model can give a boost in performance according to new research from Facebook Research and Stanford. On the Wikitext-103 language model dataset, this new hack gives a 2.9 point improvement on perplexity without any additional training!

Paper:

LINK

Tip of the (Red) Hat

Red Hat came out with a new report on the state of open source software! According to the report, proprietary software is expected to plunge in the next few years and open-source software is expected to increase. They also share why open source is so popular (besides being free) and top areas of adoption in the enterprise. If you sell open-sourced software, you should check this out:

LINK

The Annotated GPT-2

Hey, remember the annotated transformer? That was when Mr. Rush annotated the infamous “Attention is All you Need” paper with code. Well, it looks like we have one now for GPT-2 from Mr. Arora. Enjoy!

GPT-2:

The Annotated GPT-2

Introduction Prerequisites Language Models are Unsupervised Multitask Learners Abstract Model Architecture (GPT-2)…

amaarora.github.io

BERT Learns to Code

Microsoft is using BERT to code U+1F440. In a paper released this month, their research team used natural language text and code from several programming languages and deployed it during pretraining. Specifically, it was trained with both natural language & code pairs, and with unimodal data (code without natural language pairs).

The model achieves SOTA on downstream tasks such as natural language code search and code-to-documentation generation.

Paper:

LINK

Dataset of the Week: Dialogue NLI

What is it?

Dataset consists of sentence pairs labeled as entailment, neutral, or contradiction for the task of natural language inference.

Sample:

Where is it?

Dialogue Natural Language Inference

Abstract: Consistency is a long standing issue faced by dialogue models. In this paper, we frame the consistency of…

wellecks.github.io

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends or social media!

For complete coverage, follow our twitter: @Quantum_Stat

www.quantumstat.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓