NLP News Cypher | 09.20.20

Last Updated on July 24, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

NLP News Cypher U+007C 09.20.20

EMNLP and Graphs U+1F635

U+261D Persian art is pretty. Welcome back for another week of the Cypher. Yesterday, we made another weekly update to the Big Bad NLP Database and the Super Duper NLP Repo. We added 10 datasets and 6 new notebooks. This update was a good one since we added PyTorch Geometric notebooks for graph neural networks in case you all are feeling a bit adventurous.U+1F648

BTW, if you enjoy this newsletter please share it or give it a U+1F44FU+1F44F!

Detour: I’ve been experimenting with onnx runtime inference on BERT question answering. The latency is significantly improved with ONNX which is currently running on “okish” cloud CPUs, the latency range is between 170–240ms. Here’s the demo:

ONNX Runtime Inference U+007C Quantum Stat

BERT Question Answering

onnx.quantumstat.com

FYI, several EMNLP accepted papers were circulating this week for the November conference. Before we go there, here’s a quick appetizer from the paper “Message Passing for Hyper-Relational Knowledge Graphs” which compares the traditional knowledge triple vs. a hyper-relational graph.

Use the Force LUKE (preprint not out yet U+1F625)

declassified

GNN Resources

Found this thread from Petar Veličković (DeepMind) highlighting top graph neural network resources, enjoy:

A thread written by @PetarV_93

As requested , here are a few non-exhaustive resources I'd recommend for getting started with Graph Neural Nets (GNNs)…

threader.app

NeurIPS Fun n’ Games:

/overview

Wordplay: When Language Meets Games @ NeurIPS 2020. Date and time: Full day workshop on Fri Dec 11 th or Sat the 12 th…

wordplay-workshop.github.io

This Week

Dialog Ranking Pretrained Transformers

TensorFlow Lite and NLP

Indonesian NLU Benchmark

CoDEx

RECOApy for Speech Preprocessing

Survey on the ‘X-Formers’

Dataset of the Week: ASSET

Dialog Ranking Pretrained Transformers

Another one accepted at EMNLP from Microsoft Research: using transformers (GPT-2) to figure out whether a reply to a comment is more likely to get engagement or not. Pretty interesting huh! Their dialog ranking models were trained on 133M pairs of of human feedback data from Reddit.

So what does it really do? Here’s an example from their demo: For the statement“I love NLP!”, if you were to respond with “Here’s a free textbook (URL) in case anyone needs it.” this is more likely to be up-voted than the response “Me too!”. (meaning the former will have a higher ranking score)

Additionally, their colab allows you to run several models at once to distinguish:

updown… which gets more upvotes?

width… which gets more direct replies?

depth… which gets longer follow-up thread?

Colab of the Week

Thank you to author Xiang Gao for forwarding, you can also find it on the Super Duper Repo U+270C…

Google Colaboratory

Edit description

colab.research.google.com

GitHub:

golsun/DialogRPT

How likely a dialog response is upvoted U+1F44D and/or gets replied U+1F4AC? This is what DialogRPT is learned to predict. It is…

github.com

Paper: https://arxiv.org/pdf/2009.06978.pdf

TensorFlow Lite and NLP

From their blog post this past week: there are now new features in TF Lite with regards to NLP models: They have new pre-trained NLP models, and better support for converting TensorFlow NLP Models to TensorFlow Lite format.

TensorFlow Lite Model Maker

The TensorFlow Lite Model Maker library simplifies the process of training a TensorFlow Lite model using custom…

www.tensorflow.org

FYI, their TF Lite Task library has 3 APIs for:

NLClassifier: classifies the input text to a set of known categories.
BertNLClassifier: classifies text optimized for BERT-family models.
BertQuestionAnswerer: answers questions based on the content of a given passage with BERT-family models.

Keep in mind these are models that run natively on the phone (aka do not need internet connection to the cloud server).

What's new in TensorFlow Lite for NLP

September 16, 2020 – Posted by Tian Lin, Yicheng Fan, Jaesung Chung and Chen Cen TensorFlow Lite has been widely…

blog.tensorflow.org

Indonesian NLU Benchmark

Check out the new Indonesian NLU benchmark. They include a BERT-based model, IndoBERT, and its ALBERT alternative, IndoBERT-lite. In addition, the benchmark also includes datasets for 12 downstream tasks regarding single-sentence classification, single-sentence sequence-tagging, sentence-pair classification, and sentence-pair sequence labeling.

And finally, a large corpus for language modeling containing 4 billion words (250M sentences)U+1F525U+1F525.

IndoNLU Benchmark

The IndoNLU benchmark is a collection of resources for training, evaluating, and analyzing natural language…

www.indobenchmark.com

Paper:

LINK

CoDEx

tsafavi/codex

CoDEx is a set of knowledge graph Completion Datasets Extracted from Wikidata and Wikipedia. As introduced and…

github.com

RECOApy for Speech Preprocessing

RECOApy is a new library that offers devs a UI that helps to record and phonetically transcribe data for speech apps in addition to grapheme-to-phoneme conversion. Currently, the library supports transcription in 8 languages: Czech, English, French, German, Italian, Polish, Romanian and Spanish.

GitHub:

adrianastan/recoapy

RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications…

github.com

Survey on the ‘X-Formers’

The new model architecture dubbed by the Google authors as ‘X-Formers’ (e.g. Longformer and Reformer) are the new and very memory efficient transformers that have come on the scene in 2020. In this paper, the authors describe a holistic view of this architecture, techniques, and current trends.

Paper: https://arxiv.org/pdf/2009.06732.pdf

Dataset of the Week: ASSET

What is it?

A dataset for tuning and evaluation of automatic sentence simplification models. ASSET consists of 23,590 human simplifications associated with the 2,359 original sentences from TurkCorpus.

Sample:

Where is it?

facebookresearch/asset

ASSET is a dataset for evaluating Sentence Simplification systems with multiple rewriting transformations, as described…

github.com

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends!

For complete coverage, follow our Twitter: @Quantum_Stat

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Boost your Local Business with Data Analysis and Web Scraping

The 2025 AI Revolution: 10 Breakthroughs That Will Change Your Life

AI in Medical Imaging: A Life-Saving Revolution or Ethical Minefield?

Llm Fine Tuning Guide: Do You Need It and How to Do It

10 Comprehensive Strategies for Ensuring Ethical Artificial Intelligence

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

NLP News Cypher | 09.20.20

Author(s): Ricky Costa

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

NLP News Cypher U+007C 09.20.20

EMNLP and Graphs U+1F635

ONNX Runtime Inference U+007C Quantum Stat

BERT Question Answering

Use the Force LUKE (preprint not out yet U+1F625)

GNN Resources

A thread written by @PetarV_93

As requested , here are a few non-exhaustive resources I'd recommend for getting started with Graph Neural Nets (GNNs)…

NeurIPS Fun n’ Games:

/overview

Wordplay: When Language Meets Games @ NeurIPS 2020. Date and time: Full day workshop on Fri Dec 11 th or Sat the 12 th…

This Week

Dialog Ranking Pretrained Transformers

Colab of the Week

Google Colaboratory

Edit description

golsun/DialogRPT

How likely a dialog response is upvoted U+1F44D and/or gets replied U+1F4AC? This is what DialogRPT is learned to predict. It is…

TensorFlow Lite and NLP

TensorFlow Lite Model Maker

The TensorFlow Lite Model Maker library simplifies the process of training a TensorFlow Lite model using custom…

What's new in TensorFlow Lite for NLP

September 16, 2020 – Posted by Tian Lin, Yicheng Fan, Jaesung Chung and Chen Cen TensorFlow Lite has been widely…

Indonesian NLU Benchmark

IndoNLU Benchmark

The IndoNLU benchmark is a collection of resources for training, evaluating, and analyzing natural language…

CoDEx

tsafavi/codex

CoDEx is a set of knowledge graph Completion Datasets Extracted from Wikidata and Wikipedia. As introduced and…

RECOApy for Speech Preprocessing

adrianastan/recoapy

RECOApy streamlines the steps of data recording and pre-processing required in end-to-end speech-based applications…

Survey on the ‘X-Formers’

Dataset of the Week: ASSET

What is it?

Sample:

Where is it?

facebookresearch/asset

ASSET is a dataset for evaluating Sentence Simplification systems with multiple rewriting transformations, as described…

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement