Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

NLP News Cypher | 07.12.20
Latest   Machine Learning   Newsletter

NLP News Cypher | 07.12.20

Last Updated on July 27, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

Photo by Laura Colquitt on Unsplash

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

NLP News Cypher U+007C 07.12.20

Negative Ghost Rider, the Pattern is Full

Let’s get our James Bond swag on shall we? U+1F575️‍U+2640️

Defense departments worldwide are betting on AI to deliver the next generation advanced military technology, and the US is no different. In the US of A, this strategy is being orchestrated by the Joint Artificial Intelligence Center (JAIC), a department under the umbrella of the Department of Defense (DoD) led by Acting Director Nand Mulchandani. And he recently gave his first press conference. And good news…

NLP will play a bigger role in the future of JAIC strategy U+1F525. They are working on their own virtual assistant called Fire Support Cognitive Assistant (think Siri with Patriot missiles), which is a software to sort through incoming communications such as calls for artillery or air support. U+1F9D0

This may come as a surprise to many because when we think of national security and AI, it’s hard not to dream of T-1000 SkyNet robots marauding your local 7–Eleven convenience store, but in reality, at least by what they are telling us, NLP will be an important player in their AI investment!

Me walking in to the Pentagon with the Big Bad NLP Database like:

declassified

Oh, and ACL happened. Here’s a list of the best papers presented at the conf:

Best Paper Awards at ACL 2020

Congratulations to the recipients of the best paper awards at ACL 2020! Beyond Accuracy: Behavioral Testing of NLP…

acl2020.org

Best Paper

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin and Sameer Singh https://aclweb.org/anthology/2020.acl-main.442/…

Also:

Several tech groups revealed their paper list prior to last week’s newsletter so if you are interested on catching up:

Stanford AI Lab Papers and Talks at ACL 2020

The 58th annual meeting of the Association for Computational Linguistics is being hosted virtually this week. We're…

ai.stanford.edu

Salesforce Research at ACL 2020

The 58th Association for Computational Linguistics (ACL) Conference kicked off this week and runs from Sunday, Jul 5 to…

blog.einstein.ai

Google at ACL 2020

This week, the 58th Annual Meeting of the Association for Computational Linguistics ( ACL 2020), a premier conference…

ai.googleblog.com

FYI, Google’s TAPAS model for table parsing will be included in our upcoming update of the Super Duper NLP Repo. U+1F525U+1F525

This Week:

Knowledge Graphs at ACL

Facebook’s HUGE ASR Model

Scaling BERT in Deployment on CPUs

Scispacy Update

To the Cloud, with 1 Line of Code

AI Survey, this Time From Hugging Face

Dataset of the Week: The Semantic Scholar Open Research Corpus

Knowledge Graphs at ACL

All the way from ACL, Galkin had to remind us his knowledge graph research is immortal. And research using natural language over tables, adapting transformers over knowledge graphs is increasing in popularity:

Here’s the TOC:

Knowledge Graphs in Natural Language Processing @ ACL 2020

This post commemorates the first anniversary of the series where we examine advancements in NLP and Graph ML powered by…

medium.com

Facebook’s HUGE ASR Model

Training a multi-lingual speech recognition model is important to improve the scope of low-resource languages and scalability of models in production. In Facebook’s recent research, they evaluated a 1 Billion parameter model trained on 51 languages showing efficient results on training time and reduction of word error rate (WER).

“This model improves WER by 9.1% on average for high-resource languages, by 12.44% for mid-resource languages, and by 28.76% for low-resource languages.”

Paper:

Scaling BERT in Deployment on CPUs

Great article on what it takes to improve latency/throughput in production using CPUs to power the transformer. TL;DR: optimization came down to using the distilled version of BERT, quantization, caching most frequent responses and horizontal scaling.

Blog:

How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs

Here’s a classic chicken-and-egg problem for data scientists and machine learning engineers: when developing a new…

medium.com

Scispacy Update

For those in the science domain, SpaCy’s cousin Scispacy released an update which includes: 4 new Entity Linkers and KBs for Medical Subject Headings(MeSH), RxNorm, Gene Ontology and the Human Phenotype Ontology.

GitHub:

allenai/scispacy

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there…

github.com

Demo:

Streamlit

Edit description

scispacy.apps.allenai.org

While we’re on the subject of SpaCy…

Colab of the Week:

A lightning tour of the SpaCy library with quick snippets of awesome go-to code blocks for various NLP tasks:

Google Colaboratory

Edit description

colab.research.google.com

To the Cloud, with 1 Line of Code

Want to run your Keras model in the cloud but don’t want to rewrite everything? Well now you can take your model from development to the cloud by simply adding just 1 line of code. You will need to use TensorFlow Cloud for this (GitHub below).

Cool thing, it also works from Colab U+1F525U+1F525. Courtesy from Mr. Chollet.

tensorflow/cloud

The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging and training your Keras and…

github.com

AI Survey, this Time From Hugging Face

Hugging Face surveyed the dev community regarding their library. The results show that most use it for work and not play:

declassifed

What do they want to see prioritized?

They want more real-world examples.

declassified

Crazy stat: One-third of the respondents have been using the library for less than 3 months!

Full Survey Results:

Transformers Huge Community Feedback

So last week we shared the first feedback request on U+1F917transformers. The community was pretty amazingly involved in…

discuss.huggingface.co

Dataset of the Week: S2ORC: The Semantic Scholar Open Research Corpus

What is it?

Dataset is large corpus of 81.1M English-language academic papers spanning many academic disciplines. The corpus consists of rich metadata, paper abstracts, resolved bibliographic references, as well as structured full text for 8.1M open access papers.

Where is it?

allenai/s2orc

S2ORC is a general-purpose corpus for NLP and text mining research over scientific papers. We've curated a unified…

github.com

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends!

For complete coverage, follow our Twitter: @Quantum_Stat

www.quantumstat.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓