NLP News Cypher | 07.12.20

Last Updated on July 24, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

NLP News Cypher U+007C 07.12.20

Negative Ghost Rider, the Pattern is Full

Let’s get our James Bond swag on shall we? U+1F575️‍U+2640️

Defense departments worldwide are betting on AI to deliver the next generation advanced military technology, and the US is no different. In the US of A, this strategy is being orchestrated by the Joint Artificial Intelligence Center (JAIC), a department under the umbrella of the Department of Defense (DoD) led by Acting Director Nand Mulchandani. And he recently gave his first press conference. And good news…

NLP will play a bigger role in the future of JAIC strategy U+1F525. They are working on their own virtual assistant called Fire Support Cognitive Assistant (think Siri with Patriot missiles), which is a software to sort through incoming communications such as calls for artillery or air support. U+1F9D0

This may come as a surprise to many because when we think of national security and AI, it’s hard not to dream of T-1000 SkyNet robots marauding your local 7–Eleven convenience store, but in reality, at least by what they are telling us, NLP will be an important player in their AI investment!

Me walking in to the Pentagon with the Big Bad NLP Database like:

declassified

Oh, and ACL happened. Here’s a list of the best papers presented at the conf:

Best Paper Awards at ACL 2020

Congratulations to the recipients of the best paper awards at ACL 2020! Beyond Accuracy: Behavioral Testing of NLP…

acl2020.org

Best Paper

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin and Sameer Singh https://aclweb.org/anthology/2020.acl-main.442/…

Also:

Several tech groups revealed their paper list prior to last week’s newsletter so if you are interested on catching up:

Stanford AI Lab Papers and Talks at ACL 2020

The 58th annual meeting of the Association for Computational Linguistics is being hosted virtually this week. We're…

ai.stanford.edu

Salesforce Research at ACL 2020

The 58th Association for Computational Linguistics (ACL) Conference kicked off this week and runs from Sunday, Jul 5 to…

blog.einstein.ai

Google at ACL 2020

This week, the 58th Annual Meeting of the Association for Computational Linguistics ( ACL 2020), a premier conference…

ai.googleblog.com

FYI, Google’s TAPAS model for table parsing will be included in our upcoming update of the Super Duper NLP Repo. U+1F525U+1F525

This Week:

Knowledge Graphs at ACL

Facebook’s HUGE ASR Model

Scaling BERT in Deployment on CPUs

Scispacy Update

To the Cloud, with 1 Line of Code

AI Survey, this Time From Hugging Face

Dataset of the Week: The Semantic Scholar Open Research Corpus

Knowledge Graphs at ACL

All the way from ACL, Galkin had to remind us his knowledge graph research is immortal. And research using natural language over tables, adapting transformers over knowledge graphs is increasing in popularity:

Here’s the TOC:

Knowledge Graphs in Natural Language Processing @ ACL 2020

This post commemorates the first anniversary of the series where we examine advancements in NLP and Graph ML powered by…

medium.com

Facebook’s HUGE ASR Model

Training a multi-lingual speech recognition model is important to improve the scope of low-resource languages and scalability of models in production. In Facebook’s recent research, they evaluated a 1 Billion parameter model trained on 51 languages showing efficient results on training time and reduction of word error rate (WER).

“This model improves WER by 9.1% on average for high-resource languages, by 12.44% for mid-resource languages, and by 28.76% for low-resource languages.”

Paper:

Scaling BERT in Deployment on CPUs

Great article on what it takes to improve latency/throughput in production using CPUs to power the transformer. TL;DR: optimization came down to using the distilled version of BERT, quantization, caching most frequent responses and horizontal scaling.

Blog:

How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs

Here’s a classic chicken-and-egg problem for data scientists and machine learning engineers: when developing a new…

medium.com

Scispacy Update

For those in the science domain, SpaCy’s cousin Scispacy released an update which includes: 4 new Entity Linkers and KBs for Medical Subject Headings(MeSH), RxNorm, Gene Ontology and the Human Phenotype Ontology.

GitHub:

allenai/scispacy

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there…

github.com

Demo:

Streamlit

Edit description

scispacy.apps.allenai.org

While we’re on the subject of SpaCy…

Colab of the Week:

A lightning tour of the SpaCy library with quick snippets of awesome go-to code blocks for various NLP tasks:

Google Colaboratory

Edit description

colab.research.google.com

To the Cloud, with 1 Line of Code

Want to run your Keras model in the cloud but don’t want to rewrite everything? Well now you can take your model from development to the cloud by simply adding just 1 line of code. You will need to use TensorFlow Cloud for this (GitHub below).

Cool thing, it also works from Colab U+1F525U+1F525. Courtesy from Mr. Chollet.

tensorflow/cloud

The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging and training your Keras and…

github.com

AI Survey, this Time From Hugging Face

Hugging Face surveyed the dev community regarding their library. The results show that most use it for work and not play:

What do they want to see prioritized?

They want more real-world examples.

Crazy stat: One-third of the respondents have been using the library for less than 3 months!

Full Survey Results:

Transformers Huge Community Feedback

So last week we shared the first feedback request on U+1F917transformers. The community was pretty amazingly involved in…

discuss.huggingface.co

Dataset of the Week: S2ORC: The Semantic Scholar Open Research Corpus

What is it?

Dataset is large corpus of 81.1M English-language academic papers spanning many academic disciplines. The corpus consists of rich metadata, paper abstracts, resolved bibliographic references, as well as structured full text for 8.1M open access papers.

Where is it?

allenai/s2orc

S2ORC is a general-purpose corpus for NLP and text mining research over scientific papers. We've curated a unified…

github.com

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends!

For complete coverage, follow our Twitter: @Quantum_Stat

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

I Used ChatGPT to Count My Calories

Resource-Efficient Fine-Tuning of DeepSeek-R1

TAI #138: OpenAI’s o3-Mini and Deep Research: A New Era of Reasoning Powered Agents?

Text Preprocessing for NLP: A Step-by-Step Guide to Clean Raw Text Data

DeepSeek AI — The Future is Here

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

NLP News Cypher | 07.12.20

Author(s): Ricky Costa

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

NLP News Cypher U+007C 07.12.20

Negative Ghost Rider, the Pattern is Full

Best Paper Awards at ACL 2020

Congratulations to the recipients of the best paper awards at ACL 2020! Beyond Accuracy: Behavioral Testing of NLP…

Best Paper

Also:

Stanford AI Lab Papers and Talks at ACL 2020

The 58th annual meeting of the Association for Computational Linguistics is being hosted virtually this week. We're…

Salesforce Research at ACL 2020

The 58th Association for Computational Linguistics (ACL) Conference kicked off this week and runs from Sunday, Jul 5 to…

Google at ACL 2020

This week, the 58th Annual Meeting of the Association for Computational Linguistics ( ACL 2020), a premier conference…

This Week:

Knowledge Graphs at ACL

Knowledge Graphs in Natural Language Processing @ ACL 2020

This post commemorates the first anniversary of the series where we examine advancements in NLP and Graph ML powered by…

Facebook’s HUGE ASR Model

Scaling BERT in Deployment on CPUs

How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs

Here’s a classic chicken-and-egg problem for data scientists and machine learning engineers: when developing a new…

Scispacy Update

allenai/scispacy

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there…

Streamlit

Edit description

Colab of the Week:

Google Colaboratory

Edit description

To the Cloud, with 1 Line of Code

tensorflow/cloud

The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging and training your Keras and…

AI Survey, this Time From Hugging Face

What do they want to see prioritized?

Transformers Huge Community Feedback

So last week we shared the first feedback request on U+1F917transformers. The community was pretty amazingly involved in…

Dataset of the Week: S2ORC: The Semantic Scholar Open Research Corpus

What is it?

Where is it?

allenai/s2orc

S2ORC is a general-purpose corpus for NLP and text mining research over scientific papers. We've curated a unified…

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement