NLP News Cypher | 07.12.20
Last Updated on July 24, 2023 by Editorial Team
Author(s): Ricky Costa
Originally published on Towards AI.
NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
NLP News Cypher U+007C 07.12.20
Negative Ghost Rider, the Pattern is Full
Letβs get our James Bond swag on shall we? U+1F575οΈβU+2640οΈ
Defense departments worldwide are betting on AI to deliver the next generation advanced military technology, and the US is no different. In the US of A, this strategy is being orchestrated by the Joint Artificial Intelligence Center (JAIC), a department under the umbrella of the Department of Defense (DoD) led by Acting Director Nand Mulchandani. And he recently gave his first press conference. And good newsβ¦
NLP will play a bigger role in the future of JAIC strategy U+1F525. They are working on their own virtual assistant called Fire Support Cognitive Assistant (think Siri with Patriot missiles), which is a software to sort through incoming communications such as calls for artillery or air support. U+1F9D0
This may come as a surprise to many because when we think of national security and AI, itβs hard not to dream of T-1000 SkyNet robots marauding your local 7βEleven convenience store, but in reality, at least by what they are telling us, NLP will be an important player in their AI investment!
Me walking in to the Pentagon with the Big Bad NLP Database like:
Oh, and ACL happened. Hereβs a list of the best papers presented at the conf:
Best Paper Awards at ACL 2020
Congratulations to the recipients of the best paper awards at ACL 2020! Beyond Accuracy: Behavioral Testing of NLPβ¦
acl2020.org
Best Paper
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin and Sameer Singh https://aclweb.org/anthology/2020.acl-main.442/β¦
Also:
Several tech groups revealed their paper list prior to last weekβs newsletter so if you are interested on catching up:
Stanford AI Lab Papers and Talks at ACL 2020
The 58th annual meeting of the Association for Computational Linguistics is being hosted virtually this week. We'reβ¦
ai.stanford.edu
Salesforce Research at ACL 2020
The 58th Association for Computational Linguistics (ACL) Conference kicked off this week and runs from Sunday, Jul 5 toβ¦
blog.einstein.ai
Google at ACL 2020
This week, the 58th Annual Meeting of the Association for Computational Linguistics ( ACL 2020), a premier conferenceβ¦
ai.googleblog.com
FYI, Googleβs TAPAS model for table parsing will be included in our upcoming update of the Super Duper NLP Repo. U+1F525U+1F525
This Week:
Knowledge Graphs at ACL
Facebookβs HUGE ASR Model
Scaling BERT in Deployment on CPUs
Scispacy Update
To the Cloud, with 1 Line of Code
AI Survey, this Time From Hugging Face
Dataset of the Week: The Semantic Scholar Open Research Corpus
Knowledge Graphs at ACL
All the way from ACL, Galkin had to remind us his knowledge graph research is immortal. And research using natural language over tables, adapting transformers over knowledge graphs is increasing in popularity:
Hereβs the TOC:
- Question Answering over Structured Data
- KG Embeddings: Hyperbolic and Hyper-relational
- Data-to-text NLG: Prepare your Transformer
- Conversational AI: Improving Goal-Oriented Bots
- Information Extraction: OpenIE and Link Prediction
- Conclusion
Knowledge Graphs in Natural Language Processing @ ACL 2020
This post commemorates the first anniversary of the series where we examine advancements in NLP and Graph ML powered byβ¦
medium.com
Facebookβs HUGE ASR Model
Training a multi-lingual speech recognition model is important to improve the scope of low-resource languages and scalability of models in production. In Facebookβs recent research, they evaluated a 1 Billion parameter model trained on 51 languages showing efficient results on training time and reduction of word error rate (WER).
βThis model improves WER by 9.1% on average for high-resource languages, by 12.44% for mid-resource languages, and by 28.76% for low-resource languages.β
Paper:
Scaling BERT in Deployment on CPUs
Great article on what it takes to improve latency/throughput in production using CPUs to power the transformer. TL;DR: optimization came down to using the distilled version of BERT, quantization, caching most frequent responses and horizontal scaling.
Blog:
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs
Hereβs a classic chicken-and-egg problem for data scientists and machine learning engineers: when developing a newβ¦
medium.com
Scispacy Update
For those in the science domain, SpaCyβs cousin Scispacy released an update which includes: 4 new Entity Linkers and KBs for Medical Subject Headings(MeSH), RxNorm, Gene Ontology and the Human Phenotype Ontology.
GitHub:
allenai/scispacy
This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, thereβ¦
github.com
Demo:
Streamlit
Edit description
scispacy.apps.allenai.org
While weβre on the subject of SpaCyβ¦
Colab of the Week:
A lightning tour of the SpaCy library with quick snippets of awesome go-to code blocks for various NLP tasks:
Google Colaboratory
Edit description
colab.research.google.com
To the Cloud, with 1 Line of Code
Want to run your Keras model in the cloud but donβt want to rewrite everything? Well now you can take your model from development to the cloud by simply adding just 1 line of code. You will need to use TensorFlow Cloud for this (GitHub below).
Cool thing, it also works from Colab U+1F525U+1F525. Courtesy from Mr. Chollet.
tensorflow/cloud
The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging and training your Keras andβ¦
github.com
AI Survey, this Time From Hugging Face
Hugging Face surveyed the dev community regarding their library. The results show that most use it for work and not play:
What do they want to see prioritized?
They want more real-world examples.
Crazy stat: One-third of the respondents have been using the library for less than 3 months!
Full Survey Results:
Transformers Huge Community Feedback
So last week we shared the first feedback request on U+1F917transformers. The community was pretty amazingly involved inβ¦
discuss.huggingface.co
Dataset of the Week: S2ORC: The Semantic Scholar Open Research Corpus
What is it?
Dataset is large corpus of 81.1M English-language academic papers spanning many academic disciplines. The corpus consists of rich metadata, paper abstracts, resolved bibliographic references, as well as structured full text for 8.1M open access papers.
Where is it?
allenai/s2orc
S2ORC is a general-purpose corpus for NLP and text mining research over scientific papers. We've curated a unifiedβ¦
github.com
Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
If you enjoyed this article, help us out and share with friends!
For complete coverage, follow our Twitter: @Quantum_Stat
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI