NLP News Cypher | 05.24.20

Last Updated on July 24, 2023 by Editorial Team

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

NLP News Cypher U+007C 05.24.20

One for the Road

ACL 2020 coming in July to a computer near you. Papers and accepted demos/frameworks are already on display:

Facebook AI released their Blender chatbot earlier this month. I was playing around with the 2.7B model and it talks fairly well U+1F447. A common problem with chit chat dialogue systems usually involves a model’s disregard for a user’s statement which disrupts continuity. But as you can see, Blender does a great job of maintaining continuity.

Here’s the 90M param model to run inference via Colab (the larger models make Colab explode):

Google Colaboratory

Edit description

colab.research.google.com

Paper:

LINK

Happy Memorial Day Weekend! U+1F32D

Eid Mubarak

This Week:

The Stormy Seas of Deployment

BERTweet

Microsoft Build Recap

HackerEarth Survey

Hugging Reformer

Dataset of the Week: ATOMIC

The Stormy Seas of Deployment

AI2 and company companies put out great research in NLP. And this week, one of their engineers wrote a blog post on how they manage to deploy all their great demos. They let us into their world of mass deployment and how they are able to run such a complex back-end to deploy real-time inference at scale. You don’t get to see inside the machine very often, so have look at their architecture (Kubernetes alertU+1F4A5):

Skiff: Taming the Stormy Seas of the Modern Web

Whereas many research organizations are hidden behind closed doors inherited by the stringent restrictions placed upon…

medium.com

BERTweet

About time someone managed to pre-train BERT on massive amounts of English tweets. Having dealt with tweet data for my own demos, this will come in handy as Twitter is one of the obvious choices for text data harvesting. By the way, it includes COVID-related tweets which can come in handy if your model requires to be aware of COVID vocabulary.

Details:

850M English Tweets (16B word tokens ~ 80GB), containing 845M Tweets streamed from 01/2012 to 08/2019 and 5M Tweets related the COVID-19 pandemic

GitHub:

VinAIResearch/BERTweet

Table of contents BERTweet is the first public large-scale language model pre-trained for English Tweets. BERTweet is…

github.com

Microsoft Build Recap

One of the biggest developer conferences happened. And Microsoft unveiled some really cool features like auto-completing VS code and a supercomputer for OpenAI running on Azure. Find the highlights here:

The most important announcements from Microsoft Build, its annual conference for software…

The coronavirus didn't stop Microsoft from issuing a flood of news in its annual Build conference this week. Now in its…

www.cnbc.com

The supercomputer developed for OpenAI is a single system with more than 285,000 CPU cores, 10,000 GPUs and 400 gigabits per second of network connectivity for each GPU server.

Microsoft announces new supercomputer, lays out vision for future AI work – The AI Blog

Microsoft has built one of the top five publicly disclosed supercomputers in the world, making new infrastructure…

blogs.microsoft.com

HackerEarth Survey

So what are developers up to nowadays? Let’s see what 16,655 developers from 76 countries have to say:

Highlights:

Among students (29%) and experienced developers (32%), Go has emerged as the clear winner for the most sought-after programming language.

The 2nd most used resource for learning about development is from YouTube, that includes both students and professionals.

Survey:

Hugging Reformer

Hugging Face introduced the Reformer to their growing list of transformers this past week, and even included a nice Colab for text generation. If you are having trouble keeping up with the list of all these transformers, you are not alone, it’s a good problem to have. U+1F917 documentation

Colab of the Week:

Google Colaboratory

Edit description

colab.research.google.com

Dataset of the Week: ATOMIC

What is it?

Dataset is a knowledge graph of 877K textual description triples of inferential knowledge.

Sample:

Where is it?

ATOMIC Knowledge Graph Browser

For some events, annotations are quite diverse, does this mean the data is noisy? Importantly, some events invoke…

homes.cs.washington.edu

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends!

For complete coverage, follow our Twitter: @Quantum_Stat

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

NLP News Cypher | 05.24.20

Author(s): Ricky Costa

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

NLP News Cypher U+007C 05.24.20

One for the Road

Accepted Papers

Note that the titles/authors may change and papers may be withdrawn. For the final titles/authors, please refer to the…

Google Colaboratory

Edit description

This Week:

The Stormy Seas of Deployment

Skiff: Taming the Stormy Seas of the Modern Web

Whereas many research organizations are hidden behind closed doors inherited by the stringent restrictions placed upon…

BERTweet

VinAIResearch/BERTweet

Table of contents BERTweet is the first public large-scale language model pre-trained for English Tweets. BERTweet is…

Microsoft Build Recap

The most important announcements from Microsoft Build, its annual conference for software…

The coronavirus didn't stop Microsoft from issuing a flood of news in its annual Build conference this week. Now in its…

Microsoft announces new supercomputer, lays out vision for future AI work – The AI Blog

Microsoft has built one of the top five publicly disclosed supercomputers in the world, making new infrastructure…

HackerEarth Survey

Highlights:

Hugging Reformer

Google Colaboratory

Edit description

Dataset of the Week: ATOMIC

What is it?

Sample:

Where is it?

ATOMIC Knowledge Graph Browser

For some events, annotations are quite diverse, does this mean the data is noisy? Importantly, some events invoke…

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement