The NLP Cypher | 12.06.20

Last Updated on July 24, 2023 by Editorial Team

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher U+007C 12.06.20

Orion

Hey, welcome back! Plenty of NLP to discuss this week as NeurIPS takes off today. Over the last couple of days, the usual suspects opened the research paper firehose. Have a look U+1F447

Carnegie Mellon University at NeurIPS 2020

Carnegie Mellon University is proud to present 88 papers at the 34th Conference on Neural Information Processing…

blog.ml.cmu.edu

Microsoft at NeurIPS 2020 – Microsoft Research

Microsoft is delighted to sponsor and attend the 34th Annual Conference on Neural Information Processing System…

www.microsoft.com

Salesforce Research at NeurIPS 2020

This year marks the 34th annual conference on Neural Information Processing Systems (NeurIPS) reimagined for the first…

blog.einstein.ai

Super Duper NLP Repo U+270C

We recently made an awesome contribution to the Super Duper NLP Repo, adding 47 notebooks bringing us to 313 total! Added a decent selection of notebooks relating to adapters, the NEMO library, GEDI GPT-2, and PERIN for semantic parsing. Want to thank Abhilash Majumder & Eyal Gruss for their awesome contribution! U+1F60E

Oh, and EMNLP has yet to go away, Eric Wallace et al. released his slides from the conference on the interpretability of NLP models predictions.

Jraph U+007C DeepMind’s GNN Lib

While DeepMind isn’t solving age-old problems in protein folding, they just released a GNN library (in jax). It probably flew under everyone’s radar…

Here’s a basic script for working with graph tuples:

deepmind/jraph

Permalink GitHub is home to over 50 million developers working together to host and review code, manage projects, and…

github.com

El GitHub

deepmind/jraph

Jraph (pronounced giraffe) is a lightweight library for working with graph neural networks in jax. It provides a data…

github.com

Kaggle Data Science and ML 2020 Survey

Everyone’s favorite data science survey was released:

TL;DR

Coursera most popular learning resource.

A lot data scientists working in small companies (less than 50 employees).

Wow, Jupyter is the go-to IDE in data science(U+1F62C).

Only 15% say transformers are the most commonly used model architecture.

AWS leads cloud, but Google comes in 2nd, (that was a surprise, I would’ve guessed Azure).

Tensorboard more popular than I thought.

Survey

State of Data Science and Machine Learning 2020

Download our executive summary for a profile of today's working data scientist and their tools

www.kaggle.com

Data Flow

A blog from Google Cloud (with code snippets) discussing how to create data pipelines for your ML models. It focuses on batching, the singleton model pattern, and dealing with threading/processing. A helpful read for those deploying in the enterprise.

ML inference in Dataflow pipelines U+007C Google Cloud Blog

In this blog, we covered some of the patterns for running remote/local inference calls, including; batching, the…

cloud.google.com

MSFP U+007C Data Type for Efficient Inference

Microsoft invented a new data type used in data representation with a focus on improved latency during model inference called… MSFP.

[MSFP] enables dot product operations — the core of the matrix-matrix and matrix-vector multiplication operators critical to DNN inference — to be performed nearly as efficiently as with integer data types, but with accuracy comparable to floating point.

Apparently MS uses MSFP in Project Brainwave, their real-time production-scale DNN inference in the cloud. As models get bigger, big tech is getting smarter on how to deal with scale and inference in production.

A Microsoft custom data type for efficient inference – Microsoft Research

AI is taking on an increasingly important role in many Microsoft products, such as Bing and Office 365. In some cases…

www.microsoft.com

Recommenders Update

When we first spoke about TensorFlow’s Recommenders library several newsletters ago, I was really excited but TF has upped the ante by building deep learning recommender models “that can retrieve the best candidates out of millions in milliseconds.” U+1F440

It uses Google’s ScaNN library released this past summer, you can check out the repo here: https://github.com/google-research/google-research/tree/master/scann

The second part of their update is their leveraging of DCN (Deep cross networks) models.

TensorFlow Recommenders: Scalable retrieval and feature interaction modelling

November 30, 2020 – Posted by Ruoxi Wang, Phil Sun, Rakesh Shivanna and Maciej Kula (Google) In September, we…

blog.tensorflow.org

Repo Cypher U+1F468‍U+1F4BB

A collection of repos/papers that caught our U+1F441

DframCy

DframCy provides clean APIs to convert spaCy’s linguistic annotations, Matcher and PhraseMatcher information to Pandas dataframe.

yash1994/dframcy

DframCy is a light-weight utility module to integrate Pandas Dataframe to spaCy's linguistic annotation and training…

github.com

Wolfram’s Model Stash

Wolfram has his own Deep Learning model hub. Just stumbled upon this one when I saw one of Wolfram’s tweets earlier this week. U+1F648

Wolfram Neural Net Repository

The Wolfram Neural Net Repository is a public resource that hosts an expanding collection of trained and untrained…

resources.wolframcloud.com

Novel2Graph

The algorithm receives a book and it discovers main characters and main relations between characters.

Oldie but goodie.

IDSIA/novel2graph

The algorithm receives a book and it discovers main characters, main relations between characters and more powerful…

github.com

EDGEBert

New research paper on the improvement of memory and latency w/r/t BERT inference that utilizes several techniques in compression and model architecture. The authors boast of “achieving up to 2.4× and 13.4× inference latency and memory savings, respectively, with less than 1%-pt. drop in accuracy.” U+1F440

Paper: https://arxiv.org/pdf/2011.14203.pdf

OCR and Deep Learning

Couple of weeks ago on LinkedIn I posted a question regarding current OCR techniques that led to a great discussion with my connections. This week, I found this U+1F447. WINNING!

Paper: https://arxiv.org/pdf/2011.13534.pdf

Long Text Classification with BERT

Looking to classify text documents with more than 250 words per doc?

Notebook (U+1F525)

Dataset of the Week: XED

What is it?

A multi-lingual dataset consisting of emotion annotated movie subtitles from OPUS used for sentiment analysis. The task is formulated as multi-label classification.

Where is it?

Helsinki-NLP/XED

This is the XED dataset. The dataset consists of emotion annotated movie subtitles from OPUS. We use Plutchik's 8 core…

github.com

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

For complete coverage, follow our Twitter: @Quantum_Stat

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Multi-Agent AI: From Isolated Agents to Cooperative Ecosystems

Inside rStar-Math, a Technique that Makes Small Models Math GPT-o1 in Math Reasoning

Multi-Class Classification VS Multi-Label Classification

Building Large Action Models: Insights from Microsoft

My 6 Secret Tips for Getting an ML Job in 2025

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

The NLP Cypher | 12.06.20

Author(s): Ricky Costa

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher U+007C 12.06.20

Orion

Carnegie Mellon University at NeurIPS 2020

Carnegie Mellon University is proud to present 88 papers at the 34th Conference on Neural Information Processing…

OpenAI at NeurIPS 2020

Live demos and discussions at our virtual booth.

Microsoft at NeurIPS 2020 – Microsoft Research

Microsoft is delighted to sponsor and attend the 34th Annual Conference on Neural Information Processing System…

Salesforce Research at NeurIPS 2020

This year marks the 34th annual conference on Neural Information Processing Systems (NeurIPS) reimagined for the first…

ToC

Jraph U+007C DeepMind’s GNN Lib

deepmind/jraph

Permalink GitHub is home to over 50 million developers working together to host and review code, manage projects, and…

deepmind/jraph

Jraph (pronounced giraffe) is a lightweight library for working with graph neural networks in jax. It provides a data…

Kaggle Data Science and ML 2020 Survey

State of Data Science and Machine Learning 2020

Download our executive summary for a profile of today's working data scientist and their tools

Data Flow

ML inference in Dataflow pipelines U+007C Google Cloud Blog

In this blog, we covered some of the patterns for running remote/local inference calls, including; batching, the…

MSFP U+007C Data Type for Efficient Inference

A Microsoft custom data type for efficient inference – Microsoft Research

AI is taking on an increasingly important role in many Microsoft products, such as Bing and Office 365. In some cases…

Recommenders Update

TensorFlow Recommenders: Scalable retrieval and feature interaction modelling

November 30, 2020 – Posted by Ruoxi Wang, Phil Sun, Rakesh Shivanna and Maciej Kula (Google) In September, we…

Repo Cypher U+1F468‍U+1F4BB

A collection of repos/papers that caught our U+1F441

DframCy

yash1994/dframcy

DframCy is a light-weight utility module to integrate Pandas Dataframe to spaCy's linguistic annotation and training…

Wolfram’s Model Stash

Wolfram Neural Net Repository

The Wolfram Neural Net Repository is a public resource that hosts an expanding collection of trained and untrained…

Novel2Graph

IDSIA/novel2graph

The algorithm receives a book and it discovers main characters, main relations between characters and more powerful…

EDGEBert

OCR and Deep Learning

Long Text Classification with BERT

ArmandDS/bert_for_long_text

Permalink GitHub is home to over 50 million developers working together to host and review code, manage projects, and…

Using BERT For Classifying Documents with Long Texts

How to ﬁne-tuning Bert for inputs longer than a few words or sentences

Dataset of the Week: XED

What is it?

Where is it?

Helsinki-NLP/XED

This is the XED dataset. The dataset consists of emotion annotated movie subtitles from OPUS. We use Plutchik's 8 core…

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement