The NLP Cypher | 03.07.21
The NLP Cypher | 03.07.21

The Crow’s Nest

Hey Welcome back! Had a loooong weekend of busy busy, so this week’s NL will be less wordy than usual, but we’ll be back to normalcy next week.

Oh and by the way,

Maybe… the universe is just a giant neural network… U+1F937‍U+2642️

At least that’s the new theory out of MIT. FYI, it sounds eerily similar to Stephen Wolfram’s graph approach to physics.

The only question I have is, who’s running the compute? U+1F937‍U+2642️

The Universe Might Be One Big Neural Network, Study Finds

One scientist says the universe is a giant neural net. The wild concept uses neural net theory to unify quantum and…

FYI, we added 25 new notebooks to the Super Duper NLP Repo!! U+1F447


OpenChat is an awesome repo where one can interact with top tier dialogue models with just 1 line of code. Currently, it supports:

  • Microsoft’s DialoGPT : small, medium, large.
  • Facebook’s BlenderBot : small, medium, large, xlarge.


OpenChat is opensource chatting framework for generative models. You can talk with AI with only one line of code…

AI Index 2021

The yearly and comprehensive report on AI is out. The scope of the report is focused more on a global and strategic scale. For NLP focused content, start on page 62. The report is +200 pages long U+1F648.

AI Index 2021

The 2021 AI Index report is one of the most comprehensive reports about artificial intelligence to date. This latest…

OpenAI’s Reflection on its Latest Multi-Modal Models

They go in deep on CLIP’s neurons and their representations. They also analyze where they can go wrong.

Multimodal Neurons in Artificial Neural Networks

We've discovered neurons in CLIP that respond to the same concept whether presented literally, symbolically, or…

Mastering Python U+007C The OverFlow

Last week I had part II of this series, here’s III and IV now.

Level Up: Mastering statistics with Python – part 3 – Stack Overflow Blog

Welcome back! This is the third class in our Level Up series on statistics with Python. If you're just tuning in, you…

Level Up: Mastering statistics with Python — part 4 — Stack Overflow Blog

code-for-a-living March 2, 2021 While many introductory statistics classes teach the CLT, very few actually attempt to…

YAMNet U+007C Transfer Learning for Audio

YAMNet (“Yet another Audio Mobilenet Network”) is a pretrained model that predicts 521 audio events based on the AudioSet corpus.

Transfer Learning for Audio Data with YAMNet

March 02, 2021 – Posted by Luiz GUStavo Martins, Developer Advocate Transfer learning is a popular machine learning…

Several Methods for Updating Neural Networks

Here are the methods discussed:

Update Model on New Data Only

Update Model on Old and New Data

Ensemble Model With Model on New Data Only

Ensemble Model With Model on Old and New Data

How to Update Neural Network Models With More Data – Machine Learning Mastery

Deep learning neural network models used for predictive modeling may need to be updated.

Top Data Labeling Software

In-depth analysis of 10 data labeling tools for machine learning datasets.

Data Labeling Software: Best Tools for Data Labeling in 2021 –

In machine learning and AI development, the aspects of data labeling are essential. You need a structured set of…

Repo Cypher U+1F468‍U+1F4BB

A collection of recently released repos that caught our U+1F441

Gradual Finetune

If you are just fine-tuning your model once, you may be missing out. paper


Gradually fine-tuning in a multi-step process can yield substantial further gains and can be applied without modifying…

Connected Papers U+1F4C8

Forte U+007C NLP Pipeline Toolkit

A multi-purpose platform for searching documents, information extraction and language generation.


Forte is a toolkit for building Natural Language Processing pipelines, featuring cross-task interaction, adaptable…

Connected Papers U+1F4C8

Meta-Curriculum Learning for Machine Translation

Improving the meta-learning (teacher model) of MT for low-resource languages


Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation (AAAI 2021) Please cite as…

Connected Papers U+1F4C8


Automatically annotates named entities


ANEA is a tool to automatically annotate named entities in unlabeled text based on entity lists for the use as distant…

Connected Papers U+1F4C8


Evaluation toolkit for Russian sentence embeddings.


RuSentEval is an evaluation toolkit for sentence embeddings for Russian. In this repo you can find the data and scripts…

Connected Papers U+1F4C8

Learning Chess Blindfolded

Training language models on chess notation. U+1F525U+1F525


Chess as a testbed for evaluating language models on world state tracking. Pretrained model released via Huggingface…

Connected Papers U+1F4C8


Using Graph attention for the entity alignment task.


Relation-aware Graph Attention Networks for Global Entity Alignment – zhurboo/RAGA

Connected Papers U+1F4C8

Dataset of the Week: Wikipedia-based Image Text (WIT) Dataset

What is it?

A multimodal multilingual dataset. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages.


Where is it?


Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. WIT is composed of a curated set…

