Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


NLP News Cypher | 04.19.20
Latest   Machine Learning   Newsletter

NLP News Cypher | 04.19.20

Last Updated on July 27, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

NLP News Cypher | 04.19.20
Photo by Pawel Nolbert on Unsplash


NLP News Cypher U+007C 04.19.20


The universe runs on a simple rule, and it could be, that the framework of this rule may be interpreted by computing hyper-graphs. This past week, while a pandemic engulfed the planet, Stephen Wolfram unveiled his vision for what governs our universe, the possible source code that initializes all fundamental laws of physics.

An eerie parallel can be drawn between this week’s event, and what happened in 1665 when another physicist retreated to his childhood home for private study to avoid the plague. The aftermath was the law of gravity and calculus.

Back when I first read Wolfram’s insights into cellular automata (and its consequences on computation) it was fascinating. So when I heard he released his “theory of everything” this week, I was really excited for Stephen and all of physics. I hope it is as fruitful as the theories that stemmed from the mind of that other dude from Cambridge U.

FYI, I have Rule 30 on my business card U+1F601.

(declassified) cellular automaton Rule 30

If you want to catch Wolfram’s theory, travel here:

The Wolfram Physics Project: Upcoming Livestreams

Upcoming livestreams of Stephen Wolfram's project to discover the fundamental theory of physics. Recordings of previous…



Last week I opined on our new demo – RABBIT. If you aren’t caught up, it’s a real-time finance tweet classifier running on two distilled transformers. By real-time I mean it classifies tweets as they stream in, it’s not batch (except for when you land on the page). The best time to experience the demo is during the weekly stock market trading hours. Stream rate spikes around 8:00 AM.

For a peek, you can travel here:


RABBIT is a state-of-the-art AI web app that uses transformer models to classify finance-related tweets in real-time…


BTW, we were seeing weird inaccuracies on select topics, as a result, we added an additional 1,000 tweets, retrained the models, and relaunched. On a P100 GPU, this took a total of 45 minutes for data wrangling/fine-tuning 2 models. This is one of the luxuries of modern NLP stacks, fine-tuning SOTA models doesn’t take long.

I have a new surprise release lined up for this week, stay tuned U+1F440!

How was your week? U+1F60E

This Week:


Trivial BERT

The Poisoned Pawn

Synthetic Data

Scaling Your Back-End U+1F923


Dataset of the Week: The SimpleQuestions Dataset


One of the most important reasons for the creation of the Big Bad NLP Database was to bring more attention to low-resource languages. So as you may expect, I was really excited this week when a new multi-lingual benchmark was released. XTREME is meant to XTREMELY evaluate your multi-lingual model by looking at 4 NLP objectives: sentence classification, structured prediction, sentence retrieval, and question answering. Not bad right? Except that it expects your model to generalize to a subset of 40 languages per task (and there are 9 of them!). U+1F601

Which ones?

af, ar, bg, bn, de, el, en, es, et, eu, fa, fi, fr, he, hi, hu, id, it, ja, jv, ka, kk, ko, ml, mr, ms, my, nl, pt, ru, sw, ta, te, th, tl, tr, ur, vi, yo, and zh



This repository contains information about XTREME, code for downloading data, and implementations of baseline systems…


Trivial BERT

The McCormick chronicles continue its hunt to discover the inner workings of BERT. This time they looked at the inherent factoids BERT learns from its pre-training by asking fill-in-the-blank questions. You can follow them down the rabbit hole here:

Trivial BERsuiT – How much trivia does BERT know?

As I've been doing all of this research into BERT, I've been really curious-just how much trivia does BERT know? We use…


The Poisoned Pawn

Peeps at CMU can hack SOTA AI models U+1F648. Essentially, they highlight the dangers in community sharing of pre-trained weights (which has become a recent trend). What they discovered is after hacking pre-trained weights, they can penetrate your machine after fine-tuning by enabling…

“the attacker to manipulate the model prediction simply by injecting an arbitrary keyword.”

System engineers be like:

GitHub (BERT looks like a sith lord in the picture below) :


This repository contains the code to implement experiments from the paper " Weight Poisoning Attacks on Pre-trained…




Synthetic Data

Have you heard of synthetic data? Well if you’ve dealt before with class-imbalance on your training set – you’ll relate to this article. Synthetic data is the byproduct of several techniques used for generating data — which includes goodies like over-sampling techniques (found in the library imbalanced-learn), all the way to GANs!

Synthetic Data

The future standard for Data Science development


Scaling Your Back-End U+1F923

Want to know how Kaggle scaled their back-end from a single Kubernetes cluster to a multi-cluster architecture? Well, this detailed article explains one of the most difficult areas in AI deployment, load-balancing your servers with several Kubernetes clusters (in this example, they are using Google’s (GKE) and gRPC as the message protocol). We rarely get a chance at production-level architecture, this is a must-read if you are serious about deploying your models like the pros.

A multi-cluster gRPC architecture on GKE

This post explains how to load-balance a gRPC application across many GKE clusters in different regions to increase…



This paper caught my eye because when I think of transformers and dialogue, my knee jerk reaction immediately goes to chit-chat dialogue. But in this paper, Salesforce Research introduces ToD-BERT, which is BERT pre-trained on 9 task-dialogue datasets. This new model was compared with regular BERT that was only fine-tuned for downstream task-dialogue tasks, and ToD outperformed it. The downstream tasks in question were: intention detection, dialogue state tracking, dialogue act prediction, and response selection. According to the authors, code will be released soon.



Dataset of the Week: The SimpleQuestions Dataset

What is it?

“The SimpleQuestions dataset consists of a total of 108,442 questions written in natural language by human English-speaking annotators each paired with a corresponding fact, formatted as (subject, relationship, object), that provides the answer but also a complete explanation.”


* What American cartoonist is the creator of Andy Lippincott?
Fact: (andy_lippincott, character_created_by, garry_trudeau)
* Which forest is Fires Creek in?
Fact: (fires_creek, containedby, nantahala_national_forest)
* What does Jimmy Neutron do?
Fact: (jimmy_neutron, fictional_character_occupation, inventor)
* What dietary restriction is incompatible with kimchi?
Fact: (kimchi, incompatible_with_dietary_restrictions, veganism)

Where is it?


This page gather resources related to the bAbI project of Facebook AI Research which is organized towards the goal of…


Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends!

For complete coverage, follow our Twitter: @Quantum_Stat


Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓