Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


NLP News Cypher | 05.10.20
Latest   Machine Learning   Newsletter

NLP News Cypher | 05.10.20

Last Updated on July 24, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

Photo by Hendrik Cornelissen on Unsplash


NLP News Cypher U+007C 05.10.20


And we’re back. We’ve released another update to the Big Bad NLP Database! Another 50 datasets taking us past 400 total and yet, still so many left to go. I would like to thank all contributors: Martin Schmitt, Rachel Bawden, Devamanyu Hazarika, Panagiotis Simakis, and Andrew Thompson.

Oh, someone gifted me an award on Reddit, not sure what this means. But I have a teddy bear now (it’s the brown looking thing below), it’s called a Hugz award, U+1F937‍U+2642️ Cheers!


Also,U+1F6F8’s continue to exist.


An awesome peep on Reddit showed how they enhanced the video quality of the 3 UFO videos released several weeks ago. I couldn’t see much of a delta in the video quality, but it’s still interesting to know their workflow. Outline below:

Happy Momma’s Day U+1F469‍U+1F466‍U+1F466 !


FYI, a surprise coming this week, stay tuned!

This Week:

From RoBERTa Import Scratch

InferKit U+007C Bringing AutoML to NLP

Papers w/ Code Has A Paper w/ Code

The Keras Site

TL;DR Summarization

Dataset of the Week: WikiTableQuestions

From RoBERTa Import Scratch

If you want to pre-train a SOTA model like RoBERTa from scratch check out this codebase (also includes fine-tuning)! The blog is really intuitive because there are code blocks annotating the author’s workflow, in addition to a Colab! It goes over data, tokenizers, and model handling.

FastHugs: Language Modelling with Tranformers and Fastai

This aims to be an end-to-end description with code of how to train a transformer language model using fastai (v2) and…

Colab of the Week:

Google Colaboratory

Edit description

InferKit U+007C Bringing AutoML to NLP

From the maker of TalkWithTransformer, Adam King shares his latest ML project: InferKit! What is it? For now, it allows you to do state-of-the-art text classification WITHOUT any code, and it’s super simple to use. No need for hyper-parameter tuning, you just drop your CSV in the browser, click train, and InferKit’s cloud architecture does the rest. After training is done, you get an email alert, follow the link and it comes shipped with its own endpoint APIU+1F525U+1F525. I’ve already tried it and it was seamless. Soon, InferKit will also be able to conduct text generation!

FYI, anyone who signs up gets $25 of free credits. Live dangerously, give it a whirl.



Train state-of-the-art machine learning models to categorize your data with custom labels-no coding required. Use the…

Papers w/ Code Has A Paper and Code

A great update from Paperswithcode. Their database now holds over 2,500 leaderboards! In addition, they have a new extraction model, AxCell, that allows you to extract table results from an ML research paper!

Surprise, their model is open-sourced:


This repository is the official implementation of AxCell: Automatic Extraction of Results from Machine Learning Papers…

The Keras Site

New site for Keras. Not a surprise as Mr. Chollet has recently been dropping gems on Twitter along with many Colab notebooks (I’ve included one below). The site comes with a new batch of guidelines and examples.


Developer guides

Our developer guides are deep-dives into specific topics such as layer sublassing, fine-tuning, or model saving…


Keras: the Python deep learning API

Keras is an API designed for human beings, not machines. Keras follows best practices for reducing cognitive load: it…

Enjoy the neural network hallucinations in this Colab:

Google Colaboratory

Edit description

TL;DR Summarization

Allen Institute's demo for summarizing computer science research papers is here. In addition, they’ve released SCITLDR, a new dataset with 3,935 TLDRs of author-written summaries. U+1F60E

FYI, on SCITLDR, it outperforms BART! For best results, you can feed it the abstract, intro, and conclusion of your test set. U+1F447



Edit description



Dataset of the Week: WikiTableQuestions

What is it?

Dataset is for the task of question answering on a semi-structured HTML table.


Where is it?


Version 1.0.2 (October 4, 2016) The WikiTableQuestions dataset is for the task of question answering on semi-structured…

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends!

For complete coverage, follow our Twitter: @Quantum_Stat

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓