NLP News Cypher | 05.10.20
Last Updated on July 24, 2023 by Editorial Team
Author(s): Ricky Costa
Originally published on Towards AI.
NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
NLP News Cypher U+007C 05.10.20
Traveler
And weβre back. Weβve released another update to the Big Bad NLP Database! Another 50 datasets taking us past 400 total and yet, still so many left to go. I would like to thank all contributors: Martin Schmitt, Rachel Bawden, Devamanyu Hazarika, Panagiotis Simakis, and Andrew Thompson.
Oh, someone gifted me an award on Reddit, not sure what this means. But I have a teddy bear now (itβs the brown looking thing below), itβs called a Hugz award, U+1F937βU+2642οΈ Cheers!
Also,U+1F6F8βs continue to exist.
An awesome peep on Reddit showed how they enhanced the video quality of the 3 UFO videos released several weeks ago. I couldnβt see much of a delta in the video quality, but itβs still interesting to know their workflow. Outline below:
Happy Mommaβs Day U+1F469βU+1F466βU+1F466 !
&
FYI, a surprise coming this week, stay tuned!
This Week:
From RoBERTa Import Scratch
InferKit U+007C Bringing AutoML to NLP
Papers w/ Code Has A Paper w/ Code
The Keras Site
TL;DR Summarization
Dataset of the Week: WikiTableQuestions
From RoBERTa Import Scratch
If you want to pre-train a SOTA model like RoBERTa from scratch check out this codebase (also includes fine-tuning)! The blog is really intuitive because there are code blocks annotating the authorβs workflow, in addition to a Colab! It goes over data, tokenizers, and model handling.
FastHugs: Language Modelling with Tranformers and Fastai
This aims to be an end-to-end description with code of how to train a transformer language model using fastai (v2) andβ¦
www.ntentional.com
Colab of the Week:
Google Colaboratory
Edit description
colab.research.google.com
InferKit U+007C Bringing AutoML to NLP
From the maker of TalkWithTransformer, Adam King shares his latest ML project: InferKit! What is it? For now, it allows you to do state-of-the-art text classification WITHOUT any code, and itβs super simple to use. No need for hyper-parameter tuning, you just drop your CSV in the browser, click train, and InferKitβs cloud architecture does the rest. After training is done, you get an email alert, follow the link and it comes shipped with its own endpoint APIU+1F525U+1F525. Iβve already tried it and it was seamless. Soon, InferKit will also be able to conduct text generation!
FYI, anyone who signs up gets $25 of free credits. Live dangerously, give it a whirl.
App:
InferKit
Train state-of-the-art machine learning models to categorize your data with custom labels-no coding required. Use theβ¦
inferkit.com
Papers w/ Code Has A Paper and Code
A great update from Paperswithcode. Their database now holds over 2,500 leaderboards! In addition, they have a new extraction model, AxCell, that allows you to extract table results from an ML research paper!
Surprise, their model is open-sourced:
paperswithcode/axcell
This repository is the official implementation of AxCell: Automatic Extraction of Results from Machine Learning Papersβ¦
github.com
The Keras Site
New site for Keras. Not a surprise as Mr. Chollet has recently been dropping gems on Twitter along with many Colab notebooks (Iβve included one below). The site comes with a new batch of guidelines and examples.
Guides:
Developer guides
Our developer guides are deep-dives into specific topics such as layer sublassing, fine-tuning, or model savingβ¦
keras.io
Site:
Keras: the Python deep learning API
Keras is an API designed for human beings, not machines. Keras follows best practices for reducing cognitive load: itβ¦
keras.io
Enjoy the neural network hallucinations in this Colab:
Google Colaboratory
Edit description
colab.research.google.com
TL;DR Summarization
Allen Institute's demo for summarizing computer science research papers is here. In addition, theyβve released SCITLDR, a new dataset with 3,935 TLDRs of author-written summaries. U+1F60E
FYI, on SCITLDR, it outperforms BART! For best results, you can feed it the abstract, intro, and conclusion of your test set. U+1F447
Demo:
SciTLDR
Edit description
scitldr.apps.allenai.org
Paper:
Dataset of the Week: WikiTableQuestions
What is it?
Dataset is for the task of question answering on a semi-structured HTML table.
Sample:
Where is it?
ppasupat/WikiTableQuestions
Version 1.0.2 (October 4, 2016) The WikiTableQuestions dataset is for the task of question answering on semi-structuredβ¦
github.com
Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
If you enjoyed this article, help us out and share with friends!
For complete coverage, follow our Twitter: @Quantum_Stat
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI