Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

NLP News Cypher | 05.03.20
Latest   Machine Learning   Newsletter

NLP News Cypher | 05.03.20

Last Updated on July 27, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

Photo by Dwinanda Nurhanif Mujito on Unsplash

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

NLP News Cypher U+007C 05.03.20

Freedom

Last week, so much happened. ICLR was a great digital turnout and there were tons of papers/code drops from the NLP community. As a result, this week’s newsletter is loaded to the gills.

And just when you thought we could rest from AI conferences…

Facebook be like:

Facebook Research at ICASSP 2020

Facebook AI researchers are presenting their work virtually at the 45th International Conference on Acoustics, Speech…

ai.facebook.com

BTW, Ubuntu says hi!

Ubuntu 20.04 LTS arrives U+007C Ubuntu

April 23rd 2020: Canonical, the publisher of Ubuntu, today announced the general availability of Ubuntu 20.04 LTS, with…

ubuntu.com

KDNuggets, we U+2764 you too:

The Super Duper NLP Repo: 100 Ready-to-Run Colab Notebooks – KDnuggets

There are 2 major components of a machine learning modeling project of any kind: the data, and the algorithms (and…

www.kdnuggets.com

Oh, and meanwhile, back at the ranch: U+1F6F8’s are real:

declassifed U+1F923

This Week:

ICLR Highlights

Meena’s Heart in a Blender

Text-2-Tabular Data

BLINK

A Mosaic

Stanford’s Knowledge Graphs

Wolfman Cometh via YouTube

Dataset of the Week: HybridQA

ICLR Highlights

For the TL;DR crowd:

ICLR 2020 Roundup

Firstly, commiserations, again, that Addis Ababa didn't get 1000's of global AI researchers visiting this week but I'd…

www.linkedin.com

Knowledge Graphs Are A’Boomin

Michael Galkin dropped U+1F525U+1F525U+1F525 this week. Per usual, after every major AI conference, Michael sums up the cream of the crop w/r/t graphs and NLP.

His TOC:

  1. Neural Reasoning for Complex QA with KGs
  2. KG-augmented Language Models
  3. KG Embeddings: Temporal and Inductive Inference
  4. Entity Matching with GNNs
  5. Bonus: KGs in Text RPGs!

Knowledge Graphs @ ICLR 2020

U+1F44B Hello, I hope you are all doing well during the lockdown. ICLR 2020 went fully virtual, and here is a fully virtual…

medium.com

Check out this code for using Wikipedia KGs to answer open-domain QA:

AkariAsai/learning_to_retrieve_reasoning_paths

This is the official implementation of the following paper: Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard…

github.com

My fav topic from Galkin’s review: KGs and RPG games U+1F447.

Paper:

LINK

GitHub:

rajammanabrolu/KG-A2C

Goal driven language generation using knowledge graph A2C agents. This code accompanies the paper Graph Constrained…

github.com

Meena’s Heart in a Blender

A couple of months ago Google made headlines when they released “the world’s best” chatbot called Meena. Their code was never open-sourced. Well now it’s Facebook’s turn, and they’ve open-sourced their chatbot with 3 model sizes: 90M, 2.7B, and 9.4B. It’s called Blender.

You can try the 90M model here with our Colab of the week:

Google Colaboratory

Edit description

colab.research.google.com

Text-2-Tabular Data

Google drops BERT on retrieving tabular data with natural language. The takeaway is that instead of using traditional text-2-SQL type queries (which is difficult to scale across various tables), it uses BERT to encode the tables and questions as input. (FYI, rows, columns, and ranks get their own embedding!) This allows for better generalization.

How does it perform?

On the SQA dataset, it takes SOTA from 55.1 to 67.2!

With datasets WIKISQL and WIKITQ, it performs on par with the SOTA.

GitHub:

google-research/tapas

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly…

github.com

Blog:

Using Neural Networks to Find Answers in Tables

Much of the world's information is stored in the form of tables, which can be found on the web or in databases and…

ai.googleblog.com

BLINK

If you’re looking for an entity linking python library you should check out Facebook’s BLINK. This is a two-stage model using first a retrieval bi-encoder to embed candidates' context and entity descriptions and then a cross-encoder in the 2nd stage. The library uses the 2019/08/01 Wikipedia dump as a knowledge base which means it takes up a hell lot of disk space. The codebase is easy to follow and set-up.

GitHub:

facebookresearch/BLINK

BLINK is an Entity Linking python library that uses Wikipedia as the target knowledge base. The process of linking…

github.com

A Mosaic

The Allen Institute is doing the most interesting work in the reading comprehension/commonsense regions of NLP. They have great demos and I want to share with you COMeT, their event/commonsense knowledge graph. It’s pretty good, I queried “I went to the doctor’s office” and the graph generated intuitive reasons as to “why” I would go to the office. You should give it a whirl.

Mosaic Knowledge Graphs

Demo of COMeT, a knowledge base construction engine that learns to produce new nodes and connections in commonsense…

mosaickg.apps.allenai.org

GitHub:

atcbosselut/comet-commonsense

To run a generation experiment (either conceptnet or atomic), follow these instructions: First clone, the repo: git…

github.com

Stanford’s Knowledge Graphs

Seminar on Knowledge Graphs U+1F60E, videos included.

CS 520

How should AI explicitly represent knowledge? Department of Computer Science, Stanford University, Spring 2020 Tuesdays…

web.stanford.edu

Wolfman Cometh via YouTube

A crisp and clear talk about the current state of NLP and future trends from everyone’s favorite and U+1F917’s very own: Thomas Wolf. Last week they dropped an educational video…

Here are the topic/time stamps from the video:

declassified

Video:

Dataset of the Week: HybridQA

What is it?

Dataset allows for multi-hop QA over tabular data. It contains over 70K question-answer pairs based on 13,000 tables, each table is in average linked to 44 passages.

Sample:

Where is it?

wenhuchen/HybridQA

This repository contains the dataset and code for the paper HybridQA: A Dataset of Multi-Hop Question Answeringover…

github.com

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends!

For complete coverage, follow our Twitter: @Quantum_Stat

www.quantumstat.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓