The NLP Cypher | 11.22.20
Last Updated on July 24, 2023 by Editorial Team
Author(s): Ricky Costa
Originally published on Towards AI.
NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
The NLP Cypher U+007C 11.22.20
Ultima Ratio Regum
Hey welcome back! EMNLP happened this week U+1F440. Tons of research came out and this newsletter wonβt do justice to all of the great research conducted by institutions worldwide. But firstβ¦
We will be releasing an update to the Big Bad NLP Database this week and also a large update to the Super Duper NLP Repo after Thanksgiving. These updates will be delivered via our email NL, if interested, you can sign-up on our homepage.
As always, if you enjoy this read, please give it a U+1F44FU+1F44F and share with your enemies. U+1F601
Ok, knowledge graphs time: Once again, Michael Galkin released his incredibly detailed round-up newsletter U+1F525U+1F525. After a strong start in 2019 for knowledge augmented language models, it seems they continue to be the hot ticket for this year. Below is the TOC and link to full blog post (*warning* its extensive and awesome):
ToC
- KG-Augmented Language Models: Empower your Transformer
1.1 Autoencoders
1.2 Autoregressive - Natural Language Generation: New Folks in Datasetlandia
- Entity Linking: Massive and Multilingual
- Relation Extraction: OpenIE 6 and Neural Extractors
- KG Representation Learning: Temporal KGC and Successor to FB15K-237
- ConvAI + KGs: On the Shoulders of OpenDialKG
- Wrapping Up
Knowledge Graphs in NLP @ EMNLP 2020
Your guide to the KG-related research in NLP, November edition.
mgalkin.medium.com
High Performance NLP at EMNLP (SLIDES)
Slides from Google and Uni. of Washington that explores the current state of scaling NLP Models in order to deal with large volumes text, cost and software and hardware considerations. This tutorial discusses the current and possible future directions for attacking these key areas for improving NLP efficiency.
GraphGym
An awesome library that recently came out and built on top of PyTorch Geometric. It allows for an easy configuration of data loading and can be easily initialized for various GNN configurations in parallel. This is a good library to start with if you feel Geometric on its own is too intimidating. U+1F60E
snap-stanford/GraphGym
GraphGym is a platform for designing and evaluating Graph Neural Networks (GNN). 1. Highly modularized pipeline for GNNβ¦
github.com
Paper: https://arxiv.org/pdf/2011.08843.pdf
KeyBERT
This fella allows you to extract keywords and keyphrases from text by using BERT embeddings. Itβs pretty straightforward and to conduct inference, you only need 3 lines of code. Itβs fairly good, I tested it on abstract summaries from arXiv and I may use it to index the papers I read. U+270C
MaartenGr/KeyBERT
KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords andβ¦
github.com
Linformer
Linformer is the βfirst theoretically proven linear-time transformerβ out of FacebookAI (came out this Summer). In a nut shell, the amount of compute grows linearly with the amount of input length, unlike your typical transformer. U+1F447
This is great news for practitioners as this would allow one to really scale models in production, especially if you have to do millions of computations in a short amount of time. Apparently FB already has it running in production.
How Facebook uses super-efficient AI models to detect hate speech
Building AI that can analyze complicated text isn't enough to protect people from harmful content. We need systems thatβ¦
ai.facebook.com
Legal Search Engine
judyrecords is the largest search engine of United States court cases on the Internet.
Although the search engine boasts a huge catalogue, not all court documents can be made available online as some documents can only be requested in person at specific court houses. Still an awesome feat. When is the API coming out? U+1F601
judyrecords
Edit description
www.judyrecords.com
Podcast Search Engine
Interested in keeping up with podcasts: Hereβs an API that allows you to search meta data of podcasts and episodes by people, places, or topics. The API is free as long as you stay within 2,500 requests per month.
Podcast API: Podcast Search & Directory API
We have a transparent and simple pricing model for Listen API. You can start with FREE plan without entering yourβ¦
www.listennotes.com
Tabular Transformers
A great medium article highlighting how to wrap the multimodal transformers library on top of the transformers library for tabular data! Currently the library supports 3 models: BERT, DistilBERT and RoBERTa. For training, you can use the trainer class from the Transformers lib.
Documentation: https://multimodal-toolkit.readthedocs.io/en/latest/modules/model.html#module-multimodal_transformers.model.tabular_transformers
Blog:
How to Incorporate Tabular Data with HuggingFace Transformers
[Colab] [Github]
medium.com
youtube-dl returns
Devs put up a good fight. Itβs back?
GitHub explanation for reinstating youtube-dl repo:
Standing up for developers: youtube-dl is back – The GitHub Blog
Today we reinstated additional information youtube-dl, a popular project on GitHub, after we received about the projectβ¦
github.blog
Systematic Comparison of Open Information Extraction Techniques
In this paper from EMNLP, authors evaluated current deep learning systems for conducting open information extraction (OIE). That is, to automatically extract triplets from text so you can obtain subject predicate object from sentences. They explored different training scenarios for OIE, and compared existing OIE models. Good introductory paper if you are new to this space.
Paper: https://www.aclweb.org/anthology/2020.emnlp-main.690.pdf
Repo Cypher U+1F468βU+1F4BB
A collection of recent released repos that caught our U+1F441
DiagNNose
This library contains a set of modules that can be used to analyze the activations of neural networks, with a focus on NLP architectures such as LSTMs and Transformers
i-machine-think/diagNNose
Paper: https://arxiv.org/abs/2011.06819 Demo: Documentation: https://diagnnose.readthedocs.io This library contains aβ¦
github.com
NLPGym
NLPGym is a toolkit to bridge the gap between applications of RL and NLP. This aims at facilitating research and benchmarking of DRL application on natural language processing tasks.
rajcscw/nlp-gym
NLPGym is a toolkit to bridge the gap between applications of RL and NLP. This aims at facilitating research andβ¦
github.com
Entity Recognition and Relation Extraction from Scientific and Technical Texts in Russian
Datasets and models for information extraction tasks in the Russian
iis-research-team/ner-rc-russian
Contribute to iis-research-team/ner-rc-russian development by creating an account on GitHub.
github.com
WikiAsp
WikiAsp is a multi-domain, aspect-based summarization dataset in the encyclopedic domain. In this task, models are asked to summarize cited reference documents of a Wikipedia article into aspect-based summaries.
neulab/wikiasp
This repository contains the dataset from the paper " WikiAsp: A Dataset for Multi-domain Aspect-based Summarization"β¦
github.com
Dataset of the Week: GrailQA
What is it?
Dataset used for knowledge base question answering (KBQA) containing 64,331 crowdsourced questions involving up to 4 relations and functions like counting, comparatives, and superlatives. The dataset covers all of the 86 domains in Freebase Commons.
Sample
Where is it?
Strongly Generalizable Question Answering Dataset
Strongly Generalizable Question Answering Dataset (GrailQA) is a new large-scale, high-quality dataset for questionβ¦
dki-lab.github.io
Paper: https://arxiv.org/pdf/2011.07743.pdf
Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
For complete coverage, follow our Twitter: @Quantum_Stat
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI