Last Updated on July 24, 2023 by Editorial Team
Author(s): Ricky Costa
Originally published on Towards AI.
NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
The NLP Cypher U+007C 11.22.20
Ultima Ratio Regum
Hey welcome back! EMNLP happened this week U+1F440. Tons of research came out and this newsletter won’t do justice to all of the great research conducted by institutions worldwide. But first…
We will be releasing an update to the Big Bad NLP Database this week and also a large update to the Super Duper NLP Repo after Thanksgiving. These updates will be delivered via our email NL, if interested, you can sign-up on our homepage.
As always, if you enjoy this read, please give it a U+1F44FU+1F44F and share with your enemies. U+1F601
Ok, knowledge graphs time: Once again, Michael Galkin released his incredibly detailed round-up newsletter U+1F525U+1F525. After a strong start in 2019 for knowledge augmented language models, it seems they continue to be the hot ticket for this year. Below is the TOC and link to full blog post (*warning* its extensive and awesome):
- KG-Augmented Language Models: Empower your Transformer
- Natural Language Generation: New Folks in Datasetlandia
- Entity Linking: Massive and Multilingual
- Relation Extraction: OpenIE 6 and Neural Extractors
- KG Representation Learning: Temporal KGC and Successor to FB15K-237
- ConvAI + KGs: On the Shoulders of OpenDialKG
- Wrapping Up
Knowledge Graphs in NLP @ EMNLP 2020
Your guide to the KG-related research in NLP, November edition.
High Performance NLP at EMNLP (SLIDES)
Slides from Google and Uni. of Washington that explores the current state of scaling NLP Models in order to deal with large volumes text, cost and software and hardware considerations. This tutorial discusses the current and possible future directions for attacking these key areas for improving NLP efficiency.
An awesome library that recently came out and built on top of PyTorch Geometric. It allows for an easy configuration of data loading and can be easily initialized for various GNN configurations in parallel. This is a good library to start with if you feel Geometric on its own is too intimidating. U+1F60E
GraphGym is a platform for designing and evaluating Graph Neural Networks (GNN). 1. Highly modularized pipeline for GNN…
This fella allows you to extract keywords and keyphrases from text by using BERT embeddings. It’s pretty straightforward and to conduct inference, you only need 3 lines of code. It’s fairly good, I tested it on abstract summaries from arXiv and I may use it to index the papers I read. U+270C
KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and…
Linformer is the “first theoretically proven linear-time transformer” out of FacebookAI (came out this Summer). In a nut shell, the amount of compute grows linearly with the amount of input length, unlike your typical transformer. U+1F447
This is great news for practitioners as this would allow one to really scale models in production, especially if you have to do millions of computations in a short amount of time. Apparently FB already has it running in production.
How Facebook uses super-efficient AI models to detect hate speech
Building AI that can analyze complicated text isn't enough to protect people from harmful content. We need systems that…
Legal Search Engine
judyrecords is the largest search engine of United States court cases on the Internet.
Although the search engine boasts a huge catalogue, not all court documents can be made available online as some documents can only be requested in person at specific court houses. Still an awesome feat. When is the API coming out? U+1F601
Podcast Search Engine
Interested in keeping up with podcasts: Here’s an API that allows you to search meta data of podcasts and episodes by people, places, or topics. The API is free as long as you stay within 2,500 requests per month.
Podcast API: Podcast Search & Directory API
We have a transparent and simple pricing model for Listen API. You can start with FREE plan without entering your…
A great medium article highlighting how to wrap the multimodal transformers library on top of the transformers library for tabular data! Currently the library supports 3 models: BERT, DistilBERT and RoBERTa. For training, you can use the trainer class from the Transformers lib.
How to Incorporate Tabular Data with HuggingFace Transformers
Devs put up a good fight. It’s back?
GitHub explanation for reinstating youtube-dl repo:
Standing up for developers: youtube-dl is back – The GitHub Blog
Today we reinstated additional information youtube-dl, a popular project on GitHub, after we received about the project…
Systematic Comparison of Open Information Extraction Techniques
In this paper from EMNLP, authors evaluated current deep learning systems for conducting open information extraction (OIE). That is, to automatically extract triplets from text so you can obtain subject predicate object from sentences. They explored different training scenarios for OIE, and compared existing OIE models. Good introductory paper if you are new to this space.
Repo Cypher U+1F468U+1F4BB
A collection of recent released repos that caught our U+1F441
This library contains a set of modules that can be used to analyze the activations of neural networks, with a focus on NLP architectures such as LSTMs and Transformers
Paper: https://arxiv.org/abs/2011.06819 Demo: Documentation: https://diagnnose.readthedocs.io This library contains a…
NLPGym is a toolkit to bridge the gap between applications of RL and NLP. This aims at facilitating research and benchmarking of DRL application on natural language processing tasks.
NLPGym is a toolkit to bridge the gap between applications of RL and NLP. This aims at facilitating research and…
Entity Recognition and Relation Extraction from Scientific and Technical Texts in Russian
Datasets and models for information extraction tasks in the Russian
Contribute to iis-research-team/ner-rc-russian development by creating an account on GitHub.
WikiAsp is a multi-domain, aspect-based summarization dataset in the encyclopedic domain. In this task, models are asked to summarize cited reference documents of a Wikipedia article into aspect-based summaries.
This repository contains the dataset from the paper " WikiAsp: A Dataset for Multi-domain Aspect-based Summarization"…
Dataset of the Week: GrailQA
What is it?
Dataset used for knowledge base question answering (KBQA) containing 64,331 crowdsourced questions involving up to 4 relations and functions like counting, comparatives, and superlatives. The dataset covers all of the 86 domains in Freebase Commons.
Where is it?
Strongly Generalizable Question Answering Dataset
Strongly Generalizable Question Answering Dataset (GrailQA) is a new large-scale, high-quality dataset for question…
Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
For complete coverage, follow our Twitter: @Quantum_Stat
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI