The NLP Cypher | 05.09.21
Last Updated on July 24, 2023 by Editorial Team
Author(s): Ricky Costa
Originally published on Towards AI.
NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
The NLP Cypher U+007C 05.09.21
Lost Tales
I mostly know dark.fail as an onion site with a great collection of urls for parasailing tor-land (aka darknet). To be honest, I didnβt even know dark.fail had a clearnet site. And very recently, its clearnet mirror was phished for a total of 4β5 days. U+1F440
Apparently a threat actor presented a fake court order to dark.failβs domain registrar. And in return, they obtained access to the dark.failβs hosting and rerouted traffic to the bad actorβs mirrored web page. It phished the pages URLs with the intention on fooling people into thinking they were buying products on the dark markets when instead the bad actor(s) were pocketing their bitcoin. This has caused a big uproar in the hacking community given dark.failβs popularity.U+1F976
The anonymous owner of dark.fail appeared on a hacker podcast this past weekend to discuss the hijacking and spoke via a text-to-speech software as to protect their voice identity. You can watch/listen here:
And in other newsβ¦
ICLR Residualsβ¦
Google at ICLR 2021
The 9th International Conference on Learning Representations ( ICLR 2021), a virtual conference focused on deepβ¦
ai.googleblog.com
Stanford AI Lab Papers and Talks at ICLR 2021
The International Conference on Learning Representations (ICLR) 2021 is being hosted virtually from May 3rd – May 7thβ¦
ai.stanford.edu
Galkinβs Knowledge Graph Review from ICLR
Couldnβt have a conference without getting a Galkin knowledge graph review!
TOC:
- Reasoning in Knowledge Graphs: Simpler than you thought
- Temporal Logics and KGs
- NLP Perspective: PMI & Relations, Entity Linking
- Complex Question Answering: More Modalities
- Lookback
Knowledge Graphs @ ICLR 2021
Your guide to the KG-related research in ML, May edition
mgalkin.medium.com
THE NLP Index Update
Since last week, weβve added ~750 new repos to the index and Iβve included GitHub stars and programming language for each repo.
In addition, we also added nearly 1,000 introductory videos for select assets. Thank you to Amit Chaudhary for the data! U+1F431βU+1F464
Check it out here:
The NLP Index
Top NLP Code Repositories – Quantum Stat
index.quantumstat.com
A Commonsense Knowledge Base Construction
Checkout how the Max Planck Institute for Informatics is building commonsense knowledge bases.
This paper introduces 3 systems:
Quasimodo: βan open-source commonsense knowledge base designed to get relevant properties about entities.β site
Dice: βa reasoning framework for deriving refined and expressive commonsense knowledge from existing CSK collections.β site
Ascent: βa pipeline for automatically collecting, extracting and consolidating commonsense knowledge (CSK) from the web.β site
A Large Netflix Dataset
βThis dataset combines data sources from Netflix, Rotten Tomatoes, IMBD, posters, box office information, trailers on YouTube, and more using a variety of APIs.β Netflix doesnβt have itβs own API so the devs just went nuclear on triangulating Netflixβs data via other sources. U+1F649
Last updated April 2021 according to authors.
Latest Netflix data with 26+ joined attributes
Latest, complete Netflix movie dataset created from 4 APIs
www.kaggle.com
Awesome Self-Supervised Learning
Index for all things Self-Supervised Learning across different domains such as vision, NLP, graphs and more.
jason718/awesome-self-supervised-learning
A curated list of awesome self-supervised methods. Contribute to jason718/awesome-self-supervised-learning developmentβ¦
github.com
For an intuitive intro into self-supervised learning, check out Sergey Ivanovβs blog:
GML In-Depth: three forms of self-supervised learning
Hello and welcome to the graph ML newsletter! This in-depth post is about self-supervised learning (SSL) and itsβ¦
graphml.substack.com
Repo Cypher U+1F468βU+1F4BB
A collection of recently released repos that caught our U+1F441
SUPERB Benchmark for Speech
A collection of benchmarking resources to evaluate the capability of a universal shared representation for speech processing. SUPERB consists of the following:
A benchmark of ten speech processing tasks built on established public datasets,
A BENCHMARK TOOLKIT designed to evaluate and analyze pretrained model performance on various downstream tasks following the conventional evaluation protocols from speech communities,
A public LEADERBOARD for SUBMISSIONS and performance tracking on the benchmark.
SUPERB: Speech processing Universal PERformance Benchmark
A comprehensive and reproducible benchmark for Self-supervised Speech Representation Learning
superbbenchmark.org
Associated repo:
s3prl/s3prl
April 2021: Support SUPERB: Speech processing Universal PERformance Benchmark, submitted to Interspeech 2021 Jan 2021β¦
github.com
Connected Papers U+1F4C8
Explainable Text VQA
A dataset containing ground truth visual and multi-reference textual explanations that can be leveraged during both training and evaluation.
Dataset not officially out yet, but keep track of this repo for updates.
amzn/explainable-text-vqa
We will shortly release the TextVQA-X dataset accompanying the A First Look: Towards Explainable TextVQA Models viaβ¦
github.com
Connected Papers U+1F4C8
Rare Disease Identification
Using ontologies and weak supervision to identify rare diseases from clinical notes.
acadTags/Rare-disease-identification
This repository presents an approach using ontologies and weak supervision to identify rare diseases from clinicalβ¦
github.com
Connected Papers U+1F4C8
The Carleton Benchmark Suite (CBench)
A benchmarking framework for evaluating question answering systems over knowledge graphs.
aorogat/CBench
CBench is an extensible and more informative benchmarking framework for evaluating question answering systems overβ¦
github.com
Connected Papers U+1F4C8
AMR Parser with Action-Pointer Transformer
Abstract Meaning Representation (AMR) parsing is a sentence-to-graph prediction task where target nodes are not explicitly aligned to sentence tokens.
Authors used a transformer that handles the generation of arbitrary graph constructs.
IBM/transition-amr-parser
Transition-based parser for Abstract Meaning Representation (AMR) in Pytorch. The code includes two fundamentalβ¦
github.com
Connected Papers U+1F4C8
ADAM
ADAM is a demonstration of βgrounded language acquisition,β which is to say learning (some amount of) language from observing how language is used in concrete situations, like infants (presumably) do. U+1F440
This work is under DARPAβs Grounded Artificial Intelligence Language Acquisition (GAILA) program. U+1F6F8U+1F47D
isi-vista/adam
ADAM is ISI's effort under DARPA's Grounded Artificial Intelligence Language Acquisition (GAILA) program. Backgroundβ¦
github.com
Connected Papers U+1F4C8
Knover U+007C Knowledge Grounded Dialogue Generation
Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers and developers to carry out efficient training/inference of large-scale dialogue generation models.
PaddlePaddle/Knover
Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers andβ¦
github.com
Connected Papers U+1F4C8
Dataset of the Week: Ascent
What is it?
A pipeline for automatically collecting, extracting and consolidating commonsense knowledge (CSK) from the web.
Where is it?
AscentKB
Ascent ( Advanced Semantics for Commons ense K nowledge Ex t raction) is a pipeline for automatically collectingβ¦
ascent.mpi-inf.mpg.de
Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
For complete coverage, follow our Twitter: @Quantum_Stat
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI