The NLP Cypher | 05.09.21
Last Updated on July 24, 2023 by Editorial Team
Last Updated on May 11, 2021 by Editorial Team
Author(s): Quantum Stat
NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
Lost Tales
I mostly know dark.fail as an onion site with a great collection of urls for parasailing tor-land (aka darknet). To be honest, I didnβt even know dark.fail had a clearnet site. And very recently, itβs clearnet mirror was phished for a total of 4β5 days.Β ?
Apparently a threat actor presented a fake court order to dark.failβs domain registrar. And in return, they obtained access to the dark.failβs hosting and rerouted traffic to the bad actorβs mirrored web page. It phished the pages URLs with the intention on fooling people into thinking they were buying products on the dark markets when instead the bad actor(s) were pocketing their bitcoin. This has caused a big uproar in the hacking community given dark.failβs popularity.?
The anonymous owner of dark.fail appeared on a hacker podcast this past weekend to discuss the hijacking and spoke via a text-to-speech software as to protect their voice identity. You can watch/listen here:
And in otherΒ newsβ¦
ICLR Residualsβ¦
Galkinβs Knowledge Graph Review fromΒ ICLR
Couldnβt have a conference without getting a Galkin knowledge graphΒ review!
TOC:
- Reasoning in Knowledge Graphs: Simpler than youΒ thought
- Temporal Logics andΒ KGs
- NLP Perspective: PMI & Relations, EntityΒ Linking
- Complex Question Answering: More Modalities
- Lookback
THE NLP IndexΒ Update
Since last week, weβve added ~750 new repos to the index and Iβve included GitHub stars and programming language for eachΒ repo.
In addition, we also added nearly 1,000 introductory videos for select assets. Thank you to Amit Chaudhary for the data!Β ?β?
Check it outΒ here:
A Commonsense Knowledge Base Construction
Checkout how the Max Planck Institute for Informatics is building commonsense knowledge bases.
This paper introduces 3Β systems:
Quasimodo: βan open-source commonsense knowledge base designed to get relevant properties about entities.β site
Dice: βa reasoning framework for deriving refined and expressive commonsense knowledge from existing CSK collections.β site
Ascent: βa pipeline for automatically collecting, extracting and consolidating commonsense knowledge (CSK) from the web.βΒ site
A Large NetflixΒ Dataset
βThis dataset combines data sources from Netflix, Rotten Tomatoes, IMBD, posters, box office information, trailers on YouTube, and more using a variety of APIs.β Netflix doesnβt have itβs own API so the devs just went nuclear on triangulating Netflixβs data via other sources.Β ?
Last updated April 2021 according toΒ authors.
Latest Netflix data with 26+ joined attributes
Awesome Self-Supervised Learning
Index for all things Self-Supervised Learning across different domains such as vision, NLP, graphs andΒ more.
jason718/awesome-self-supervised-learning
For an intuitive intro into self-supervised learning, check out Sergey IvanovβsΒ blog:
GML In-Depth: three forms of self-supervised learning
Repo CypherΒ ?β?
A collection of recently released repos that caught ourΒ ?
SUPERB Benchmark forΒ Speech
A collection of benchmarking resources to evaluate the capability of a universal shared representation for speech processing. SUPERB consists of the following:
A benchmark of ten speech processing tasks built on established public datasets,
A BENCHMARK TOOLKIT designed to evaluate and analyze pretrained model performance on various downstream tasks following the conventional evaluation protocols from speech communities,
A public LEADERBOARD for SUBMISSIONS and performance tracking on the benchmark.
SUPERB: Speech processing Universal PERformance Benchmark
Associated repo:
Explainable TextΒ VQA
A dataset containing ground truth visual and multi-reference textual explanations that can be leveraged during both training and evaluation.
Dataset not officially out yet, but keep track of this repo forΒ updates.
Rare Disease Identification
Using ontologies and weak supervision to identify rare diseases from clinicalΒ notes.
acadTags/Rare-disease-identification
The Carleton Benchmark SuiteΒ (CBench)
A benchmarking framework for evaluating question answering systems over knowledge graphs.
AMR Parser with Action-Pointer Transformer
Abstract Meaning Representation (AMR) parsing is a sentence-to-graph prediction task where target nodes are not explicitly aligned to sentenceΒ tokens.
Authors used a transformer that handles the generation of arbitrary graph constructs.
ADAM
ADAM is a demonstration of βgrounded language acquisition,β which is to say learning (some amount of) language from observing how language is used in concrete situations, like infants (presumably) do.Β ?
This work is under DARPAβs Grounded Artificial Intelligence Language Acquisition (GAILA) program.Β ??
Knover | Knowledge Grounded Dialogue Generation
Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers and developers to carry out efficient training/inference of large-scale dialogue generation models.
Dataset of the Week:Β Ascent
What isΒ it?
A pipeline for automatically collecting, extracting and consolidating commonsense knowledge (CSK) from theΒ web.
Where isΒ it?
Every Sunday we do a weekly round-up of NLP news and code drops from researchers around theΒ world.
For complete coverage, follow our Twitter: @Quantum_Stat
The NLP Cypher | 05.09.21 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI