The NLP Cypher | 04.11.21
Last Updated on July 24, 2023 by Editorial Team
Author(s): Ricky Costa
Originally published on Towards AI.
NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
The NLP Cypher U+007C 04.11.21
Itβs Dark, and NLP is Hot
One small step for manβ¦
One giant leap for monkeys playing pong with their mindβ¦
Welcome back. This week we have a good one for you. But firstβ¦ Neuralink, Elon Muskβs brain chip company, implanted a chip in a monkeyβs skull so it could play pong wirelessly. Winning! And if youβre the type to create the next level Star Trek tech, they have open positions available. U+1F649
The first fully-implanted 1000+ channel brain-machine interface
In a 2019 white paper, we outlined the design of our novel electrodes and our unique surgical approach, along withβ¦
neuralink.com
Graphbrain U+007C Semantic Hypergraphs
Excited to announce that the Graphbrain library had a major update this past week. It now includes a more extensive documentation with tutorials and notebooks for quick experimentation.
Recapβ¦
Graphbrain is a library used to construct semantic hypergraphs from text. A hypergraph is just a normal graph except that an edge is not limited to only 2 vertices. It can have 3 or > U+1F60E. This feature gives it the flexibility to extract knowledge entities in a hierarchical nature. It is built on top of spaCy and Hugging Faceβs NeuralCoref library to help with the coreference resolution task.
If you are new to the library, it may be a bit intimidating at first because of its notation. FYI, this is what the notation looks like⦠U+1F447
I knowβ¦ to the untrained eye itβs a bit funky but itβs definitely worth it to explore deeper because it can help give you a fresh look at NLP tasks from a new architecture other than a pure deep learning approach. To get familiar with all the tasks check out their paper below.
FYI, this is their manual to familiarize yourself with the notation the model spits out:
Semantic Hypergraph notation – Graphbrain 0.4.0 documentation
SH notation is based on two simple principles: Every hyperedge belongs to one of eight basic types. The first elementβ¦
graphbrain.net
Documentation:
Graphbrain – Language, Knowledge, Cognition – Graphbrain 0.4.0 documentation
Graphbrain is an Artificial Intelligence open-source software library and scientific research tool. Its aim is toβ¦
graphbrain.net
Code:
graphbrain/graphbrain
Graphbrain is an Artificial Intelligence open-source software library and scientific research tool. Its aim is toβ¦
github.com
Itβs the Wild Wild West on Reddit U+1F62C
Chirpy: Stanfordβs Open Source Chatbot
Stanford open-sourced Chirpy, their chatbot who won 2nd place in the Alexa Prize. This is a chit-chat bot that has a broad range of response generators. They can be fully rule-based to fully neural.
Types of generators:
Music Response Generator
Personal Chat Response Generator
Wiki Response Generator
Inside Chirpy Cardinal: Stanford's Open-Source Social Chatbot that Won 2nd place in the Alexa Prize
Last year, Stanford won 2nd place in the Alexa Prize Socialbot Grand Challenge 3 for social chatbots. In this post, weβ¦
ai.stanford.edu
Running PyTorch on Appleβs M1 Chip? U+1F447
GPU acceleration for Apple's M1 chip? Β· Issue #47702 Β· pytorch/pytorch
U+1F680 Feature Hi, I was wondering if we could evaluate PyTorch's performance on Apple's new M1 chip. I'm also wonderingβ¦
github.com
Kgextension: From Knowledge Graphs to Pandas
βThe kgextension package allows one to access and use Linked Open Data to augment existing datasets. It enables one to incorporate knowledge graph information in pandas.DataFramesβ
Types of Linked Open Data: DBpedia, WikiData or the EU Open Data Portal
om-hb/kgextension
The kgextension package allows to access and use Linked Open Data to augment existing datasets. It enables toβ¦
github.com
Colab of the Week
Google Colaboratory
Edit description
colab.research.google.com
Visualize BERT
Attention is all you need⦠to see a transformer bust a move.
βBertViz is a tool for visualizing attention in the Transformer model, supporting all models from the transformers library (BERT, GPT-2, XLNet, RoBERTa, XLM, CTRL, etc.)β
jessevig/bertviz
BertViz is a tool for visualizing attention in the Transformer model, supporting all models from the transformersβ¦
github.com
GECToR β Grammatical Error Correction
Besides a bunch YouTube ads, Grammarly also has a bunch of transformer models U+1F62C. Their grammatical error correction models were pre-trained on synthetic data and then fine-tuned in two stages:
first on error-filled corpora, and second, on a combination of error-filled and error-free parallel corpora.
grammarly/gector
This repository provides code for training and testing state-of-the-art models for grammatical error correction withβ¦
github.com
PyTorch Geometric Temporal
A temporal graph neural network extension library for PyTorch Geometric. If you are into Epidemiological Forecasting or Web Traffic Prediction, have a look:
benedekrozemberczki/pytorch_geometric_temporal
PyTorch Geometric Temporal is a temporal (dynamic) extension library for PyTorch Geometric. The library consists ofβ¦
github.com
NLP Use-Cases
In these slides, Andrei Lopatenko, ML engineer, describes some of the top NLP use cases in the business world that heβs experienced over the past 15 years.
Locust U+007C Load Testing
Need to load test your website or your API endpoint on open-sourced software? Check out locustβ¦U+1F41C
Locust – A modern load testing framework
Edit description
locust.io
Python Packages Anyone?
How to make an awesome Python package in 2021β¦
How to make an awesome Python package in 2021
If you are like me, every once in a while you write a useful python utility and want to share it with your colleaguesβ¦
antonz.org
The Annoy Library
Very fast nearest neighbor search. Spotify uses it for their music recommendations.
Features (found it on their repo):
- Euclidean distance, Manhattan distance, cosine distance, Hamming distance, or Dot (Inner) Product distance
- Cosine distance is equivalent to Euclidean distance of normalized vectors = sqrt(2β2*cos(u, v))
- Works better if you donβt have too many dimensions (like <100) but seems to perform surprisingly well even up to 1,000 dimensions
- Small memory usage
- Lets you share memory between multiple processes
- Index creation is separate from lookup (in particular you can not add more items once the tree has been created)
- Native Python support, tested with 2.7, 3.6, and 3.7.
- Build index on disk to enable indexing big datasets that wonβt fit into memory (contributed by Rene Hollander)
spotify/annoy
Annoy ( Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space thatβ¦
github.com
Multi-Document Summarization
Aylien created the Wikipedia Current Events Portal Dataset for summarization. In their blog they discuss why their approach differs from the more recent SOTA models like PEGASUS and BART with their inability to access multi-documents. U+1F976U+1F976
βIt is based on the Wikipedia Current Events Portal (WCEP) where Wikipedia editors write concise summaries of important current events, usually in 1 or 2 sentences, and provide links to news articles as sources for each summary.β
Blog:
Adventures in Multi-Document Summarisation: The Wikipedia Current Events Portal Dataset
01 Apr, 2021 Demian Gholipour 13 Min Read In this post we give a brief overview of multi-document summarization (MDS)β¦
aylien.com
Google Colaboratory
Edit description
colab.research.google.com
Repo Cypher U+1F468βU+1F4BB
A collection of recently released repos that caught our U+1F441
CodeTrans
State of the art pre-trained models for source code. CodeTrans was trained on several Nvidia RTX 8000 GPUs and couple of Google TPUs using various state-of -the-art transformer models.
agemagician/CodeTrans
CodeTrans is providing state of the art pre-trained models for source code. CodeTrans was trained on several Nvidia RTXβ¦
github.com
Connected Papers U+1F4C8
Layout Parser
OCR U+1F631U+1F631U+1F631U+1F631U+1F631 and document image analysis
Layout-Parser/layout-parser
Layout Parser is a deep learning based tool for document image layout analysis tasks. Use pip or conda to install theβ¦
github.com
Connected Papers U+1F4C8
Conversational Question Answering over Knowledge Graphs with Transformer and Graph Attention Networks
A transformer architecture extended with Graph Attention Networks for multi-task neural semantic parsing.
endrikacupaj/LASAGNE
This paper addresses the task of (complex) conversational question answering over a knowledge graph. For this task, weβ¦
github.com
Connected Papers U+1F4C8
EXPATS: A Toolkit for Explainable Automated Text Scoring
A framework for automated text scoring (ATS) tasks, such as automated essay scoring and readability assessment. The toolkit also provides seamless integration with the Language Interpretability Tool (LIT) so that one can interpret and visualize models and their predictions.
octanove/expats
EXPATS is an open-source framework for automated text scoring (ATS) tasks, such as automated essay scoring andβ¦
github.com
Connected Papers U+1F4C8
GrammarTagger β A Neural Multilingual Grammar Profiler for Language Learning
Given an input text, identifies grammatical features useful for language education.
octanove/grammartagger
GrammarTagger – A Neural Multilingual Grammar Profiler for Language Learning – octanove/grammartagger
github.com
Connected Papers U+1F4C8
MMBERT: Multimodal BERT Pretraining for Improved Medical VQA
A multi-modal transformer for visual question answering task in the medical domain. It achieves new state-of-the-art performance on two VQA datasets for radiology images β VQA-Med 2019 and VQARAD.
VirajBagal/MMBERT
Yash Khare*, Viraj Bagal*, Minesh Mathew, Adithi Devi, U Deva Priyakumar, CV Jawahar Abstract: Images in the medicalβ¦
github.com
Connected Papers U+1F4C8
Dataset of the Week: HumAID
What is it?
A dataset consisting of βΌ77K human-labeled tweets, sampled from a pool of βΌ24 million tweets across 19 disaster events that happened between 2016 and 2019. Disaster events consist of earthquakes/cyclones, floods, hurricanes and wildfires.
Where is it?
CrisisNLP
Description of the dataset The HumAID Twitter dataset consists of several thousands of manually annotated tweets thatβ¦
crisisnlp.qcri.org
Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
For complete coverage, follow our Twitter: @Quantum_Stat
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI