The NLP Cypher | 02.07.21
Last Updated on July 24, 2023 by Editorial Team
Author(s): Ricky Costa
Originally published on Towards AI.
NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
The NLP Cypher U+007C 02.07.21
The Short Squeeze
The plebeian beats the marketβ¦
Wall Street β with its historical knack for favoring the frothingly rich, summer-home traveling bourgeoisie β was outsmarted by a network of Reddit day-trading Millennial computer nerds. The outcome: billions of hedge fund capital thrown to the wind.
When the COVID saga began, those looking to make some extra scratch online leaned on online commission-free trading platforms like Robinhood to make ends meet. And with this meteoric rise of a new retail investor gambling on Wall Streetβs hallowed grounds, no one, not even the institutional investor foresaw the events of the last weeks.
So what happened?
The Reddit traders began purchasing shares of stock that was heavily shorted by industry investors. (βshortedβ means betting the stock will go down). When stock price began to climb due to their share purchasing, hedge funds began losing loads of money on their short and in result, hedge funds began to purchase the same stocks that they shorted in order to cover their losses. The end result is a vicious melt up of stock price aka the short squeeze.
FYI, you can use the PRAW library to view live streams on Reddit, so you get the pleasure in watching WallStreetBets phenomenon in real-time. U+1F601
PRAW: The Python Reddit API Wrapper – PRAW 7.1.5.dev0 documentation
Edit description
praw.readthedocs.io
ArXiv Revisited U+007C Graphs U+007C Video
ArXiv released a new feature that allows one to use βConnected Papersβ to generate a graph of related research papers from the open-sourced platform. FYI, I hacked it this week, so Iβll add βconnected papersβ to the repo cypher every week. U+1F601
Speaking of arXiv, thereβs a new feature called βpapers-with-videoβ created by Amit Chaudhary. Itβs a web browser extension which provides a link to a video relating to the arXiv paper in view. It currently covers 3.7K ML papers. U+1F525U+1F525
NMT
From the makers of sentence transformers, they introduce a new machine translation library. (comes with language detection too)
Deets:
- Easy installation and usage: Use state-of-the-art machine translation with 3 lines of code
- Automatic download of pre-trained machine translation models
- Translation between 150+ languages
- Automatic language detection for 170+ languages
- Sentence and document translation
- Multi-GPU and multi-process translation
UKPLab/EasyNMT
This package provides easy to use, state-of-the-art machine translation for more than 100+ languages. The highlights ofβ¦
github.com
BudgetML
A great library if you want to deploy your model on Google Cloud and get a nice API endpoint running on top of FastAPI and GCPβs preemptible instances. Since preemptible machines can be taken down anytime, they have a mechanism in place to auto-start them to avoid down time. U+270C
ebhy/budgetml
Give us a GitHub star to show your love! BudgetML is perfect for practitioners who would like to quickly deploy theirβ¦
github.com
The GPT-3 List of Projects/Startups
The web that GPT-3 currently weaves. Hereβs a nice table of current projects and startups riding the GPT-3 gravy train. OpenAIβs inference API has spun up an entire industry U+1F635.
FastStylometry
Stylometry library correlating writing styles. Uses Burrowsβ Delta algo.
βThe Burrowsβ delta is a statistic which expresses the distance between two authorsβ writing styles. A high number like 3 implies that the two authors are very dissimilar, whereas a low number like 0.2 would imply that two books are very likely to be by the same author.β
Author mentions that most stylometry libraries include mostly graphs but for his library, he wanted to include probabilities as well. In addition, faststylometry includes βunknownβ books for testing purposes. Pretty cool.
Feds use this type of tech to catch perps on the dark web by correlating writing styles to get warrants. (random fact) They also use time correlations but thatβs another storyβ¦
Blog:
Fast Stylometry Tutorial – Freelance Data Scientist U+007C Thomas Wood
I'm introducing a Python library I've written, called faststylometry, which allows you to compare authors of texts byβ¦
freelancedatascientist.net
GitHub:
fastdatascience/faststylometry
By Thomas Wood, Fast Data Science Source code at https://github.com/woodthom2/faststylometry Tutorial atβ¦
github.com
Using CLIP for Unsplash Search
Someone threw OpenAIβs CLIP model on top of Unsplash for searching pictures via natural language. Includes a Colab. U+1F60E
haltakov/natural-language-image-search
Search photos on Unsplash using natural language descriptions. The search is powered by OpenAI's CLIP model and theβ¦
github.com
VS Code Chat
βChat with your Slack and Discord teams within VS Codeβ
One less open tab on your browser. Winning!
vsls-contrib/chat
0.34.0: With this release, the integration with VS Live Share has now moved into the core VS Live Share extensionβ¦
github.com
GitHub Live Tracker
βghtop
provides a number of views of all current public activity from all users across the entire GitHub platformβ
One more terminal window open. Winning!
(Headshot of the Week U+1F3C6)
nat/ghtop
See what's happening on GitHub in real time (also helpful if you need to use up your API quota as quickly as possible)β¦
github.com
Colab of the Week
Using Transformers with Weights and Biases:
Google Colaboratory
Edit description
colab.research.google.com
A100 vs V100 GPU Benchmarks
Want to know the PyTorch training speed difference between the A100 vs. V100 GPUs for language models U+1F447? FYI, Lambda now carries the big boy, the A100. More in the blog:
A100 vs V100 Deep Learning Benchmarks U+007C Lambda
Lambda is now shipping A100 servers. In this post, we benchmark the PyTorch training speed of the Tesla A100 and V100β¦
lambdalabs.com
Star Trek Dialogue Scripts in JSON
If you need your GPT-3 to speak in Klingon U+1F923.
Example JSON:
βlineβ: βOn Stardate 43997, Captain Jean-Luc Picard of the Federation Starship Enterprise was kidnapped for six days by an invading force known as the Borg. Surgically altered, he was forced to lead an assault on Starfleet at Wolf 359.β
jkingsman/Star-Trek-Script-Programmatics
A collection of Star Trek scripts dumped to JSON. A bit of a messy repo from my work but better the data be out thereβ¦
github.com
RackSpace AI/ML Survey
Total respondents =1,870 U+007C IT Professionals Worldwide
TL;DR
β$1.06M: What the average company spends annually on AI and machine learning initiatives..β
Leading Use of βAIβ is as a βComponent of data analyticsβ¦β
Regarding current plans: 46% say they βwant to improve the speed and efficiency of existing processesβ¦β
Leading Challenge with 27% of respondents βShortage of skilled AI/ML talentβ
Get your copy here U+1F447
AI and machine learning research report U+007C Rackspace Technology
To learn more about how IT leaders are adopting and using AI and machine learning, we surveyed 1500+ IT leaders inβ¦
www.rackspace.com
Repo Cypher U+1F468βU+1F4BB
U+1F4C8 U+1F4C8Added the new ConnectedPapers feature U+1F4C8 U+1F4C8
PAWLS U+007C PDF Annotations
Software that allows one to collect annotations associated with a PDF document.
allenai/pawls
PDF Annotations with Labels and Structure is software that makes it easy to collect a series of annotations associatedβ¦
github.com
Connected Papers U+1F4C8
Multi-Document Driven Dialogue (MD3)
A new dialogue task where an agent can guess the target document that
the user is interested in by leading a dialogue.
laddie132/MD3
This is the code for AAAI2021 paper Converse, Focus and Guess – Towards Multi-Document Driven Dialogue. We build aβ¦
github.com
Connected Papers U+1F4C8
SkillNER
A named entity recognition system that extracts soft skills from text.
nicolamelluso/SkillNER
A Named Entity Recognition system that extracts soft skills from text Permalink Failed to load latest commitβ¦
github.com
Connected Papers U+1F4C8
WeNet
A speech recognition toolkit for Automatic Speech Recognition (ASR).
mobvoi/wenet
We share neural Net together. The main motivation of WeNet is to close the gap between research and productionβ¦
github.com
Connected Papers U+1F4C8
Tabular K-BERT U+007C Tabular Scenario Based Question Answering
Repo for tabular scenario question answering where a model is tasked to answer multiple-choice questions based on a passage and associated tables.
nju-websoft/TSQA
Sorce code for "TSQA: Tabular Scenario Based Question Answering", implement is based on K-BERT. We thank the authors ofβ¦
github.com
Connected Papers U+1F4C8
Open Information Extraction Dataset
A large dataset for open information extraction in addition to training scripts for your own model using AI2βs library, run on PyTorch.
Jacobsolawetz/large-scale-oie
In this repository, you will find the data published in the paper Scaling Up Supervised Information Extraction, alongβ¦
github.com
Connected Papers U+1F4C8
Dataset of the Week: Urban Dictionary (UD) Dataset
Dataset contains 2.5 million phrases from Urban Dictionary, including their definitions and votes.
Content
CSV Rows: 2,580,925
Column 1: word_id β for usage in urban dictionary api
Column 2: word β the text being defined
Column 3: up_votes β thumbs up count as of may 2016
Column 4: down_votes β thumbs down count as of may 2016
Column 5: author β hash of username of submitter
Column 6: definition β text with possible utf8 chars, double semi-colon denotes a newline
Where is it?
Hereβs the dataset in a bonus repo for generating slang U+1F525.
zhewei-sun/slanggen
This is the github repository for the TACL paper "A Computation Framework for Slang Generation". The dataset is aβ¦
github.com
Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
For complete coverage, follow our Twitter: @Quantum_Stat
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI