The NLP Cypher | 02.21.21
Last Updated on July 24, 2023 by Editorial Team
Author(s): Ricky Costa
Originally published on Towards AI.
NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER
The NLP Cypher U+007C 02.21.21
U+1F389 1T or bust my dudes U+1F389
Thereβs a group of ML hackers attempting to recreate GPT-3 on their own.
Earlier this year, EleutherAI sent data nerds buzzing when they released their pile dataset (825 GB English text corpus targeted at training large-scale language models) paper. This breakthrough takes care of the data problem, now all they need is the compute: U+1F447
They are building it using Tensorflowβs Mesh library. We wish them the best of luck. Or as it states on their repo: 1T or bust my dudes.
EleutherAI/gpt-neo
An implementation of model & data parallel GPT2 & GPT3-like models, with the ability to scale up to full GPT3 sizesβ¦
github.com
Their discord server:
Join the EleutherAI Discord Server!
Check out the EleutherAI community on Discord – hang out with 3,168 other members and enjoy free voice and text chat.
discord.com
Oh, and Hello Mars! U+1F47D
If you enjoy the read, help us out by giving it a U+1F44FU+1F44F and share with friends U+1F648.
PyTorch U+007C Ray and Distributed Training
If you want to stay on top of the latest distributed training with PyTorch and Ray, this is a healthy intro:
Getting Started with Distributed Machine Learning with PyTorch and Ray
Ray is a popular framework for distributed Python that can be paired with PyTorch to rapidly scale machine learningβ¦
medium.com
Transformers Interpret
βTransformers interpret allows any transformers model to be explained in just two lines. It even supports visualizations in both notebooks and as savable html files.β
So for example if you were doing sentiment analysis on the sentence below:
βI love you, I like youβ
This output U+1F447 would tell you what words have the biggest impact on inference.
[(βBOS_TOKENβ, 0.0),
(βIβ, 0.46820529249283205),
(βloveβ, 0.46061853275727177),
(βyouβ, 0.566412765400519),
(β,β, -0.017154456486408547),
(βIβ, -0.053763869433472),
(βlikeβ, 0.10987746237531228),
(βyouβ, 0.48221682341218103),
(βEOS_TOKENβ, 0.0)]
Then you visualize it with 1 line of code:
cls_explainer.visualize("distilbert_viz.html")
cdpierse/transformers-interpret
Transformers Interpret is a model explainability tool designed to work exclusively with U+1F917 transformers. In line withβ¦
github.com
ConvLab-2
βConvLab-2 is an open-source toolkit that enables researchers to build task-oriented dialog systems with state-of-the-art models, perform an end-to-end evaluation, and diagnose the weakness of systems.β
ConvLab-2
ConvLab-2 is an open-source toolkit that enables researchers to build task-oriented dialog systems withβ¦
convlab.github.io
Question Generation Tutorial on Udemy
The creator of QuestGen library, Ramsri Golla, has a new course on Udemy!
And I got a discount coupon you can use for his program. Hereβs a description of what youβll learn in case you are interested:
- Generate assessments like MCQs, True/False questions etc from any content using state-of-the-art natural language processing techniques.
- Apply recent advancements like BERT, OpenAI GPT-2, and T5 transformers to solve real-world problems in edtech.
- Use NLP libraries like Spacy, NLTK, AllenNLP, HuggingFace transformers, etc.
- Use Google Colab environment to run all these algorithms.
- 4 hours on-demand video U+1F916
25% Off Coupon:
Question Generation using Natural Language processing
This course focuses on using state-of-the-art Natural Language processing techniques to solve the problem of questionβ¦
www.udemy.com
MIT CS Courses
Electrical Engineering and Computer Science courses at MIT.
Electrical Engineering and Computer Science
Graduates of MIT's electrical engineering and computer science department work in diverse industries and conductβ¦
ocw.mit.edu
Wikiβs API
Article describing the genesis of Wikipediaβs API, the problem of originally not having a holistic API strategy at the Wikimedia Foundation (WMF) and their solution to this problem. The API was completed in December of 2020.
The New API for Wikipedia
I recently left my job at the Wikimedia Foundation (WMF) to head up engineering at MTTR. I'm proud of the hard work myβ¦
evanprodromou.wordpress.com
Source Code:
wikimedia/apiclient-wiki
Sample client for the Wikimedia API Platform. Contribute to wikimedia/apiclient-wiki development by creating an accountβ¦
github.com
Docker Swarm Implementation
Includes codeβ¦Hope you like YML files. U+1F601
Container Orchestration With Docker Swarm
NLP Cloud is a service I have contributed to recently. It is using several interesting technologies under the hood so Iβ¦
juliensalinas.com
Papers Without Code U+1F62C
Where unreproducible papers come to liveβ¦
Papers without code – where unreproducible papers come to live
where unreproducible papers come to live
where unreproducible papers come to livewww.paperswithoutcode.com
Repo Cypher U+1F468βU+1F4BB
A collection of recently released repos that caught our U+1F441
65 Million Probably Asked Questions and New Retriever Model
A new QA-pair retriever model, RePAQ, to complement Probably Asked Questions (PAQ), a resource of 65M automatically-generated QA-pairs.
facebookresearch/PAQ
This repository contains code and models to support the research paper PAQ: 65 Million Probably-Asked Questions andβ¦
github.com
Connected Papers U+1F4C8
Fact Check Summarization
Abstractive Summarization using two methods:
1. JAENS: joint entity and summary generation
2. Summary-worthy entity classification with summarization (multi-task learning)
This approach is interested in handling the factual consistency of entities in abstractive summarization (AS), which is an ongoing research problem.
*runs on fairseq*
amazon-research/fact-check-summarization
We provide the code for the paper "Entity-level Factual Consistency of Abstractive Text Summarization", by Feng Nanβ¦
github.com
Connected Papers U+1F4C8
Emoji Transfer
Training transformers for sentiment analysis with emoji data.
uds-lsv/emoji-transfer
This is the repository for Emoji-Based Transfer Learning for Sentiment Tasks. https://arxiv.org/abs/2102.06423 Datasetsβ¦
github.com
Connected Papers U+1F4C8
Relation Extraction Over Universal Graph
Distantly Supervised Relation Extraction (DS-RE) over knowledge graph and textual data.
baodaiqin/UGDSRE
Codes and datasets for our paper "Two Training Strategies for Improving Relation Extraction over Universal Graph" Weβ¦
github.com
Connected Papers U+1F4C8
Apache Log Generator
Automating the parsing task of Apache logs by formulating it as a machine translation (MT) task.
WulffHunter/log_generator
This repository contains tools used for generating synthetic Apache logs and the tools needed to parse referenceβ¦
github.com
Connected Papers U+1F4C8
NoiseQA
New question answering evaluation benchmark. Takes in consideration on how the deployment of a QA model can impact performance. For example, QA interfaces such as speech, text or translation can induce unique inference error that most evaluation benchmarks donβt consider.
NoiseQA
All materials for the paper
noiseqa.github.io
Connected Papers U+1F4C8
Optimizing Inference on CPU for Transformers
Empirical analysis of scalability and performance of inferencing a Transformer-based model on CPUs.
Optimizing Inference Performance of Transformers on CPUs
The Transformer architecture revolutionized the field of natural language processing (NLP). Transformers-based modelsβ¦
arxiv.org
Connected Papers U+1F4C8
Exploring Transformers for NLG
A pithy introduction to transformers of GPT, BERT, and XLNET for NLG.
Connected Papers U+1F4C8
Dataset of the Week: ArtEmis
A dataset that associates human emotions with artworks and contains explanations in natural language of the rationale behind each triggered emotion.
Sample
Where is it?
ArtEmis
ArtEmis: Affective Language for Art Stanford University 1 LIX, Ecole Polytechnique 2 King Abdullah University ofβ¦
www.artemisdataset.org
Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
For complete coverage, follow our Twitter: @Quantum_Stat
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI