The NLP Cypher | 11.29.20

Last Updated on July 24, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

The NLP Cypher | 11.29.20 — found it on @vboykis’s twitter

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher U+007C 11.29.20

Hand of God

Hey, welcome back, just returned from the holidays. And Happy Thanksgiving for those celebrating. It’s been a slow week given the holiday break so the newsletter will be a bit shorter than usual, but that doesn’t mean we can’t discuss alien monoliths…

If you haven’t heard, in a national park in Utah, an unknown monolith was discovered. At the moment, no one knows where it came from.

And it didn’t take long for someone to loot it U+1F62D.

Software Updates

Release TensorFlow 2.4.0-rc3 · tensorflow/tensorflow

tf.distribute introduces experimental support for asynchronous training of Keras models via the…

github.com

You can now parallelize models on the Transformers library!

Oh and by the way, earlier this week we added 50 new datasets to the Big Bad NLP Database: highlights include the IndoNLU benchmark and several datasets from EMNLP, thank you to Ulrich Schäfer and Neea Rusch for contributing!

P.S. If you enjoy today’s article, don’t hesitate to give a U+1F44FU+1F44F! Thank you!

GNN Book

Hey want a an awesome introduction to graph neural networks? Found this pre pub version of William Hamilton’s “Graph Representation Learning” book.

It is very well written and illustrates this burgeoning topic in machine learning with elegant simplicity.

ToC

Chapter 1: Introduction and Motivations [Draft. Updated September 2020.]
Chapter 2: Background and Traditional Approaches [Draft. Updated September 2020.]

Part I: Node Embeddings

Chapter 3: Neighborhood Reconstruction Methods [Draft. Updated September 2020.]
Chapter 4: Multi-Relational Data and Knowledge Graphs [Draft. Updated September 2020.]

Part II: Graph Neural Networks

Chapter 5: The Graph Neural Network Model [Draft. Updated September 2020.]
Chapter 6: Graph Neural Networks in Practice [Draft. Updated September 2020.]
Chapter 7: Theoretical Motivations [Draft. Updated September 2020.]

Part III: Generative Graph Models

Chapter 8: Traditional Graph Generation Approaches [Draft. Updated September 2020.]
Chapter 9: Deep Generative Models [Draft. Updated September 2020.]

Graph Representation Learning Book

The field of graph representation learning has grown at an incredible (and sometimes unwieldy) pace over the past seven…

www.cs.mcgill.ca

PDF Graph Representation Learning

Language Explanations

Can language help us to train models better?

“In the same way that we might take an input x, and extract features (e.g. the presence of certain words) to train a model, we can use explanations to provide additional features.”

In a new blog post from Stanford AI, they discuss the problem on why it’s so hard to teach models knowledge via language, and possible solutions from an NLP perspective(i.e. they discuss their ExpBERT paper from earlier this year), and computer vision perspective (i.e. their visual perceptions paper)

Learning from Language Explanations

Imagine you're a machine learning practitioner and you want to solve some classification problem, like classifying…

ai.stanford.edu

ExpBERT’s GitHub discussed in the blog:

MurtyShikhar/ExpBERT

This repository contains code, scripts, data and checkpoints for running experiments in the following paper: Shikhar…

github.com

DataLoader PyTorch

Interesting blog post from PaperSpace discussing the DataLoader Class in PyTorch. They summarize this handy class in PyTorch if you are interested in using preexisting datasets or even using your own custom dataset on numerical or text data. ToC:

Working on Datasets
Data Loading in PyTorch
Looking at the MNIST Dataset in-Depth
Transforms and Rescaling the Data
Creating Custom Datasets in PyTorch

Blog:

Complete Guide to the DataLoader Class in PyTorch U+007C Paperspace Blog

In this post, we'll deal with one of the most challenging problems in the fields of Machine Learning and Deep Learning…

blog.paperspace.com

Repo Cypher U+1F468‍U+1F4BB

A collection of recent released repos that caught our U+1F441

Neural Acoustic

A library for modeling English speech data with varied accents using Transformers.

Bartelds/neural-acoustic-distance

Code associated with the paper: Neural Representations for Modeling Variation in English Speech. git clone…

github.com

The Speech Accent Archive

RELVM

Repo used for training a latent variable generative model on pairs of entities and contexts (i.e. sentences) in which the entities occur. Their model can be used to perform both mention-level and pair-level classification.

BenevolentAI/RELVM

This repository contains the code accompanying the paper "Learning Informative Representations of Biomedical Relations…

github.com

Paper

GLGE Benchmark

A new natural language generation (NLG) benchmark composing of 8 language generation tasks, including Abstractive Text Summarization (CNN/DailyMail, Gigaword, XSUM, MSNews), Answer-aware Question Generation (SQuAD 1.1, MSQG), Conversational Question Answering (CoQA), and Personalizing Dialogue (Personachat).

microsoft/glge

This repository contains information about the general langugae generation evaluation benchmark GLGE, which is composed…

github.com

In addition,

Microsoft highlights a new pre-trained language model called ProphetNet used in sequence-to-sequence learning with a novel self-supervised objective called future n-gram prediction.

microsoft/ProphetNet

This repo provides the code for reproducing the experiments in ProphetNet: Predicting Future N-gram for…

github.com

OpenTQA

OPENTQA is a open framework of the textbook question answering task. Textbook Question Answering (TQA) is where one should answer a diagram/non-diagram question given a large multi-modal context consisting of abundant essays and diagrams.

keep-smile-001/opentqa

OPENTQA is a open framework of the textbook question answering.

github.com

Dataset of the Week: Question Answering for Artificial Intelligence (QuAIL)

What is it?

QuAIL contains 15K multiple-choice questions in texts 300–350 tokens long across 4 domains (news, user stories, fiction, blogs).

Sample

Where is it?

text-machine-lab/quail

This repository contains the main and challenge data for QuAIL reading comprehension dataset. QuAIL contains 15K…

github.com

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

For complete coverage, follow our Twitter: @Quantum_Stat

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

Frequently Used, Contextual References

Resources

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

The NLP Cypher | 11.29.20

Author(s): Ricky Costa

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher U+007C 11.29.20

Hand of God

Software Updates

Release TensorFlow 2.4.0-rc3 · tensorflow/tensorflow

tf.distribute introduces experimental support for asynchronous training of Keras models via the…

GNN Book

Graph Representation Learning Book

The field of graph representation learning has grown at an incredible (and sometimes unwieldy) pace over the past seven…

Language Explanations

Learning from Language Explanations

Imagine you're a machine learning practitioner and you want to solve some classification problem, like classifying…

MurtyShikhar/ExpBERT

This repository contains code, scripts, data and checkpoints for running experiments in the following paper: Shikhar…

DataLoader PyTorch

Complete Guide to the DataLoader Class in PyTorch U+007C Paperspace Blog

In this post, we'll deal with one of the most challenging problems in the fields of Machine Learning and Deep Learning…

Repo Cypher U+1F468‍U+1F4BB

A collection of recent released repos that caught our U+1F441

Neural Acoustic

Bartelds/neural-acoustic-distance

Code associated with the paper: Neural Representations for Modeling Variation in English Speech. git clone…

RELVM

BenevolentAI/RELVM

This repository contains the code accompanying the paper "Learning Informative Representations of Biomedical Relations…

GLGE Benchmark

microsoft/glge

This repository contains information about the general langugae generation evaluation benchmark GLGE, which is composed…

microsoft/ProphetNet

This repo provides the code for reproducing the experiments in ProphetNet: Predicting Future N-gram for…

OpenTQA

keep-smile-001/opentqa

OPENTQA is a open framework of the textbook question answering.

Dataset of the Week: Question Answering for Artificial Intelligence (QuAIL)

What is it?

Sample

Where is it?

text-machine-lab/quail

This repository contains the main and challenge data for QuAIL reading comprehension dataset. QuAIL contains 15K…

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement