NLP News Cypher | 02.09.20

Last Updated on July 27, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

NLP News Cypher | 02.09.20 — Photo by Jeremy Bishop on Unsplash

Natural Language Processing (NLP) Weekly Newsletter

NLP News Cypher U+007C 02.09.20

To Hell and Back…

DARPA, the Defense Advanced Research Projects Agency, a.k.a. the Agency that Builds U+1F47D Spacecraft (ABAS), really loves NLP. More specifically, they really like building multi-modal models for enhancing knowledge graphs. Apparently, they also have their own YouTube channel called DARPAtv.U+1F937‍U+2642️

declassified

Halfway during the video above, the fellow dives into a word sense disambiguation problem regarding the word “tank” in the sentence “There is a tank outside my house” U+1F923U+1F923.

And I thought I had big problems with semantics, guess DARPA tops me.

So, how was your week?

This week we added 25 new datasets to the Big Bad NLP Database. We had several user contributors: Philip Vollet, Arthit Suriyawongkul, Talha Anwar, and Gabriel Altay. Thank you very much!

This Week:

The Missing Semester

StreamingLighting SpaCy

Questioning Meaning

Research From Scratch

COTA: Customer Obsession Ticket Assistant

Multi-Lingual Datasets Stand Among Giants

Investing in AI for Investment

Dataset of the Week: MultiLingual Question Answering (MLQA)

The Missing Semester

MIT has more secrets. Apparently, MIT has a hidden Konami cheat code for learning about computer science that few know about. While searching their website, I found this:

The Missing Semester of Your CS Education

Classes teach you all about advanced topics within CS, from operating systems to machine learning, but there's one…

missing.csail.mit.edu

Video:

StreamingLighting SpaCy

I thought SpaCy couldn’t get any more visually stunning. But apparently, it can. With the help of Streamlit, you can achieve all the NLP goodies that SpaCy has to offer. You can even recreate it with Prasanna’s code (Github) inspired by Ines Montani.

If you haven’t checked out Streamlit, here’s their site:

Streamlit – The fastest way to build custom ML tools

Streamlit is an open-source app framework for Machine Learning and Data Science teams. Create beautiful data apps in…

www.streamlit.io

Questioning Meaning

Allen Institute released an awesome blog post posing an interesting reflection on question understanding. While we usually focus on whether an AI model can answer a question, AI2 sends us off on a journey to understand its antecedent: can it understand a question?

They do this via decomposition. Here’s an example:

“A system could potentially answer “Name the political parties of the most densely populated country”, by first returning “the most densely populated country” using a DB query, then “the political parties of #1” using a QA model for text.”

Blog:

Break: Mapping Natural Language Questions to their Meaning Representation

Joint work by a team of NLP researchers at Tel Aviv University and Allen Institute for AI.

medium.com

Research From Scratch

Edward Raff asks a central question:

“How reproducible is the latest ML research, and can we begin to quantify what impacts its reproducibility?”

Finding 1: Having fewer equations per page makes a paper more reproducible.

Finding 2: Empirical papers may be more reproducible than theory-oriented papers.

Finding 3: Sharing code is not a panacea.

Finding 4: Having detailed pseudo code is just as reproducible as having no pseudo code.

Finding 5: Creating simplified example problems do not appear to help with reproducibility.

Finding 6: Please, check your email (reply to email questions about paper)

Quantifying Independently Reproducible Machine Learning

eer review has been an integral part of scientific research for more than 300 years. But even before peer review was…

thegradient.pub

COTA: Customer Obsession Ticket Assistant

Welcome to Uber homies! In their blog, Uber goes over how they built an in-house customer service ticket system to help when peeps are mad at their drivers (jk). But seriously, Uber shows how they use simple yet efficacious techniques, like TF-IDF, cosine similarity (word2vec be like U+1F440) to help scale their services! So you were wondering how the private industry was using AI? Well, you can read about it here:

COTA: Improving Uber Customer Care with NLP & Machine Learning

Enter COTA, our Customer Obsession Ticket Assistant, a tool that uses machine learning and natural language processing…

eng.uber.com

Multi-Lingual Datasets Stand Among Giants

For some reason, lots of datasets dropped this week. Facebook and Google got in on the action on the Multi-Lingual side of things. And yes, I plan to add them this week to the database.

(Surprise, Google’s dataset is already in the database U+1F601)

Investing in AI for Investment

The World Economic Forum and Cambridge U. investigated the usage of AI for financial services in a recently released report.

Noteworthy Highlights:

Top area for AI adoption: Risk Management

Top area for AI adoption among AI leaders: Customer Service

Top AI use-cases in the data analytics domain: Sales Analytics

LINK

Dataset of the Week: MultiLingual Question Answering (MLQA)

What is it?

Dataset for evaluating cross-lingual question answering performance in English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese.

Sample:

Where is it?

facebookresearch/MLQA

MLQA (MultiLingual Question Answering) is a benchmark dataset for evaluating cross-lingual question answering…

github.com

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends or social media!

For complete coverage, follow our twitter: @Quantum_Stat

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

Frequently Used, Contextual References

Resources

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.