Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


NLP News Cypher | 02.09.20
Latest   Machine Learning   Newsletter

NLP News Cypher | 02.09.20

Last Updated on July 27, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

Photo by Jeremy Bishop on Unsplash

Natural Language Processing (NLP) Weekly Newsletter

NLP News Cypher U+007C 02.09.20

To Hell and Back…

DARPA, the Defense Advanced Research Projects Agency, a.k.a. the Agency that Builds U+1F47D Spacecraft (ABAS), really loves NLP. More specifically, they really like building multi-modal models for enhancing knowledge graphs. Apparently, they also have their own YouTube channel called DARPAtv.U+1F937‍U+2642️


Halfway during the video above, the fellow dives into a word sense disambiguation problem regarding the word “tank” in the sentence “There is a tank outside my house” U+1F923U+1F923.

And I thought I had big problems with semantics, guess DARPA tops me.

So, how was your week?

This week we added 25 new datasets to the Big Bad NLP Database. We had several user contributors: Philip Vollet, Arthit Suriyawongkul, Talha Anwar, and Gabriel Altay. Thank you very much!

This Week:

The Missing Semester

StreamingLighting SpaCy

Questioning Meaning

Research From Scratch

COTA: Customer Obsession Ticket Assistant

Multi-Lingual Datasets Stand Among Giants

Investing in AI for Investment

Dataset of the Week: MultiLingual Question Answering (MLQA)

The Missing Semester

MIT has more secrets. Apparently, MIT has a hidden Konami cheat code for learning about computer science that few know about. While searching their website, I found this:

The Missing Semester of Your CS Education

Classes teach you all about advanced topics within CS, from operating systems to machine learning, but there's one…



StreamingLighting SpaCy

I thought SpaCy couldn’t get any more visually stunning. But apparently, it can. With the help of Streamlit, you can achieve all the NLP goodies that SpaCy has to offer. You can even recreate it with Prasanna’s code (Github) inspired by Ines Montani.

If you haven’t checked out Streamlit, here’s their site:

Streamlit – The fastest way to build custom ML tools

Streamlit is an open-source app framework for Machine Learning and Data Science teams. Create beautiful data apps in…


Questioning Meaning

Allen Institute released an awesome blog post posing an interesting reflection on question understanding. While we usually focus on whether an AI model can answer a question, AI2 sends us off on a journey to understand its antecedent: can it understand a question?

They do this via decomposition. Here’s an example:

“A system could potentially answer “Name the political parties of the most densely populated country”, by first returning “the most densely populated country” using a DB query, then “the political parties of #1” using a QA model for text.”



Break: Mapping Natural Language Questions to their Meaning Representation

Joint work by a team of NLP researchers at Tel Aviv University and Allen Institute for AI.


Research From Scratch

Edward Raff asks a central question:

“How reproducible is the latest ML research, and can we begin to quantify what impacts its reproducibility?”

Finding 1: Having fewer equations per page makes a paper more reproducible.

Finding 2: Empirical papers may be more reproducible than theory-oriented papers.

Finding 3: Sharing code is not a panacea.

Finding 4: Having detailed pseudo code is just as reproducible as having no pseudo code.

Finding 5: Creating simplified example problems do not appear to help with reproducibility.

Finding 6: Please, check your email (reply to email questions about paper)

Quantifying Independently Reproducible Machine Learning

eer review has been an integral part of scientific research for more than 300 years. But even before peer review was…


COTA: Customer Obsession Ticket Assistant

Welcome to Uber homies! In their blog, Uber goes over how they built an in-house customer service ticket system to help when peeps are mad at their drivers (jk). But seriously, Uber shows how they use simple yet efficacious techniques, like TF-IDF, cosine similarity (word2vec be like U+1F440) to help scale their services! So you were wondering how the private industry was using AI? Well, you can read about it here:

COTA: Improving Uber Customer Care with NLP & Machine Learning

Enter COTA, our Customer Obsession Ticket Assistant, a tool that uses machine learning and natural language processing…


Multi-Lingual Datasets Stand Among Giants

For some reason, lots of datasets dropped this week. Facebook and Google got in on the action on the Multi-Lingual side of things. And yes, I plan to add them this week to the database.

(Surprise, Google’s dataset is already in the database U+1F601)

Investing in AI for Investment

The World Economic Forum and Cambridge U. investigated the usage of AI for financial services in a recently released report.

Noteworthy Highlights:

Top area for AI adoption: Risk Management

Top area for AI adoption among AI leaders: Customer Service

Top AI use-cases in the data analytics domain: Sales Analytics


Dataset of the Week: MultiLingual Question Answering (MLQA)

What is it?

Dataset for evaluating cross-lingual question answering performance in English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese.


Where is it?


MLQA (MultiLingual Question Answering) is a benchmark dataset for evaluating cross-lingual question answering…


Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends or social media!

For complete coverage, follow our twitter: @Quantum_Stat


Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓