NLP News Cypher | 02.09.20

Last Updated on July 24, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

Natural Language Processing (NLP) Weekly Newsletter

NLP News Cypher U+007C 02.09.20

To Hell and Back…

DARPA, the Defense Advanced Research Projects Agency, a.k.a. the Agency that Builds U+1F47D Spacecraft (ABAS), really loves NLP. More specifically, they really like building multi-modal models for enhancing knowledge graphs. Apparently, they also have their own YouTube channel called DARPAtv.U+1F937‍U+2642️

declassified

Halfway during the video above, the fellow dives into a word sense disambiguation problem regarding the word “tank” in the sentence “There is a tank outside my house” U+1F923U+1F923.

And I thought I had big problems with semantics, guess DARPA tops me.

So, how was your week?

This week we added 25 new datasets to the Big Bad NLP Database. We had several user contributors: Philip Vollet, Arthit Suriyawongkul, Talha Anwar, and Gabriel Altay. Thank you very much!

This Week:

The Missing Semester

StreamingLighting SpaCy

Questioning Meaning

Research From Scratch

COTA: Customer Obsession Ticket Assistant

Multi-Lingual Datasets Stand Among Giants

Investing in AI for Investment

Dataset of the Week: MultiLingual Question Answering (MLQA)

The Missing Semester

MIT has more secrets. Apparently, MIT has a hidden Konami cheat code for learning about computer science that few know about. While searching their website, I found this:

The Missing Semester of Your CS Education

Classes teach you all about advanced topics within CS, from operating systems to machine learning, but there's one…

missing.csail.mit.edu

Video:

StreamingLighting SpaCy

I thought SpaCy couldn’t get any more visually stunning. But apparently, it can. With the help of Streamlit, you can achieve all the NLP goodies that SpaCy has to offer. You can even recreate it with Prasanna’s code (Github) inspired by Ines Montani.

If you haven’t checked out Streamlit, here’s their site:

Streamlit – The fastest way to build custom ML tools

Streamlit is an open-source app framework for Machine Learning and Data Science teams. Create beautiful data apps in…

www.streamlit.io

Questioning Meaning

Allen Institute released an awesome blog post posing an interesting reflection on question understanding. While we usually focus on whether an AI model can answer a question, AI2 sends us off on a journey to understand its antecedent: can it understand a question?

They do this via decomposition. Here’s an example:

“A system could potentially answer “Name the political parties of the most densely populated country”, by first returning “the most densely populated country” using a DB query, then “the political parties of #1” using a QA model for text.”

Blog:

Break: Mapping Natural Language Questions to their Meaning Representation

Joint work by a team of NLP researchers at Tel Aviv University and Allen Institute for AI.

medium.com

Research From Scratch

Edward Raff asks a central question:

“How reproducible is the latest ML research, and can we begin to quantify what impacts its reproducibility?”

Finding 1: Having fewer equations per page makes a paper more reproducible.

Finding 2: Empirical papers may be more reproducible than theory-oriented papers.

Finding 3: Sharing code is not a panacea.

Finding 4: Having detailed pseudo code is just as reproducible as having no pseudo code.

Finding 5: Creating simplified example problems do not appear to help with reproducibility.

Finding 6: Please, check your email (reply to email questions about paper)

Quantifying Independently Reproducible Machine Learning

eer review has been an integral part of scientific research for more than 300 years. But even before peer review was…

thegradient.pub

COTA: Customer Obsession Ticket Assistant

Welcome to Uber homies! In their blog, Uber goes over how they built an in-house customer service ticket system to help when peeps are mad at their drivers (jk). But seriously, Uber shows how they use simple yet efficacious techniques, like TF-IDF, cosine similarity (word2vec be like U+1F440) to help scale their services! So you were wondering how the private industry was using AI? Well, you can read about it here:

COTA: Improving Uber Customer Care with NLP & Machine Learning

Enter COTA, our Customer Obsession Ticket Assistant, a tool that uses machine learning and natural language processing…

eng.uber.com

Multi-Lingual Datasets Stand Among Giants

For some reason, lots of datasets dropped this week. Facebook and Google got in on the action on the Multi-Lingual side of things. And yes, I plan to add them this week to the database.

(Surprise, Google’s dataset is already in the database U+1F601)

Investing in AI for Investment

The World Economic Forum and Cambridge U. investigated the usage of AI for financial services in a recently released report.

Noteworthy Highlights:

Top area for AI adoption: Risk Management

Top area for AI adoption among AI leaders: Customer Service

Top AI use-cases in the data analytics domain: Sales Analytics

LINK

Dataset of the Week: MultiLingual Question Answering (MLQA)

What is it?

Dataset for evaluating cross-lingual question answering performance in English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese.

Sample:

Where is it?

facebookresearch/MLQA

MLQA (MultiLingual Question Answering) is a benchmark dataset for evaluating cross-lingual question answering…

github.com

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends or social media!

For complete coverage, follow our twitter: @Quantum_Stat

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

I Used ChatGPT to Count My Calories

Resource-Efficient Fine-Tuning of DeepSeek-R1

TAI #138: OpenAI’s o3-Mini and Deep Research: A New Era of Reasoning Powered Agents?

Text Preprocessing for NLP: A Step-by-Step Guide to Clean Raw Text Data

DeepSeek AI — The Future is Here

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.