NLP News Cypher | 02.09.20
Last Updated on July 24, 2023 by Editorial Team
Author(s): Ricky Costa
Originally published on Towards AI.
Natural Language Processing (NLP) Weekly Newsletter
NLP News Cypher U+007C 02.09.20
To Hell and Backβ¦
DARPA, the Defense Advanced Research Projects Agency, a.k.a. the Agency that Builds U+1F47D Spacecraft (ABAS), really loves NLP. More specifically, they really like building multi-modal models for enhancing knowledge graphs. Apparently, they also have their own YouTube channel called DARPAtv.U+1F937βU+2642οΈ
Halfway during the video above, the fellow dives into a word sense disambiguation problem regarding the word βtankβ in the sentence βThere is a tank outside my houseβ U+1F923U+1F923.
And I thought I had big problems with semantics, guess DARPA tops me.
So, how was your week?
This week we added 25 new datasets to the Big Bad NLP Database. We had several user contributors: Philip Vollet, Arthit Suriyawongkul, Talha Anwar, and Gabriel Altay. Thank you very much!
This Week:
The Missing Semester
StreamingLighting SpaCy
Questioning Meaning
Research From Scratch
COTA: Customer Obsession Ticket Assistant
Multi-Lingual Datasets Stand Among Giants
Investing in AI for Investment
Dataset of the Week: MultiLingual Question Answering (MLQA)
The Missing Semester
MIT has more secrets. Apparently, MIT has a hidden Konami cheat code for learning about computer science that few know about. While searching their website, I found this:
The Missing Semester of Your CS Education
Classes teach you all about advanced topics within CS, from operating systems to machine learning, but there's oneβ¦
missing.csail.mit.edu
Video:
StreamingLighting SpaCy
I thought SpaCy couldnβt get any more visually stunning. But apparently, it can. With the help of Streamlit, you can achieve all the NLP goodies that SpaCy has to offer. You can even recreate it with Prasannaβs code (Github) inspired by Ines Montani.
If you havenβt checked out Streamlit, hereβs their site:
Streamlit – The fastest way to build custom ML tools
Streamlit is an open-source app framework for Machine Learning and Data Science teams. Create beautiful data apps inβ¦
www.streamlit.io
Questioning Meaning
Allen Institute released an awesome blog post posing an interesting reflection on question understanding. While we usually focus on whether an AI model can answer a question, AI2 sends us off on a journey to understand its antecedent: can it understand a question?
They do this via decomposition. Hereβs an example:
βA system could potentially answer βName the political parties of the most densely populated countryβ, by first returning βthe most densely populated countryβ using a DB query, then βthe political parties of #1β using a QA model for text.β
Blog:
Break: Mapping Natural Language Questions to their Meaning Representation
Joint work by a team of NLP researchers at Tel Aviv University and Allen Institute for AI.
medium.com
Research From Scratch
Edward Raff asks a central question:
βHow reproducible is the latest ML research, and can we begin to quantify what impacts its reproducibility?β
Finding 1: Having fewer equations per page makes a paper more reproducible.
Finding 2: Empirical papers may be more reproducible than theory-oriented papers.
Finding 3: Sharing code is not a panacea.
Finding 4: Having detailed pseudo code is just as reproducible as having no pseudo code.
Finding 5: Creating simplified example problems do not appear to help with reproducibility.
Finding 6: Please, check your email (reply to email questions about paper)
Quantifying Independently Reproducible Machine Learning
eer review has been an integral part of scientific research for more than 300 years. But even before peer review wasβ¦
thegradient.pub
COTA: Customer Obsession Ticket Assistant
Welcome to Uber homies! In their blog, Uber goes over how they built an in-house customer service ticket system to help when peeps are mad at their drivers (jk). But seriously, Uber shows how they use simple yet efficacious techniques, like TF-IDF, cosine similarity (word2vec be like U+1F440) to help scale their services! So you were wondering how the private industry was using AI? Well, you can read about it here:
COTA: Improving Uber Customer Care with NLP & Machine Learning
Enter COTA, our Customer Obsession Ticket Assistant, a tool that uses machine learning and natural language processingβ¦
eng.uber.com
Multi-Lingual Datasets Stand Among Giants
For some reason, lots of datasets dropped this week. Facebook and Google got in on the action on the Multi-Lingual side of things. And yes, I plan to add them this week to the database.
(Surprise, Googleβs dataset is already in the database U+1F601)
Investing in AI for Investment
The World Economic Forum and Cambridge U. investigated the usage of AI for financial services in a recently released report.
Noteworthy Highlights:
Top area for AI adoption: Risk Management
Top area for AI adoption among AI leaders: Customer Service
Top AI use-cases in the data analytics domain: Sales Analytics
Dataset of the Week: MultiLingual Question Answering (MLQA)
What is it?
Dataset for evaluating cross-lingual question answering performance in English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese.
Sample:
Where is it?
facebookresearch/MLQA
MLQA (MultiLingual Question Answering) is a benchmark dataset for evaluating cross-lingual question answeringβ¦
github.com
Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
If you enjoyed this article, help us out and share with friends or social media!
For complete coverage, follow our twitter: @Quantum_Stat
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI