Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

NLP News Cypher | 07.26.20
Natural Language Processing   Newsletter

NLP News Cypher | 07.26.20

Last Updated on July 24, 2023 by Editorial Team

Author(s): Quantum Stat

Photo by Will Truettner onΒ Unsplash

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

Primus

The Liber Primus is unsolved to this day. A book of 58 pages written in Runes, of which, its bewildering encryption continues to haunt hacker gunslingers around the globe who choose only to communicate and study its content via IRCs (internet chatΒ relays).

The cryptic book arrived on the internet in the mid 2010’s by the now wildly popular but mysterious internet group 3301. While the group’s identity remains hidden, it is speculated they are a remnant of the cypherpunk activist movement (birthed somewhere out of Berkley in the 80s). At least this is the most plausible explanation given to us by one of the few known hackers that’s made it inside the clandestine groupβ€Šβ€”β€ŠMarcus Wanner. But whoΒ knows…

3301’s Cicada project started with a random 4chan post in 2012 leading many thrill seekers, with a cult-like following, on a puzzle hunt that encompassed everything from steganography to cryptography. While most of their puzzles were eventually solved, the very last one, the Liber Primus, is still (mostly) encrypted. The last known comms from 3301 came in April 2017 via Pastebin post. ItΒ reads:

Message from 3301/Cicada – Pastebin.com

FYI, there’s a standard PGP (pretty good privacy) key for all 3301 posts. If you see a 3301 online post without their PGP signature, don’t trust it (plenty of troll accounts to beΒ found).

For a Summary/Timeline:

Uncovering Cicada Wiki

Visit Nox’s YouTube channel if you are interested in understanding how they cracked previous Cicada puzzles ante-Liber Primus.

Meanwhile back at theΒ ranch…

I luckily found my way in creating a training script for adapters (the modular add-ons discussed in last week’s blog). The script works for the GLUE datasets. Will keep everyone updated as new events unfold regarding the AdapterHub. Very excited about this new framework, once again thanks to Jonas for nudging me in the right direction.

Stay Frosty ✌✌

This Week

SimpleTOD

TurboTransformers

NLP & Audio Pretrained Models

NERtwork

AllenNLP Library Step-by-Step

Search Engining is HardΒ Bruh

Dataset of the Week:Β ODSQA

SimpleTOD

Previous task oriented dialogues, especially from those chatbots we all dream of one day building, are built using a standard modular pipeline (similar to what you find in the RASA framework). However, Salesforce Research has recently released a unidirectional language model called SimpleTOD, that attempts to solve all the sub-tasks in an end-to-end manner. It was built with Transformers on the MultiWOZΒ dataset.

Blog:

SimpleTOD: A Simple Language Model for Task-Oriented Dialogue

Paper

GitHub:

salesforce/simpletod

TurboTransformers

A recent transformer runtime library, TurboTransformers, for inference came to my attention. This library optimizes what everyone wants in production, lower latency. TheyΒ claim:

It brings 1.88x acceleration to the WeChat FAQ service, 2.11x acceleration to the public cloud sentiment analysis service, and 13.6x acceleration to the QQ recommendation system.

The sell is that it can support various lengths of input sequences without preprocessing which reduces overhead in computation. ?

GitHub:

Tencent/TurboTransformers

NLP & Audio Pretrained Models

A nice collection of pretrained model libraries found on GitHub. These 2 repos encompass NLP and Speech modeling. Conveniently, the models are indexed by framework and includes a brief description.

NLP

balavenkatesh3322/NLP-pretrained-model

Speech/Audio

balavenkatesh3322/audio-pretrained-model

NERtwork

Awesome new shell/python script that graphs a network of co-occurring entities from plainΒ text!

It combines Stanford’s NER for the model, OpenRefine (to deal with data normalization: i.e. B. Obama and Barrack are same entity) and NetworkX for graph creation.

Blog: http://brandontlocke.com/2020/07/22/announcing-nertwork.html

GitHub (Profile photo of theΒ week):

brandontlocke/NERtwork

AllenNLP Library Step-by-Step

Best step-by-step guide into AllenNLP’s library to date. Lengthy but worthwhile with code pasted along the way. The demo is for building/training an NER LSTMΒ model.

Blog:

Part 0 – Setup

Search Engining is HardΒ Bruh

Research scientist from AI2 discusses the hardships of building the Semantic Scholar search engine, which currently indexes 190M scientific papers.Β ?

It uses the 2 model architecture: sparse search via Elasticsearch and then a ranker MLΒ model.

The blog goes in-depth into the challenges they faced while building the search engine such as data complexity, and evaluation problems. It offers a ton of detail, more than I can handle on this post to give it justice, so give it a read if your are interested inΒ search.

Building a Better Search Engine for Semantic Scholar

Dataset of the Week:Β ODSQA

What isΒ it?

ODSQA is a Chinese dataset for spoken question answering (extractive). It contains 3,654 question answerΒ pairs.

Paper: https://arxiv.org/pdf/1808.02280.pdf

Where isΒ it?

chiahsuan156/ODSQA

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around theΒ world.

If you enjoyed this article, help us out and share withΒ friends!

For complete coverage, follow our Twitter: @Quantum_Stat

www.quantumstat.com


NLP News Cypher | 07.26.20 was originally published in Towards AIβ€Šβ€”β€ŠMultidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓