NLP News Cypher | 07.26.20

Last Updated on July 24, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

NLP News Cypher U+007C 07.26.20

Primus

The Liber Primus is unsolved to this day. A book of 58 pages written in Runes, of which, its bewildering encryption continues to haunt hacker gunslingers around the globe who choose only to communicate and study its content via IRCs (internet relay chat).

The cryptic book arrived on the internet in the mid 2010’s by the now wildly popular but mysterious internet group 3301. While the group’s identity remains hidden, it is speculated they are a remnant of the cypherpunk activist movement (birthed somewhere out of Berkley in the 80s). At least this is the most plausible explanation given to us by one of the few known hackers that’s made it inside the clandestine group — Marcus Wanner. But who knows…

3301’s Cicada project started with a random 4chan post in 2012 leading many thrill seekers, with a cult-like following, on a puzzle hunt that encompassed everything from steganography to cryptography. While most of their puzzles were eventually solved, the very last one, the Liber Primus, is still (mostly) encrypted. The last known comms from 3301 came in April 2017 via Pastebin post. It reads:

Message from 3301/Cicada – Pastebin.com

Not a member of Pastebin yet? Sign Up , it unlocks many cool features! —–BEGIN PGP SIGNED MESSAGE—– Beware false…

pastebin.com

FYI, there’s a standard PGP (pretty good privacy) key for all 3301 posts. If you see a 3301 online post without their PGP signature, don’t trust it (plenty of troll accounts to be found).

For a Summary/Timeline:

Uncovering Cicada Wiki

NEW USERS, PLEASE READ THIS FAQ IF YOU DON'T KNOW WHAT PGP IS CLICK HERE

uncovering-cicada.fandom.com

Visit Nox’s YouTube channel if you are interested in understanding how they cracked previous Cicada puzzles ante-Liber Primus.

Meanwhile back at the ranch…

I luckily found my way in creating a training script for adapters (the modular add-ons discussed in last week’s blog). The script works for the GLUE datasets. Will keep everyone updated as new events unfold regarding the AdapterHub. Very excited about this new framework, once again thanks to Jonas for nudging me in the right direction.

Stay Frosty U+270CU+270C

This Week

SimpleTOD

TurboTransformers

NLP & Audio Pretrained Models

NERtwork

AllenNLP Library Step-by-Step

Search Engining is Hard Bruh

Dataset of the Week: ODSQA

SimpleTOD

Previous task oriented dialogues, especially from those chatbots we all dream of one day building, are built using a standard modular pipeline (similar to what you find in the RASA framework). However, Salesforce Research has recently released a unidirectional language model called SimpleTOD, that attempts to solve all the sub-tasks in an end-to-end manner. It was built with Transformers on the MultiWOZ dataset.

Blog:

SimpleTOD: A Simple Language Model for Task-Oriented Dialogue

We propose recasting task-oriented dialogue as a simple, causal (unidirectional) language modeling task. We show that…

blog.einstein.ai

Paper

LINK

GitHub:

salesforce/simpletod

Authors: Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, and Richard Socher Task-oriented dialogue (TOD)…

github.com

TurboTransformers

A recent transformer runtime library, TurboTransformers, for inference came to my attention. This library optimizes what everyone wants in production, lower latency. They claim:

It brings 1.88x acceleration to the WeChat FAQ service, 2.11x acceleration to the public cloud sentiment analysis service, and 13.6x acceleration to the QQ recommendation system.

The sell is that it can support various lengths of input sequences without preprocessing which reduces overhead in computation. U+1F9D0

GitHub:

Tencent/TurboTransformers

Make transformers serving fast by adding a turbo to your inference engine!Transformer is the most critical alogrithm…

github.com

NLP & Audio Pretrained Models

A nice collection of pretrained model libraries found on GitHub. These 2 repos encompass NLP and Speech modeling. Conveniently, the models are indexed by framework and includes a brief description.

NLP

balavenkatesh3322/NLP-pretrained-model

A pre-trained model is a model created by some one else to solve a similar problem. Instead of building a model from…

github.com

Speech/Audio

balavenkatesh3322/audio-pretrained-model

A pre-trained model is a model created by some one else to solve a similar problem. Instead of building a model from…

github.com

NERtwork

Awesome new shell/python script that graphs a network of co-occurring entities from plain text!

It combines Stanford’s NER for the model, OpenRefine (to deal with data normalization: i.e. B. Obama and Barrack are same entity) and NetworkX for graph creation.

Blog: http://brandontlocke.com/2020/07/22/announcing-nertwork.html

GitHub (Profile photo of the week):

brandontlocke/NERtwork

NERtwork is a collection of scripts to help you create a network graph of co-occurring named entities using open source…

github.com

AllenNLP Library Step-by-Step

Best step-by-step guide into AllenNLP’s library to date. Lengthy but worthwhile with code pasted along the way. The demo is for building/training an NER LSTM model.

Blog:

Part 0 – Setup

This series is my AllenNLP tutorial that goes from installation through building a state-of-the-art (or nearly) named…

jbarrow.ai

Search Engining is Hard Bruh

Research scientist from AI2 discusses the hardships of building the Semantic Scholar search engine, which currently indexes 190M scientific papers. U+1F440

It uses the 2 model architecture: sparse search via Elasticsearch and then a ranker ML model.

The blog goes in-depth into the challenges they faced while building the search engine such as data complexity, and evaluation problems. It offers a ton of detail, more than I can handle on this post to give it justice, so give it a read if your are interested in search.

Building a Better Search Engine for Semantic Scholar

A “tell-all” account of improving our academic search engine.

medium.com

Dataset of the Week: ODSQA

What is it?

ODSQA is a Chinese dataset for spoken question answering (extractive). It contains 3,654 question answer pairs.

Paper: https://arxiv.org/pdf/1808.02280.pdf

Where is it?

chiahsuan156/ODSQA

This repository contains dataset for the IEEE SLT 2018 paper: ODSQA: OPEN-DOMAIN SPOKEN QUESTION ANSWERING DATASET…

github.com

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends!

For complete coverage, follow our Twitter: @Quantum_Stat

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

I Used ChatGPT to Count My Calories

Resource-Efficient Fine-Tuning of DeepSeek-R1

TAI #138: OpenAI’s o3-Mini and Deep Research: A New Era of Reasoning Powered Agents?

Text Preprocessing for NLP: A Step-by-Step Guide to Clean Raw Text Data

DeepSeek AI — The Future is Here

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

NLP News Cypher | 07.26.20

Author(s): Ricky Costa

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

NLP News Cypher U+007C 07.26.20

Primus

Message from 3301/Cicada – Pastebin.com

Not a member of Pastebin yet? Sign Up , it unlocks many cool features! —–BEGIN PGP SIGNED MESSAGE—– Beware false…

Uncovering Cicada Wiki

NEW USERS, PLEASE READ THIS FAQ IF YOU DON'T KNOW WHAT PGP IS CLICK HERE

This Week

SimpleTOD

SimpleTOD: A Simple Language Model for Task-Oriented Dialogue

We propose recasting task-oriented dialogue as a simple, causal (unidirectional) language modeling task. We show that…

salesforce/simpletod

Authors: Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, and Richard Socher Task-oriented dialogue (TOD)…

TurboTransformers

Tencent/TurboTransformers

Make transformers serving fast by adding a turbo to your inference engine!Transformer is the most critical alogrithm…

NLP & Audio Pretrained Models

balavenkatesh3322/NLP-pretrained-model

A pre-trained model is a model created by some one else to solve a similar problem. Instead of building a model from…

balavenkatesh3322/audio-pretrained-model

A pre-trained model is a model created by some one else to solve a similar problem. Instead of building a model from…

NERtwork

brandontlocke/NERtwork

NERtwork is a collection of scripts to help you create a network graph of co-occurring named entities using open source…

AllenNLP Library Step-by-Step

Part 0 – Setup

This series is my AllenNLP tutorial that goes from installation through building a state-of-the-art (or nearly) named…

Search Engining is Hard Bruh

Building a Better Search Engine for Semantic Scholar

A “tell-all” account of improving our academic search engine.

Dataset of the Week: ODSQA

What is it?

Where is it?

chiahsuan156/ODSQA

This repository contains dataset for the IEEE SLT 2018 paper: ODSQA: OPEN-DOMAIN SPOKEN QUESTION ANSWERING DATASET…

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement