Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

The NLP Cypher | 05.02.21
Latest   Newsletter

The NLP Cypher | 05.02.21

Last Updated on July 24, 2023 by Editorial Team

Author(s): Quantum Stat

Beware the Beautiful Witch |Β O’Malley

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLPΒ Index

As an applied machine learning engineer (aka hacker ?‍? aka flying ninja ?‍?), I’m consistently looking for better and faster ways to stay on top of the deep learning and software development circuit. After comparing various sources for research, code, and apps. I’ve discovered that a significant amount of awesome NLP code is not on arXiv and not all NLP research is on GitHub. To obtain a wider scope of current NLP research and code, I’ve created the NLP Index! A search-as-you-type search engine containing over 3,000 NLP repositories (updated weekly) ?. The index contains the research paper, a ConnectedPapers link for a graph of related papers, and its GitHubΒ repo.

The NLP Index

The intent of this platform is for researchers and hackers to obtain information quickly and comprehensively about all things NLP. And not just from research papers, but from awesome apps that are created on top of this research.

We’ve included the option of open search (as opposed to exclusively only serving pre-defined categories) because of inter-dependencies among subject areas. Meaning, sometimes a paper/repo can be both about β€œknowledge graphs” and β€œdatasets” simultaneously and it’s difficult to discretize topics. We prefer giving the user the option of openly searching the database across all domains/sectors simultaneously. We also included pre-defined queries with dozens of topics in NLP via the sidebar for convenience.

The index has several attributes such as: search as you type, typo tolerance, and synonym detection.

Synonym Detection

For example, if you search for β€œdataset” the database will also search for β€œcorpus” and β€œcorpora” text simultaneously to make sure every asset is searched. ?

Typo Tolerance

If you search β€œgpt2" it will also includeΒ β€œgpt-2"

Search as youΒ type

It will output results on every character as you type in real-time taking only a couple milliseconds. (thank you memory mappingΒ ?)

Also want to mention that the Big Bad NLP Database has already been merged with the NLP Index! For the most up-to-date compendium of NLP datasets, you can go to the β€œdata” section of the sidebar and click dataset or openly search for a specific dataset/task. Eventually, I will sunset the BBND URL and eventually redirect it to theΒ Index.

Want to thank all of the support I’ve received over the past week after taking the NLP Index live. Thank you to Philip Vollet for sharing his dataset with hundred of NLP repos. You can find his posts in the β€œUncharted” section.

More features coming soon. Stay tuned.Β ?

BERT, Explain Yourself!

Discover why BERT makes an inference using SHAP (SHapley Additive exPlanations); a game theoretic approach to explain the output of any machine learning model. It leverages the Transformers pipeline.

ml6team/quick-tips

Colab of theΒ Week

Google Colaboratory

Explainable AI CheatΒ Sheet

Includes graphic, YouTube vid, and several links with papers/ books discussing the topic of explainable AI.

Explainable AI Guide

StyleCLIP is Too MuchΒ Fun!

Awesome introduction from Max Woolf on using StyleCLIP (via Colab notebooks) to manipulate headshot pics via text prompts. You can even add your own pictures, the quality is pretty good. For example, take a look at the generation after the text prompt: β€œFace after using the NLP index” ?Β ??

Easily Transform Portraits of People into AI Aberrations Using StyleCLIP | Max Woolf's Blog

Software Updates

AdapterHub

New version includes BART and GPT-2 modelsΒ ?

Adapters for Generative and Seq2Seq Models in NLP

BERTopic

(semi-)supervised topic modeling by leveraging supervised options inΒ UMAP

  • model.fit(docs, y=target_classes)

Backends:

  • Added Spacy, Gensim, USEΒ (TFHub)
  • Use a different backend for document embeddings and word embeddings
  • Create your own backends with bertopic.backend.BaseEmbedder
  • Click here for an overview of all newΒ backends

Calculate and visualize topics perΒ class

  • Calculate: topics_per_class = topic_model.topics_per_class(docs, topics,Β classes)

Visualize: topic_model.visualize_topics_per_class(topics_per_class)

Release Major Release v0.7 Β· MaartenGr/BERTopic

Repo CypherΒ ?‍?

A collection of recently released repos that caught ourΒ ?

Gradient-based Adversarial Attacks against Text Transformers

A general-purpose framework, GBDA (Gradient-based Distributional Attack), for gradient-based adversarial attacks, and apply it against transformer models on textΒ data.

facebookresearch/text-adversarial-attack

Connected PapersΒ ?

Easy and Efficient Transformer

Pytorch inference plugin for transformers with large model sizes and long sequences. Currently supports GPT-2 and BERTΒ models.

NetEase-FuXi/EET

Connected PapersΒ ?

MDETR: Modulated Detection for End-to-End Multi-Modal Understanding

Code and links to pre-trained models for MDETR (Modulated DETR) for pre-training on data having aligned text and images with box annotations, as well as fine-tuning on tasks requiring fine grained understanding of image andΒ text.

ashkamath/mdetr

Connected PapersΒ ?

XLM-Tβ€Šβ€”β€ŠA Multilingual Language Model Toolkit forΒ Twitter

Continues pre-training on a large corpus of Twitter in multiple languages on the XLM-Roberta-Base model. Includes 4 colab notebooks.

cardiffnlp/xlm-t

Connected PapersΒ ?

FRANK: Factuality Evaluation Benchmark

A typology of factual errors for fine-grained analysis of factuality in summarization systems.

artidoro/frank

Connected PapersΒ ?

Legal Document Similarity

A collection of state-of-the-art document representation methods for the task of retrieving semantically related US case law. Text-based (e.g., fastText, Transformers), citation-based (e.g., DeepWalk, PoincarΓ©), and
hybrid methods were explored.

malteos/legal-document-similarity

Connected PapersΒ ?

Dataset of the Week: Shellcode_IA32 ?‍?

What isΒ it?

Shellcode_IA32 is a dataset containing 20 years of shellcodes from a variety of sources is the largest collection of shellcodes in assembly available to date. This dataset consists of 3,200 examples of instructions in assembly language for IA-32 (the 32-bit version of the x86 Intel Architecture) from publicly available security exploits. Dataset is used for automatically generating shell code (code generation task). Assembly programs used to generate shellcode from exploit-db and from shell-storm were collected.

paper

Where isΒ it?

dessertlab/Shellcode_IA32

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around theΒ world.

For complete coverage, follow our Twitter: @Quantum_Stat

Quantum Stat


The NLP Cypher | 05.02.21 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓