Mini NLP Cypher | Mini Year Review

Last Updated on July 24, 2023 by Editorial Team

Author(s): Ricky Costa

Originally published on Towards AI.

init.py

Mini NLP Cypher U+007C Mini Year Review

U+1F44BU+1F44B 2020 – The Year That Never Was

Good riddance to the woeful 12 months that made the year 2020. We spent the entire time wearing masks and nervously watching the news for vaccine updates. And while the Earth stood still for a full calendar year, software (and hardware) marched forward, and it never stopped. Even as the year winded down, and all was quiet, maybe too quiet, we couldn’t help but to witness Microsoft and Google go head-to-head once more in the never-ending SuperGLUE battle:

Microsoft added DeBERTa to supersede Google’s T5’s position on the benchmark to only 12 hours later be superseded by a new deployment of T5 + Meena (what?). U+1F923

At Quantum Stat, we kept moving forward as well. We added 800+ datasets and 300+ notebooks to our inventories in addition to thousands of inference code snippets for NLP models. U+1F635 Thank you to all contributors who made it possible!

Ok, so what does NLP look like for 2021? A bifurcation of SUPER large models vs. smaller compressed models? Or how about advancements in sparsity for pretrained models? Or how about models small enough to fit natively on the edge getting closer to reality?

Maybe all of the above. Additionally, we’ll probably see graphs and deep learning finally get married. 2021 will be their honeymoon. Several libraries already out there and have been maturing for several years like PyTorch Geometric, DGL, and DeepMind’s Graph Nets. Here are their GitHub stars growth trajectories over the years:

With regards to model architecture, we are also seeing a few alternatives for memory savings, improved abilities to handle longer sequences of text and improved training objectives. Few examples:

Longformer

Reformer

ELECTRA

Also, domain specific adaptation of NLP models will continue to proliferate. And by domain, I’m referring to 3 dimensions: languages, textual format (Twitter text or formal text etc.) and sector (legal or healthcare etc.)

Few examples:

Language-Focused: BERTurk, CamemBERT, AlBERTo, MBERT

Text-Focused: BERTweet , CharBERT

Sector-Focused: BioBERT, FinBERT, Legal-BERT

Inference optimization was a big winner this past year with several libraries being released. This focus area will help to continue bridge the performance gap between research and the enterprise so expect more from this area for the upcoming year. Here are a few libraries that help with optimizing transformers:

BERT seems so far away now with so many new model architectures and novel use-cases that made 2020 a weird one given the circumstances.

But 2021 is shaping up to be a good year for all of us. So until then…

Happy New Years U+1F387U+1F386U+1F387, and see you on the other side! U+270CU+270C 2021

P.S., regular NLP Cypher arriving Sunday.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

The Fundamental Mathematics of Machine Learning

Built-In AI Web APIs Will Enable A New Generation Of AI Startups

Auditing Predictive A.I. Models for Bias and Fairness

Why is Llama 3.1 Such a Big deal?

5 AI Real-World Projects To Set Foot in The Door

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.