Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

The Limits of Deep Learning
Deep Learning

The Limits of Deep Learning

Last Updated on August 1, 2020 by Editorial Team

Author(s): Frederik Bussler

Deep Learning

Big compute needs limit performance, calling for more efficiency.

Photo by Luca Ambrosi onΒ Unsplash

GPT-3, the latest state-of-the-art in Deep Learning, achieved incredible results in a range of language tasks without additional training. The main difference between this model and its predecessor was in terms ofΒ size.

GPT-3 was trained on hundreds of billions of wordsβ€Šβ€”β€Šnearly the whole Internetβ€Šβ€”β€Šyielding a wildly compute-heavy, 175 billion parameter model.

OpenAI’s authors note that we can’t scale modelsΒ forever:

β€œA more fundamental limitation of the general approach described in this paperβ€Šβ€”β€Šscaling up any LM-like model, whether autoregressive or bidirectionalβ€Šβ€”β€Šis that it may eventually run into (or could already be running into) the limits of the pretraining objective.”

This is the law of diminishing returns inΒ action.

Diminishing Returns

By author.

If you train a deep learning model from scratch with small data (not starting with ResNet or ImageNet, or some other transfer learning base), you’ll achieve lesser performance. If you train with more data, you’ll achieve more performance. GPT-3 showed that training on an enormous dataset, with a supercomputer, achieves state-of-the-art results.

Each successive GPT model improved on the last largely by scaling the trainingΒ data.

Meme created by author usingΒ imgflip.

However, it’s uncertain that scaling it up againβ€Šβ€”β€Šsay, 10X the data and 10X the computeβ€Šβ€”β€Šwould bring anything more than modest gains on accuracy. The paper β€œComputational Limits in Deep Learning” lays out these problemsβ€Šβ€”β€ŠDeep Learning is unsustainable, as-is:

β€œProgress along current lines is rapidly becoming economically, technically, and environmentally unsustainable.”

This example perfectly illustrates diminishing returns:

β€œEven in the more-optimistic model, it is estimated to take an additional 10⁡× more computing to get to an error rate of 5% for ImageNet.”

FranΓ§ois Chollet, the author of the wildly popular Keras library, notes that we’ve been approaching DL’sΒ limits:

β€œFor most problems where deep learning has enabled transformationally better solutions (vision, speech), we’ve entered diminishing returns territory in 2016–2017.”

Deep Learning: Diminishing Returns? – Semiwiki

In fact, while GPT-3 is wildly bigger than GPT-2, it still has serious shortcomings, as per the paper’sΒ authors:

β€œDespite the strong quantitative and qualitative improvements of GPT-3, particularly compared to its direct predecessor GPT-2, it still has notable weaknesses,” including β€œlittle better than chance” performance on adversarial NLI.

Natural Language Inference has proven to be a major challenge for Deep Learning, so much so that training on an incredibly large corpus couldn’t solveΒ it.

β€œBlack Box” AIβ€Šβ€”β€ŠPoor Explainability

Another limitation of Deep Learning is its poor explainability. With enormous models like GPT-3β€Šβ€”β€Šrecall that it has 175 billion parametersβ€Šβ€”β€Šexplainability is near-impossible. We can only really guess as to why the model makes a certain decision, but with no realΒ clarity.

For instance, if GPT-3 tells us that it prefers Minecraft over Fortnite, we could intuit that this is because the word β€œMinecraft” shows up more in its trainingΒ data.

This problem is separate to Deep Learning’s poor efficiency, and the best solution, if you’re looking for explainability, is to simply use more explainable models.

For instance, some AutoML tools like Apteo gain insights from your data by selecting among models including decision trees and random forest, which have greater explainability than a deep neuralΒ network.

Screenshot of Apteo. Captured byΒ author.

Ultimately, you need to weigh the relative importance of explainability in your use-case.

Achieving Greater Deep Learning Efficiency

The past few years have seen breakthrough after breakthrough in AI due to far greater compute and data, but we’re exploiting those opportunities to theirΒ limits.

The conversation needs to shift towards algorithmic and hardware efficiency, which would also increase sustainability.

Quantum Computing

In the last decade, computational improvements for DL included mostly GPU and TPU implementations, as well as FPGA and other ASICs. Quantum computing is perhaps the best alternative, as β€œit offers a potential for sustained exponential increases in computing power.”

By author.

Current cutting-edge quantum computers, like IBM’s Raleigh, have a Quantum Volume of around 32, though Honeywell claims to have recently created a 64 Quantum Volume computer. IBM hopes to double its Quantum Volume everyΒ year.

Reducing Computational Complexity

GPT-3 is an incredibly complex model, with 175 billion parameters. One can reduce computational complexity by compressing connections in a neural network, such as by β€œpruning” away weights, quantizing the network, or using low-rank compression.

Results from these methods so far leave much to be desired, but this is one potential area of exploration.

High-Performance Small DeepΒ Learning

Finally, one can use optimization to find more efficient network architectures, as well as meta-learning and transfer learning. However, methods like meta-learning can negatively impact accuracy.

Conclusion

Deep Learning has been a source of incredible AI breakthroughs in recent years, given ever-increasing data and compute, but Moore’s Law can’t go on forever. We’re already witnessing diminishing returns in scale models. Potential solutions include greater algorithmic and hardware efficiency, particularly in regard to quantum computing.


The Limits of Deep Learning was originally published in Towards AIβ€Šβ€”β€ŠMultidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓