The Limits of Deep Learning
Last Updated on August 1, 2020 by Editorial Team
Author(s): Frederik Bussler
Deep Learning
Big compute needs limit performance, calling for more efficiency.
GPT-3, the latest state-of-the-art in Deep Learning, achieved incredible results in a range of language tasks without additional training. The main difference between this model and its predecessor was in terms ofΒ size.
GPT-3 was trained on hundreds of billions of wordsβββnearly the whole Internetβββyielding a wildly compute-heavy, 175 billion parameter model.
OpenAIβs authors note that we canβt scale modelsΒ forever:
βA more fundamental limitation of the general approach described in this paperβββscaling up any LM-like model, whether autoregressive or bidirectionalβββis that it may eventually run into (or could already be running into) the limits of the pretraining objective.β
This is the law of diminishing returns inΒ action.
Diminishing Returns
If you train a deep learning model from scratch with small data (not starting with ResNet or ImageNet, or some other transfer learning base), youβll achieve lesser performance. If you train with more data, youβll achieve more performance. GPT-3 showed that training on an enormous dataset, with a supercomputer, achieves state-of-the-art results.
Each successive GPT model improved on the last largely by scaling the trainingΒ data.
However, itβs uncertain that scaling it up againβββsay, 10X the data and 10X the computeβββwould bring anything more than modest gains on accuracy. The paper βComputational Limits in Deep Learningβ lays out these problemsβββDeep Learning is unsustainable, as-is:
βProgress along current lines is rapidly becoming economically, technically, and environmentally unsustainable.β
This example perfectly illustrates diminishing returns:
βEven in the more-optimistic model, it is estimated to take an additional 10β΅Γ more computing to get to an error rate of 5% for ImageNet.β
FranΓ§ois Chollet, the author of the wildly popular Keras library, notes that weβve been approaching DLβsΒ limits:
βFor most problems where deep learning has enabled transformationally better solutions (vision, speech), weβve entered diminishing returns territory in 2016β2017.β
Deep Learning: Diminishing Returns? – Semiwiki
In fact, while GPT-3 is wildly bigger than GPT-2, it still has serious shortcomings, as per the paperβsΒ authors:
βDespite the strong quantitative and qualitative improvements of GPT-3, particularly compared to its direct predecessor GPT-2, it still has notable weaknesses,β including βlittle better than chanceβ performance on adversarial NLI.
Natural Language Inference has proven to be a major challenge for Deep Learning, so much so that training on an incredibly large corpus couldnβt solveΒ it.
βBlack Boxβ AIβββPoor Explainability
Another limitation of Deep Learning is its poor explainability. With enormous models like GPT-3βββrecall that it has 175 billion parametersβββexplainability is near-impossible. We can only really guess as to why the model makes a certain decision, but with no realΒ clarity.
For instance, if GPT-3 tells us that it prefers Minecraft over Fortnite, we could intuit that this is because the word βMinecraftβ shows up more in its trainingΒ data.
This problem is separate to Deep Learningβs poor efficiency, and the best solution, if youβre looking for explainability, is to simply use more explainable models.
For instance, some AutoML tools like Apteo gain insights from your data by selecting among models including decision trees and random forest, which have greater explainability than a deep neuralΒ network.
Ultimately, you need to weigh the relative importance of explainability in your use-case.
Achieving Greater Deep Learning Efficiency
The past few years have seen breakthrough after breakthrough in AI due to far greater compute and data, but weβre exploiting those opportunities to theirΒ limits.
The conversation needs to shift towards algorithmic and hardware efficiency, which would also increase sustainability.
Quantum Computing
In the last decade, computational improvements for DL included mostly GPU and TPU implementations, as well as FPGA and other ASICs. Quantum computing is perhaps the best alternative, as βit offers a potential for sustained exponential increases in computing power.β
Current cutting-edge quantum computers, like IBMβs Raleigh, have a Quantum Volume of around 32, though Honeywell claims to have recently created a 64 Quantum Volume computer. IBM hopes to double its Quantum Volume everyΒ year.
Reducing Computational Complexity
GPT-3 is an incredibly complex model, with 175 billion parameters. One can reduce computational complexity by compressing connections in a neural network, such as by βpruningβ away weights, quantizing the network, or using low-rank compression.
Results from these methods so far leave much to be desired, but this is one potential area of exploration.
High-Performance Small DeepΒ Learning
Finally, one can use optimization to find more efficient network architectures, as well as meta-learning and transfer learning. However, methods like meta-learning can negatively impact accuracy.
Conclusion
Deep Learning has been a source of incredible AI breakthroughs in recent years, given ever-increasing data and compute, but Mooreβs Law canβt go on forever. Weβre already witnessing diminishing returns in scale models. Potential solutions include greater algorithmic and hardware efficiency, particularly in regard to quantum computing.
The Limits of Deep Learning was originally published in Towards AIβββMultidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI