Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Understanding the Universal Approximation Theorem
Deep Learning

Understanding the Universal Approximation Theorem

Last Updated on November 10, 2020 by Editorial Team

Author(s): Shiva Sankeerth Reddy

Photo by Jeremy Thomas onΒ Unsplash

Let us take some time to understand the importance of NeuralΒ Networks

Neural networks are one of the most beautiful programming paradigms ever invented. In the conventional approach to programming, we tell the computer what to do, breaking big problems up into many small, precisely defined tasks that the computer can easily perform. By contrast, in a neural network, we don’t tell the computer how to solve our problem. Instead, it learns from observational data, figuring out its solution to the problem atΒ hand.

Until recently we didn’t know how to train neural networks to surpass more traditional approaches, except for a few specialized problems. What changed was the discovery of techniques for learning in so-called deep neural networks. These techniques are now known as deep learning. They’ve been developed further, and today deep neural networks and deep learning achieve outstanding performance on many important problems in computer vision, speech recognition, and natural language processing.

That being said, let’s dive into the Universal Approximation Theorem.

Suppose someone has given you a wiggly function, say f(x) likeΒ below.

Image credits to MichaelΒ Nielsen

One of the striking features of using a Neural Network is that it can compute any function, no matter how complicated itΒ is.

There is a guarantee that there will be a neural network for any function so that for every possible input, x, the value f(x)(or some close approximation) is output from the network,Β e.g.:

Image credits to MichaelΒ Nielsen

The above-said result holds even if the function has multiple inputs, f=f(x1,…, xm), and many outputs. For instance, here’s a network computing for a function with m=3 inputs and n=2Β outputs:

Image credits to MichaelΒ Nielsen

This tells us that Neural networks have some kind of universality in them i.e. whatever the function may be that we want to compute, there is already a neural network available forΒ it.

The universality theorem is well known by people who use neural networks. But why it’s true is not so widely understood.

Almost any process you can imagine can be thought of as function computation. Consider the problem of naming a piece of music based on a short sample of the piece. That can be thought of as computing a function. Or consider the problem of translating a Chinese text into English. Again, that can be thought of as computing a function.

Of course, just because we know a neural network exists that can (say) translate Chinese text into English, that doesn’t mean we have good techniques for constructing or even recognizing such a network. This limitation applies also to traditional universality theorems for models such as Boolean circuits.

Universality with one input and oneΒ output

let’s start by understanding how to construct a neural network which approximates a function with just one input and oneΒ output:

Image credits to MichaelΒ Nielsen

If observed keenly, this is the core of the problem of universality. Once we’ve understood this special case it’s pretty easy to extend to functions with many inputs and manyΒ outputs.

To build insight into how to construct a network to compute f, let’s start with a network containing just a single hidden layer, with two hidden neurons, and an output layer containing a single outputΒ neuron:

Image credits to MichaelΒ Nielsen

To get a feel for how components in the network work, let’s focus on the top hidden neuron. Let’s start with weight w = 0 and bias b =Β 0.

weight = 0 and bias =Β 0

Let’s increase the weight β€˜w’ by keeping bias constant. The below images show the function f(x) in 2D space for different values of weightΒ β€˜w’.

weight = 1 and bias =Β 0
weight = 4 and bias =Β 0
weight = -4 and bias =Β 0

Now let’s change the bias by keeping weight constant.

weight = 3 and bias =Β 0
weight = 3 and bias =Β -1
weight = 3 and bias =Β 1

This visual representation of weights and bias in a simple neural network must have helped you to understand the intuition behind the Universal Approximation Theorem.

Many Input Variables

Let’s extend this approach to the case of many input variables. This may sound complicated, but all the ideas we need can be understood in the case of just two inputs. So let’s address the two-input case.

Here, we have inputs x and y, with corresponding weights w1 and w2, and a bias b on the neuron. Let’s set the weight w2 to 0, and then play around with the first weight, w1, and the bias, b, to see how they affect the output from theΒ neuron:

w1=0 w2=0 andΒ b=0
w1=3 w2=0 andΒ b=0
w1=-7 w2=0 andΒ b=0
w1=1 w2=0 andΒ b=1

As you can see, with w2=0 the input y makes no difference to the output from the neuron. It’s as though x is the onlyΒ input.

Let’s change the bias too to play with the graph and understand what’s happening.

As the input weight gets larger the output approaches a step function. The difference is that now the step function is in three dimensions. Also as before, we can move the location of the step point around by modifying the bias. The actual location of the step point isΒ Sxβ‰‘βˆ’b/w1

We can use the step functions we’ve just constructed to compute a three-dimensional bump function. To do this, we use two neurons, each computing a step function in the x-direction. Then we combine those step functions with weight h and βˆ’h, respectively, where h is the desired height of the bump. It’s all illustrated in the following diagram:

We’ve figured out how to make a bump function in the x-direction. Of course, we can easily make a bump function in the y-direction, by using two-step functions in the y-direction. Recall that we do this by making the weight large on the β€˜y’ input, and the weight 0 on the x input. Here’s theΒ result:

This looks nearly identical to the earlierΒ network!

Let’s consider what happens when we add up two bump functions, one in the x-direction, the other in the y-direction, both of heightΒ h:

We can continue like this to approximate any kind of function. But we will end this discussion here and I will point you to some links in the bottom to let you play with these neural networks.

Conclusion

The explanation for universality we’ve discussed is certainly not a practical prescription for how to compute using neural networks!

Although the result isn’t directly useful in constructing networks, it’s important because it takes off the table the question of whether any particular function is computable using a neuralΒ network.

The answer to that question is always β€œyes”. So the right question to ask is not whether any particular function is computable, but rather what’s a good way to compute the function.

All the parts of this article are adapted from the book β€œNeural Networks and Deep Learning” by MichaelΒ Nielsen.

References:

  1. A visual proof that neural nets can compute any function by MichaelΒ Nielson.
  2. This article has been written as part of the assignment for Jovian.ml’s course β€œZeroToGANs” offered in collaboration with freeCodeCamp.
  3. Tensorflow’s Playground for intuitively understanding Neural networks.
  4. Machine Learning Playground


Understanding the Universal Approximation Theorem was originally published in Towards AIβ€Šβ€”β€ŠMultidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓