Mastering Deep Learning: The Art of Approximating Non-Linearities with Piecewise Estimations Part-3

Last Updated on November 9, 2023 by Editorial Team

Author(s): Raagulbharatwaj K

Originally published on Towards AI.

Greetings, everyone! Welcome to the third installment of my Mastering Deep Learning series. This article marks the conclusion of the first two parts, titled The Art of Approximating Non-Linearities with Piecewise Estimations. In the initial two articles, we delved deeply into comprehending the inner workings of deep neural networks. In this final segment, we will conclude our exploration by understanding how neural networks handle multidimensional inputs and outputs, which represent a more realistic scenario. Before we dive into the content, I received a suggestion, which is to attempt a better and more comprehensible visualization of the folding of the higher-dimensional input space. Let’s quickly explore this idea and then proceed!

simple neural network to visualize latent representations

Let’s consider a simple neural network, as depicted above. Each neuron, denoted as hₖ’ in the second hidden layer, estimates a hyperplane using the following format:

hₖ’ = θ’ₖ₁h₁ + θ’ₖ₂h₂ + β₁

The equation above represents a two-dimensional hyperplane in the h₁ and h₂ space. Similarly, we can obtain the output y as follows:

y = ∑ ₀¹⁰ Ωᵢhᵢ’ + β₂

As we’ve explored in earlier blog posts, we can unfold the equation above to understand the mapping from x to y. However, our current focus is not on the specific mapping; instead, we aim to visualize how each of the ten neurons contributes to the folding process.

The neural network above is trained to estimate the following function: f(x) = sin(2x) + cos(x) + x. Each hₖ’ represents a two-dimensional surface that can be visualized. Below is a figure that illustrates the various surfaces estimated by each of the ten neurons in the second hidden layer.

Ten different neurons fold the surface defined over h₁, h₂, and hₖ in ten unique ways.

Each neuron has uniquely folded the surface, providing us with insight into the various ways this folding can manifest. The latent estimation of hₖ’ in terms of x is essentially a line, which is a component of the surface depicted above. You can think of the surface as the paper on which we draw this line, and the process is akin to making cuts on the folded paper, rather than directly cutting the line itself.

Now, let’s explore how neural networks handle multiple inputs, considering a simple shallow neural network as shown in the figure below.

simple shallow neural network with two inputs

Instead of having just one weight at the hidden unit, we now have two weights along with a bias. With this change, each neuron estimates a plane instead of a line, while the overall mechanics remain similar. Let’s now attempt to estimate the mapping f(x₁, x₂) = sin(2x₁)sin(2x₂) + cos(x₁)cos(x₂) + x₁ + x₂ using shallow neural networks with different capacities and visualize their estimations.

The estimation of **f(x₁, x₂)** using a shallow neural network with 12 hidden units.

The estimation of **f(x₁, x₂)** using a shallow neural network with 32 hidden units.

The estimation of **f(x₁, x₂)** using a shallow neural network with 64hidden units.

As we increase the number of hidden units, our estimates of the surface we’re trying to model continue to improve. With each neuron estimating a plane, the ReLU function can cut these planes on any of their four sides, creating piece-wise linear surfaces. As mentioned earlier, the more pieces we have, the better we can approximate the target function. It’s important to note that the estimated surface is strictly piece-wise linear, even though it might give the impression of a non-linear surface. If we zoom in enough, we would clearly see these piece-wise linear regions.

Having understood how multi-dimensional inputs are handled, there’s no limitation preventing us from using deep neural networks for such estimations. Deep networks can create more of these linear regions compared to shallow networks with the same number of neurons. For instance, if we consider a 100-neuron shallow network and a deep network with two layers, each containing 50 neurons, the deep network has more parameters between the two hidden layers alone than the entire shallow network. Although both networks have the same number of neurons, we gain additional estimation power at the cost of more parameters.

The figure below shows the same estimation done by a deep network with 6 neurons in each hidden layer, and we can observe that the estimate is as good as the one obtained with a shallow neural network having 32 neurons

The estimation of **f(x₁, x₂)** using a deep neural network with two hidden layers, each containing 6 hidden units.

Let’s continue our exploration of how neural networks estimate multidimensional outputs. To gain a better understanding, let’s consider two functions, sin(x) and cos(x), both functions of a real-valued variable x. Our objective is to simultaneously learn both of these functions from the input x. We will train a simple shallow neural network, as depicted below, to estimate y₁ and y₂. In previous examples, we estimated a function with multiple variables, but now we are estimating multiple functions.

A simple shallow neural network that simultaneously estimates two different functions, y₁ and y₂

The estimated functions **f₁(x)** and **f₂(x)** produced by the model with 3 hidden units.

We can observe that the estimation exhibits 4 linear regions, corresponding to the 3 cuts made by each of the 3 neurons. An interesting aspect here is that both of the estimated functions have cuts at the same points. This phenomenon occurs because the neurons in the final layer estimate linear combinations of the estimations made by the neurons in the previous layer, and these linear combinations are passed through ReLU, causing cuts to be made prior to the final estimation. Consequently, the final estimation consists of two different linear combinations of these functions, resulting in cuts at precisely the same locations for both functions

The estimated values of y₁ and y₂ as produced by the model

Real-life problems often involve multiple variables on both the input and output sides. The figure provided below represents a generic shallow neural network capable of taking multiple inputs and returning multiple outputs. Neural networks enable us to estimate multiple multivariate functions simultaneously, making them a remarkably powerful tool for estimating complex mathematical functions.

A generic shallow neural network that accepts multivariate inputs to simultaneously estimate multiple mappings.

This blog draws significant inspiration from the book “Understanding Deep Learning” by Simon J.D. Prince (udlbook.github.io/udlbook/). In the next installments, we will delve into loss functions and loss functions are derived from maximum likelihood estimation. The code I utilized to generate the plots can be found below. If you’ve found this blog insightful, I would greatly appreciate your support by giving it a like.

Understanding-Deep-Learning/Mastering_Deep_Learning_The_Art_of_Approximating_Non_Linearities_with_Pi…

Contribute to Raagulbharatwaj/Understanding-Deep-Learning development by creating an account on GitHub.

github.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Mastering Deep Learning: The Art of Approximating Non-Linearities with Piecewise Estimations Part-3

Author(s): Raagulbharatwaj K

Understanding-Deep-Learning/Mastering_Deep_Learning_The_Art_of_Approximating_Non_Linearities_with_Pi…

Contribute to Raagulbharatwaj/Understanding-Deep-Learning development by creating an account on GitHub.

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

#44 Why is Model Distillation the Hottest Trend in AI Right Now?

Voting Ensembles in Machine Learning: Making Predictions Stronger Together

Decision Trees Unveiled: From ID3 to CART to Random Forests to XGBoost

A Data Scientist’s Guide to Ensemble Learning: Techniques, Benefits, and Code

#43 MemoRAG, RAG Agent, RAG Fusion, and more!

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Mastering Deep Learning: The Art of Approximating Non-Linearities with Piecewise Estimations Part-3

Author(s): Raagulbharatwaj K

Understanding-Deep-Learning/Mastering_Deep_Learning_The_Art_of_Approximating_Non_Linearities_with_Pi…

Contribute to Raagulbharatwaj/Understanding-Deep-Learning development by creating an account on GitHub.

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement