How Does AI Work? Create a Neural Network from Scratch

Last Updated on September 19, 2024 by Editorial Team

Author(s): Sean Jude Lyons

Originally published on Towards AI.

By the end of this article, you’ll be able to build your own model and Machine Learning library to make predictions.

Let's begin by writing a simple function and discussing it:

def parabola_function(x):
 return 3*x**2 - 4*x+5

A parabolic function is just some function for which if we input x coordinates, it gives us a set of y coordinates which we can map to form a parabola, for eg, for this set of x coordinates.

x_point_list = np.arange(-5, 5, 0.25) # creates a list of points from -5 to 5 seperated by 0.25
print(x_point_list)
----
[-5. -4.75 -4.5 -4.25 -4. -3.75 -3.5 -3.25 -3. -2.75 -2.5 -2.25
 -2. -1.75 -1.5 -1.25 -1. -0.75 -0.5 -0.25 0. 0.25 0.5 0.75
 1. 1.25 1.5 1.75 2. 2.25 2.5 2.75 3. 3.25 3.5 3.75]

For example, if we give the following points to the parabolic_functionwe get:

x_point_list = np.arange(-5, 5, 0.25) # creates a list of points from -5 to 5 seperated by 0.25
y_point_list = parabola_function(x_point_list)
print(y_point_list)
plt.plot(x_point_list,y_point_list) # plotting our points
----
[100. 91.6875 83.75 76.1875 69. 62.1875 55.75 49.6875
 44. 38.6875 33.75 29.1875 25. 21.1875 17.75 14.6875
 12. 9.6875 7.75 6.1875 5. 4.1875 3.75 3.6875
 4. 4.6875 5.75 7.1875 9. 11.1875 13.75 16.6875
 20. 23.6875 27.75 32.1875 37. 42.1875 47.75 53.6875]

Finally, we can plot these points to visualize the parabolic curve:

Now, this is where machine learning comes into play. Specifically, when we’re interested in finding the lowest point along the curve above, i.e., the global minima. Visually, we can estimate that this point lies somewhere between 0 and 2 along the x axis.

But how would we identify this point if we didn’t have the luxury of a graph?

To tackle this, we introduce a small “nudge”, let us call this h:

h = 0.1
x = 3.0
parabola_function(x + h)
----
y = 21.430000000000007

As we can see above, by adding this h to x we can begin to explore how the function behaves and how we can gradually move toward that minimum point. We can continue nudging and trying different combinations, as seen below:

h = 0.1
x = -4.4
parabola_function(x + h)
----
y = 77.67000000000002

Let’s automate this process, given our goal to be nearest to zero. The way we can do that is by actually checking y’s position: new y position after nudge — old y position. Another way to represent this is as seen below:

To automate the process, we calculate the difference between the new y-value and the old one after each nudge, then divide by h to get the step size. This helps us move closer to the lowest point:

h = 0.0001
x = 2/3
(f(x+h) - f(x))/h
---
0.0002999999981767587

By repeating this process, we gradually bring the x-coordinate closer to the minimum point, which in this case is around x = 2/3.

Another way of representing the above is:

Finally we find a value of x and h which best identifies a point nearest to zero:

This intuition is fundamental in training a neural network, where the algorithm iteratively adjusts itself to minimize the error and make better predictions.

Think of this algorithm not just as a single number but as the result of a series of mathematical operations that have been performed to reach that final state we got to before.

Keeping this in mind, an example of how we can arrive at the parabolic_function can be seen below:

x = ML Framework(2.0) # let's say x is 2.0
y = 3 * x.times(x) # represents 3x^2
y = y.minus(4 * x) # represents 3x^2 - 4x
y = y.plus(5) # represents 3x^2 - 4x + 5

The magic here lies in how these operations are automated. A neural network essentially carries out these calculations on a massive scale, tweaking and adjusting itself (the weights and biases) to minimize error and improve predictions.

But how does the neural network compute and update these values? The key is in how we represent and perform the mathematical operations. For instance, we can assume x as our original input, w as the weight we assign to x, and b as our bias (or nudge):

The next and more obvious question is: how do we update these values? To do this, we need to evaluate how close our predicted values (the outputs of our network) are to the actual values (the target outputs).

We do this by making a hypothetical prediction using our current values then calculating the difference between this prediction and the actual target value. This difference is known as the “loss,” and it gives us a sense of how far off our predictions are.

To improve and address the discrepancy in loss, we need to go backward through the network and evaluate how the loss changes relating to each of our weights. This is where we can introduce a concept like Depth First Search (DFS) to further understand what we are going to do.

Source: Wiki Create Commons (link — /Depth-First-Search.gif?20090326120256)

DFS is a recursive exploration algorithm that can help us understand the process of backpropagation in neural networks. Backpropagation is a way to compute our values by traversing the network in reverse order, ensuring that each mathematical operation is completed correctly.

We are essentially doing the same thing by organising our values in a “bottom-up” manner, similar to backpropagation. This approach is known as a topological sort of a directed acyclic graph (DAG).

It ensures that each value in the network is visited in a depth-first order, meaning we explore as deeply as possible along one path before backtracking.

The general idea is we consistently change values so we finally match or closely match our hypothetical prediction to our actual prediction. Once we have done this we can effectively solve the prediction problem. Here is a general overview of what we will be doing:

The code for this will look very simple in comparison to the diagram above:

def backward(self):
 topo = []
 visited = set()
 def build_topo(v):
 if v not in visited:
 visited.add(v)
 for child in v._prev:
 build_topo(child)
 topo.append(v)
 build_topo(self)
 self.grad = 1.0
 for node in reversed(topo):
 node._backward()

This code is all we are going to need to implement backpropagation which will be the key part of the design. Now, let’s take a look at a practical application of this concept.

Predicting House Prices

Let’s suppose we have some data, i.e., a 1-bedroom house costs $100,000, a 2-bedroom house costs $200,000, and a 3-bedroom house costs $300,000. Now, we want to use this data to make predictions.

In general, machine learning means that our computer program or software makes predictions and provides outputs without being explicitly programmed to do so. In other words, we don’t write code that directly calculates the outcome — instead, the program learns from the data we provide and then makes predictions based on that learning. To begin, lets first normalize and pre-process the data we have.

Bedrooms Prices Normalised Prices
 Bedroom -----> $100,000 -----> 1.0
 Bedrooms -----> $200,000 -----> 2.0
 Bedrooms -----> $300,000 -----> 3.0

Now, let’s say we want to predict the price of a 5-bedroom house using a neural network. The goal is for the neural network to learn from the existing data (1-bedroom, 2-bedroom, and 3-bedroom prices) and use that learning to predict the price of a house with more bedrooms.

predicted_price = model(bedroom_5)
---
500,000

So, the first thing we need to do is create our Machine Learning Framework. Think of this as building a simplified version of a popular library like TensorFlow, PyTorch, or JAX. Let’s break down some key components we’ll need:

value: This is basically what we discussed previously. It is a gradient informed by how much our prediction contributes to our loss.
backward: This stores our actual backward function, which we use to propagate errors backward through the network and update our weights.
data: These are the actual values we’re working with, like housing prices.
prev: This keeps track of whether we’ve visited a particular node before.

class ML_Framework:
 def __init__(self, data, _children=()):
 self.data = data
 self.value = 0.0
 self._backward = lambda: None
 self._prev = set(_children)

In this class, data holds the actual data points (like house prices), value starts at zero and represents the gradient that will be updated as the model learns, _backward is initially set to a placeholder function but will be updated to perform the actual backward pass and _prev keeps track of the nodes we've already visited.

Once we have our framework in place, the next step is to define what our model will look like. In this case, we’re going to use a single-layer perceptron. I won’t dive into too much detail here, but the key point is that we’ll declare our bias and set some initial weights.

class SingleLayerNeuron:
 def __init__(self, num_of_inputs):
 self.weights = [ML_Framework(0.09) for _ in range(num_of_inputs)]
 self.bias = ML_Framework(-0.9)

 def weights_bias_parameters(self):
 return self.weights + [self.bias]

 def zero_value(self):
 for p in self.weights_bias_parameters():
 p.value = 0.0
 
 def __call__(self, x):
 cumulative_sum = self.bias
 for wi, xi in zip(self.weights, x):
 product = wi.times(xi)
 cumulative_sum = cumulative_sum.plus(product)
 return cumulative_sum

We can now train our model:

model = SingleLayerNeuron(1)
print("Initial weights:", [w.data for w in model.weights])
print("Initial bias:", model.bias.data)
learning_rate = 0.05
epochs = 100

for epoch in range(epochs):
 total_loss = 0
 for i in range(num_of_model_inputs):
 x_model_input = x_input_values[i]
 y_desired_output = y_output_values[i]
 
 model_prediction = model(x_model_input)
 loss = squared_error_loss(model_prediction, y_desired_output)
 model.zero_value()
 loss.backward()
 
 total_loss = total_loss + loss.data
 
 for weights_bias_parameters in model.weights_bias_parameters():
 weights_bias_parameters.data = weights_bias_parameters.data - (learning_rate * weights_bias_parameters.value)
 mean_squared_error = total_loss / num_of_model_inputs
 if epoch % 1 == 0:
 print(f"Epoch {epoch}, Loss: {mean_squared_error}")
----------------
output:
Initial weights: [0.09]
Initial bias: -0.9
Epoch 0, Loss: 3.2761
Epoch 1, Loss: 2.096704
Epoch 2, Loss: 1.3418905599999997
Epoch 3, Loss: 0.8588099583999995
Epoch 4, Loss: 0.5496383733759997
Epoch 5, Loss: 0.35176855896063985
Epoch 6, Loss: 0.2251318777348095
Epoch 7, Loss: 0.14408440175027803
Epoch 8, Loss: 0.09221401712017797
Epoch 9, Loss: 0.05901697095691388
Epoch 10, Loss: 0.03777086141242487
Epoch 11, Loss: 0.02417335130395193
Epoch 12, Loss: 0.015470944834529213
Epoch 13, Loss: 0.009901404694098687
Epoch 14, Loss: 0.006336899004223174
Epoch 15, Loss: 0.004055615362702837
Epoch 16, Loss: 0.0025955938321298136
Epoch 17, Loss: 0.0016611800525630788
Epoch 18, Loss: 0.0010631552336403704
Epoch 19, Loss: 0.0006804193495298324
Epoch 20, Loss: 0.0004354683836990928
Epoch 21, Loss: 0.0002786997655674194
Epoch 22, Loss: 0.00017836784996314722
Epoch 23, Loss: 0.00011415542397641612
Epoch 24, Loss: 7.305947134490555e-05
Epoch 25, Loss: 4.675806166074016e-05
Epoch 26, Loss: 2.9925159462873704e-05
Epoch 27, Loss: 1.915210205623956e-05
Epoch 28, Loss: 1.2257345315993629e-05
Epoch 29, Loss: 7.844701002235673e-06
Epoch 30, Loss: 5.020608641430632e-06

This gradual reduction in loss demonstrates how our model’s predictions are becoming more accurate as it continues to adjust the weights and bias based on the input data. Finally, once the training is complete, we can use our trained model to predict the price of a 5-bedroom house:

bedroom_5 = [ML_Framework(5)]
predicted_price = model(bedroom_5)
predicted_price_denormalized = predicted_price.data * 100000
print(f"Predicted price for a 10-bedroom house: ${predicted_price_denormalized:.2f}")
-------
Predicted price for a 5-bedroom house: $498000.00

As you can see, the predicted price is quite close to what we might expect, though not perfect. With more advanced models and techniques, the accuracy and applicability of predictions can be significantly enhanced. But the principles remain the same, i.e., through an iterative training process, a neural network learns what to predict and improves over time with adjustments.

Break-down

Let's break down the process a bit for those interested in the technical aspects of what is going on. At Epoch 0, the weight is initialized at 0.09, and the bias is set to -0.9. In the first training cycle, the model makes a prediction. This prediction is calculated by multiplying the number of bedrooms (in this case, 1) by the weight (0.09) and then adding the bias (-0.9). The formula can be thought of as:

The desired output — the price the model should ideally predict — is 100,000, which corresponds to the actual price of a 1-bedroom house. However, let's say the model predicts 80,000 instead. This prediction is far off from the target of 100,000.

To improve its predictions, the neural network adjusts its weights and bias based on the error. The process of determining how to make these adjustments is known as backpropagation.

During backpropagation, the model calculates how the loss (the error between the predicted and actual prices) would change if the weights and bias were adjusted slightly. This is where the gradient, or the "Value" we discussed earlier, comes into play.

The learning rate, set to 0.04 in this example, controls the size of the adjustment. The model uses this learning rate to ensure that it doesn't make overly large changes, which could destabilise the learning process.

The model then updates the weights and bias by subtracting the calculated gradient, helping it make more accurate predictions in subsequent cycles. By repeating this process over multiple epochs, the neural network gradually improves its ability to predict house prices more accurately.

Acknowledgment & Further Reading

The code for the model discussed in this article can be found here: https://github.com/seanjudelyons/Single_Layer_Perceptron. This example uses a DFS algorithm as an alternative to diving deep into the chain rule and is loosely based on the concept of a single-layered perceptron.

For those interested in the ideas here, you might want to explore “Learning Representations by Back-Propagating Errors” by Rumelhart, Hinton, and Williams (1986). Also see — “The Perceptron” (1957) by Frank Rosenblat, it is an interesting read.

This article has been largely inspired by the educational efforts in machine learning by Andrej Karpathy, Andrew Ng, and Laurence Moroney; I suggest checking them out. Their contributions to the field have been invaluable in making complex concepts accessible to learners worldwide.

Below is a screenshot of when I first came across the cost function of a neural network. We have come a long way, making such complex concepts accessible to learners worldwide.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

How Does AI Work? Create a Neural Network from Scratch

Author(s): Sean Jude Lyons

Predicting House Prices

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

LAI #66: Information Theory for People in a Hurry

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

🔥 Recommended Articles 🔥

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

How Does AI Work? Create a Neural Network from Scratch

Author(s): Sean Jude Lyons

Predicting House Prices

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥