How Does AI Work? Create a Neural Network from Scratch
Last Updated on September 19, 2024 by Editorial Team
Author(s): Sean Jude Lyons
Originally published on Towards AI.
By the end of this article, youβll be able to build your own model and Machine Learning library to make predictions.
Let's begin by writing a simple function and discussing it:
def parabola_function(x):
return 3*x**2 - 4*x+5
A parabolic function is just some function for which if we input x coordinates, it gives us a set of y coordinates which we can map to form a parabola, for eg, for this set of x coordinates.
x_point_list = np.arange(-5, 5, 0.25) # creates a list of points from -5 to 5 seperated by 0.25
print(x_point_list)
----
[-5. -4.75 -4.5 -4.25 -4. -3.75 -3.5 -3.25 -3. -2.75 -2.5 -2.25
-2. -1.75 -1.5 -1.25 -1. -0.75 -0.5 -0.25 0. 0.25 0.5 0.75
1. 1.25 1.5 1.75 2. 2.25 2.5 2.75 3. 3.25 3.5 3.75]
For example, if we give the following points to the parabolic_function
we get:
x_point_list = np.arange(-5, 5, 0.25) # creates a list of points from -5 to 5 seperated by 0.25
y_point_list = parabola_function(x_point_list)
print(y_point_list)
plt.plot(x_point_list,y_point_list) # plotting our points
----
[100. 91.6875 83.75 76.1875 69. 62.1875 55.75 49.6875
44. 38.6875 33.75 29.1875 25. 21.1875 17.75 14.6875
12. 9.6875 7.75 6.1875 5. 4.1875 3.75 3.6875
4. 4.6875 5.75 7.1875 9. 11.1875 13.75 16.6875
20. 23.6875 27.75 32.1875 37. 42.1875 47.75 53.6875]
Finally, we can plot these points to visualize the parabolic curve:
Now, this is where machine learning comes into play. Specifically, when weβre interested in finding the lowest point along the curve above, i.e., the global minima. Visually, we can estimate that this point lies somewhere between 0 and 2 along the x axis.
But how would we identify this point if we didnβt have the luxury of a graph?
To tackle this, we introduce a small βnudgeβ, let us call this h
:
h = 0.1
x = 3.0
parabola_function(x + h)
----
y = 21.430000000000007
As we can see above, by adding this h
to x
we can begin to explore how the function behaves and how we can gradually move toward that minimum point. We can continue nudging and trying different combinations, as seen below:
h = 0.1
x = -4.4
parabola_function(x + h)
----
y = 77.67000000000002
Letβs automate this process, given our goal to be nearest to zero. The way we can do that is by actually checking yβs position: new y position after nudge β old y position
. Another way to represent this is as seen below:
To automate the process, we calculate the difference between the new y-value and the old one after each nudge, then divide by h
to get the step size. This helps us move closer to the lowest point:
h = 0.0001
x = 2/3
(f(x+h) - f(x))/h
---
0.0002999999981767587
By repeating this process, we gradually bring the x-coordinate closer to the minimum point, which in this case is around x = 2/3
.
Another way of representing the above is:
Finally we find a value of x
and h
which best identifies a point nearest to zero:
This intuition is fundamental in training a neural network, where the algorithm iteratively adjusts itself to minimize the error and make better predictions.
Think of this algorithm not just as a single number but as the result of a series of mathematical operations that have been performed to reach that final state we got to before.
Keeping this in mind, an example of how we can arrive at the parabolic_function
can be seen below:
x = ML Framework(2.0) # let's say x is 2.0
y = 3 * x.times(x) # represents 3x^2
y = y.minus(4 * x) # represents 3x^2 - 4x
y = y.plus(5) # represents 3x^2 - 4x + 5
The magic here lies in how these operations are automated. A neural network essentially carries out these calculations on a massive scale, tweaking and adjusting itself (the weights and biases) to minimize error and improve predictions.
But how does the neural network compute and update these values? The key is in how we represent and perform the mathematical operations. For instance, we can assume x
as our original input, w
as the weight we assign to x
, and b
as our bias (or nudge):
The next and more obvious question is: how do we update these values? To do this, we need to evaluate how close our predicted values (the outputs of our network) are to the actual values (the target outputs).
We do this by making a hypothetical prediction using our current values then calculating the difference between this prediction and the actual target value. This difference is known as the βloss,β and it gives us a sense of how far off our predictions are.
To improve and address the discrepancy in loss, we need to go backward through the network and evaluate how the loss changes relating to each of our weights. This is where we can introduce a concept like Depth First Search (DFS) to further understand what we are going to do.
DFS is a recursive exploration algorithm that can help us understand the process of backpropagation in neural networks. Backpropagation is a way to compute our values by traversing the network in reverse order, ensuring that each mathematical operation is completed correctly.
We are essentially doing the same thing by organising our values in a βbottom-upβ manner, similar to backpropagation. This approach is known as a topological sort of a directed acyclic graph (DAG).
It ensures that each value in the network is visited in a depth-first order, meaning we explore as deeply as possible along one path before backtracking.
The general idea is we consistently change values so we finally match or closely match our hypothetical prediction to our actual prediction. Once we have done this we can effectively solve the prediction problem. Here is a general overview of what we will be doing:
The code for this will look very simple in comparison to the diagram above:
def backward(self):
topo = []
visited = set()
def build_topo(v):
if v not in visited:
visited.add(v)
for child in v._prev:
build_topo(child)
topo.append(v)
build_topo(self)
self.grad = 1.0
for node in reversed(topo):
node._backward()
This code is all we are going to need to implement backpropagation which will be the key part of the design. Now, letβs take a look at a practical application of this concept.
Predicting House Prices
Letβs suppose we have some data, i.e., a 1-bedroom house costs $100,000, a 2-bedroom house costs $200,000, and a 3-bedroom house costs $300,000. Now, we want to use this data to make predictions.
In general, machine learning means that our computer program or software makes predictions and provides outputs without being explicitly programmed to do so. In other words, we donβt write code that directly calculates the outcome β instead, the program learns from the data we provide and then makes predictions based on that learning. To begin, lets first normalize and pre-process the data we have.
Bedrooms Prices Normalised Prices
Bedroom -----> $100,000 -----> 1.0
Bedrooms -----> $200,000 -----> 2.0
Bedrooms -----> $300,000 -----> 3.0
Now, letβs say we want to predict the price of a 5-bedroom house using a neural network. The goal is for the neural network to learn from the existing data (1-bedroom, 2-bedroom, and 3-bedroom prices) and use that learning to predict the price of a house with more bedrooms.
predicted_price = model(bedroom_5)
---
500,000
So, the first thing we need to do is create our Machine Learning Framework. Think of this as building a simplified version of a popular library like TensorFlow, PyTorch, or JAX. Letβs break down some key components weβll need:
- value: This is basically what we discussed previously. It is a gradient informed by how much our prediction contributes to our loss.
- backward: This stores our actual backward function, which we use to propagate errors backward through the network and update our weights.
- data: These are the actual values weβre working with, like housing prices.
- prev: This keeps track of whether weβve visited a particular node before.
class ML_Framework:
def __init__(self, data, _children=()):
self.data = data
self.value = 0.0
self._backward = lambda: None
self._prev = set(_children)
In this class, data
holds the actual data points (like house prices), value
starts at zero and represents the gradient that will be updated as the model learns, _backward
is initially set to a placeholder function but will be updated to perform the actual backward pass and _prev
keeps track of the nodes we've already visited.
Once we have our framework in place, the next step is to define what our model will look like. In this case, weβre going to use a single-layer perceptron. I wonβt dive into too much detail here, but the key point is that weβll declare our bias and set some initial weights.
class SingleLayerNeuron:
def __init__(self, num_of_inputs):
self.weights = [ML_Framework(0.09) for _ in range(num_of_inputs)]
self.bias = ML_Framework(-0.9)
def weights_bias_parameters(self):
return self.weights + [self.bias]
def zero_value(self):
for p in self.weights_bias_parameters():
p.value = 0.0
def __call__(self, x):
cumulative_sum = self.bias
for wi, xi in zip(self.weights, x):
product = wi.times(xi)
cumulative_sum = cumulative_sum.plus(product)
return cumulative_sum
We can now train our model:
model = SingleLayerNeuron(1)
print("Initial weights:", [w.data for w in model.weights])
print("Initial bias:", model.bias.data)
learning_rate = 0.05
epochs = 100
for epoch in range(epochs):
total_loss = 0
for i in range(num_of_model_inputs):
x_model_input = x_input_values[i]
y_desired_output = y_output_values[i]
model_prediction = model(x_model_input)
loss = squared_error_loss(model_prediction, y_desired_output)
model.zero_value()
loss.backward()
total_loss = total_loss + loss.data
for weights_bias_parameters in model.weights_bias_parameters():
weights_bias_parameters.data = weights_bias_parameters.data - (learning_rate * weights_bias_parameters.value)
mean_squared_error = total_loss / num_of_model_inputs
if epoch % 1 == 0:
print(f"Epoch {epoch}, Loss: {mean_squared_error}")
----------------
output:
Initial weights: [0.09]
Initial bias: -0.9
Epoch 0, Loss: 3.2761
Epoch 1, Loss: 2.096704
Epoch 2, Loss: 1.3418905599999997
Epoch 3, Loss: 0.8588099583999995
Epoch 4, Loss: 0.5496383733759997
Epoch 5, Loss: 0.35176855896063985
Epoch 6, Loss: 0.2251318777348095
Epoch 7, Loss: 0.14408440175027803
Epoch 8, Loss: 0.09221401712017797
Epoch 9, Loss: 0.05901697095691388
Epoch 10, Loss: 0.03777086141242487
Epoch 11, Loss: 0.02417335130395193
Epoch 12, Loss: 0.015470944834529213
Epoch 13, Loss: 0.009901404694098687
Epoch 14, Loss: 0.006336899004223174
Epoch 15, Loss: 0.004055615362702837
Epoch 16, Loss: 0.0025955938321298136
Epoch 17, Loss: 0.0016611800525630788
Epoch 18, Loss: 0.0010631552336403704
Epoch 19, Loss: 0.0006804193495298324
Epoch 20, Loss: 0.0004354683836990928
Epoch 21, Loss: 0.0002786997655674194
Epoch 22, Loss: 0.00017836784996314722
Epoch 23, Loss: 0.00011415542397641612
Epoch 24, Loss: 7.305947134490555e-05
Epoch 25, Loss: 4.675806166074016e-05
Epoch 26, Loss: 2.9925159462873704e-05
Epoch 27, Loss: 1.915210205623956e-05
Epoch 28, Loss: 1.2257345315993629e-05
Epoch 29, Loss: 7.844701002235673e-06
Epoch 30, Loss: 5.020608641430632e-06
This gradual reduction in loss demonstrates how our modelβs predictions are becoming more accurate as it continues to adjust the weights and bias based on the input data. Finally, once the training is complete, we can use our trained model to predict the price of a 5-bedroom house:
bedroom_5 = [ML_Framework(5)]
predicted_price = model(bedroom_5)
predicted_price_denormalized = predicted_price.data * 100000
print(f"Predicted price for a 10-bedroom house: ${predicted_price_denormalized:.2f}")
-------
Predicted price for a 5-bedroom house: $498000.00
As you can see, the predicted price is quite close to what we might expect, though not perfect. With more advanced models and techniques, the accuracy and applicability of predictions can be significantly enhanced. But the principles remain the same, i.e., through an iterative training process, a neural network learns what to predict and improves over time with adjustments.
Break-down
Let's break down the process a bit for those interested in the technical aspects of what is going on. At Epoch 0
, the weight
is initialized at 0.09
, and the bias
is set to -0.9
. In the first training cycle, the model makes a prediction. This prediction is calculated by multiplying the number of bedrooms (in this case, 1
) by the weight
(0.09
) and then adding the bias
(-0.9
). The formula can be thought of as:
The desired output β the price the model should ideally predict β is 100,000
, which corresponds to the actual price of a 1-bedroom house. However, let's say the model predicts 80,000
instead. This prediction is far off from the target of 100,000
.
To improve its predictions, the neural network adjusts its weights
and bias
based on the error. The process of determining how to make these adjustments is known as backpropagation
.
During backpropagation
, the model calculates how the loss
(the error between the predicted and actual prices) would change if the weights
and bias
were adjusted slightly. This is where the gradient
, or the "Value" we discussed earlier, comes into play.
The learning rate
, set to 0.04
in this example, controls the size of the adjustment. The model uses this learning rate
to ensure that it doesn't make overly large changes, which could destabilise the learning process.
The model then updates the weights
and bias
by subtracting the calculated gradient
, helping it make more accurate predictions in subsequent cycles. By repeating this process over multiple epochs, the neural network gradually improves its ability to predict house prices more accurately.
Acknowledgment & Further Reading
The code for the model discussed in this article can be found here: https://github.com/seanjudelyons/Single_Layer_Perceptron. This example uses a DFS algorithm as an alternative to diving deep into the chain rule and is loosely based on the concept of a single-layered perceptron.
For those interested in the ideas here, you might want to explore βLearning Representations by Back-Propagating Errorsβ by Rumelhart, Hinton, and Williams (1986). Also see β βThe Perceptronβ (1957) by Frank Rosenblat, it is an interesting read.
This article has been largely inspired by the educational efforts in machine learning by Andrej Karpathy, Andrew Ng, and Laurence Moroney; I suggest checking them out. Their contributions to the field have been invaluable in making complex concepts accessible to learners worldwide.
Below is a screenshot of when I first came across the cost function of a neural network. We have come a long way, making such complex concepts accessible to learners worldwide.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI