Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take the GenAI Test: 25 Questions, 6 Topics. Free from Activeloop & Towards AI

Publication

How Does AI Work? Create a Neural Network from Scratch
Artificial Intelligence   Latest   Machine Learning

How Does AI Work? Create a Neural Network from Scratch

Last Updated on September 19, 2024 by Editorial Team

Author(s): Sean Jude Lyons

Originally published on Towards AI.

By the end of this article, you’ll be able to build your own model and Machine Learning library to make predictions.

Let's begin by writing a simple function and discussing it:

def parabola_function(x):
return 3*x**2 - 4*x+5

A parabolic function is just some function for which if we input x coordinates, it gives us a set of y coordinates which we can map to form a parabola, for eg, for this set of x coordinates.

x_point_list = np.arange(-5, 5, 0.25) # creates a list of points from -5 to 5 seperated by 0.25
print(x_point_list)
----
[-5. -4.75 -4.5 -4.25 -4. -3.75 -3.5 -3.25 -3. -2.75 -2.5 -2.25
-2. -1.75 -1.5 -1.25 -1. -0.75 -0.5 -0.25 0. 0.25 0.5 0.75
1. 1.25 1.5 1.75 2. 2.25 2.5 2.75 3. 3.25 3.5 3.75]

For example, if we give the following points to the parabolic_functionwe get:

x_point_list = np.arange(-5, 5, 0.25) # creates a list of points from -5 to 5 seperated by 0.25
y_point_list = parabola_function(x_point_list)
print(y_point_list)
plt.plot(x_point_list,y_point_list) # plotting our points
----
[100. 91.6875 83.75 76.1875 69. 62.1875 55.75 49.6875
44. 38.6875 33.75 29.1875 25. 21.1875 17.75 14.6875
12. 9.6875 7.75 6.1875 5. 4.1875 3.75 3.6875
4. 4.6875 5.75 7.1875 9. 11.1875 13.75 16.6875
20. 23.6875 27.75 32.1875 37. 42.1875 47.75 53.6875]

Finally, we can plot these points to visualize the parabolic curve:

Image by author.

Now, this is where machine learning comes into play. Specifically, when we’re interested in finding the lowest point along the curve above, i.e., the global minima. Visually, we can estimate that this point lies somewhere between 0 and 2 along the x axis.

But how would we identify this point if we didn’t have the luxury of a graph?

To tackle this, we introduce a small β€œnudge”, let us call this h:

h = 0.1
x = 3.0
parabola_function(x + h)
----
y = 21.430000000000007

As we can see above, by adding this h to x we can begin to explore how the function behaves and how we can gradually move toward that minimum point. We can continue nudging and trying different combinations, as seen below:

h = 0.1
x = -4.4
parabola_function(x + h)
----
y = 77.67000000000002

Let’s automate this process, given our goal to be nearest to zero. The way we can do that is by actually checking y’s position: new y position after nudge β€” old y position. Another way to represent this is as seen below:

Image by author.

To automate the process, we calculate the difference between the new y-value and the old one after each nudge, then divide by h to get the step size. This helps us move closer to the lowest point:

h = 0.0001
x = 2/3
(f(x+h) - f(x))/h
---
0.0002999999981767587

By repeating this process, we gradually bring the x-coordinate closer to the minimum point, which in this case is around x = 2/3.

Another way of representing the above is:

Image by author.

Finally we find a value of x and h which best identifies a point nearest to zero:

This intuition is fundamental in training a neural network, where the algorithm iteratively adjusts itself to minimize the error and make better predictions.

Think of this algorithm not just as a single number but as the result of a series of mathematical operations that have been performed to reach that final state we got to before.

Keeping this in mind, an example of how we can arrive at the parabolic_function can be seen below:

x = ML Framework(2.0) # let's say x is 2.0
y = 3 * x.times(x) # represents 3x^2
y = y.minus(4 * x) # represents 3x^2 - 4x
y = y.plus(5) # represents 3x^2 - 4x + 5

The magic here lies in how these operations are automated. A neural network essentially carries out these calculations on a massive scale, tweaking and adjusting itself (the weights and biases) to minimize error and improve predictions.

But how does the neural network compute and update these values? The key is in how we represent and perform the mathematical operations. For instance, we can assume x as our original input, w as the weight we assign to x, and b as our bias (or nudge):

Image by author.

The next and more obvious question is: how do we update these values? To do this, we need to evaluate how close our predicted values (the outputs of our network) are to the actual values (the target outputs).

We do this by making a hypothetical prediction using our current values then calculating the difference between this prediction and the actual target value. This difference is known as the β€œloss,” and it gives us a sense of how far off our predictions are.

To improve and address the discrepancy in loss, we need to go backward through the network and evaluate how the loss changes relating to each of our weights. This is where we can introduce a concept like Depth First Search (DFS) to further understand what we are going to do.

Source: Wiki Create Commons (link β€” /Depth-First-Search.gif?20090326120256)

DFS is a recursive exploration algorithm that can help us understand the process of backpropagation in neural networks. Backpropagation is a way to compute our values by traversing the network in reverse order, ensuring that each mathematical operation is completed correctly.

We are essentially doing the same thing by organising our values in a β€œbottom-up” manner, similar to backpropagation. This approach is known as a topological sort of a directed acyclic graph (DAG).

It ensures that each value in the network is visited in a depth-first order, meaning we explore as deeply as possible along one path before backtracking.

The general idea is we consistently change values so we finally match or closely match our hypothetical prediction to our actual prediction. Once we have done this we can effectively solve the prediction problem. Here is a general overview of what we will be doing:

Image by author.

The code for this will look very simple in comparison to the diagram above:

def backward(self):
topo = []
visited = set()
def build_topo(v):
if v not in visited:
visited.add(v)
for child in v._prev:
build_topo(child)
topo.append(v)
build_topo(self)
self.grad = 1.0
for node in reversed(topo):
node._backward()

This code is all we are going to need to implement backpropagation which will be the key part of the design. Now, let’s take a look at a practical application of this concept.

Predicting House Prices

Let’s suppose we have some data, i.e., a 1-bedroom house costs $100,000, a 2-bedroom house costs $200,000, and a 3-bedroom house costs $300,000. Now, we want to use this data to make predictions.

Image by author.

In general, machine learning means that our computer program or software makes predictions and provides outputs without being explicitly programmed to do so. In other words, we don’t write code that directly calculates the outcome β€” instead, the program learns from the data we provide and then makes predictions based on that learning. To begin, lets first normalize and pre-process the data we have.

Bedrooms Prices Normalised Prices
Bedroom -----> $100,000 -----> 1.0
Bedrooms -----> $200,000 -----> 2.0
Bedrooms -----> $300,000 -----> 3.0

Now, let’s say we want to predict the price of a 5-bedroom house using a neural network. The goal is for the neural network to learn from the existing data (1-bedroom, 2-bedroom, and 3-bedroom prices) and use that learning to predict the price of a house with more bedrooms.

predicted_price = model(bedroom_5)
---
500,000

So, the first thing we need to do is create our Machine Learning Framework. Think of this as building a simplified version of a popular library like TensorFlow, PyTorch, or JAX. Let’s break down some key components we’ll need:

  • value: This is basically what we discussed previously. It is a gradient informed by how much our prediction contributes to our loss.
  • backward: This stores our actual backward function, which we use to propagate errors backward through the network and update our weights.
  • data: These are the actual values we’re working with, like housing prices.
  • prev: This keeps track of whether we’ve visited a particular node before.
class ML_Framework:
def __init__(self, data, _children=()):
self.data = data
self.value = 0.0
self._backward = lambda: None
self._prev = set(_children)

In this class, data holds the actual data points (like house prices), value starts at zero and represents the gradient that will be updated as the model learns, _backward is initially set to a placeholder function but will be updated to perform the actual backward pass and _prev keeps track of the nodes we've already visited.

Once we have our framework in place, the next step is to define what our model will look like. In this case, we’re going to use a single-layer perceptron. I won’t dive into too much detail here, but the key point is that we’ll declare our bias and set some initial weights.

class SingleLayerNeuron:
def __init__(self, num_of_inputs):
self.weights = [ML_Framework(0.09) for _ in range(num_of_inputs)]
self.bias = ML_Framework(-0.9)

def weights_bias_parameters(self):
return self.weights + [self.bias]

def zero_value(self):
for p in self.weights_bias_parameters():
p.value = 0.0

def __call__(self, x):
cumulative_sum = self.bias
for wi, xi in zip(self.weights, x):
product = wi.times(xi)
cumulative_sum = cumulative_sum.plus(product)
return cumulative_sum

We can now train our model:

model = SingleLayerNeuron(1)
print("Initial weights:", [w.data for w in model.weights])
print("Initial bias:", model.bias.data)
learning_rate = 0.05
epochs = 100

for epoch in range(epochs):
total_loss = 0
for i in range(num_of_model_inputs):
x_model_input = x_input_values[i]
y_desired_output = y_output_values[i]

model_prediction = model(x_model_input)
loss = squared_error_loss(model_prediction, y_desired_output)
model.zero_value()
loss.backward()

total_loss = total_loss + loss.data

for weights_bias_parameters in model.weights_bias_parameters():
weights_bias_parameters.data = weights_bias_parameters.data - (learning_rate * weights_bias_parameters.value)
mean_squared_error = total_loss / num_of_model_inputs
if epoch % 1 == 0:
print(f"Epoch {epoch}, Loss: {mean_squared_error}")
----------------
output:
Initial weights: [0.09]
Initial bias: -0.9
Epoch 0, Loss: 3.2761
Epoch 1, Loss: 2.096704
Epoch 2, Loss: 1.3418905599999997
Epoch 3, Loss: 0.8588099583999995
Epoch 4, Loss: 0.5496383733759997
Epoch 5, Loss: 0.35176855896063985
Epoch 6, Loss: 0.2251318777348095
Epoch 7, Loss: 0.14408440175027803
Epoch 8, Loss: 0.09221401712017797
Epoch 9, Loss: 0.05901697095691388
Epoch 10, Loss: 0.03777086141242487
Epoch 11, Loss: 0.02417335130395193
Epoch 12, Loss: 0.015470944834529213
Epoch 13, Loss: 0.009901404694098687
Epoch 14, Loss: 0.006336899004223174
Epoch 15, Loss: 0.004055615362702837
Epoch 16, Loss: 0.0025955938321298136
Epoch 17, Loss: 0.0016611800525630788
Epoch 18, Loss: 0.0010631552336403704
Epoch 19, Loss: 0.0006804193495298324
Epoch 20, Loss: 0.0004354683836990928
Epoch 21, Loss: 0.0002786997655674194
Epoch 22, Loss: 0.00017836784996314722
Epoch 23, Loss: 0.00011415542397641612
Epoch 24, Loss: 7.305947134490555e-05
Epoch 25, Loss: 4.675806166074016e-05
Epoch 26, Loss: 2.9925159462873704e-05
Epoch 27, Loss: 1.915210205623956e-05
Epoch 28, Loss: 1.2257345315993629e-05
Epoch 29, Loss: 7.844701002235673e-06
Epoch 30, Loss: 5.020608641430632e-06

This gradual reduction in loss demonstrates how our model’s predictions are becoming more accurate as it continues to adjust the weights and bias based on the input data. Finally, once the training is complete, we can use our trained model to predict the price of a 5-bedroom house:

bedroom_5 = [ML_Framework(5)]
predicted_price = model(bedroom_5)
predicted_price_denormalized = predicted_price.data * 100000
print(f"Predicted price for a 10-bedroom house: ${predicted_price_denormalized:.2f}")
-------
Predicted price for a 5-bedroom house: $498000.00

As you can see, the predicted price is quite close to what we might expect, though not perfect. With more advanced models and techniques, the accuracy and applicability of predictions can be significantly enhanced. But the principles remain the same, i.e., through an iterative training process, a neural network learns what to predict and improves over time with adjustments.

Break-down

Let's break down the process a bit for those interested in the technical aspects of what is going on. At Epoch 0, the weight is initialized at 0.09, and the bias is set to -0.9. In the first training cycle, the model makes a prediction. This prediction is calculated by multiplying the number of bedrooms (in this case, 1) by the weight (0.09) and then adding the bias (-0.9). The formula can be thought of as:

Image by author.

The desired output β€” the price the model should ideally predict β€” is 100,000, which corresponds to the actual price of a 1-bedroom house. However, let's say the model predicts 80,000 instead. This prediction is far off from the target of 100,000.

To improve its predictions, the neural network adjusts its weights and bias based on the error. The process of determining how to make these adjustments is known as backpropagation.

During backpropagation, the model calculates how the loss (the error between the predicted and actual prices) would change if the weights and bias were adjusted slightly. This is where the gradient, or the "Value" we discussed earlier, comes into play.

The learning rate, set to 0.04 in this example, controls the size of the adjustment. The model uses this learning rate to ensure that it doesn't make overly large changes, which could destabilise the learning process.

The model then updates the weights and bias by subtracting the calculated gradient, helping it make more accurate predictions in subsequent cycles. By repeating this process over multiple epochs, the neural network gradually improves its ability to predict house prices more accurately.

Acknowledgment & Further Reading

The code for the model discussed in this article can be found here: https://github.com/seanjudelyons/Single_Layer_Perceptron. This example uses a DFS algorithm as an alternative to diving deep into the chain rule and is loosely based on the concept of a single-layered perceptron.

For those interested in the ideas here, you might want to explore β€œLearning Representations by Back-Propagating Errors” by Rumelhart, Hinton, and Williams (1986). Also see β€” β€œThe Perceptron” (1957) by Frank Rosenblat, it is an interesting read.

This article has been largely inspired by the educational efforts in machine learning by Andrej Karpathy, Andrew Ng, and Laurence Moroney; I suggest checking them out. Their contributions to the field have been invaluable in making complex concepts accessible to learners worldwide.

Below is a screenshot of when I first came across the cost function of a neural network. We have come a long way, making such complex concepts accessible to learners worldwide.

Image by author.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓