# How Does AI Work? Create a Neural Network from Scratch

Last Updated on September 19, 2024 by Editorial Team

**Author(s): Sean Jude Lyons**

Originally published on Towards AI.

By the end of this article, youβll be able to build your own model and Machine Learning library to make predictions.

Let's begin by writing a simple function and discussing it:

`def parabola_function(x):`

return 3*x**2 - 4*x+5

A parabolic function is just some function for which if we input x coordinates, it gives us a set of y coordinates which we can map to form a parabola, for eg, for this set of x coordinates.

`x_point_list = np.arange(-5, 5, 0.25) # creates a list of points from -5 to 5 seperated by 0.25`

print(x_point_list)

----

[-5. -4.75 -4.5 -4.25 -4. -3.75 -3.5 -3.25 -3. -2.75 -2.5 -2.25

-2. -1.75 -1.5 -1.25 -1. -0.75 -0.5 -0.25 0. 0.25 0.5 0.75

1. 1.25 1.5 1.75 2. 2.25 2.5 2.75 3. 3.25 3.5 3.75]

For example, if we give the following points to the `parabolic_function`

we get:

`x_point_list = np.arange(-5, 5, 0.25) # creates a list of points from -5 to 5 seperated by 0.25`

y_point_list = parabola_function(x_point_list)

print(y_point_list)

plt.plot(x_point_list,y_point_list) # plotting our points

----

[100. 91.6875 83.75 76.1875 69. 62.1875 55.75 49.6875

44. 38.6875 33.75 29.1875 25. 21.1875 17.75 14.6875

12. 9.6875 7.75 6.1875 5. 4.1875 3.75 3.6875

4. 4.6875 5.75 7.1875 9. 11.1875 13.75 16.6875

20. 23.6875 27.75 32.1875 37. 42.1875 47.75 53.6875]

Finally, we can plot these points to visualize the parabolic curve:

Now, this is where machine learning comes into play. Specifically, when weβre interested in finding the lowest point along the curve above, i.e., the global minima. Visually, we can estimate that this point lies somewhere between 0 and 2 along the x axis.

But how would we identify this point if we didnβt have the luxury of a graph?

To tackle this, we introduce a small βnudgeβ, let us call this `h`

:

`h = 0.1`

x = 3.0

parabola_function(x + h)

----

y = 21.430000000000007

As we can see above, by adding this `h`

to `x`

we can begin to explore how the function behaves and how we can gradually move toward that minimum point. We can continue nudging and trying different combinations, as seen below:

`h = 0.1`

x = -4.4

parabola_function(x + h)

----

y = 77.67000000000002

Letβs automate this process, given our goal to be nearest to zero. The way we can do that is by actually checking yβs position: `new y position after nudge β old y position`

. Another way to represent this is as seen below:

To automate the process, we calculate the difference between the new y-value and the old one after each nudge, then divide by `h`

to get the step size. This helps us move closer to the lowest point:

`h = 0.0001`

x = 2/3

(f(x+h) - f(x))/h

---

0.0002999999981767587

By repeating this process, we gradually bring the x-coordinate closer to the minimum point, which in this case is around `x = 2/3`

.

Another way of representing the above is:

Finally we find a value of `x`

and `h`

which best identifies a point nearest to zero:

This intuition is fundamental in training a neural network, where the algorithm iteratively adjusts itself to minimize the error and make better predictions.

Think of this algorithm not just as a single number but as the result of a series of mathematical operations that have been performed to reach that final state we got to before.

Keeping this in mind, an example of how we can arrive at the `parabolic_function`

can be seen below:

`x = ML Framework(2.0) # let's say x is 2.0`

y = 3 * x.times(x) # represents 3x^2

y = y.minus(4 * x) # represents 3x^2 - 4x

y = y.plus(5) # represents 3x^2 - 4x + 5

The magic here lies in how these operations are automated. A neural network essentially carries out these calculations on a massive scale, tweaking and adjusting itself (the weights and biases) to minimize error and improve predictions.

But how does the neural network compute and update these values? The key is in how we represent and perform the mathematical operations. For instance, we can assume `x`

as our original input, `w`

as the weight we assign to `x`

, and `b`

as our bias (or nudge):

The next and more obvious question is: how do we update these values? To do this, we need to evaluate how close our predicted values (the outputs of our network) are to the actual values (the target outputs).

We do this by making a hypothetical prediction using our current values then calculating the difference between this prediction and the actual target value. This difference is known as the βloss,β and it gives us a sense of how far off our predictions are.

To improve and address the discrepancy in loss, we need to go backward through the network and evaluate how the loss changes relating to each of our weights. This is where we can introduce a concept like Depth First Search (DFS) to further understand what we are going to do.

DFS is a recursive exploration algorithm that can help us understand the process of backpropagation in neural networks. Backpropagation is a way to compute our values by traversing the network in reverse order, ensuring that each mathematical operation is completed correctly.

We are essentially doing the same thing by organising our values in a βbottom-upβ manner, similar to backpropagation. This approach is known as a topological sort of a directed acyclic graph (DAG).

It ensures that each value in the network is visited in a depth-first order, meaning we explore as deeply as possible along one path before backtracking.

The general idea is we consistently change values so we finally match or closely match our hypothetical prediction to our actual prediction. Once we have done this we can effectively solve the prediction problem. Here is a general overview of what we will be doing:

The code for this will look very simple in comparison to the diagram above:

`def backward(self):`

topo = []

visited = set()

def build_topo(v):

if v not in visited:

visited.add(v)

for child in v._prev:

build_topo(child)

topo.append(v)

build_topo(self)

self.grad = 1.0

for node in reversed(topo):

node._backward()

This code is all we are going to need to implement backpropagation which will be the key part of the design. Now, letβs take a look at a practical application of this concept.

## Predicting House Prices

Letβs suppose we have some data, i.e., a 1-bedroom house costs $100,000, a 2-bedroom house costs $200,000, and a 3-bedroom house costs $300,000. Now, we want to use this data to make predictions.

In general, machine learning means that our computer program or software makes predictions and provides outputs without being explicitly programmed to do so. In other words, we donβt write code that directly calculates the outcome β instead, the program learns from the data we provide and then makes predictions based on that learning. To begin, lets first normalize and pre-process the data we have.

`Bedrooms Prices Normalised Prices`

Bedroom -----> $100,000 -----> 1.0

Bedrooms -----> $200,000 -----> 2.0

Bedrooms -----> $300,000 -----> 3.0

Now, letβs say we want to predict the price of a 5-bedroom house using a neural network. The goal is for the neural network to learn from the existing data (1-bedroom, 2-bedroom, and 3-bedroom prices) and use that learning to predict the price of a house with more bedrooms.

`predicted_price = model(bedroom_5)`

---

500,000

So, the first thing we need to do is create our Machine Learning Framework. Think of this as building a simplified version of a popular library like TensorFlow, PyTorch, or JAX. Letβs break down some key components weβll need:

**value**: This is basically what we discussed previously. It is a gradient informed by how much our prediction contributes to our loss.**backward**: This stores our actual backward function, which we use to propagate errors backward through the network and update our weights.**data**: These are the actual values weβre working with, like housing prices.**prev**: This keeps track of whether weβve visited a particular node before.

`class ML_Framework:`

def __init__(self, data, _children=()):

self.data = data

self.value = 0.0

self._backward = lambda: None

self._prev = set(_children)

In this class, `data`

holds the actual data points (like house prices), `value`

starts at zero and represents the gradient that will be updated as the model learns, `_backward`

is initially set to a placeholder function but will be updated to perform the actual backward pass and `_prev`

keeps track of the nodes we've already visited.

Once we have our framework in place, the next step is to define what our model will look like. In this case, weβre going to use a single-layer perceptron. I wonβt dive into too much detail here, but the key point is that weβll declare our bias and set some initial weights.

`class SingleLayerNeuron:`

def __init__(self, num_of_inputs):

self.weights = [ML_Framework(0.09) for _ in range(num_of_inputs)]

self.bias = ML_Framework(-0.9)

def weights_bias_parameters(self):

return self.weights + [self.bias]

def zero_value(self):

for p in self.weights_bias_parameters():

p.value = 0.0

def __call__(self, x):

cumulative_sum = self.bias

for wi, xi in zip(self.weights, x):

product = wi.times(xi)

cumulative_sum = cumulative_sum.plus(product)

return cumulative_sum

We can now train our model:

`model = SingleLayerNeuron(1)`

print("Initial weights:", [w.data for w in model.weights])

print("Initial bias:", model.bias.data)

learning_rate = 0.05

epochs = 100

for epoch in range(epochs):

total_loss = 0

for i in range(num_of_model_inputs):

x_model_input = x_input_values[i]

y_desired_output = y_output_values[i]

model_prediction = model(x_model_input)

loss = squared_error_loss(model_prediction, y_desired_output)

model.zero_value()

loss.backward()

total_loss = total_loss + loss.data

for weights_bias_parameters in model.weights_bias_parameters():

weights_bias_parameters.data = weights_bias_parameters.data - (learning_rate * weights_bias_parameters.value)

mean_squared_error = total_loss / num_of_model_inputs

if epoch % 1 == 0:

print(f"Epoch {epoch}, Loss: {mean_squared_error}")

----------------

output:

Initial weights: [0.09]

Initial bias: -0.9

Epoch 0, Loss: 3.2761

Epoch 1, Loss: 2.096704

Epoch 2, Loss: 1.3418905599999997

Epoch 3, Loss: 0.8588099583999995

Epoch 4, Loss: 0.5496383733759997

Epoch 5, Loss: 0.35176855896063985

Epoch 6, Loss: 0.2251318777348095

Epoch 7, Loss: 0.14408440175027803

Epoch 8, Loss: 0.09221401712017797

Epoch 9, Loss: 0.05901697095691388

Epoch 10, Loss: 0.03777086141242487

Epoch 11, Loss: 0.02417335130395193

Epoch 12, Loss: 0.015470944834529213

Epoch 13, Loss: 0.009901404694098687

Epoch 14, Loss: 0.006336899004223174

Epoch 15, Loss: 0.004055615362702837

Epoch 16, Loss: 0.0025955938321298136

Epoch 17, Loss: 0.0016611800525630788

Epoch 18, Loss: 0.0010631552336403704

Epoch 19, Loss: 0.0006804193495298324

Epoch 20, Loss: 0.0004354683836990928

Epoch 21, Loss: 0.0002786997655674194

Epoch 22, Loss: 0.00017836784996314722

Epoch 23, Loss: 0.00011415542397641612

Epoch 24, Loss: 7.305947134490555e-05

Epoch 25, Loss: 4.675806166074016e-05

Epoch 26, Loss: 2.9925159462873704e-05

Epoch 27, Loss: 1.915210205623956e-05

Epoch 28, Loss: 1.2257345315993629e-05

Epoch 29, Loss: 7.844701002235673e-06

Epoch 30, Loss: 5.020608641430632e-06

This gradual reduction in loss demonstrates how our modelβs predictions are becoming more accurate as it continues to adjust the weights and bias based on the input data. Finally, once the training is complete, we can use our trained model to predict the price of a 5-bedroom house:

`bedroom_5 = [ML_Framework(5)]`

predicted_price = model(bedroom_5)

predicted_price_denormalized = predicted_price.data * 100000

print(f"Predicted price for a 10-bedroom house: ${predicted_price_denormalized:.2f}")

-------

Predicted price for a 5-bedroom house: $498000.00

As you can see, the predicted price is quite close to what we might expect, though not perfect. With more advanced models and techniques, the accuracy and applicability of predictions can be significantly enhanced. But the principles remain the same, i.e., through an iterative training process, a neural network learns what to predict and improves over time with adjustments.

**Break-down**

Let's break down the process a bit for those interested in the technical aspects of what is going on. At `Epoch 0`

, the `weight`

is initialized at `0.09`

, and the `bias`

is set to `-0.9`

. In the first training cycle, the model makes a prediction. This prediction is calculated by multiplying the number of bedrooms (in this case, `1`

) by the `weight`

(`0.09`

) and then adding the `bias`

(`-0.9`

). The formula can be thought of as:

The desired output β the price the model should ideally predict β is `100,000`

, which corresponds to the actual price of a 1-bedroom house. However, let's say the model predicts `80,000`

instead. This prediction is far off from the target of `100,000`

.

To improve its predictions, the neural network adjusts its `weights`

and `bias`

based on the error. The process of determining how to make these adjustments is known as `backpropagation`

.

During `backpropagation`

, the model calculates how the `loss`

(the error between the predicted and actual prices) would change if the `weights`

and `bias`

were adjusted slightly. This is where the `gradient`

, or the "Value" we discussed earlier, comes into play.

The `learning rate`

, set to `0.04`

in this example, controls the size of the adjustment. The model uses this `learning rate`

to ensure that it doesn't make overly large changes, which could destabilise the learning process.

The model then updates the `weights`

and `bias`

by subtracting the calculated `gradient`

, helping it make more accurate predictions in subsequent cycles. By repeating this process over multiple epochs, the neural network gradually improves its ability to predict house prices more accurately.

**Acknowledgment & Further Reading**

The code for the model discussed in this article can be found here: https://github.com/seanjudelyons/Single_Layer_Perceptron. This example uses a DFS algorithm as an alternative to diving deep into the chain rule and is loosely based on the concept of a single-layered perceptron.

For those interested in the ideas here, you might want to explore βLearning Representations by Back-Propagating Errorsβ by Rumelhart, Hinton, and Williams (1986). Also see β βThe Perceptronβ (1957) by Frank Rosenblat, it is an interesting read.

This article has been largely inspired by the educational efforts in machine learning by Andrej Karpathy, Andrew Ng, and Laurence Moroney; I suggest checking them out. Their contributions to the field have been invaluable in making complex concepts accessible to learners worldwide.

Below is a screenshot of when I first came across the cost function of a neural network. We have come a long way, making such complex concepts accessible to learners worldwide.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI