Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

The Normal equation: the calculus, the algebra and the code
Latest

The Normal equation: the calculus, the algebra and the code

Last Updated on February 4, 2023 by Editorial Team

Author(s): Menzliskander

Originally published on Towards AI.

The Normal Equation: The Calculus, the Algebra, and TheΒ Code

Photo by Antoine Dautry onΒ Unsplash

Introduction:

The normal equation is a closed-form solution used to solve linear regression problems. It allows us to directly compute the optimal parameters of the line (hyperplane) that best fits ourΒ data.

In this article, we’ll demonstrate the normal equation using a calculus approach and linear algebra one, then implement it in python but first, let’s recap linear regression.

Linear Regression:

let’s say we have data points xΒΉ,xΒ²,xΒ³,… where each point has kΒ features

image byΒ author

and each data point has a target valueΒ yα΅’.

the goal of linear regression is to find parameters ΞΈβ‚€, θ₁, ΞΈβ‚‚,…,ΞΈk that form a relation between each data point and it’s target valueΒ yα΅’

image byΒ author

so we’re trying to solve this system of equations:

image byΒ author

putting it all in matrix form, we get: XΞΈ=yΒ with:

image byΒ author

Now the problem is, in most cases, this system is not solvable. We can’t fit a straight line through theΒ data

image byΒ author

And this is where the normal equations will step in to find the best approximate solution, Practically the normal equation will find the parameter vector ΞΈ that solves the equation XΞΈ=yΜ‚ where yΜ‚ are as close as possible to our original targetΒ values.

and here is the normal equation:

image byΒ author

how did we get there?? Well, there are 2 ways to explainΒ it

Calculus:

As we said earlier we are trying to find the parameters ΞΈ so that our predictions yΜ‚ = XΞΈ is as close as possible to our original y.So we want to minimize the distance between them i.e., minimize ||y-yΜ‚|| and that’s the same as minimizing ||y-yΜ‚||Β² (view graphΒ below)

image byΒ author

now all we have to do is solve this minimization problem first, let’s expand itΒ :

note: XΞΈ and y are vectors, so we can change the order when weΒ multiply

now to find the minimum, we will derive with respect to ΞΈ and set it toΒ 0

image byΒ author

and that’s how we arrive at the normal equation. Now there is another approach that will get usΒ there.

Linear Algebra:

Again our equation is XΞΈ=y, knowing a bit of matrix multiplication, we know that the result of multiplying a vector by a matrix is the linear combination of the matrix column’s multiplied by the vector’s components, so we can write asΒ :

image byΒ author

so for this system to have a solution, y needs to be in the column space of X (noted C(X)). And since that’s usually not the case we have to settle for the next best thing which is solving it for the closest approximation of y inΒ C(X).

and that’s just the projection of y into C(X)Β !! (view imageΒ below)

image byΒ author

yΜ‚ is the projection of y unto C(X) so we can write as yΜ‚ =XΞΈΒ X^T

e = yβ€” yΜ‚ and since it’s orthogonal to C(X)Β , X^T multiplied by e is equal toΒ 0

now putting all this to together:

image byΒ author

as we can see we get the same exactΒ result!

Code:

Now implementing this in python is fairly straightforward

First, we’ll create someΒ data:

import numpy as np
import matplotlib.pyplot as plt

X=3*np.random.rand(100,1)
#generating the labels using the function y=2X+3+gaussian noise
Y=2*X+3+np.random.randn(100,1)
#displaying the data
plt.scatter(X,Y)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Random data(y=2X+3+gaussian noise)')
plt.show()
image byΒ author
#adding ones column for the bias term
X1 = np.c_[np.ones((100,1)),X]
#applying the normal equation:
theta = np.linalg.inv(X1.T.dot(X1)).dot(X1.T).dot(Y)
#we find that theta is equal to :array([[2.78609912],[2.03156946]))
#the actual function we used is y=3+2x+ gaussian noise
#so our approximation is pretty good

Now all that’s left is to use our ΞΈ parameters to make predictions:

Y_predict=X1.dot(theta_best)
plt.plot(X,Y,"b.")
plt.plot(X,Y_predict,"r-",label="predictions")
plt.legend()
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Random data(y=2X+3+gaussian noise)')
plt.show()
image byΒ author

Conclusion:

As we saw the normal equation is pretty straightforward and easy to use to directly get the optimal parameters however it is not commonly used on large datasets because it involves computing the inverse of the matrix which is computationally expensive (takes O(nΒ³) time complexity) that’s why an iterative approach like gradient descent is preferred


The Normal equation: the calculus, the algebra and the code was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓