Mastering Derivatives for Machine Learning
Last Updated on February 22, 2023 by Editorial Team
Author(s): Towards AI Editorial Team
Originally published on Towards AI.
Understanding the building blocks of machineΒ learning
Photo by Michael Dziedzic onΒ Unsplash
Author(s): PratikΒ Shukla
βEducation is the movement from darkness to lightββββAllanΒ Bloom
Table of Contents:
- The Slope of a straightΒ line
- The rise of derivatives
- Definition of derivative
- Using the definition of derivative
- The First Derivative Test
- Working Example of the First Derivative Test
- The Second Derivative Test
- The Inflection Point
- Working Example of the Second Derivative Test
- Checking for Inflection Point
- Working Example when fββ(X)=0, and X is an Inflection Point
- Why do we need the First Derivative Test?
- Working Example when fββ(X)=0, and X is not an Inflection Point
- Flowchart for the First Derivative Test
- Flowchart for the Second Derivative Test
- Flowchart for the Inflection PointΒ Test
- Resources and References
If youβre interested in machine learning, itβs likely that youβve come across the term βderivativeβ before. Derivatives are a fundamental concept in calculus, and they play a crucial role in many machine-learning algorithms. Put simply, a derivative measures the rate of change of a function at a particular point. This information can be used to optimize functions, find local minima and maxima, and more. In this blog, weβll dive into the world of derivatives and explore some of the key concepts that are relevant to machine learning. Specifically, weβll focus on the first derivative test, the second derivative test, and the inflection testβββall powerful tools for analyzing the behavior of a function at different points. By the end of this blog, youβll have a solid understanding of how these tests work and how they can be used to improve your machine-learning models.
The Slope of a StraightΒ Line:
A slope is something that helps us measure the rate of change of a line. The slope of a straight line can be positive, negative, zero, or undefined.
We all know that the slope of a straight line is given by the ratio of change in y to change in x. In other words, we can also say that the slope of a straight line is given by the rise over the run. Note that the slope of a straight line is always constant. We can put it into the mathematical form using the following formula.
Figureβββ1: Equation of the slope of a straightΒ lineFigureβββ2: Slope of a straightΒ line
The Rise of Derivatives:
In the 17th century, Isaac Newton and Gottfried Leibniz thought that the concept of slope can be applied to curves as well. They thought that instead of having a constant rate of change, weβll have a variable rate of change in the case of curves. This is how the method of calculating the derivatives was born. Now, letβs see how this methodΒ works.
1. Stepβββ1:
In the below image, letβs say we want to find the slope at point P. To find the slope at point P, we will need to find the slope of the tangent line at pointΒ P.
Figureβββ3: Finding the slope of the curve at pointΒ P
2. Stepβββ2:
But, as of now we only know how to find the slope of a straight line. To do that, letβs find another point on the curve and calculate the slope of the line between these twoΒ points.
Figureβββ4: Finding the slope of the straight lineΒ PQ
3. Stepβββ3:
But here we can see that the above line does not represent the slope exactly at point P. Now, to find the slope exactly at point P, letβs find another point on the curve which is closer toΒ P.
Figureβββ5: Finding more straight lines on theΒ curve
4. Stepβββ4:
From the above image we can say that as we move the second point closer and closer, we are approaching our goal of finding the tangent line to point P. So, based on that, we can say that we just need to minimize the distance between the two points and keep it as close as 0. Letβs see how calculus can help us withΒ this.
Letβs say the distance between the two points, P and Q is h. Now, our goal is to minimize the distance and keep it as close as 0. Here, we will use the concept ofΒ limits.
Figureβββ6: Finding the slope of the straight lineΒ PQ
Definition of Derivatives:
Derivatives are the essence of calculus. Basically, derivatives represent the instantaneous rate of change of a function with respect to one of its variables. Geometrically, a derivative is the slope of a tangent line of a curve at a point which signifies the rate of change at a particular point.
Mathematical Definition of Derivative:
Figureβββ7: Mathematical definition of derivative
Using the Definition of Derivative:
Now, we know that we can find the derivative of any function using the following formula.
Figureβββ8: Mathematical Definition of Derivatives
Letβs take an example to understand how we can actually find the derivative using the above-given formula.
Figureβββ9: Calculating Derivatives Using the Mathematical Definition of Derivatives
The First Derivative Test:
We use the first derivative test to check whether a function is increasing or decreasing in its domain. We can also use this test to identify its local maxima andΒ minima.
The first derivative of a function is the slope of the tangent line to the graph of a function at a given point. We can think of the first derivative as the slope of the function. When the slope is positive, the graph is increasing. When the slope is negative, the graph is decreasing. When the slope is 0, those points will be local maxima or minima. These points are called critical points. The first derivative test involves testing the behavior of the function around these points to determine whether or not they are local maxima orΒ minima.
The first derivative test is based on the fact that the sign of the first derivative does not change between critical points. Thus, if we find the critical points of a function, we can test points within the intervals between critical points to determine whether the function is increasing or decreasing over those intervals. Then, by determining whether the function is increasing or decreasing before and after a critical point, we can identify whether the point is a minimum, maximum, orΒ neither.
Letβs say we have a function f(X) which is plotted in the belowΒ image.
Figureβββ10: A Graph Showing Maximum and MinimumΒ Points
Now, we can say the following things about the function.
- Function f has a relative minimum at x=m if for all the points p near m, f(m)<f(p).
- Function f has a relative maximum at x=m if for all the points p near m, f(m)>f(p).
The First Derivative Test:
Letβs say f is a differentiable function with fβ(m) = 0. Based on this, we can derive the following conclusions.
- If fβ(x) changes from positive to negative at x=m, then f has a local maximum atΒ m.
- If fβ(x) changes from negative to positive at x=m, then f has a local minimum atΒ m.
Working Example of the First Derivative Test
Example: Find the relative extrema for f(X) = 2XΒ³βββ3XΒ²βββ12X.
1. Stepβββ1:
Our function f(X) is givenΒ byβ¦
Figureβββ11: FunctionΒ f(X)
2. Stepβββ2:
To find the relative extrema, we need to find the point(s) where the function's first derivative isΒ 0.
Figureβββ12: Finding the Critical Points of the FunctionΒ f(X)
3. Stepβββ3:
Finding the points where the first derivative of the function (fβ(X)) isΒ 0.
Figureβββ13: Finding Points WhereΒ fβ(X)=0
4. Stepβββ4:
Now, we have two points where we will have relative maxima or minima. To find whether we have minima or maxima, we will need to check the points around these critical points. To do that, letβs find the intervals around these criticalΒ points.
Figureβββ14: Finding Intervals based on the CriticalΒ Points
5. Stepβββ5:
Next, we will choose a point from each of the intervals and find the value of fβ(X) for that chosen point. Here we will also note the sign of fβ(X) for the chosen value of X in each interval. If the sign of fβ(X) is positive (+), then it means that the function is increasing in that interval. On the other hand, if the sign of fβ(X) is negative (-), then it means that the function is increasing in that interval.
The following table shows the required calculations.
Figureβββ15: Calculating Whether f(X) is Increasing or Decreasing in the Intervals
6. Stepβββ6:
Based on the above table, we can say that we will have local maxima at point X = -1 as the value of fβ(X) changes from positive to negative around it. On the other hand, we will have local minima at point X=2 as the value of fβ(X) changes from negative to positive around it. Please note that the function does not change its value from positive to negative or negative to positive anywhere other than these criticalΒ points.
Figureβββ16: The Graph of the Function f(X)= 2XΒ³-3XΒ²-12X
The Second Derivative Test
We use the second derivative test to determine whether a critical point(s) of a function is a local minimum or maximum. We know that the first derivative is defined as the rate of change of the function, and itβs given by the slope of the tangential line at a given point on the curve. In the same way, the second derivative is the rate of change of the first derivative, and itβs also known as the concavity ofΒ f(X).
Before we dive into the second derivative test, letβs first understand the meaning of an inflection point.
The Inflection Point:
An inflection point is defined as a point on a curve at which the sign of the concavity changes. Inflection points cannot be local maxima or local minima. In the following image, we can see that for the function f(X) = XΒ³, X = 0 is an inflection point. Also, we can see that in the interval (-inf,0), the function is concave and, in the interval, (0, inf), the function isΒ convex.
Figureβββ17: An Inflection Point
The Second Derivative Test:
Letβs say f(X) is a function such that fβ(X) and fββ(X) can be defined for this function. Now, we can find all the critical points at fβ(X) = 0. Next, we need to find the second derivative of the function at the critical points. The second derivative test is defined asΒ follows:
- If fββ(X) > 0, then f(X) has a local minimum atΒ X.
- If fββ(X)<0, then f(X) has a local maximum atΒ X.
- If fββ(X) = 0, then we can say that the test is inconclusive. Now, it is possible that the point is an inflection point if the sign of fββ(X) changes from negative to positive or positive to negative around this point. But, if the sign of fββ(X) does not change, we can say that it is not an inflection point.
Steps Involved in Finding the Second Derivative:
- We have the functionΒ f(X).
- Find the critical points of the function by finding the points where the first derivative fβ(X)=0.
- Find the second derivative of the function f(X) at all the criticalΒ points.
- If fββ(X)>0, then f(X) has a local minimum at X, and f(X) is a convex function in that interval.
- If fββ(X)<0, then f(X) has a local maximum at X, and f(X) is a concave function in that interval.
- If fββ(X)=0, then the second derivative test is inconclusive.
- Check for inflection points.
Working Example of the Second Derivative Test
Example: Find the relative extrema for f(X) = 5XΒ³βββ3Xβ΅
1. Stepβββ1:
Our function f(X) is givenΒ byβ¦
Figureβββ18: FunctionΒ f(x)
2. Stepβββ2:
To find the relative extrema(s), we need to find the critical points. Critical points are the points where the first derivative of the function fβ(X) =Β 0.
Figureβββ19: Finding the Critical Points of the FunctionΒ f(X)
3. Stepβββ3:
Finding the points where the first derivative of the function (fβ(X)) isΒ 0.
Figureβββ20: Finding Points WhereΒ fβ(X)=0
4. Stepβββ4:
Now, we have three points where we might have relative maxima or minima. Next, we will find the second derivative fββ(X) of the function f(X) at these criticalΒ points.
Figureβββ21: Finding the Second derivative
So, according to the second derivative test, we can say that we will have a local minimum at x=-1 and a local maximum atΒ x=1.
5. Stepβββ5:
Now, we know that at X=0, the test is inconclusive. So, now letβs find whether X=0 is an inflection point or not. To do that, we need to test the concavity of f(X) before and after the inflection point by selecting a point from an appropriate interval. Here we know that the critical points are -1, 0, and 1. So, we need to choose a value from (-1,0) and (0,1). Letβs choose X=-0.5 and X=0.5 and see how itΒ goes.
Figureβββ22: Finding an inflection point
Here, we can see that fββ(X) changes the sign around the point 0. So, we can say that 0 is an inflection point. Other than that, based on the results, we can say that the function is concave in the interval (-1,0) and convex in the intervalΒ (0,1).
Figureβββ23: Monitoring theΒ graph
Confusing Terms:
- Concave Up =Β Convex
- Concave Down =Β Concave
Checking for Inflection Point:
If fββ(X) =0 then there are two possibilities.
- X is an inflection pointββββ End of theΒ story
- X is not an inflection pointββββ The story continues
Steps to check whether a critical point is an inflection point orΒ not?
- We know that fββ(X)=0.
- Find appropriate intervals around critical pointΒ X.
- Find the values of fββ(X) in these intervals.
- If the sign of fββ(X) changes from negative (-) to positive (+) or positive (+) to negative (-) then it is an inflection point. If the fββ(X) sign does not change in these intervals, then it is not an inflection point.
Working Example when fββ(X)=0, and X is an Inflection Point:
1. Stepβββ1:
Our function f(X) is givenΒ byβ¦
Figureβββ24: FunctionΒ f(x)
2. Stepβββ2:
To find the relative extrema(s), we need to find the critical points. Critical points are the points where the first derivative of the function fβ(X) =Β 0.
Figureβββ25: Finding the Critical Points of the FunctionΒ fβ(X)
3. Stepβββ3:
Finding the points where the first derivative of the function (fβ(X)) isΒ 0.
Figureβββ26: Finding Points WhereΒ fβ(X)=0
4. Stepβββ4:
Next, we have only one point where we might have local maxima or minima. Letβs find the second derivative fββ(X) of the function f(X) at the criticalΒ point.
Figureβββ27: Finding the Second derivative
5. Stepβββ5:
Since the second derivative at the critical point is 0, we know that at X=0, the test is inconclusive for the concavity of the function. So, now letβs find whether X=0 is an inflection point or not. To do that, we need to test the concavity of f(X) before and after the critical point by selecting a value from an appropriate interval. Here we know that the critical point is 0. So, we need to choose a value from (-β,0) and (0, β). Letβs choose X=-1 and X=1 and see how itΒ goes.
Figureβββ28: Finding an inflection point
Here, we can see that fββ(X) changes the sign around the point 0. So, we can say that 0 is an inflection point. Other than that, based on the results, we can say that the function is concave in the interval (-β,0) and convex in the interval (0,Β β).
Figureβββ29: Monitoring the changes in theΒ function
Why Do We Need the First Derivative Test?
Cases When the Second Derivative Test Does NotΒ Work:
- When fβ(X) = 0 and fββ(X) =Β 0.
- When fβ(X) = 0 and fββ(X) is notΒ defined.
- When fβ(X) is notΒ defined.
In the above cases, we need to use the first derivative test to find out whether the critical point is at a local minimum or at a localΒ maximum.
An example is when the Second Derivative(fββ(X)) is 0 and X is not an Inflection Point:
1. Stepβββ1:
Our function f(X) is givenΒ byβ¦
Figureβββ30: FunctionΒ f(x)
2. Stepβββ2:
To find the relative extrema(s), we need to find the critical points. Critical points are the points where the first derivative of the function fβ(X) =Β 0.
Figureβββ31: Finding the Critical Points of the FunctionΒ fβ(X)
3. Stepβββ3:
Finding the points where the first derivative of the function (fβ(X)) isΒ 0.
Figureβββ32: Finding Points WhereΒ fβ(X)=0
4. Stepβββ4:
Next, we have only one point where we might have local maxima or minima. Letβs find the second derivative fββ(X) of the function f(X) at the criticalΒ point.
Figureβββ33: Finding the Second derivative
5. Stepβββ5:
Since the second derivative at the critical point is 0, we know that at X=0, the test is inconclusive for the concavity of the function. So, now letβs find whether X=0 is an inflection point or not. To do that, we need to test the concavity of f(X) before and after the critical point by selecting a value from an appropriate interval. Here we know that the critical point is 0. So, we need to choose a value from (-β,0) and (0,β). Letβs choose X=-1 and X=1, and see how itΒ goes.
Figureβββ34: Finding an inflection point
Here we can see that fββ(X) does not change the sign around the point 0. So, we can confidently say that 0 is not an inflection point.
Figureβββ35: Monitoring the graph on various intervals
Now what?
Now we know that 0 is not an inflection point for f(X) = Xβ΄. If it is not an inflection point, then it must be either local minima or local maxima. Since the 2nd derivative test failed to determine this, we will apply the 1st derivative testΒ here.
6. Stepβββ6:
Next, we will choose a point from each of the intervals ((-β,0) and (0,β)) and find the value of fβ(X) for that chosen point. Here we will also note the sign of fβ(X) for the chosen value of X in each interval. If the sign of fβ(X) is positive (+), then it means that the function is increasing in that interval. On the other hand, if the sign of fβ(X) is negative (-), then it means that the function is decreasing in that interval.
The following table shows the required calculations.
Figureβββ36: Monitoring the graph on various intervals
From the above table, we can say that the function f(x) is changing from negative to positive. That means that the point X=0 is the local minima for the functionΒ f(X)=Xβ΄.
The First Derivative Test:
Figure 37: The First Derivative Test
The Second Derivative Test:
Figureβββ38: The Second Derivative Test
An Inflection PointΒ Test:
Figure-39: An Inflection PointΒ Test
Conclusion:
In conclusion, understanding derivatives is a crucial component of any machine learning practitionerβs toolkit. The ability to analyze and optimize functions using the first derivative test, the second derivative test, and the inflection test can greatly improve the performance of machine learning models. While these concepts may seem daunting at first, with practice and patience, they can become intuitive and even enjoyable to work with. With the rise of deep learning and other complex machine learning techniques, the importance of derivatives is only increasing. By mastering these concepts, youβll be better equipped to tackle challenging problems in the field and to push the boundaries of what is possible with machine learning.
Citation:
For attribution in academic contexts, please cite this workΒ as:
Shukla, et al., βMastering Derivatives for Machine Learningβ, Towards AI,Β 2023
BibTex Citation:
@article{pratik_2023,
title={Mastering Derivatives for Machine Learning},
url={https://pub.towardsai.net/mastering-derivatives-for-machine-learning-b09336bb074},
journal={Towards AI},
publisher={Towards AI Co.},
author={Pratik, Shukla},
editor={Binal, Dave},
year={2023},
month={Feb}
}
Resources and References:
Mastering Derivatives for Machine Learning was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI