Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Mastering Derivatives for Machine Learning
Latest   Machine Learning

Mastering Derivatives for Machine Learning

Last Updated on February 22, 2023 by Editorial Team

Author(s): Towards AI Editorial Team

Originally published on Towards AI.

Understanding the building blocks of machineΒ learning

Photo by Michael Dziedzic onΒ Unsplash

Author(s): PratikΒ Shukla

β€œEducation is the movement from darkness to lightβ€β€Šβ€”β€ŠAllanΒ Bloom

Table of Contents:

  1. The Slope of a straightΒ line
  2. The rise of derivatives
  3. Definition of derivative
  4. Using the definition of derivative
  5. The First Derivative Test
  6. Working Example of the First Derivative Test
  7. The Second Derivative Test
  8. The Inflection Point
  9. Working Example of the Second Derivative Test
  10. Checking for Inflection Point
  11. Working Example when f’’(X)=0, and X is an Inflection Point
  12. Why do we need the First Derivative Test?
  13. Working Example when f’’(X)=0, and X is not an Inflection Point
  14. Flowchart for the First Derivative Test
  15. Flowchart for the Second Derivative Test
  16. Flowchart for the Inflection PointΒ Test
  17. Resources and References

If you’re interested in machine learning, it’s likely that you’ve come across the term β€œderivative” before. Derivatives are a fundamental concept in calculus, and they play a crucial role in many machine-learning algorithms. Put simply, a derivative measures the rate of change of a function at a particular point. This information can be used to optimize functions, find local minima and maxima, and more. In this blog, we’ll dive into the world of derivatives and explore some of the key concepts that are relevant to machine learning. Specifically, we’ll focus on the first derivative test, the second derivative test, and the inflection testβ€Šβ€”β€Šall powerful tools for analyzing the behavior of a function at different points. By the end of this blog, you’ll have a solid understanding of how these tests work and how they can be used to improve your machine-learning models.

The Slope of a StraightΒ Line:

A slope is something that helps us measure the rate of change of a line. The slope of a straight line can be positive, negative, zero, or undefined.

We all know that the slope of a straight line is given by the ratio of change in y to change in x. In other words, we can also say that the slope of a straight line is given by the rise over the run. Note that the slope of a straight line is always constant. We can put it into the mathematical form using the following formula.

Figureβ€Šβ€”β€Š1: Equation of the slope of a straightΒ lineSlope of a line can be positive, negative, zero, or undefinedFigureβ€Šβ€”β€Š2: Slope of a straightΒ line

The Rise of Derivatives:

In the 17th century, Isaac Newton and Gottfried Leibniz thought that the concept of slope can be applied to curves as well. They thought that instead of having a constant rate of change, we’ll have a variable rate of change in the case of curves. This is how the method of calculating the derivatives was born. Now, let’s see how this methodΒ works.

1. Stepβ€Šβ€”β€Š1:

In the below image, let’s say we want to find the slope at point P. To find the slope at point P, we will need to find the slope of the tangent line at pointΒ P.

Figureβ€Šβ€”β€Š3: Finding the slope of the curve at pointΒ P

2. Stepβ€Šβ€”β€Š2:

But, as of now we only know how to find the slope of a straight line. To do that, let’s find another point on the curve and calculate the slope of the line between these twoΒ points.

Figureβ€Šβ€”β€Š4: Finding the slope of the straight lineΒ PQ

3. Stepβ€Šβ€”β€Š3:

But here we can see that the above line does not represent the slope exactly at point P. Now, to find the slope exactly at point P, let’s find another point on the curve which is closer toΒ P.

Figureβ€Šβ€”β€Š5: Finding more straight lines on theΒ curve

4. Stepβ€Šβ€”β€Š4:

From the above image we can say that as we move the second point closer and closer, we are approaching our goal of finding the tangent line to point P. So, based on that, we can say that we just need to minimize the distance between the two points and keep it as close as 0. Let’s see how calculus can help us withΒ this.

Let’s say the distance between the two points, P and Q is h. Now, our goal is to minimize the distance and keep it as close as 0. Here, we will use the concept ofΒ limits.

Figureβ€Šβ€”β€Š6: Finding the slope of the straight lineΒ PQ

Definition of Derivatives:

Derivatives are the essence of calculus. Basically, derivatives represent the instantaneous rate of change of a function with respect to one of its variables. Geometrically, a derivative is the slope of a tangent line of a curve at a point which signifies the rate of change at a particular point.

Mathematical Definition of Derivative:

Figureβ€Šβ€”β€Š7: Mathematical definition of derivative

Using the Definition of Derivative:

Now, we know that we can find the derivative of any function using the following formula.

Figureβ€Šβ€”β€Š8: Mathematical Definition of Derivatives

Let’s take an example to understand how we can actually find the derivative using the above-given formula.

Figureβ€Šβ€”β€Š9: Calculating Derivatives Using the Mathematical Definition of Derivatives

The First Derivative Test:

We use the first derivative test to check whether a function is increasing or decreasing in its domain. We can also use this test to identify its local maxima andΒ minima.

The first derivative of a function is the slope of the tangent line to the graph of a function at a given point. We can think of the first derivative as the slope of the function. When the slope is positive, the graph is increasing. When the slope is negative, the graph is decreasing. When the slope is 0, those points will be local maxima or minima. These points are called critical points. The first derivative test involves testing the behavior of the function around these points to determine whether or not they are local maxima orΒ minima.

The first derivative test is based on the fact that the sign of the first derivative does not change between critical points. Thus, if we find the critical points of a function, we can test points within the intervals between critical points to determine whether the function is increasing or decreasing over those intervals. Then, by determining whether the function is increasing or decreasing before and after a critical point, we can identify whether the point is a minimum, maximum, orΒ neither.

Let’s say we have a function f(X) which is plotted in the belowΒ image.

Figureβ€Šβ€”β€Š10: A Graph Showing Maximum and MinimumΒ Points

Now, we can say the following things about the function.

  1. Function f has a relative minimum at x=m if for all the points p near m, f(m)<f(p).
  2. Function f has a relative maximum at x=m if for all the points p near m, f(m)>f(p).

The First Derivative Test:

Let’s say f is a differentiable function with f’(m) = 0. Based on this, we can derive the following conclusions.

  1. If f’(x) changes from positive to negative at x=m, then f has a local maximum atΒ m.
  2. If f’(x) changes from negative to positive at x=m, then f has a local minimum atΒ m.

Working Example of the First Derivative Test

Example: Find the relative extrema for f(X) = 2XΒ³β€Šβ€”β€Š3XΒ²β€Šβ€”β€Š12X.

1. Stepβ€Šβ€”β€Š1:

Our function f(X) is givenΒ by…

Figureβ€Šβ€”β€Š11: FunctionΒ f(X)

2. Stepβ€Šβ€”β€Š2:

To find the relative extrema, we need to find the point(s) where the function's first derivative isΒ 0.

Figureβ€Šβ€”β€Š12: Finding the Critical Points of the FunctionΒ f(X)

3. Stepβ€Šβ€”β€Š3:

Finding the points where the first derivative of the function (f’(X)) isΒ 0.

Figureβ€Šβ€”β€Š13: Finding Points WhereΒ f’(X)=0

4. Stepβ€Šβ€”β€Š4:

Now, we have two points where we will have relative maxima or minima. To find whether we have minima or maxima, we will need to check the points around these critical points. To do that, let’s find the intervals around these criticalΒ points.

Figureβ€Šβ€”β€Š14: Finding Intervals based on the CriticalΒ Points

5. Stepβ€Šβ€”β€Š5:

Next, we will choose a point from each of the intervals and find the value of f’(X) for that chosen point. Here we will also note the sign of f’(X) for the chosen value of X in each interval. If the sign of f’(X) is positive (+), then it means that the function is increasing in that interval. On the other hand, if the sign of f’(X) is negative (-), then it means that the function is increasing in that interval.

The following table shows the required calculations.

Figureβ€Šβ€”β€Š15: Calculating Whether f(X) is Increasing or Decreasing in the Intervals

6. Stepβ€Šβ€”β€Š6:

Based on the above table, we can say that we will have local maxima at point X = -1 as the value of f’(X) changes from positive to negative around it. On the other hand, we will have local minima at point X=2 as the value of f’(X) changes from negative to positive around it. Please note that the function does not change its value from positive to negative or negative to positive anywhere other than these criticalΒ points.

Figureβ€Šβ€”β€Š16: The Graph of the Function f(X)= 2XΒ³-3XΒ²-12X

The Second Derivative Test

We use the second derivative test to determine whether a critical point(s) of a function is a local minimum or maximum. We know that the first derivative is defined as the rate of change of the function, and it’s given by the slope of the tangential line at a given point on the curve. In the same way, the second derivative is the rate of change of the first derivative, and it’s also known as the concavity ofΒ f(X).

Before we dive into the second derivative test, let’s first understand the meaning of an inflection point.

The Inflection Point:

An inflection point is defined as a point on a curve at which the sign of the concavity changes. Inflection points cannot be local maxima or local minima. In the following image, we can see that for the function f(X) = XΒ³, X = 0 is an inflection point. Also, we can see that in the interval (-inf,0), the function is concave and, in the interval, (0, inf), the function isΒ convex.

Figureβ€Šβ€”β€Š17: An Inflection Point

The Second Derivative Test:

Let’s say f(X) is a function such that f’(X) and f’’(X) can be defined for this function. Now, we can find all the critical points at f’(X) = 0. Next, we need to find the second derivative of the function at the critical points. The second derivative test is defined asΒ follows:

  • If f’’(X) > 0, then f(X) has a local minimum atΒ X.
  • If f’’(X)<0, then f(X) has a local maximum atΒ X.
  • If f’’(X) = 0, then we can say that the test is inconclusive. Now, it is possible that the point is an inflection point if the sign of f’’(X) changes from negative to positive or positive to negative around this point. But, if the sign of f’’(X) does not change, we can say that it is not an inflection point.

Steps Involved in Finding the Second Derivative:

  1. We have the functionΒ f(X).
  2. Find the critical points of the function by finding the points where the first derivative f’(X)=0.
  3. Find the second derivative of the function f(X) at all the criticalΒ points.
  4. If f’’(X)>0, then f(X) has a local minimum at X, and f(X) is a convex function in that interval.
  5. If f’’(X)<0, then f(X) has a local maximum at X, and f(X) is a concave function in that interval.
  6. If f’’(X)=0, then the second derivative test is inconclusive.
  7. Check for inflection points.

Working Example of the Second Derivative Test

Example: Find the relative extrema for f(X) = 5XΒ³β€Šβ€”β€Š3X⁡

1. Stepβ€Šβ€”β€Š1:

Our function f(X) is givenΒ by…

Figureβ€Šβ€”β€Š18: FunctionΒ f(x)

2. Stepβ€Šβ€”β€Š2:

To find the relative extrema(s), we need to find the critical points. Critical points are the points where the first derivative of the function f’(X) =Β 0.

Figureβ€Šβ€”β€Š19: Finding the Critical Points of the FunctionΒ f(X)

3. Stepβ€Šβ€”β€Š3:

Finding the points where the first derivative of the function (f’(X)) isΒ 0.

Figureβ€Šβ€”β€Š20: Finding Points WhereΒ f’(X)=0

4. Stepβ€Šβ€”β€Š4:

Now, we have three points where we might have relative maxima or minima. Next, we will find the second derivative f’’(X) of the function f(X) at these criticalΒ points.

Figureβ€Šβ€”β€Š21: Finding the Second derivative

So, according to the second derivative test, we can say that we will have a local minimum at x=-1 and a local maximum atΒ x=1.

5. Stepβ€Šβ€”β€Š5:

Now, we know that at X=0, the test is inconclusive. So, now let’s find whether X=0 is an inflection point or not. To do that, we need to test the concavity of f(X) before and after the inflection point by selecting a point from an appropriate interval. Here we know that the critical points are -1, 0, and 1. So, we need to choose a value from (-1,0) and (0,1). Let’s choose X=-0.5 and X=0.5 and see how itΒ goes.

Figureβ€Šβ€”β€Š22: Finding an inflection point

Here, we can see that f’’(X) changes the sign around the point 0. So, we can say that 0 is an inflection point. Other than that, based on the results, we can say that the function is concave in the interval (-1,0) and convex in the intervalΒ (0,1).

Figureβ€Šβ€”β€Š23: Monitoring theΒ graph

Confusing Terms:

  • Concave Up =Β Convex
  • Concave Down =Β Concave

Checking for Inflection Point:

If f’’(X) =0 then there are two possibilities.

  1. X is an inflection pointβ€Šβ€”β€Šβ€” End of theΒ story
  2. X is not an inflection pointβ€Šβ€”β€Šβ€” The story continues

Steps to check whether a critical point is an inflection point orΒ not?

  1. We know that f’’(X)=0.
  2. Find appropriate intervals around critical pointΒ X.
  3. Find the values of f’’(X) in these intervals.
  4. If the sign of f’’(X) changes from negative (-) to positive (+) or positive (+) to negative (-) then it is an inflection point. If the f’’(X) sign does not change in these intervals, then it is not an inflection point.

Working Example when f’’(X)=0, and X is an Inflection Point:

1. Stepβ€Šβ€”β€Š1:

Our function f(X) is givenΒ by…

Figureβ€Šβ€”β€Š24: FunctionΒ f(x)

2. Stepβ€Šβ€”β€Š2:

To find the relative extrema(s), we need to find the critical points. Critical points are the points where the first derivative of the function f’(X) =Β 0.

Figureβ€Šβ€”β€Š25: Finding the Critical Points of the FunctionΒ f’(X)

3. Stepβ€Šβ€”β€Š3:

Finding the points where the first derivative of the function (f’(X)) isΒ 0.

Figureβ€Šβ€”β€Š26: Finding Points WhereΒ f’(X)=0

4. Stepβ€Šβ€”β€Š4:

Next, we have only one point where we might have local maxima or minima. Let’s find the second derivative f’’(X) of the function f(X) at the criticalΒ point.

Figureβ€Šβ€”β€Š27: Finding the Second derivative

5. Stepβ€Šβ€”β€Š5:

Since the second derivative at the critical point is 0, we know that at X=0, the test is inconclusive for the concavity of the function. So, now let’s find whether X=0 is an inflection point or not. To do that, we need to test the concavity of f(X) before and after the critical point by selecting a value from an appropriate interval. Here we know that the critical point is 0. So, we need to choose a value from (-∞,0) and (0, ∞). Let’s choose X=-1 and X=1 and see how itΒ goes.

Figureβ€Šβ€”β€Š28: Finding an inflection point

Here, we can see that f’’(X) changes the sign around the point 0. So, we can say that 0 is an inflection point. Other than that, based on the results, we can say that the function is concave in the interval (-∞,0) and convex in the interval (0, ∞).

Figureβ€Šβ€”β€Š29: Monitoring the changes in theΒ function

Why Do We Need the First Derivative Test?

Cases When the Second Derivative Test Does NotΒ Work:

  • When f’(X) = 0 and f’’(X) =Β 0.
  • When f’(X) = 0 and f’’(X) is notΒ defined.
  • When f’(X) is notΒ defined.

In the above cases, we need to use the first derivative test to find out whether the critical point is at a local minimum or at a localΒ maximum.

An example is when the Second Derivative(f’’(X)) is 0 and X is not an Inflection Point:

1. Stepβ€Šβ€”β€Š1:

Our function f(X) is givenΒ by…

Figureβ€Šβ€”β€Š30: FunctionΒ f(x)

2. Stepβ€Šβ€”β€Š2:

To find the relative extrema(s), we need to find the critical points. Critical points are the points where the first derivative of the function f’(X) =Β 0.

Figureβ€Šβ€”β€Š31: Finding the Critical Points of the FunctionΒ f’(X)

3. Stepβ€Šβ€”β€Š3:

Finding the points where the first derivative of the function (f’(X)) isΒ 0.

Figureβ€Šβ€”β€Š32: Finding Points WhereΒ f’(X)=0

4. Stepβ€Šβ€”β€Š4:

Next, we have only one point where we might have local maxima or minima. Let’s find the second derivative f’’(X) of the function f(X) at the criticalΒ point.

Figureβ€Šβ€”β€Š33: Finding the Second derivative

5. Stepβ€Šβ€”β€Š5:

Since the second derivative at the critical point is 0, we know that at X=0, the test is inconclusive for the concavity of the function. So, now let’s find whether X=0 is an inflection point or not. To do that, we need to test the concavity of f(X) before and after the critical point by selecting a value from an appropriate interval. Here we know that the critical point is 0. So, we need to choose a value from (-∞,0) and (0,∞). Let’s choose X=-1 and X=1, and see how itΒ goes.

Figureβ€Šβ€”β€Š34: Finding an inflection point

Here we can see that f’’(X) does not change the sign around the point 0. So, we can confidently say that 0 is not an inflection point.

Figureβ€Šβ€”β€Š35: Monitoring the graph on various intervals

Now what?

Now we know that 0 is not an inflection point for f(X) = X⁴. If it is not an inflection point, then it must be either local minima or local maxima. Since the 2nd derivative test failed to determine this, we will apply the 1st derivative test here.

6. Stepβ€Šβ€”β€Š6:

Next, we will choose a point from each of the intervals ((-∞,0) and (0,∞)) and find the value of f’(X) for that chosen point. Here we will also note the sign of f’(X) for the chosen value of X in each interval. If the sign of f’(X) is positive (+), then it means that the function is increasing in that interval. On the other hand, if the sign of f’(X) is negative (-), then it means that the function is decreasing in that interval.

The following table shows the required calculations.

Figureβ€Šβ€”β€Š36: Monitoring the graph on various intervals

From the above table, we can say that the function f(x) is changing from negative to positive. That means that the point X=0 is the local minima for the function f(X)=X⁴.

The First Derivative Test:

Figure 37: The First Derivative Test

The Second Derivative Test:

Figureβ€Šβ€”β€Š38: The Second Derivative Test

An Inflection PointΒ Test:

Figure-39: An Inflection PointΒ Test

Conclusion:

In conclusion, understanding derivatives is a crucial component of any machine learning practitioner’s toolkit. The ability to analyze and optimize functions using the first derivative test, the second derivative test, and the inflection test can greatly improve the performance of machine learning models. While these concepts may seem daunting at first, with practice and patience, they can become intuitive and even enjoyable to work with. With the rise of deep learning and other complex machine learning techniques, the importance of derivatives is only increasing. By mastering these concepts, you’ll be better equipped to tackle challenging problems in the field and to push the boundaries of what is possible with machine learning.

Citation:

For attribution in academic contexts, please cite this workΒ as:

Shukla, et al., β€œMastering Derivatives for Machine Learning”, Towards AI,Β 2023

BibTex Citation:

@article{pratik_2023, 
title={Mastering Derivatives for Machine Learning},
url={https://pub.towardsai.net/mastering-derivatives-for-machine-learning-b09336bb074},
journal={Towards AI},
publisher={Towards AI Co.},
author={Pratik, Shukla},
editor={Binal, Dave},
year={2023},
month={Feb}
}

Resources and References:

  1. Derivativeβ€Šβ€”β€ŠWikipedia


Mastering Derivatives for Machine Learning was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓