Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Proving the Convexity of Log-Loss for Logistic Regression
Latest

Proving the Convexity of Log-Loss for Logistic Regression

Last Updated on February 25, 2023 by Editorial Team

Author(s): Towards AI Editorial Team

Originally published on Towards AI.

Unpacking Log Loss Error Function’s Impact on Logistic Regression

Photo by DeepMind onΒ Unsplash

Author(s): PratikΒ Shukla

β€œCourage is like a muscle. We strengthen it by use.β€β€Šβ€”β€ŠRuthΒ Gordo

Table of Contents:

  1. Proof of convexity of the log-loss function for logistic regression
  2. A visual look at BCE for logistic regression
  3. Resources and references

Introduction

In this tutorial, we will see why the log-loss function works better in logistic regression. Here, our goal is to prove that the log-loss function is a convex function for logistic regression. Once we prove that the log-loss function is convex for logistic regression, we can establish that it’s a better choice for the loss function.

Logistic regression is a widely used statistical technique for modeling binary classification problems. In this method, the log-odds of the outcome variable is modeled as a linear combination of the predictor variables. To estimate the parameters of the model, the maximum likelihood method is used, which involves optimizing the log-likelihood function. The log-likelihood function for logistic regression is typically expressed as the negative sum of the log-likelihoods of each observation. This function is known as the log-loss function or binary cross-entropy loss. In this blog post, we will explore the convexity of the log-loss function and why it is an essential property in optimization algorithms used in logistic regression. We will also provide a proof of the convexity of the log-loss function.

Proof of convexity of the log-loss function for logistic regression:

Let’s mathematically prove that the log-loss function for logistic regression isΒ convex.

We saw in the previous tutorial that a function is said to be a convex function if its second derivative is >0. So, here we’ll take the log-loss function and find its second derivative to see whether it’s >0 or not. If it’s >0, then we can say that it is a convex function.

Here we are going to consider the case of a single trial to simplify the calculations.

Stepβ€Šβ€”β€Š1:

The following is a mathematical definition of the binary cross-entropy loss function (for a singleΒ trial).

Figureβ€Šβ€”β€Š1: Binary Cross-Entropy loss for a singleΒ trial

Stepβ€Šβ€”β€Š2:

The following is the predicted value (Ε·) for logistic regression.

Figureβ€Šβ€”β€Š2: The predicted probability for the givenΒ example

Stepβ€Šβ€”β€Š3:

In the following image, z represents the linear transformation.

Figureβ€Šβ€”β€Š3: Linear transformation in forward propagation

Stepβ€Šβ€”β€Š4:

After that, we are modifying Stepβ€Šβ€”β€Š1 to reflect the values of Stepβ€Šβ€”β€Š3 and Stepβ€Šβ€”β€Š2.

Figureβ€Šβ€”β€Š4: Binary Cross-Entropy loss for logistic regression for a singleΒ trial

Stepβ€Šβ€”β€Š5:

Next, we are simplifying the terms in Stepβ€Šβ€”β€Š4.

Figureβ€Šβ€”β€Š5: Binary Cross-Entropy loss for logistic regression for a singleΒ trial

Stepβ€Šβ€”β€Š6:

Next, we are further simplifying the terms in Stepβ€Šβ€”β€Š5.

Figureβ€Šβ€”β€Š6: Binary Cross-Entropy loss for logistic regression for a singleΒ trial

Stepβ€Šβ€”β€Š7:

The following is the quotient rule for logarithms.

Figureβ€Šβ€”β€Š7: The quotient rule for logarithms

Stepβ€Šβ€”β€Š8:

Next, we are using the equation from Stepβ€Šβ€”β€Š7 to further simplify Stepβ€Šβ€”β€Š6.

Figureβ€Šβ€”β€Š8: Binary Cross-Entropy loss for logistic regression for a singleΒ trial

Stepβ€Šβ€”β€Š9:

In Stepβ€Šβ€”β€Š8, the value of log(1) is going to beΒ 0.

Figureβ€Šβ€”β€Š9: The value ofΒ log(1)=0

Stepβ€Šβ€”β€Š10:

Next, we are rewriting Stepβ€Šβ€”β€Š8 with the remaining terms.

Figureβ€Šβ€”β€Š10: Binary Cross-Entropy loss for logistic regression for a singleΒ trial

Stepβ€Šβ€”β€Š11:

The following is the power rule for logarithms.

Figureβ€Šβ€”β€Š11: Power rule for logarithms

Stepβ€Šβ€”β€Š12:

Next, we will use the power rule of logarithms to simplify the equation in Stepβ€Šβ€”β€Š10.

Figureβ€Šβ€”β€Š12: Applying the powerΒ rule

Stepβ€Šβ€”β€Š13:

Next, we are replacing the values in Stepβ€Šβ€”β€Š10 with the values in Stepβ€Šβ€”β€Š12.

Figureβ€Šβ€”β€Š13: Using the power rule for logarithms

Stepβ€Šβ€”β€Š14:

Next, we are substituting the value of Stepβ€Šβ€”β€Š13 into Stepβ€Šβ€”β€Š10.

Figureβ€Šβ€”β€Š14: Binary Cross-Entropy loss for logistic regression for a singleΒ trial

Stepβ€Šβ€”β€Š15:

Next, we are multiplying Stepβ€Šβ€”β€Š14 by (-1) on bothΒ sides.

Figureβ€Šβ€”β€Š15: Binary Cross-Entropy loss for logistic regression for a singleΒ trial

Finding the First Derivative:

Stepβ€Šβ€”β€Š16:

Next, we are going to find the first derivative ofΒ f(x).

Figureβ€Šβ€”β€Š16: Finding the first derivative ofΒ f(w)

Stepβ€Šβ€”β€Š17:

Here we are distributing the partial differentiation sign to eachΒ term.

Figureβ€Šβ€”β€Š17: Finding the first derivative ofΒ f(w)

Stepβ€Šβ€”β€Š18:

Here we are applying the derivative rules.

Figureβ€Šβ€”β€Š18: Finding the first derivative ofΒ f(w)

Stepβ€Šβ€”β€Š19:

Here we are finding the partial derivative of the last term of Stepβ€Šβ€”β€Š18.

Figureβ€Šβ€”β€Š19: Finding the first derivative ofΒ f(w)

Stepβ€Šβ€”β€Š20:

Here we are finding the partial derivative of the first term of Stepβ€Šβ€”β€Š18.

Figureβ€Šβ€”β€Š20: Finding the first derivative ofΒ f(w)

Stepβ€Šβ€”β€Š21:

Here we are putting together the results of Stepβ€Šβ€”β€Š19 and Stepβ€Šβ€”β€Š20.

Figureβ€Šβ€”β€Š21: Finding the first derivative ofΒ f(w)

Stepβ€Šβ€”β€Š22:

Next, we are rearranging the terms of the equation in Stepβ€Šβ€”β€Š21.

Figureβ€Šβ€”β€Š22: Finding the first derivative ofΒ f(w)

Stepβ€Šβ€”β€Š23:

Next, we are rewriting the equation in Stepβ€Šβ€”β€Š22.

Figureβ€Šβ€”β€Š23: Finding the first derivative ofΒ f(w)

Finding the Second Derivative:

Stepβ€Šβ€”β€Š24:

Next, we are going to find the second derivative of the functionΒ f(x).

Figureβ€Šβ€”β€Š24: Finding the second derivative ofΒ f(w)

Stepβ€Šβ€”β€Š25:

Here we are distributing the partial derivative to eachΒ term.

Figureβ€Šβ€”β€Š25: Finding the second derivative ofΒ f(w)

Stepβ€Šβ€”β€Š26:

Next, we are simplifying the equation in Stepβ€Šβ€”β€Š25 to remove redundant terms.

Figureβ€Šβ€”β€Š26: Finding the second derivative ofΒ f(w)

Stepβ€Šβ€”β€Š27:

Here is the derivative rule forΒ 1/f(x).

Figureβ€Šβ€”β€Š27: The derivative rule forΒ 1/f(x)

Stepβ€Šβ€”β€Š28:

Next, we are finding the relevant term to plug-in in Stepβ€Šβ€”β€Š27.

Figureβ€Šβ€”β€Š28: Value of p(w) for derivative ofΒ 1/p(w)

Stepβ€Šβ€”β€Š29:

Here we are finding the partial derivative term for Stepβ€Šβ€”β€Š27.

Figureβ€Šβ€”β€Š29: Value of p’(w) for derivative ofΒ 1/p(w)

Stepβ€Šβ€”β€Š30:

Here we are finding the squared term for Stepβ€Šβ€”β€Š27.

Figureβ€Šβ€”β€Š30: Value of p(w)Β² for derivative ofΒ 1/p(w)

Stepβ€Šβ€”β€Š31:

Here we are putting together all the terms of Stepβ€Šβ€”β€Š27.

Figureβ€Šβ€”β€Š31: Calculating the value of the derivative ofΒ 1/p(w)

Stepβ€Šβ€”β€Š32:

Here we are simplifying the equation in Stepβ€Šβ€”β€Š31.

Figureβ€Šβ€”β€Š32: Calculating the value of the derivative ofΒ 1/p(w)

Stepβ€Šβ€”β€Š33:

Next, we are putting together all the values in Stepβ€Šβ€”β€Š26.

Figureβ€Šβ€”β€Š33: Finding the second derivative ofΒ f(w)

Stepβ€Šβ€”β€Š34:

Next, we are further simplifying the terms in Stepβ€Šβ€”β€Š33.

Figureβ€Šβ€”β€Š34: Finding the second derivative ofΒ f(w)

Alright! So, now we have the second derivative of the function f(x). Next, we need to find out whether this will be >0 for all the values of x or not. If it is >0 for all the values of x, then we can say that the binary cross-entropy loss is convex for logistic regression.

As we can see that the following terms from Stepβ€Šβ€”β€Š34 are always going to be β‰₯0 because the square of any number is alwaysΒ β‰₯0.

Figureβ€Šβ€”β€Š35: The square of any term is always β‰₯0 for any value ofΒ x

Now, we need to determine whether or not the value of e^(-wx) is >0. To do that, let’s first find the range of the function e^(-wx) in the domain [-∞,+∞]. To further simplify the calculations, we will consider the function e^-x instead of e^-wx. Please note that scaling a function does not change the range of the function if the domain is [-∞,+∞]. Let’s first plot the graph of e^-x to understand itsΒ range.

Figureβ€Šβ€”β€Š36: Graph of e^-x for the domain of [-10,Β 10]

From the above graph we can derive the following conclusion:

  1. As the value of x moves towards negative infinity (-∞), the value of e^-x moves towards infinity (+∞).

Figureβ€Šβ€”β€Š37: The value of e^-x as x approaches -∞

2. As the value of x moves towards 0, the value of e^-x moves towardsΒ 1.

Figureβ€Šβ€”β€Š38: The value of e^-x as x approaches 0

3. As the value of x moves towards positive infinity (+∞), the value of e^-x moves towards 0.

Figureβ€Šβ€”β€Š40: The value of e^-x as x approaches +∞

So, we can say that the range of the function f(x)=e^-x is [0,+∞]. Based on the calculations, we can say that the function f(x)=e^-wx is always going to beΒ β‰₯0.

Alright! So, we have concluded that all the terms of the equation in Stepβ€Šβ€”β€Š34 areβ‰₯0. Hence, we can say that the function f(x) is a convex function for logistic regression.

Important Note:

If the value of the second derivative of the function is 0, then there is a possibility that the function is neither concave nor convex. But, let’s not worry too much aboutΒ it!

A Visual Look at BCE for Logistic Regression:

The binary cross entropy function for logistic regression is givenΒ by…

Figureβ€Šβ€”β€Š41: Binary Cross EntropyΒ Loss

Now, we know that this is a binary classification problem. So, there can be only two possible values for Yi (0 orΒ 1).

Stepβ€Šβ€”β€Š1:

The value of cost function whenΒ Yi=0.

Figureβ€Šβ€”β€Š42: Binary Cross Entropy Loss whenΒ Y=0

Stepβ€Šβ€”β€Š2:

Figureβ€Šβ€”β€Š43: Binary Cross Entropy Loss whenΒ Y=1

Now, let’s consider only one trainingΒ example.

Stepβ€Šβ€”β€Š3:

Now, let’s say we have only one training example. It means that n=1. So, the value of the cost function whenΒ Y=0,

Figureβ€Šβ€”β€Š44: Binary Cross Entropy Loss for a single training example whenΒ Y=0

Stepβ€Šβ€”β€Š4:

Now, let’s say we have only one training example. It means that n=1. So, the value of the cost function whenΒ Y=1,

Figureβ€Šβ€”β€Š45: Binary Cross Entropy Loss for a single training example whenΒ Y=1

Stepβ€Šβ€”β€Š5:

Now, let’s plot the function graph in Stepβ€Šβ€”β€Š3.

Figureβ€Šβ€”β€Š46: Graph of -log(1-X)

Stepβ€Šβ€”β€Š6:

Now, let’s plot the function graph in Stepβ€Šβ€”β€Š4.

Figureβ€Šβ€”β€Š47: Graph ofΒ -log(X)

Stepβ€Šβ€”β€Š7:

Let’s put the graphs in Stepβ€Šβ€”β€Š5 and Stepβ€Šβ€”β€Š6 together.

Figureβ€Šβ€”β€Š48: Graph of -log(1-X) andΒ -log(X)

The above graphs follow the definition of the convex function (β€œA function of a single variable is called a convex function if no line segments joining two points on the graph lie below the graph at any point”). So, we can say that the function isΒ convex.

Conclusion:

In conclusion, we have explored the concept of convexity and its importance in optimization algorithms used in logistic regression. We have demonstrated that the log-loss function is convex, which implies that its optimization problem has a unique global minimum. This property is crucial for ensuring the stability and convergence of optimization algorithms used in logistic regression. By proving the convexity of the log-loss function, we have shown that the optimization problem in logistic regression is well-posed and can be efficiently solved using standard convex optimization methods. Moreover, our proof provides a deeper understanding of the mathematical foundations of logistic regression and lays the groundwork for further research and development in thisΒ field.

Buy Pratik aΒ Coffee!

Citation:

For attribution in academic contexts, please cite this workΒ as:

Shukla, et al., β€œProving the Convexity of Log Loss for Logistic Regression”, Towards AI,Β 2023

BibTex Citation:

@article{pratik_2023, 
title={Proving the Convexity of Log Loss for Logistic Regression},
url={https://pub.towardsai.net/proving-the-convexity-of-log-loss-for-logistic-regression-49161798d0f3},
journal={Towards AI},
publisher={Towards AI Co.},
author={Pratik, Shukla},
editor={Binal, Dave},
year={2023},
month={Feb}
}


Proving the Convexity of Log-Loss for Logistic Regression was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓