Understanding Logistic Regression: Theory, Intuition, and Applications

Last Updated on September 4, 2025 by Editorial Team

Author(s): Abinaya Subramaniam

Originally published on Towards AI.

Understanding Logistic Regression: Theory, Intuition, and Applications — Logistic Regression — Image by Author

You may already be familiar with linear regression, which models a straight line relationship between input features and a continuous output. However, when the output variable is categorical, especially binary (like yes/no, pass/fail, spam/not spam), linear regression fails to capture the essence of the problem. This is where logistic regression steps in. Despite its name, logistic regression is not used for regression but rather for classification problems.

This blog provides a comprehensive journey through logistic regression, why we need it, how it works mathematically, the cost function it uses, optimization techniques, performance evaluation metrics, and even how it extends to multi class classification.

Understanding Binary Classification

Let us begin by understanding what a binary classification problem looks like. Imagine a medical researcher studying the relationship between a patient’s cholesterol level and their risk of developing heart disease. If a patient has a very low cholesterol level, they are unlikely to have heart disease. At significantly elevated levels, the patient has a substantially higher risk.

In this case, the input feature is the cholesterol level (measured in mg/dL), and the output is categorical with two labels: high risk (1) and low risk (0). The task of the machine learning model is to predict the probability of being high risk, given the cholesterol level. Since there are only two possible outcomes, this is a classic binary classification problem.

Why Not Use Linear Regression for Classification?

At first glance, one might wonder why linear regression cannot be applied to such classification problems. If we assign numeric labels to the outcomes say, 0 for low risk and 1 for high risk, then it seems plausible that a regression line could separate the two categories. Predictions greater than 0.5 could be mapped to “high risk,” and predictions less than or equal to 0.5 could be mapped to “low risk.”

However, applying linear regression to this type of problem has several fundamental flaws. The first issue arises when outliers are present in the data. For example, if one patient had an extremely high cholesterol level due to a rare genetic condition and was labeled as high risk, this single data point would drastically shift the regression line. As a result, the model might incorrectly predict that a patient with a moderately high cholesterol level is at low risk, even though clinical guidelines suggest otherwise.

The second and more serious issue is that linear regression outputs are unbounded and not restricted to the range of [0, 1]. The regression line could easily predict risk probabilities like -0.2 or 1.3, which are meaningless and invalid in the context of medical classification, since probabilities must always lie between 0 and 1.

Therefore, a new mechanism is required to “squash” the output values into the valid probability range. This is precisely what logistic regression achieves through the use of the sigmoid function.

Logistic Regression and the Sigmoid Function

The central idea of logistic regression is simple: instead of directly using the linear regression output, we pass it through a mathematical function called the sigmoid function (also known as the logistic function).

The sigmoid function is defined as:

Here, z is the linear combination of input features:

The sigmoid function takes any real number as input and squashes it into a value strictly between 0 and 1. When z approaches negative infinity, the sigmoid output approaches 0. When z approaches positive infinity, the output approaches 1. At z=0, the function outputs 0.5.

This makes sigmoid ideal for classification because the output can be interpreted as the probability of belonging to class 1. If the probability is greater than 0.5, we classify the input as 1; otherwise, we classify it as 0.

The Cost Function of Logistic Regression

In linear regression, we used the mean squared error as the cost function. However, applying the same cost function in logistic regression does not work well. The squared error cost function becomes non-convex when combined with the sigmoid, resulting in multiple local minima. Gradient descent, which we use for optimization, would then struggle to converge to the true global minimum.

To address this, logistic regression uses a different cost function called log loss (or binary cross-entropy). For a single training example, the cost is defined as:

Here, hθ(x) is the predicted probability, and y is the actual class label (0 or 1).

If y=1, the cost reduces to −log⁡(hθ(x)). The closer hθ(x) is to 1, the smaller the cost.
If y=0, the cost reduces to −log⁡(1−hθ(x)). The closer hθ(x) is to 0, the smaller the cost.

For the entire dataset of m examples, the overall cost function is the average of these individual costs:

This cost function is convex, ensuring a smooth “bowl-shaped” curve that gradient descent can optimize effectively.

Optimization with Gradient Descent

The parameters θj in logistic regression are learned by minimizing the cost function. Gradient descent is the optimization algorithm commonly used for this purpose.

In gradient descent, parameters are updated iteratively according to the rule:

where α is the learning rate, a small positive number that controls the step size. By repeatedly updating the parameters in the opposite direction of the gradient, the algorithm converges to the point where the cost function is minimized.

Evaluating Logistic Regression: Performance Metrics

Once the model is trained, the next step is to evaluate how well it performs. Unlike regression tasks, where metrics such as R-squared are used, classification problems require different evaluation tools.

The foundation of classification metrics is the confusion matrix, which summarizes predictions into four categories: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). From this matrix, we derive performance metrics.

Accuracy measures the proportion of correct predictions. While simple, accuracy can be misleading when the dataset is imbalanced. For instance, if 95% of emails are not spam, a model that always predicts “not spam” will achieve 95% accuracy yet fail at identifying actual spam.

To address such limitations, precision and recall are used. Precision answers the question: “Of all emails predicted as spam, how many were actually spam?” Recall answers: “Of all actual spam emails, how many did the model correctly detect?” In some applications, such as spam detection, precision is more important, while in medical diagnosis, recall often takes priority.

A single metric that balances precision and recall is the F1 score, which is the harmonic mean of the two. In more flexible situations, the Fβ score allows adjusting the emphasis on precision versus recall depending on the problem.

Extending Logistic Regression to Multi-Class Classification

So far, we have focused on binary classification. But what if we have more than two categories, such as classifying fruit into apples, bananas, and oranges? Logistic regression can be extended to multi-class problems using the One-Versus-Rest (OVR) approach.

In OVR, a separate binary classifier is trained for each class. Each classifier predicts the probability that a data point belongs to its assigned class versus all other classes combined. At prediction time, the input is passed through all classifiers, and the class with the highest probability is chosen as the final output.

This simple yet effective method allows logistic regression to handle multi-class classification problems efficiently.

Applications of Logistic Regression

Logistic regression, though one of the oldest algorithms in machine learning, remains widely used due to its simplicity, interpretability, and efficiency. Some common applications include:

Predicting whether a patient has a disease (medical diagnosis)
Predicting Employee Attrition (Human Resources)
Determining whether a customer will default on a loan (credit scoring)
Fraud Detection for Insurance Claims (Insurance)
Identifying spam emails (natural language processing)
Analyzing customer churn in marketing

Its interpretability makes logistic regression particularly valuable in domains like healthcare and finance, where understanding why a model made a decision is as important as the decision itself.

Conclusion

Logistic regression is a cornerstone algorithm in machine learning, bridging the gap between regression and classification. By applying the sigmoid function to the linear regression model, it transforms unbounded outputs into probabilities between 0 and 1, making it perfectly suited for classification problems. The log loss cost function ensures convex optimization, and gradient descent provides an efficient way to learn parameters.

Performance evaluation in logistic regression goes beyond accuracy, with precision, recall, and F1 scores playing crucial roles depending on the use case. Finally, logistic regression is not limited to binary classification, with techniques like One-Versus-Rest, it scales gracefully to multi class settings.

Simple, interpretable, and mathematically elegant, logistic regression remains one of the most powerful tools for solving classification problems, even in the era of deep learning.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Understanding Logistic Regression: Theory, Intuition, and Applications

Author(s): Abinaya Subramaniam

Understanding Binary Classification

Why Not Use Linear Regression for Classification?

Logistic Regression and the Sigmoid Function

The Cost Function of Logistic Regression

Optimization with Gradient Descent

Evaluating Logistic Regression: Performance Metrics

Extending Logistic Regression to Multi-Class Classification

Applications of Logistic Regression

Conclusion

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Understanding Logistic Regression: Theory, Intuition, and Applications

Author(s): Abinaya Subramaniam

Understanding Binary Classification

Why Not Use Linear Regression for Classification?

Logistic Regression and the Sigmoid Function

The Cost Function of Logistic Regression

Optimization with Gradient Descent

Evaluating Logistic Regression: Performance Metrics

Extending Logistic Regression to Multi-Class Classification

Applications of Logistic Regression

Conclusion

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement