Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

Understanding Logistic Regression: Theory, Intuition, and Applications
Artificial Intelligence   Latest   Machine Learning

Understanding Logistic Regression: Theory, Intuition, and Applications

Last Updated on September 4, 2025 by Editorial Team

Author(s): Abinaya Subramaniam

Originally published on Towards AI.

Understanding Logistic Regression: Theory, Intuition, and Applications
Logistic Regression — Image by Author

You may already be familiar with linear regression, which models a straight line relationship between input features and a continuous output. However, when the output variable is categorical, especially binary (like yes/no, pass/fail, spam/not spam), linear regression fails to capture the essence of the problem. This is where logistic regression steps in. Despite its name, logistic regression is not used for regression but rather for classification problems.

This blog provides a comprehensive journey through logistic regression, why we need it, how it works mathematically, the cost function it uses, optimization techniques, performance evaluation metrics, and even how it extends to multi class classification.

Understanding Binary Classification

Let us begin by understanding what a binary classification problem looks like. Imagine a medical researcher studying the relationship between a patient’s cholesterol level and their risk of developing heart disease. If a patient has a very low cholesterol level, they are unlikely to have heart disease. At significantly elevated levels, the patient has a substantially higher risk.

Binary Classification — Image by Author

In this case, the input feature is the cholesterol level (measured in mg/dL), and the output is categorical with two labels: high risk (1) and low risk (0). The task of the machine learning model is to predict the probability of being high risk, given the cholesterol level. Since there are only two possible outcomes, this is a classic binary classification problem.

Why Not Use Linear Regression for Classification?

At first glance, one might wonder why linear regression cannot be applied to such classification problems. If we assign numeric labels to the outcomes say, 0 for low risk and 1 for high risk, then it seems plausible that a regression line could separate the two categories. Predictions greater than 0.5 could be mapped to “high risk,” and predictions less than or equal to 0.5 could be mapped to “low risk.”

However, applying linear regression to this type of problem has several fundamental flaws. The first issue arises when outliers are present in the data. For example, if one patient had an extremely high cholesterol level due to a rare genetic condition and was labeled as high risk, this single data point would drastically shift the regression line. As a result, the model might incorrectly predict that a patient with a moderately high cholesterol level is at low risk, even though clinical guidelines suggest otherwise.

The second and more serious issue is that linear regression outputs are unbounded and not restricted to the range of [0, 1]. The regression line could easily predict risk probabilities like -0.2 or 1.3, which are meaningless and invalid in the context of medical classification, since probabilities must always lie between 0 and 1.

Therefore, a new mechanism is required to “squash” the output values into the valid probability range. This is precisely what logistic regression achieves through the use of the sigmoid function.

Logistic Regression and the Sigmoid Function

The central idea of logistic regression is simple: instead of directly using the linear regression output, we pass it through a mathematical function called the sigmoid function (also known as the logistic function).

The sigmoid function is defined as:

Sigmoid Graph — Image by Author

Here, z is the linear combination of input features:

The sigmoid function takes any real number as input and squashes it into a value strictly between 0 and 1. When z approaches negative infinity, the sigmoid output approaches 0. When z approaches positive infinity, the output approaches 1. At z=0, the function outputs 0.5.

This makes sigmoid ideal for classification because the output can be interpreted as the probability of belonging to class 1. If the probability is greater than 0.5, we classify the input as 1; otherwise, we classify it as 0.

The Cost Function of Logistic Regression

In linear regression, we used the mean squared error as the cost function. However, applying the same cost function in logistic regression does not work well. The squared error cost function becomes non-convex when combined with the sigmoid, resulting in multiple local minima. Gradient descent, which we use for optimization, would then struggle to converge to the true global minimum.

To address this, logistic regression uses a different cost function called log loss (or binary cross-entropy). For a single training example, the cost is defined as:

Here, hθ​(x) is the predicted probability, and y is the actual class label (0 or 1).

  • If y=1, the cost reduces to −log⁡(hθ(x)). The closer hθ​(x) is to 1, the smaller the cost.
  • If y=0, the cost reduces to −log⁡(1−hθ(x)). The closer hθ​(x) is to 0, the smaller the cost.

For the entire dataset of m examples, the overall cost function is the average of these individual costs:

This cost function is convex, ensuring a smooth “bowl-shaped” curve that gradient descent can optimize effectively.

Optimization with Gradient Descent

The parameters θj​ in logistic regression are learned by minimizing the cost function. Gradient descent is the optimization algorithm commonly used for this purpose.

In gradient descent, parameters are updated iteratively according to the rule:

where α is the learning rate, a small positive number that controls the step size. By repeatedly updating the parameters in the opposite direction of the gradient, the algorithm converges to the point where the cost function is minimized.

Evaluating Logistic Regression: Performance Metrics

Once the model is trained, the next step is to evaluate how well it performs. Unlike regression tasks, where metrics such as R-squared are used, classification problems require different evaluation tools.

The foundation of classification metrics is the confusion matrix, which summarizes predictions into four categories: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). From this matrix, we derive performance metrics.

Confusion Matrix — Image by Author

Accuracy measures the proportion of correct predictions. While simple, accuracy can be misleading when the dataset is imbalanced. For instance, if 95% of emails are not spam, a model that always predicts “not spam” will achieve 95% accuracy yet fail at identifying actual spam.

To address such limitations, precision and recall are used. Precision answers the question: “Of all emails predicted as spam, how many were actually spam?” Recall answers: “Of all actual spam emails, how many did the model correctly detect?” In some applications, such as spam detection, precision is more important, while in medical diagnosis, recall often takes priority.

A single metric that balances precision and recall is the F1 score, which is the harmonic mean of the two. In more flexible situations, the Fβ score allows adjusting the emphasis on precision versus recall depending on the problem.

Extending Logistic Regression to Multi-Class Classification

So far, we have focused on binary classification. But what if we have more than two categories, such as classifying fruit into apples, bananas, and oranges? Logistic regression can be extended to multi-class problems using the One-Versus-Rest (OVR) approach.

In OVR, a separate binary classifier is trained for each class. Each classifier predicts the probability that a data point belongs to its assigned class versus all other classes combined. At prediction time, the input is passed through all classifiers, and the class with the highest probability is chosen as the final output.

This simple yet effective method allows logistic regression to handle multi-class classification problems efficiently.

Applications of Logistic Regression

Logistic regression, though one of the oldest algorithms in machine learning, remains widely used due to its simplicity, interpretability, and efficiency. Some common applications include:

  • Predicting whether a patient has a disease (medical diagnosis)
  • Predicting Employee Attrition (Human Resources)
  • Determining whether a customer will default on a loan (credit scoring)
  • Fraud Detection for Insurance Claims (Insurance)
  • Identifying spam emails (natural language processing)
  • Analyzing customer churn in marketing

Its interpretability makes logistic regression particularly valuable in domains like healthcare and finance, where understanding why a model made a decision is as important as the decision itself.

Conclusion

Logistic regression is a cornerstone algorithm in machine learning, bridging the gap between regression and classification. By applying the sigmoid function to the linear regression model, it transforms unbounded outputs into probabilities between 0 and 1, making it perfectly suited for classification problems. The log loss cost function ensures convex optimization, and gradient descent provides an efficient way to learn parameters.

Performance evaluation in logistic regression goes beyond accuracy, with precision, recall, and F1 scores playing crucial roles depending on the use case. Finally, logistic regression is not limited to binary classification, with techniques like One-Versus-Rest, it scales gracefully to multi class settings.

Simple, interpretable, and mathematically elegant, logistic regression remains one of the most powerful tools for solving classification problems, even in the era of deep learning.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.