Mastering Naive Bayes: Concepts, Math, and Python Code

Last Updated on December 9, 2025 by Editorial Team

Author(s): Jeet Mukherjee

Originally published on Towards AI.

You can never ignore Probability when it comes to learning Machine Learning. Naive Bayes is a Machine Learning algorithm that utilizes Bayes' theorem from probability theory as its foundation. It is primarily used for building classifiers, which are known as Naive Bayes Classifiers. In this article, we will cover everything, from the basics to the advanced.

Mastering Naive Bayes: Concepts, Math, and Python Code

There’ll be four sections in the article —

Foundation — The foundation section will contain probability theory that is essential for Naive Bayes.
Theoretical Intuition — Section 2 will explain how the algorithm works.
Mathematical Intuition — In this section, we’ll focus on the math behind the Naive Bayes.
Code — Implementation of Naive Bayes using Python (using Problem Statement from DeepML)

Foundation

Let’s start with conditional probability. This is the basic starting point for Naive Bayes. If you know nothing about probability, please refer to this first — https://medium.com/@themukherjee/a-beginners-guide-to-probability-a-quick-crash-course-b9426ea39c7b

Conditional Probability is the likelihood of an event happening given that another event has already occurred. This is denoted as P(A|B), often referred to as the Probability of A given B.

Formula of Conditional Probability —

P(A|B) = P(A∩B) / P(B)

The above formula explains the probability of an event A given that the event B has already occurred. This basically says that the sample space has shrunk.

Bayes’ Theorem is an extension of Conditional Probability. It is a mathematical formula that describes the probability of an event based on prior knowledge of conditions related to the event.

Formula of Bayes’ Theorem —

P(A|B) = P(B|A)P(A) / P(B)

A mathematical example for Bayes’ Theorem —

Q. There are three bags, each with 10 balls.

Bag 1 has 3 red balls
Bag 2 has 4 red balls
Bag 3 has 5 red balls

You pick one bag at random and then draw a ball.
The ball you drew is red.

What is the probability that this red ball came from Bag 1?

If you take a close look, you would notice that in a sequence of two events, we already know the output of the final event, and trying to find the probability of the first event.

Probability of getting a bag — 1/3 (As we have 3 bags)

Probability of getting a red ball out of each bag —

P(Getting a red ball out of bag 1) = 3/10, P(Getting a red ball out of bag 2) = 4/10, P(Getting a red ball out of bag 3) = 5/10

Total Probability of getting a Red ball — P(Red)

P(Red) = 1/3(3/10+4/10+5/10) = 12/30 = 2/5

Probability of getting a red ball from bag 1 —

P(Bag 1|Red) = P(Red|Bag 1) P(Bag 1) / P(Red) = [(3/10 * 1/3) / 2/5] = 1/4

The probability that this red ball came from Bag 1 is 25%.

Theoretical Intuition

In the previous section, we discussed a brief overview of Conditional Probability along with Bayes’ Theorem. Now, let’s learn how Naive Bayes works. Table 1 shows a sample dataset used to classify emails as spam or not spam. We’ll use this dataset to explain the Naive Bayes algorithm.

The intuition behind the dataset is, given Contains “Free”, Contains “Win”, and Contains “Money"—can the email be labeled as Spam or Not Spam? To put it in probabilistic terms —

P(Label = Spam | Contains “Free” = Yes ∩ Contains “Win” = Yes ∩ Contains “Money” = Yes)

and we will also calculate P(Label = Not Spam | Contains “Free” = Yes ∩ Contains “Win” = Yes ∩ Contains “Money” = Yes)

So, it says the probability of an email being Spam or Not Spam given Contains “Free” = Yes, Contains “Win” = Yes, and Contains “Money” = Yes. If you take a look closely, you will notice that this actually takes us back to Conditional Probability. We drop the Email ID as it’s not gonna help much to classify the email as Spam or Not Spam. Now, the highest probability value between P(Spam) and P(Not Spam) will be considered for classifying a new email as Spam or Not Spam.

Let’s assume —

P(Spam) = P(A) and P(Contains “Free” = Yes ∩ Contains “Win” = Yes ∩ Contains “Money” = Yes) = P(B)

P(A|B) = P(B|A)P(A) / P(B) to calculate for Spam.

P(A`|B) = P(B|A`)P(A`) / P(B) to calculate for Not Spam

As we can see, the denominator is the same for both equations, so we remove it. And the equations are as follows —

P(A|B) = P(B|A)P(A) to calculate for Spam.

P(A`|B) = P(B|A`)P(A`) to calculate for Not Spam

Now let’s calculate probabilities one by one —

P(A) = P(Spam) = 3/6 [Refer to Table 1 for the data]

P(A`) = P(Not Spam) = 3/6

P(B|A) = P(Yes, Yes, Yes | Spam ) = 0. If you refer to the table, you would notice that there’s no combination where Contains “Free” = Yes, Contains “Win” = Yes, Contains “Money” = Yes, and Label = Spam. Which makes sense also. Whenever we try such a combination, there’s a high chance that it will give a probability value of 0. So, we decompose it to —

P(Contains “Free” = Yes|Spam) P(Contains “Win” = Yes|Spam) P(Contains “Money” = Yes|Spam) P(Spam)

Now, let’s classify an email with Contains “Free” = Yes, Contains “Win” = Yes, and Contains “Money” = No.

And again let’s assume — P(Spam) = P(A) and P(Contains “Free” = Yes ∩ Contains “Win” = Yes ∩ Contains “Money” = No) = P(B)

P(Contains “Free” = Yes|Spam) = 2/3, P(Contains “Win” = Yes|Spam) = 2/3, P(Contains “Money” = No|Spam) = 1 — 1/3 = 2/3

P(Contains “Free” = Yes|Not Spam) = 1/3, P(Contains “Win” = Yes|Not Spam) = 0, P(Contains “Money” = No|Not Spam) = 1 — 1/3 = 2/3

Now, P(A|B) = 2/3 * 2/3 * 2/3 * 3/6 = 4/27 = 0.148

P(A`|B) = 1/3 * 0 * 2/3 * 3/6 = 0

As we can clearly see that the probability of the new email being Spam is 0.148 and being Not Spam is 0. And 0.148 > 0, so this new email will be classified as Spam.

Mathematical Intuition

Now we’ll deep dive into the mathematical intuition of Naive Bayes.

X = {X₁ + X₂ + X₃ … + Xₙ}

Cₖ = { C₁ + C₂ + C₃ … + Cₖ}

Here, X represents the independent features from the dataset. And C represents classification values from the target column(also known as the dependent column).

Now to calculate the probability of Cₖ given X—

P(Cₖ | X) = P(X | Cₖ) P(Cₖ) / P(X)

As P(X) is common in the denominator, we’ll remove it. Also, with the help of Conditional Probability, we can prove —

P(A|B) = P(A∩B)/P(B) which further gives us → P(A|B)P(B) = P(A∩B)

Now, as we have removed the denominator

P(Cₖ | X) = P(X | Cₖ) P(Cₖ) → P(Cₖ | X) = P(X ∩ Cₖ) → P(Cₖ | X) = P(X, Cₖ)

Which finally gives us —

P(Cₖ | X) = P(X₁, X₂, X₃ … Xₙ, Cₖ)

Using the Chain Rule of Conditional Probability —

P(X₁, X₂, X₃ … Xₙ, Cₖ) = P( X₁ | X₂, X₃ … Xₙ, Cₖ) P(X₂ | X₃ … Xₙ, Cₖ) … P(Xₙ₋₁ | Xₙ, Cₖ) P(Xₙ | Cₖ)

Now, as we have previously seen that P( X₁ | X₂, X₃ … Xₙ, Cₖ) can be 0 as well. So we take a Naive assumption here.

Using the Conditional Independence theorem —

So, the final formula with proper notation looks like this —

If you remember, in the theoretical section, we calculated P(Contains “Free” = Yes|Spam) P(Contains “Win” = Yes|Spam) P(Contains “Money” = Yes|Spam) P(Spam). And the output with the highest probability is used as the value of predicted value.

Code Implementation

We’ll use a problem statement from Deep-ML. It will help us to implement Bernoulli Naive Bayes. In Bernoulli Naive Bayes, the features have binary values, for example —

Contains “Win” or Contains “Free” can only have values like 0 or 1. This is something we have already seen in our Email Spam example.

Figure 1 represents the problem statement from Deep-ML.

The solution contains one forward function to perform the training, and one predict function for making predictions.

import numpy as np

class NaiveBayes:
 def __init__(self, smoothing=1.0):
 self.smoothing = smoothing
 self.class_log_prior_ = None
 self.feature_log_prob_ = None
 self.classes_ = None

 def forward(self, X, y):
 """
 Train Bernoulli Naive Bayes.
 X: 2D binary matrix (n_samples, n_features)
 y: 1D array of class labels (0 or 1)
 """
 X = np.asarray(X)
 y = np.asarray(y)
 self.classes_ = np.unique(y)

 n_samples, n_features = X.shape
 smoothing = self.smoothing

 # Count occurrences of each class
 class_counts = np.array([(y == c).sum() for c in self.classes_])
 total_samples = len(y)

 # Compute log priors: log(P(Y=c))
 self.class_log_prior_ = np.log(class_counts / total_samples)

 # Compute feature probabilities for Bernoulli NB
 # P(x_j=1 | y=c) for each class c and feature j
 feature_prob = np.zeros((len(self.classes_), n_features))

 for idx, c in enumerate(self.classes_):
 X_c = X[y == c] # samples belonging to class c
 count_ones = X_c.sum(axis=0)

 # Laplace Smoothing
 feature_prob[idx] = (count_ones + smoothing) / (X_c.shape[0] + 2 * smoothing)

 # Store log probabilities
 self.feature_log_prob_ = np.log(feature_prob) # log P(x=1|c)
 self.feature_log_prob_neg_ = np.log(1 - feature_prob) # log P(x=0|c)

 def predict(self, X):
 """
 Predict labels for test matrix X.
 Returns binary labels (0 or 1).
 """
 X = np.asarray(X)
 n_samples = X.shape[0]

 # For numerical stability:
 # log(P(Y=c)) + Σ[ x_j * log(P(x_j=1|c)) + (1-x_j) * log(P(x_j=0|c)) ]
 log_probs = []
 for idx, c in enumerate(self.classes_):
 log_likelihood = (X * self.feature_log_prob_[idx] +
 (1 - X) * self.feature_log_prob_neg_[idx])
 total = self.class_log_prior_[idx] + log_likelihood.sum(axis=1)
 log_probs.append(total)

 log_probs = np.vstack(log_probs).T # shape: (n_samples, n_classes)

 # Pick highest posterior probability
 best = np.argmax(log_probs, axis=1)

 return self.classes_[best]

Even though the code looks scary, the flow of the code is pretty straightforward. In the forward function, two types of probability are calculated: How many times each class appears. And for each feature, how often is the class 0 or 1?

The predict function uses learned probabilities to classify the new data points.

Conclusion

By understanding the theory, the mathematics, and finally implementing it from scratch, you now have a complete picture of how Naive Bayes works end-to-end. Whether you’re preparing for interviews, building ML projects, or exploring deeper models in the future, mastering Naive Bayes gives you a strong foundation in probabilistic thinking and machine learning fundamentals.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Mastering Naive Bayes: Concepts, Math, and Python Code

Author(s): Jeet Mukherjee

Foundation

Theoretical Intuition

Mathematical Intuition

Code Implementation

Conclusion

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Mastering Naive Bayes: Concepts, Math, and Python Code

Author(s): Jeet Mukherjee

Foundation

Theoretical Intuition

Mathematical Intuition

Code Implementation

Conclusion

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement