Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Free: 6-day Agentic AI Engineering Email Guide.
Learnings from Towards AI's hands-on work with real clients.
Mastering Naive Bayes: Concepts, Math, and Python Code
Latest   Machine Learning

Mastering Naive Bayes: Concepts, Math, and Python Code

Last Updated on December 9, 2025 by Editorial Team

Author(s): Jeet Mukherjee

Originally published on Towards AI.

You can never ignore Probability when it comes to learning Machine Learning. Naive Bayes is a Machine Learning algorithm that utilizes Bayes' theorem from probability theory as its foundation. It is primarily used for building classifiers, which are known as Naive Bayes Classifiers. In this article, we will cover everything, from the basics to the advanced.

Mastering Naive Bayes: Concepts, Math, and Python Code

There’ll be four sections in the article —

  1. Foundation — The foundation section will contain probability theory that is essential for Naive Bayes.
  2. Theoretical Intuition — Section 2 will explain how the algorithm works.
  3. Mathematical Intuition — In this section, we’ll focus on the math behind the Naive Bayes.
  4. Code — Implementation of Naive Bayes using Python (using Problem Statement from DeepML)

Foundation

Let’s start with conditional probability. This is the basic starting point for Naive Bayes. If you know nothing about probability, please refer to this first — https://medium.com/@themukherjee/a-beginners-guide-to-probability-a-quick-crash-course-b9426ea39c7b

Conditional Probability is the likelihood of an event happening given that another event has already occurred. This is denoted as P(A|B), often referred to as the Probability of A given B.

Formula of Conditional Probability —

P(A|B) = P(A∩B) / P(B)

The above formula explains the probability of an event A given that the event B has already occurred. This basically says that the sample space has shrunk.

Bayes’ Theorem is an extension of Conditional Probability. It is a mathematical formula that describes the probability of an event based on prior knowledge of conditions related to the event.

Formula of Bayes’ Theorem —

P(A|B) = P(B|A)P(A) / P(B)

A mathematical example for Bayes’ Theorem —

Q. There are three bags, each with 10 balls.

  • Bag 1 has 3 red balls
  • Bag 2 has 4 red balls
  • Bag 3 has 5 red balls

You pick one bag at random and then draw a ball.
The ball you drew is red.

What is the probability that this red ball came from Bag 1?

If you take a close look, you would notice that in a sequence of two events, we already know the output of the final event, and trying to find the probability of the first event.

Probability of getting a bag — 1/3 (As we have 3 bags)

  • Probability of getting a red ball out of each bag —

P(Getting a red ball out of bag 1) = 3/10, P(Getting a red ball out of bag 2) = 4/10, P(Getting a red ball out of bag 3) = 5/10

  • Total Probability of getting a Red ball — P(Red)

P(Red) = 1/3(3/10+4/10+5/10) = 12/30 = 2/5

  • Probability of getting a red ball from bag 1 —

P(Bag 1|Red) = P(Red|Bag 1) P(Bag 1) / P(Red) = [(3/10 * 1/3) / 2/5] = 1/4

The probability that this red ball came from Bag 1 is 25%.

Theoretical Intuition

In the previous section, we discussed a brief overview of Conditional Probability along with Bayes’ Theorem. Now, let’s learn how Naive Bayes works. Table 1 shows a sample dataset used to classify emails as spam or not spam. We’ll use this dataset to explain the Naive Bayes algorithm.

Table 1: Email Spam Dataset Table

The intuition behind the dataset is, given Contains “Free”, Contains “Win”, and Contains “Money"—can the email be labeled as Spam or Not Spam? To put it in probabilistic terms —

P(Label = Spam | Contains “Free” = Yes Contains “Win” = Yes Contains “Money” = Yes)

and we will also calculate P(Label = Not Spam | Contains “Free” = Yes Contains “Win” = Yes Contains “Money” = Yes)

So, it says the probability of an email being Spam or Not Spam given Contains “Free” = Yes, Contains “Win” = Yes, and Contains “Money” = Yes. If you take a look closely, you will notice that this actually takes us back to Conditional Probability. We drop the Email ID as it’s not gonna help much to classify the email as Spam or Not Spam. Now, the highest probability value between P(Spam) and P(Not Spam) will be considered for classifying a new email as Spam or Not Spam.

Let’s assume —

P(Spam) = P(A) and P(Contains “Free” = Yes Contains “Win” = Yes Contains “Money” = Yes) = P(B)

P(A|B) = P(B|A)P(A) / P(B) to calculate for Spam.

P(A`|B) = P(B|A`)P(A`) / P(B) to calculate for Not Spam

As we can see, the denominator is the same for both equations, so we remove it. And the equations are as follows —

P(A|B) = P(B|A)P(A) to calculate for Spam.

P(A`|B) = P(B|A`)P(A`) to calculate for Not Spam

Now let’s calculate probabilities one by one —

P(A) = P(Spam) = 3/6 [Refer to Table 1 for the data]

P(A`) = P(Not Spam) = 3/6

P(B|A) = P(Yes, Yes, Yes | Spam ) = 0. If you refer to the table, you would notice that there’s no combination where Contains “Free” = Yes, Contains “Win” = Yes, Contains “Money” = Yes, and Label = Spam. Which makes sense also. Whenever we try such a combination, there’s a high chance that it will give a probability value of 0. So, we decompose it to —

P(Contains “Free” = Yes|Spam) P(Contains “Win” = Yes|Spam) P(Contains “Money” = Yes|Spam) P(Spam)

Now, let’s classify an email with Contains “Free” = Yes, Contains “Win” = Yes, and Contains “Money” = No.

And again let’s assume — P(Spam) = P(A) and P(Contains “Free” = Yes Contains “Win” = Yes Contains “Money” = No) = P(B)

P(Contains “Free” = Yes|Spam) = 2/3, P(Contains “Win” = Yes|Spam) = 2/3, P(Contains “Money” = No|Spam) = 1 — 1/3 = 2/3

P(Contains “Free” = Yes|Not Spam) = 1/3, P(Contains “Win” = Yes|Not Spam) = 0, P(Contains “Money” = No|Not Spam) = 1 — 1/3 = 2/3

Now, P(A|B) = 2/3 * 2/3 * 2/3 * 3/6 = 4/27 = 0.148

P(A`|B) = 1/3 * 0 * 2/3 * 3/6 = 0

As we can clearly see that the probability of the new email being Spam is 0.148 and being Not Spam is 0. And 0.148 > 0, so this new email will be classified as Spam.

Mathematical Intuition

Now we’ll deep dive into the mathematical intuition of Naive Bayes.

X = {X₁ + X₂ + X₃ … + Xₙ}

Cₖ = { C₁ + C₂ + C₃ … + Cₖ}

Here, X represents the independent features from the dataset. And C represents classification values from the target column(also known as the dependent column).

Now to calculate the probability of Cₖ given X—

P(Cₖ | X) = P(X | Cₖ) P(Cₖ) / P(X)

As P(X) is common in the denominator, we’ll remove it. Also, with the help of Conditional Probability, we can prove —

P(A|B) = P(AB)/P(B) which further gives us → P(A|B)P(B) = P(AB)

Now, as we have removed the denominator

P(Cₖ | X) = P(X | Cₖ) P(Cₖ) → P(Cₖ | X) = P(X Cₖ) → P(Cₖ | X) = P(X, Cₖ)

Which finally gives us —

P(Cₖ | X) = P(X₁, X₂, X₃ … Xₙ, Cₖ)

Using the Chain Rule of Conditional Probability —

P(X₁, X₂, X₃ … Xₙ, Cₖ) = P( X₁ | X₂, X₃ … Xₙ, Cₖ) P(X₂ | X₃ … Xₙ, Cₖ) … P(Xₙ₋₁ | Xₙ, Cₖ) P(Xₙ | Cₖ)

Now, as we have previously seen that P( X₁ | X₂, X₃ … Xₙ, Cₖ) can be 0 as well. So we take a Naive assumption here.

Using the Conditional Independence theorem —

P(Cₖ | X) = P(X₁|Cₖ)P(X₂|Cₖ)P(X₃|Cₖ)…P(Xₙ | Cₖ)P(Cₖ)

So, the final formula with proper notation looks like this —

If you remember, in the theoretical section, we calculated P(Contains “Free” = Yes|Spam) P(Contains “Win” = Yes|Spam) P(Contains “Money” = Yes|Spam) P(Spam). And the output with the highest probability is used as the value of predicted value.

Code Implementation

We’ll use a problem statement from Deep-ML. It will help us to implement Bernoulli Naive Bayes. In Bernoulli Naive Bayes, the features have binary values, for example —

Contains “Win” or Contains “Free” can only have values like 0 or 1. This is something we have already seen in our Email Spam example.

Figure 1 represents the problem statement from Deep-ML.

Figure 1: The Problem Statement

The solution contains one forward function to perform the training, and one predict function for making predictions.

import numpy as np

class NaiveBayes:
def __init__(self, smoothing=1.0):
self.smoothing = smoothing
self.class_log_prior_ = None
self.feature_log_prob_ = None
self.classes_ = None

def forward(self, X, y):
"""
Train Bernoulli Naive Bayes.
X: 2D binary matrix (n_samples, n_features)
y: 1D array of class labels (0 or 1)
"""

X = np.asarray(X)
y = np.asarray(y)
self.classes_ = np.unique(y)

n_samples, n_features = X.shape
smoothing = self.smoothing

# Count occurrences of each class
class_counts = np.array([(y == c).sum() for c in self.classes_])
total_samples = len(y)

# Compute log priors: log(P(Y=c))
self.class_log_prior_ = np.log(class_counts / total_samples)

# Compute feature probabilities for Bernoulli NB
# P(x_j=1 | y=c) for each class c and feature j
feature_prob = np.zeros((len(self.classes_), n_features))

for idx, c in enumerate(self.classes_):
X_c = X[y == c] # samples belonging to class c
count_ones = X_c.sum(axis=0)

# Laplace Smoothing
feature_prob[idx] = (count_ones + smoothing) / (X_c.shape[0] + 2 * smoothing)

# Store log probabilities
self.feature_log_prob_ = np.log(feature_prob) # log P(x=1|c)
self.feature_log_prob_neg_ = np.log(1 - feature_prob) # log P(x=0|c)

def predict(self, X):
"""
Predict labels for test matrix X.
Returns binary labels (0 or 1).
"""

X = np.asarray(X)
n_samples = X.shape[0]

# For numerical stability:
# log(P(Y=c)) + Σ[ x_j * log(P(x_j=1|c)) + (1-x_j) * log(P(x_j=0|c)) ]
log_probs = []
for idx, c in enumerate(self.classes_):
log_likelihood = (X * self.feature_log_prob_[idx] +
(1 - X) * self.feature_log_prob_neg_[idx])
total = self.class_log_prior_[idx] + log_likelihood.sum(axis=1)
log_probs.append(total)

log_probs = np.vstack(log_probs).T # shape: (n_samples, n_classes)

# Pick highest posterior probability
best = np.argmax(log_probs, axis=1)

return self.classes_[best]

Even though the code looks scary, the flow of the code is pretty straightforward. In the forward function, two types of probability are calculated: How many times each class appears. And for each feature, how often is the class 0 or 1?

The predict function uses learned probabilities to classify the new data points.

Conclusion

By understanding the theory, the mathematics, and finally implementing it from scratch, you now have a complete picture of how Naive Bayes works end-to-end. Whether you’re preparing for interviews, building ML projects, or exploring deeper models in the future, mastering Naive Bayes gives you a strong foundation in probabilistic thinking and machine learning fundamentals.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.