
Classification Models as Detectives: Solving Mysteries with LDA, QDA, and Naive Bayes
Last Updated on May 13, 2025 by Editorial Team
Author(s): Ashwin Biju Alikkal
Originally published on Towards AI.

✒️ Introduction
There’s something about ML models that reminds me of detective stories.
Instead of jumping straight into, “Who did it?” like most classifiers, ML models say, “Let’s understand how each suspect behaves first.” They analyze the world from each class’s perspective and then — using Bayes’ Theorem — decide where a new observation belongs.
This post is about my understanding about models like Linear Discriminant Analysis(LDA), Quadratic Discriminant Analysis(QDA), and Naive Bayes (NB) work — not just the math, but the mindset behind them. And I’ll add some examples that made things click for me.
Here’s what we will cover:
🧩 Index:
- Bayes Theorem
- Linear Discriminant Analysis (LDA)
- Quadratic Discriminant Analysis (QDA)
- Naive Bayes Classifier
- Which model to use and when?
1. Bayes Theorem
Before we jump into the models, let’s rewind to one of the most beautiful pieces of logic in probability: Bayes’ Theorem.
At its core, Bayes’ Theorem allows us to reverse probabilities — to go from “likelihood of data given class” to “likelihood of class given data.”
The formula for Bayes Theorem is given below:

which we can write as:

Here we can write it as:


Now, there is a concept of Bayes’ Classifier. Bayes Classifier is a theoretical model that assigns a new observation x to the class k that has the highest posterior probability.
Now in the following models (LDA, QDA and Naive Bayes), we will use different estimates of f_k(x) given in the Bayes theorem formula above to approximate Bayes Classifier.
🕵️ Detective Setup: The Jewel Heist
A priceless ruby has been stolen from a mansion during a stormy night. The power was out. There are three suspects:
- Alice, the sophisticated housekeeper
- Bob, the moody gardener
- Cara, the mysterious art curator
Clues found at the scene: a muddy footprint, a half-eaten apple, and a red glove.
Let’s now see how the detective solves the mystery using different styles.
2. Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA) is a classification algorithm that models how each class generates the data and then uses Bayes’ Theorem to assign new observations to the most probable class.
Now we know that in the Bayes classifier, we need to compute:

In LDA, we assume that each class k has a multivariate normal (Gaussian) distribution

Here, LDA learns a separate mean vector for each class but uses a single shared covariance matrix. This assumption simplifies the math beautifully.
Now, when we plug the Gaussian density function into Bayes’ formula and do a little algebra (dropping constant terms), we arrive at the discriminant function.
(MATHEMATICAL PROOF COMING UP! BRACE YOURSELVES.)





The word “Linear” in Linear Discriminant Analysis comes from the fact that the discriminant function are linear functions of x. So we can also say that the decision boundaries between the classes are linear.
🕵️ LDA Detective Style
The detective pulls out clean, organized notes.
“Let’s assume all suspects leave behind similar types of messes — same level of dirt and noise. But their average behavior differs.”
Using LDA, the detective assumes the suspects leave behind similarly messy clues (same covariance), but each has a unique pattern:
- Bob often leaves muddy boots,
- Alice drinks tea,
- Cara smells of paint thinner.
He plots past behavior and draws straight lines separating their known patterns.
The muddy footprint most closely matches Bob.
“He’s not the only one who leaves mud — but he does it most often and most consistently.”
LDA strength: Clean, linear separation based on average behavior.
3. Quadratic Discriminant Analysis (QDA)
If LDA is the calm and structured sibling, QDA is its wilder, more flexible counterpart.
Quadratic Discriminant Analysis (QDA) is also a classification algorithm — just like LDA — but it makes one key change: It drops the assumption that all classes share the same covariance matrix.
This single shift changes everything. This gives QDA the power to model non-linear decision boundaries.
QDA still assumes:
- the class-conditional density f_k(x) follows a multivariate normal distribution.
- But each class k now has its own covariance matrix Σ_k.

Now just like above, we will substitute the f_k(x) into Bayes’ Theorem and hence, we get the discriminant function as:

Because this function includes quadratic terms in x, the decision boundaries are curved.
🕵️ QDA Detective Style
This time, the detective frowns. “Maybe each suspect creates their own brand of chaos.”
Bob’s messiness is all over the place — sometimes neat, sometimes a disaster. Alice is tidy. Cara’s clues tend to scatter.
Using QDA, the detective models each suspect’s unique pattern of behavior — with their own spread and irregularities. He realizes the muddy footprint + apple combo fits Bob’s erratic patterns — especially when it’s raining.
“Every suspect makes a different kind of mess. This one has Bob’s signature randomness.”
QDA strength: Flexible, curved boundaries that adapt to unique clue patterns.
4. Naive Bayes (NB)
If QDA is the rebel, Naive Bayes is the minimalist.
Naive Bayes says: “Why complicate things? Let’s assume all features are independent given the class, and just multiply the probabilities.”
Yes, it’s a huge assumption — and it’s almost never true. But surprisingly, it still works incredibly well in practice. Especially when you have lots of features and limited data.
Instead of assuming a multivariate Gaussian, it simplifies the class-conditional probability using this bold idea: All features are conditionally independent given the class.
This turns the joint likelihood into a simple product of individual predictor densities.

No covariance matrices, no complicated interactions — just simple, one-feature-at-a-time density.
So unlike LDA and QDA, it is not necessary that the different predictors follow normal distribution. It is possible that different predictors follow different distributions.
🕵️ Naive Bayes Detective Style
The detective checks his watch. “No time to connect the dots. Let’s treat every clue separately.”
He breaks the scene into pieces:
- Muddy footprint → likely Bob
- Red glove → likely Cara
- Half-eaten apple → likely Alice
He checks records and multiplies the odds.
Even though these clues may be connected, Naive Bayes ignores those connections. It says: “The highest individual match is Bob.”
“Give me a footprint and a glove — I’ll give you a suspect.”
NB strength: Fast, scalable decisions — especially useful when you have lots of clues (features) and need a quick call.
5. Which model to use and When?
At this point, you’ve met our three detective-inspired classifiers — each with their quirks, strengths, and strategies.
Here’s a breakdown to help you decide which one to call upon in your own ML investigations:

🕵️ Detective Recap
- LDA is the detective with a rulebook that makes clean decisions assuming suspects leave similar types of mess.
- QDA is the improviser that reads each suspect’s unique behavior and adapts with curved reasoning.
- Naive Bayes is the fast-and-furious type that takes each clue at face value, adds up the odds, and makes a quick call.
Every dataset has its own mystery. Your job isn’t just to solve it, but to choose the right detective to do so. Sometimes, a simple approach like Naive Bayes wins. Other times, the structured LDA or flexible QDA helps you draw sharper boundaries. Whatever path you take, remember: the story your data tells depends on who you ask to read it.
Source and inspirations:
- ISLR Book (https://www.statlearning.com/) [MUST READ]
- StatQuest videos (https://www.youtube.com/@statquest) [BAM!]
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.