
Classification Models as Detectives: Solving Mysteries with LDA, QDA, and Naive Bayes
Last Updated on May 13, 2025 by Editorial Team
Author(s): Ashwin Biju Alikkal
Originally published on Towards AI.

✒οΈ Introduction
Thereβs something about ML models that reminds me of detective stories.
Instead of jumping straight into, βWho did it?β like most classifiers, ML models say, βLetβs understand how each suspect behaves first.β They analyze the world from each classβs perspective and then β using Bayesβ Theorem β decide where a new observation belongs.
This post is about my understanding about models like Linear Discriminant Analysis(LDA), Quadratic Discriminant Analysis(QDA), and Naive Bayes (NB) work β not just the math, but the mindset behind them. And Iβll add some examples that made things click for me.
Hereβs what we will cover:
🧩 Index:
- Bayes Theorem
- Linear Discriminant Analysis (LDA)
- Quadratic Discriminant Analysis (QDA)
- Naive Bayes Classifier
- Which model to use and when?
1. Bayes Theorem
Before we jump into the models, letβs rewind to one of the most beautiful pieces of logic in probability: Bayesβ Theorem.
At its core, Bayesβ Theorem allows us to reverse probabilities β to go from βlikelihood of data given classβ to βlikelihood of class given data.β
The formula for Bayes Theorem is given below:

which we can write as:

Here we can write it as:


Now, there is a concept of Bayesβ Classifier. Bayes Classifier is a theoretical model that assigns a new observation x to the class k that has the highest posterior probability.
Now in the following models (LDA, QDA and Naive Bayes), we will use different estimates of f_k(x) given in the Bayes theorem formula above to approximate Bayes Classifier.
🕵οΈ Detective Setup: The Jewel Heist
A priceless ruby has been stolen from a mansion during a stormy night. The power was out. There are three suspects:
- Alice, the sophisticated housekeeper
- Bob, the moody gardener
- Cara, the mysterious art curator
Clues found at the scene: a muddy footprint, a half-eaten apple, and a red glove.
Letβs now see how the detective solves the mystery using different styles.
2. Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA) is a classification algorithm that models how each class generates the data and then uses Bayesβ Theorem to assign new observations to the most probable class.
Now we know that in the Bayes classifier, we need to compute:

In LDA, we assume that each class k has a multivariate normal (Gaussian) distribution

Here, LDA learns a separate mean vector for each class but uses a single shared covariance matrix. This assumption simplifies the math beautifully.
Now, when we plug the Gaussian density function into Bayesβ formula and do a little algebra (dropping constant terms), we arrive at the discriminant function.
(MATHEMATICAL PROOF COMING UP! BRACE YOURSELVES.)





The word βLinearβ in Linear Discriminant Analysis comes from the fact that the discriminant function are linear functions of x. So we can also say that the decision boundaries between the classes are linear.
🕵οΈ LDA Detective Style
The detective pulls out clean, organized notes.
βLetβs assume all suspects leave behind similar types of messes β same level of dirt and noise. But their average behavior differs.β
Using LDA, the detective assumes the suspects leave behind similarly messy clues (same covariance), but each has a unique pattern:
- Bob often leaves muddy boots,
- Alice drinks tea,
- Cara smells of paint thinner.
He plots past behavior and draws straight lines separating their known patterns.
The muddy footprint most closely matches Bob.
βHeβs not the only one who leaves mud β but he does it most often and most consistently.β
LDA strength: Clean, linear separation based on average behavior.
3. Quadratic Discriminant Analysis (QDA)
If LDA is the calm and structured sibling, QDA is its wilder, more flexible counterpart.
Quadratic Discriminant Analysis (QDA) is also a classification algorithm β just like LDA β but it makes one key change: It drops the assumption that all classes share the same covariance matrix.
This single shift changes everything. This gives QDA the power to model non-linear decision boundaries.
QDA still assumes:
- the class-conditional density f_k(x) follows a multivariate normal distribution.
- But each class k now has its own covariance matrix Ξ£_k.

Now just like above, we will substitute the f_k(x) into Bayesβ Theorem and hence, we get the discriminant function as:

Because this function includes quadratic terms in x, the decision boundaries are curved.
🕵οΈ QDA Detective Style
This time, the detective frowns. βMaybe each suspect creates their own brand of chaos.β
Bobβs messiness is all over the place β sometimes neat, sometimes a disaster. Alice is tidy. Caraβs clues tend to scatter.
Using QDA, the detective models each suspectβs unique pattern of behavior β with their own spread and irregularities. He realizes the muddy footprint + apple combo fits Bobβs erratic patterns β especially when itβs raining.
βEvery suspect makes a different kind of mess. This one has Bobβs signature randomness.β
QDA strength: Flexible, curved boundaries that adapt to unique clue patterns.
4. Naive Bayes (NB)
If QDA is the rebel, Naive Bayes is the minimalist.
Naive Bayes says: βWhy complicate things? Letβs assume all features are independent given the class, and just multiply the probabilities.β
Yes, itβs a huge assumption β and itβs almost never true. But surprisingly, it still works incredibly well in practice. Especially when you have lots of features and limited data.
Instead of assuming a multivariate Gaussian, it simplifies the class-conditional probability using this bold idea: All features are conditionally independent given the class.
This turns the joint likelihood into a simple product of individual predictor densities.

No covariance matrices, no complicated interactions β just simple, one-feature-at-a-time density.
So unlike LDA and QDA, it is not necessary that the different predictors follow normal distribution. It is possible that different predictors follow different distributions.
🕵οΈ Naive Bayes Detective Style
The detective checks his watch. βNo time to connect the dots. Letβs treat every clue separately.β
He breaks the scene into pieces:
- Muddy footprint β likely Bob
- Red glove β likely Cara
- Half-eaten apple β likely Alice
He checks records and multiplies the odds.
Even though these clues may be connected, Naive Bayes ignores those connections. It says: βThe highest individual match is Bob.β
βGive me a footprint and a glove β Iβll give you a suspect.β
NB strength: Fast, scalable decisions β especially useful when you have lots of clues (features) and need a quick call.
5. Which model to use and When?
At this point, youβve met our three detective-inspired classifiers β each with their quirks, strengths, and strategies.
Hereβs a breakdown to help you decide which one to call upon in your own ML investigations:

🕵οΈ Detective Recap
- LDA is the detective with a rulebook that makes clean decisions assuming suspects leave similar types of mess.
- QDA is the improviser that reads each suspectβs unique behavior and adapts with curved reasoning.
- Naive Bayes is the fast-and-furious type that takes each clue at face value, adds up the odds, and makes a quick call.
Every dataset has its own mystery. Your job isnβt just to solve it, but to choose the right detective to do so. Sometimes, a simple approach like Naive Bayes wins. Other times, the structured LDA or flexible QDA helps you draw sharper boundaries. Whatever path you take, remember: the story your data tells depends on who you ask to read it.
Source and inspirations:
- ISLR Book (https://www.statlearning.com/) [MUST READ]
- StatQuest videos (https://www.youtube.com/@statquest) [BAM!]
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI