Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

The Covariance and Correlation Clutter…

Last Updated on July 25, 2023 by Editorial Team

Author(s): Astha Puri

Originally published on Towards AI.

For the longest time, I remember being confused between these two devils — covariance and correlation. And the resemblance DID NOT help! 🙁

So here I am, writing my first post and making an attempt to simplify the massive world of data — and stats that come with it. I’ll try to keep my posts short and sweet. I hope they help the impatient newbies like me out there to stay motivated. So let's learn and crush the clutter!!

Okay, try to put yourself in the scenarios described below. How would you feel if:
1. you’re in a race but you’re not told how long it is. The deal is to tell you every 1 minute about the distance you’ve covered.
2. your friend takes you to a concert and at the top of each hour, you’re told how much time has passed but not how long the concert is.
3. you’re writing an exam. You got to finish all the question BUT you don’t know how long the exam is! (This one gives me the chills..)

Anyway, pretty scary, huh? Okay, let's get back to statistics. Maybe it wouldn’t sound so scary now!

So, covariance and correlation both tell us how the relationship between two variables is. Does an increase in one lead to an increase in another?

1. Yes? Good. Then you say the two variables have a positive correlation and a positive covariance. For example, the more you drink water, the more you pee!
2. No? Okay then..maybe you have two variables that are completely independent of each other. For example — how many hours I sleep does not impact how much rain South Dakota gets.
3. Hang on! Is an increase in one variable decreasing the other? Voila! You got a negative correlation and covariance. Umm lets see…say the more I eat, the thinner I get? Haha, I wish.

Anyway…so then why two different terms? Haven’t we got enough in this world to learn already?

Well, remember the three scary scenarios I gave above? What made them scary? I don’t mind giving an exam or participating in a race..just tell me how long each of them is, and I’ll be okay!

That exactly is the difference between covariance and correlation. Covariance values have no bound, but correlation will stick between -1 and 1.
So I could say A and B have a covariance of 20 or covariance of 50. How do you measure impact? There is no upper limit. So we know that A and B are moving together, but how impactful is it? That's where correlation comes in. I could tell you A and B have a correlation of 0.3 or 0.7 or any other value for that matter. But when I say this, you have 1 as the upper bound of correlation, so it gives you a better picture of the strength of the impact!

Let's look at an example in python:

First, we use a random number generator to generate arrays:

The way the arrays are generated, the second array values increase from their respective first array values. Let's calculate the covariance matrix.

The diagonal of the matrix = covariance between each variable and itself. The other values = covariance between the two variables.

So the covariance here is 3.15…..No good in 3.15 as an absolute number without context right? All we know if there is a positive relation, but how strong?

Let's look at the correlation.

Great…I hope you enjoyed the read 🙂 Au revoir, for now..

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI