Standard Deviation For Dummies
Last Updated on November 14, 2024 by Editorial Team
Author(s): Igor Novikov
Originally published on Towards AI.
I bet youβve heard about standard deviation but what does it mean?
Here is an explanation that even your dog can understand.
Standard deviation is closely related to variance. Variance is, unsurprisingly enough, a measure of variance of values in a certain dataset. It shows how different (or similar) items are in the group. For example, on average, a manβs height is 174 centimeters. But if you stop a random dude on the street β his height is likely to be different. So, for example for your neighborhood, if you stop a certain amount of dudes (say ten dudes) β you can calculate the variance of heights. It can be calculated like this (Iβll explain the formula a bit later):
D(X) =sum(xi-average)Β²/ n,
where:
- n β number of elements (ten),
- xi β height of dude number i
So, letβs assume the heights of the ten dudes we met are as follows:
1: 172
2: 163
3: 154
4: 181
5: 190 (lucky dude)
6: 170
7: 174 (average Joe)
8: 168
9: 178
10: 160
Now letβs calculate the variance. First, the average of all values is:
average = (172 + 163 + 154 + 181 + 190 + 170 + 174 + 168 + 171 + 160)/10 = 171 centimeters
As you can see dudes in our neighborhood are somewhat shorter than the countryβs averageβ¦ Probably smoked too much in their teenage years or something. Now letβs look at how different are men in our neighborhood are β that is how similar their heights are. For that, we need to calculate the variance:
- (172β171)Β²= 1Β²=1
- (163β171)Β²= -8Β²=64
- (154β171)Β²= -17Β²=289
- (181β171)Β²= 10Β²=100
- (190β171)Β²= 19Β²=361
- (170β171)Β²= -1Β²=1
- (174β171)Β²= 3Β²=9
- (168β171)Β²=-3Β²=9
- (178β171)Β²=7Β²=49
- (160β171)Β²=-11Β²=121
The difference between the average and a particular dudeβs height is what variance is, but why do we square it?
Well, we do that because otherwise, we can get a negative number (if the current dude is lower than average). In terms of understanding the variability negative numbers do not make sense, so we square it to make the measure always positive.
Now letβs calculate the sum:
1 + 64 + 289 + 100 + 361 + 1 + 9 + 9 + 49 + 121 = 1004
And the variance = 1004/10 = 100.4.
But what does it mean here? And what units does it use? Well, since we squared to difference β it doesnβt use the original units (centimeters), obviously. And it is difficult to interpret because of thatβ¦ So here comes the standard deviation, we simply take the square root of the variance (to sort of reverse the squaring):
Standard Deviation (std) = square_root (100.4) = 10 centimeters
Now itβs centimeters! Much easier to understand, and it means that on average, the height of average men in our neighborhood is different (smaller or bigger) than the average by 10 centimeters. On average β it is important but for a given specific dude it can be different.
So, standard deviation is defined as a measure of the amount of variation of the values of a variable about its mean. Now you can understand what that actually means.
The important property of standard deviation is that in normal distribution, about 95% of values will be within 2 standard deviations of the mean. That means in our example that 95% of the men in our neighborhood will have a height in the range of 171 Β± 20 centimeters (2*10). And 99.7% (almost everybody) will be within 3 standard deviations. That means that dudes 2 meters high are very rare (and they are called outliers, because they lie outside of three standard deviations).
Normal distribution (also called Gaussian) is the most important for practical purposes. It is bell-shaped (see below) and most natural and social phenomena in real life correspond to normal distribution. Why? No idea, itβs just a fact. I guess the universe likes symmetry or something.
So hopefully now you can understand standard deviation better. Have fun!
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI