Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Statistics for Machine Learning A-Z
Latest

Statistics for Machine Learning A-Z

Last Updated on January 6, 2023 by Editorial Team

Last Updated on June 12, 2022 by Editorial Team

Author(s): Gencay I.

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Briefly Explained

Photo by Dan-Cristian Pădureț on Unsplash
Contents
Β· Introduction
Β· Terms
∘ Numerical Variable
∘ Categorical Variable
∘ Continuous Variable
∘ Discrete Variables
∘ Dependent Variable:
∘ Independent Variable
∘ Observational Studies
∘ Experimental Studies
∘ Simple Random Sample
∘ Stratified Sample
∘ Placebo Effect
∘ Generalizability
∘ Histogram
∘ Dot plot
∘ Boxplot
∘ IQR
∘ Q3
∘ Q1
∘ Left skewed
∘ Right skewed
∘ Symmetric
∘ Mean
∘ Median
∘ Average
∘ Variance
∘ Standard deviation
∘ Mode
∘ Null Hypothesis
∘ Alternative Hypothesis
∘ P-Value
∘ Law of Large Numbers
∘ Mutually Exclusive ( Disjoint)
∘ Non-disjoint
∘ Probability Trees
∘ Normal Distribution
∘ Binomial Distribution
∘ Bernoulli Distribution
∘ PDF (Probability Density Function)
∘ Z Score
∘ Percentiles
∘ Sampling Variability
∘ Central Limit Theorem
∘ Confidence Interval
∘ Significance Level
∘ Power
∘ Accuracy
∘ Precision
∘ Statistical Inference
∘ Type 1 Error
∘ Type 2 Error
∘ T Distribution
∘ Degrees of Freedom
Β· Conclusion

Introduction

Programming, Statistics, Calculus.

These are 3 things that you should be familiar with if you would like to be involved in Machine Learning.

Image ByΒ Author

While there are too many courses that existed in the market, I love creating that kind of article to remind myself of theseΒ terms.

That helps me refresh my memories and make repetition.

Repetition is the mother of learning, the father of action, which makes it the architect of accomplishment.” ZigΒ Ziglar

Whether you are at the beginning of your Data Science or Machine Learning career or experienced one, that article will serve you to create a neural path in your mind about Statistics.

Let’s dive into these terms from beginning to intermediate.

Terms

Numerical Variable

The value that contains anΒ integer.

Categorical Variable

Contains categories instead of numbers, such as human body shapes such as; skinny, fat, or muscular.

Continuous Variable

1, 2, 3,4, 5,6, 7 … Take a number of values in a givenΒ range.

Discrete Variables

1, 5, 8, 11, 35Β . The specific set ofΒ values.

Dependent Variable:

The two variables, when one changes if the other will change will be dependent variables.

Independent Variable

If others won't change, independent.

Observational Studies

The methods won't be specified by researchers. For example, when they asked you about the method of losing weight, they do not offer you the method such as diet or sport, you could say whatever youΒ like.

Experimental Studies

Now there are limited options, choose one, diet orΒ sport.

Simple RandomΒ Sample

You could choose anything.

Stratified Sample

Split populations into the clusters, then randomly sample from eachΒ cluster.

Placebo Effect

You will use fakeΒ care.

Generalizability

Could we draw a conclusion as a result of our data on the population?

Image ByΒ Author

Histogram

It provides a useful view of dataΒ density.

https://statisticsbyjim.com/basics/histograms/

Dot plot

If your sample size is small and you want to view individual dataΒ points.

https://r-coder.com/dot-plot-r/

Boxplot

It is good to see statistical values such as IQR andΒ median.

https://byjus.com/maths/box-plot/

IQR

Interquartile range,Range of thee middle 50 %,Β Q3-Q1.

Q3

75th percentile.

Q1

25th percentile.

https://www.sigmamagic.com/blogs/how-to-interpret-skewness-and-kurtosis/

Left skewed

The tail will be on the right side and the density, mean<Β median.

Right skewed

The tail will be on the right, mean >Β median.

Symmetric

The mean and median are close together.

Mean

Arithmetic average.

Median

is the number that exists in theΒ middle.

1,5,7Β , medianΒ :Β 5

Average

Sum and divide by the number of integers,

a+b/2, a+b+c/3

Variance

The average squared deviation of theΒ mean.

n: number of sample
Variance = (Number1-mean)**2 + (number2 -mean) ** 2 …. + (number n-mean) **2 / (n-1)

Standard deviation

The square root of the variance.

Standard deviation : (Variance) ** 1/2

Mode

Most frequentΒ number.

1,4,4, 7Β , modeΒ :Β 4

Null Hypothesis

Nothing going on, everything should be theΒ same.

Alternative Hypothesis

Something going on, something should beΒ changed.

P-Value

The possibility of your null hypothesis isΒ true.

If the p-value < 0.05, you would reject the null hypothesis, and accept the alternative:

If the p-value >0.05, you would reject the alternative hypothesis.

Law of LargeΒ Numbers

As the sample size increases, the mean would be closer to the population means.

Mutually Exclusive ( Disjoint)

Cases that can not happen at identical times.

Image byΒ Author

Non-disjoint

Cases that could happen at an identical time.

Image byΒ Author

Probability Trees

The trees of continuous possibilities.

https://www.mathsisfun.com/data/probability-tree-diagrams.html

Normal Distribution

It is a probability distribution, that shape is symmetric around theΒ mean.

https://www.statology.org/the-normal-distribution/

Binomial Distribution

Probability of success or failure ( 2 possible outcomes, like heads or tail.Β )

Bernoulli Distribution

It is a discrete distribution, and we have still 2 possible outcomes, 0 orΒ 1.

https://www.statology.org/bernoulli-vs-binomial/

PDF (Probability Density Function)

The function provides the possibility of the value of a random variable will be in the predefined range.

Z Score

(observation-mean) /Β SD

Z score of the mean isΒ zero.

Percentiles

Sometimes images are stronger than words for describing.

https://online.stat.psu.edu/stat800/book/export/html/741

Sampling Variability

It is impossible to gather data from the whole population, we will gather information from different samples and sampling them together viaΒ this.

Central LimitΒ Theorem

Describes shapes centers and spreads of sampling distributions when certain conditions are matched. (nearly normal population)(center-mean)(spread-SE)

https://www.investopedia.com/terms/c/central_limit_theorem.asp

Confidence Interval

A reasonable range of values for the population parameter is called a confidence interval.

https://nulib.github.io/moderndive_book/10-CIs.html

%90 confident of the β€œ Estimate Β±1.65βˆ—SE(Estimate)”

%95 confident that the β€œ Estimate Β±1.96βˆ—SE(Estimate) = (xβˆ’1.96Οƒ/√n, x+1.96Οƒ/√n)

% 99 confident that the β€œ β€œEstimate Β±2.58βˆ—SE(Estimate)”

Significance Level

The probability of rejecting Ho when you shouldn't do.

Power

the likelihood of rejecting the null hypothesis when it should be rejected.

Accuracy

How close is your estimation to the trueΒ value?

Precision

It is a quality measurement, how close will your two measurements be to eachΒ other.

Statistical Inference

Finding a conclusion from your data by using statistics.

Type 1Β Error

It occurs when rejecting a null hypothesis but it should not have to be rejected.

Type 2Β Error

It occurs when the null hypotheses should be rejected but youΒ don't.

T Distribution

Comparing means while standard deviation isΒ unknown.

Degrees ofΒ Freedom

Determines thickness ofΒ tails.

As the degrees of freedom increase, the shape of the t-distribution approaches the normal distribution.

Conclusion

There are too many terms that existed which I maybe write another article in nearΒ future.

That depends both on the statistics of that article and myΒ path.

If you want a follow-up article to that one, do not forget β€œthumbs up” and followΒ me.

Thanks again.

β€œMachine learning is the last invention that humanity will ever need to make.” NickΒ Bostrom


Statistics for Machine Learning A-Z was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓