Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Module 1 Part -01 Building Block of Data Analytics
Data Science   Latest   Machine Learning

Module 1 Part -01 Building Block of Data Analytics

Author(s): Sudeep

Originally published on Towards AI.

created by me using canva

If you are wondering what is this Module 1 and related stuff, please refer this : What is Data Analytics

So it all starts with Statistics

At a high level, statistics is a collection of methods that help us analyze, summarize, and interpret data. To dive into statistics, we first need to understand what data is and its various types.

Data

Data is a collection of facts, numbers, words, or observations that can be used to learn about something. Data can be represented in many different ways and can be used for a variety of purposes.

Data can be divided into mainly 3 types

made with whimsical

Now lets learn about the types of statistics

There are broadly two types of statistics

1) Descriptive Statistics
formal definition: Descriptive statistics are methods used to summarize and describe the main features of a dataset
In-short it has so many methods which helps get summary of data , well it has methods such as mean, mode, medium.

2) Inductive/ Inferential Statistics
formal definition: Inferential statistics involves drawing conclusions or making inferences about a population based on data collected from a sample of that population.
In-short inferential statistics is all about understanding the β€˜why’ and β€˜how’ behind the data patterns we observe.

Terms

Population: The whole data is called population.
Then What is a part of the population called? …🤔 well its called Samples. And also known as observation, tuples, feature Matrix.
Bonus: Attributes are known as Features

Variables

Variables are of mainly two types:

Made with Whimsical

An example for Nominal variable are as follows :
Colors of cars in a parking lot (Red, Blue, Black, White).
Types of payment methods used in a store (Cash, Credit Card, Debit Card, Mobile Payment).

An example for Ordered variable are as follows :
Education levels (High School, Bachelor’s, Master’s, PhD)
Customer satisfaction ratings (Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied)
T-shirt sizes (XS, S, M, L, XL)

Lets focus more on Descriptive statistics…

made with whimsical

What is measure of central tendency🤔…?
Its nothing much , basically it include methods like Mean Medium Mode

1. Mean

  • Definition: The average value.
  • Formula: Mean = (Sum of all values) / (Number of values).
  • Example: The average age of students in a class.

2. Median

  • Definition: The middle value when data is sorted.
  • Tip: For even-sized datasets, take the average of the two middle values.
  • Example: The median salary in a company can give you a better idea of employee earnings when there are outliers.

3. Mode

  • Definition: The most frequent value in a dataset.
  • Example: The most popular product sold in an online store.

But What is Measure of Dispersion
A measure of dispersion is a statistical value that indicates how spread out a set of data is around a central value. It can help you determine if the data is stretched out or squeezed together
Some examples are
Range: It is defined as the difference between the largest and the smallest value in the distribution.

Mean Deviation: It is the arithmetic mean of the difference between the values and their mean.

Standard Deviation: It is the square root of the arithmetic average of the square of the deviations measured from the mean.

Variance: It is defined as the average of the square deviation from the mean of the given data set.

Quartile Deviation: It is defined as half of the difference between the third quartile and the first quartile in a given data set.

Interquartile Range: The difference between upper(Q3 ) and lower(Q1) quartile is called Interterquartile Range. Its formula is given as Q3 β€” Q1.

In summary, what we discussed in this post are the fundamental building blocks of data analytics, specifically:

The foundation of statistics and its two main branches:

  • Descriptive Statistics: Methods for summarizing data.
  • Inferential Statistics: Drawing conclusions about populations based on sample data.

Basic terminology in data analytics:

  • Population: The complete dataset.
  • Samples: Subsets of the population.
  • Features (also called attributes): Characteristics we measure.

The classification of variables:

  • Numerical variables.
  • Categorical variables,

which include:

  • Nominal data (e.g., colors, payment methods).
  • Ordinal data (e.g., education levels, satisfaction ratings).

Important statistical measures:

  1. Measures of Central Tendency:
  • Mean: The average.
  • Median: The middle value.
  • Mode: The most frequent value.

2. Measures of Dispersion:

  • Range: Difference between the largest and smallest values.
  • Mean Deviation: The mean of the differences from the mean.
  • Standard Deviation: The square root of the average squared differences from the mean.
  • Variance: The average of the squared differences from the mean.
  • Quartile Deviation: Half the difference between the third and first quartiles.
  • Interquartile Range: The difference between the upper (Q3) and lower (Q1) quartiles.

That’s it for this post! Stay tuned for Part 2, coming soon in the next 3–5 days. 😉

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓