Module 1 Part -01 Building Block of Data Analytics

Author(s): Sudeep

Originally published on Towards AI.

If you are wondering what is this Module 1 and related stuff, please refer this : What is Data Analytics

So it all starts with Statistics

At a high level, statistics is a collection of methods that help us analyze, summarize, and interpret data. To dive into statistics, we first need to understand what data is and its various types.

Data

Data is a collection of facts, numbers, words, or observations that can be used to learn about something. Data can be represented in many different ways and can be used for a variety of purposes.

Data can be divided into mainly 3 types

Now lets learn about the types of statistics

There are broadly two types of statistics

1) Descriptive Statistics
formal definition: Descriptive statistics are methods used to summarize and describe the main features of a dataset
In-short it has so many methods which helps get summary of data , well it has methods such as mean, mode, medium.

2) Inductive/ Inferential Statistics
formal definition: Inferential statistics involves drawing conclusions or making inferences about a population based on data collected from a sample of that population.
In-short inferential statistics is all about understanding the ‘why’ and ‘how’ behind the data patterns we observe.

Terms

Population: The whole data is called population.
Then What is a part of the population called? …🤔 well its called Samples. And also known as observation, tuples, feature Matrix.
Bonus: Attributes are known as Features

Variables

Variables are of mainly two types:

An example for Nominal variable are as follows :
Colors of cars in a parking lot (Red, Blue, Black, White).
Types of payment methods used in a store (Cash, Credit Card, Debit Card, Mobile Payment).

An example for Ordered variable are as follows :
Education levels (High School, Bachelor’s, Master’s, PhD)
Customer satisfaction ratings (Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied)
T-shirt sizes (XS, S, M, L, XL)

Lets focus more on Descriptive statistics…

What is measure of central tendency🤔…?
Its nothing much , basically it include methods like Mean Medium Mode

1. Mean

Definition: The average value.
Formula: Mean = (Sum of all values) / (Number of values).
Example: The average age of students in a class.

2. Median

Definition: The middle value when data is sorted.
Tip: For even-sized datasets, take the average of the two middle values.
Example: The median salary in a company can give you a better idea of employee earnings when there are outliers.

3. Mode

Definition: The most frequent value in a dataset.
Example: The most popular product sold in an online store.

But What is Measure of Dispersion
A measure of dispersion is a statistical value that indicates how spread out a set of data is around a central value. It can help you determine if the data is stretched out or squeezed together
Some examples are
Range: It is defined as the difference between the largest and the smallest value in the distribution.

Mean Deviation: It is the arithmetic mean of the difference between the values and their mean.

Standard Deviation: It is the square root of the arithmetic average of the square of the deviations measured from the mean.

Variance: It is defined as the average of the square deviation from the mean of the given data set.

Quartile Deviation: It is defined as half of the difference between the third quartile and the first quartile in a given data set.

Interquartile Range: The difference between upper(Q3 ) and lower(Q1) quartile is called Interterquartile Range. Its formula is given as Q3 — Q1.

In summary, what we discussed in this post are the fundamental building blocks of data analytics, specifically:

The foundation of statistics and its two main branches:

Descriptive Statistics: Methods for summarizing data.
Inferential Statistics: Drawing conclusions about populations based on sample data.

Basic terminology in data analytics:

Population: The complete dataset.
Samples: Subsets of the population.
Features (also called attributes): Characteristics we measure.

The classification of variables:

Numerical variables.
Categorical variables,

which include:

Nominal data (e.g., colors, payment methods).
Ordinal data (e.g., education levels, satisfaction ratings).

Important statistical measures:

Measures of Central Tendency:

Mean: The average.
Median: The middle value.
Mode: The most frequent value.

2. Measures of Dispersion:

Range: Difference between the largest and smallest values.
Mean Deviation: The mean of the differences from the mean.
Standard Deviation: The square root of the average squared differences from the mean.
Variance: The average of the squared differences from the mean.
Quartile Deviation: Half the difference between the third and first quartiles.
Interquartile Range: The difference between the upper (Q3) and lower (Q1) quartiles.

That’s it for this post! Stay tuned for Part 2, coming soon in the next 3–5 days. 😉

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Module 1 Part -01 Building Block of Data Analytics

Author(s): Sudeep

So it all starts with Statistics

Data

Now lets learn about the types of statistics

Terms

Variables

Lets focus more on Descriptive statistics…

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

LAI #66: Information Theory for People in a Hurry

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

🔥 Recommended Articles 🔥

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Module 1 Part -01 Building Block of Data Analytics

Author(s): Sudeep

So it all starts with Statistics

Data

Now lets learn about the types of statistics

Terms

Variables

Lets focus more on Descriptive statistics…

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥