Data Diagnostics: Transforming & Reducing Data for Smarter Insights
Last Updated on February 20, 2025 by Editorial Team
Author(s): Saif Ali Kheraj
Originally published on Towards AI.
This member-only story is on us. Upgrade to access all of Medium.
Ever looked at a dataset and wondered, Where do I even start? The answer lies in understanding its distribution. Before jumping into fancy models, getting a grip on how your data is spread out helps in spotting trends, detecting outliers, and avoiding misleading conclusions.
Imagine youβre analyzing delivery times for an online food service. If most orders arrive within 30 minutes but a few take over an hour, that is a skewed distribution β something you wouldnβt notice just by looking at averages. This is why examining the shape of your data is crucial.
First, check central tendency β this tells you where most of your data sits. The main ones are:
Mean: The average value.Median: The middle value.
If the mean is much higher than the median, your data is skewed. Imagine analyzing delivery times. If most orders arrive in 30 minutes but a few take 90 minutes, the average goes up because of those late orders, even though most deliveries are on time.
Next is dispersion β how spread out your data is. Key things to check:
Range: The difference between the highest and lowest values. For example, if delivery times range from… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI