Scaling vs. Normalizing Data
Last Updated on January 6, 2023 by Editorial Team
Last Updated on January 10, 2021 by Editorial Team
Author(s): Lawrence Alaso Krukrubo
Data Science
Understanding When to Apply One or theΒ Otherβ¦
Intro:
When it comes to data exploration and model building, there are multiple ways to perform certain tasks and often, it all boils down to the goals and the experience or flair of the Data Scientist.
For Example, you may want to normalize data via the L1 (Manhattan Distance) or L2 (Euclidean Distance) or even a combination ofΒ both.
Itβs common practice to interchange certain terms in Data Science. Methods are frequently interchanged with functions and vice-versa. These have similar meanings but by behaviour, functions take in one or more parameters, while methods are usually called uponΒ objectsβ¦
print(βhelloβ) #Β function
df.head() #Β method
We see the same interplay in the words mean andΒ averageβ¦
The words βmeanβ and βaverageβ are often used interchangeably. The substitution of one word for the other is common practice. The technical term is βarithmetic mean,β and βaverageβ is technically a center location. However, in practice among non-statisticians, βaverageβ is commonly accepted for βarithmetic mean.β (openstax.org)
Overview:
Scaling and normalization are often used interchangeably. And to make matters more interesting, scaling and normalization are veryΒ similar!
Similarities:
In both scaling and normalization, youβre transforming the values of numeric variables so that the transformed data points have specific helpful properties. These properties can be exploited to create better features andΒ models.
Differences:
In Scaling, weβre changing the range of the distribution of the dataβ¦ While in normalizing, weβre changing the shape of the distribution of theΒ data.
Range is the difference between the smallest and largest element in a distribution.
Scaling and normalization are so similar that theyβre often applied interchangeably, but as weβve seen from the definitions, they have different effects on the data. As Data Professionals, we need to understand these differences and more importantly, know when to apply one rather than theΒ other.
Why Do We ScaleΒ Data?
Remember that in scaling, weβre transforming the data so that it fits within a specific scale, like 0-100 or 0-1. Usually 0-1. You want to scale data especially when youβre using methods based on measures of how far apart data pointsΒ are.
For example, while using support vector machines (SVM) or clustering algorithms like k-nearest neighbors (KNN)β¦
With these algorithms, a change of β1β in any numeric feature is given the same importance. Letβs take an example fromΒ Kaggle.
Imagine youβre looking at the prices of some products in both Yen and US Dollars. One US Dollar is worth about 100 Yen, but if you donβt scale your prices, algorithms like SVM or KNN will consider a difference in price of 1 Yen as important as a difference of 1 USΒ Dollar!
This clearly doesnβt fit our intuitions of the world. So generally, we may need to scale data for machine learning problems so that all variables have quite similar distribution range to avoid suchΒ issues.
By scaling your variables, you can help compare different variables on equal footing…(Kaggle)
Some Common Types ofΒ Scaling:
1. Simple FeatureΒ Scaling:
This method simply divides each value by the maximum value for that featureβ¦The resultant values are in the range between zero(0) andΒ one(1)
Simple-feature scaling is the defacto scaling method used on image-data. When we scale images by dividing each image by 255 (maximum image pixel intensity)
Letβs define a simple-feature scaling functionΒ β¦
We can see the above distribution with range[1,10] was scaled via simple-feature scaling to the range[0.1, 1], quiteΒ easily.
2. Min-MaxΒ Scaling:
This is more popular than simple-feature scaling. This scaler takes each value and subtracts the minimum and then divides by the range(max-min).
The resultant values range between zero(0) andΒ one(1).
Letβs define a min-max functionβ¦
Just like before, min-max scaling takes a distribution with range[1,10] and scales it to the range[0.0, 1].
Apply Scaling to a Distribution:
Letβs grab a data set and apply Scaling to a numerical feature. Weβd use the Credit-One Bank credit loan customers dataset.
This time, weβll use the minmax_scaling function from mlxtend.preprocessing. Letβs see the head of the dataΒ set.
Ok, letβs for the sake of practice, scale the βAgeβ column of theΒ data
After scaling the data, we can see from the image below that the original dataset has a minimum age of 19 and a maximum of 75. And, the scaled dataset has a minimum of [0.] and maximum ofΒ [1.]
The only thing that changes, when we scale the data is the range of the distribution⦠The shape and other properties remain the same.
Why Do We Normalize Data?
Normalization is a more radical transformation. The point of normalization is to change your observations so that they can be described as a normal distribution⦠(Kaggle)
Normal distribution: AKA, βbell curveβ, is a specific statistical distribution where roughly equal observations fall above and below the mean, the mean and the median are about the same, and there are more observations closer to the mean. The normal distribution is also known as the Gaussian distribution.
In general, youβll normalize your data if youβre going to be using a machine learning or statistics technique that assumes your data is normally distributed. Some examples of these include linear discriminant analysis (LDA) and Gaussian naiveΒ Bayes.
(Pro tip: any method with βGaussianβ in the name probably assumes normality.)
Note that Normalization is also referred to as Standardization in some Statistical journals. Standardization aims to normalize the distribution by considering how far away each observation is from the mean in terms of the standard deviation. An example is theΒ Z-Score.
Some Common Types of Normalization:
1. Z-Score or StandardΒ Score:
For each value in the distribution, we subtract the average or meanβ¦
And then divide by the Standard deviation. This gives a range from about minus 3 to 3, could be more, orΒ less.
We can easily code it up, letβs define a Z-scoreΒ methodβ¦
2. Box-Cox Normalization:
A Box-Cox transformation is a transformation of a non-normal dependent variable into a normal shape. The Box-Cox transformation is named after statisticians George Box and Sir David Roxbee Cox who collaborated on a 1964 paper and developed the technique⦠(link)
How itΒ Worksβ¦
At the heart of the box-cox normalization is an exponent lambda (Ξ»), which varies from -5 to 5. All values of Ξ» are considered and the optimal value for your data is selected; The βoptimal valueβ is the one which results in the best approximation of a normal distribution curve.
For those of us with some ML skills, this process is akin to tuning the learning-rate alpha (Ξ±), in order to produce a finer fit to theΒ data
Box-Cox by default works for only positive values, but thereβs a variant that can approximate negative values too. See this link. For more see thisΒ article.
Apply Normalization to a Distribution:
Letβs continue with the Credit-One Bank credit loan customers dataset. This time, weβd apply box cox transformation to the same Age Column. Weβd use the boxcox() function from scipy.stats.
So we use the stats.boxcox() function, which returns a tuple with the normalized Series as the first element. The original dataset minimum is 19 and the maximum is 75. While the normalized dataset minimum is 1.300 and maximum isΒ 1.4301
Notice that in addition to changing the range of the Age distribution, the normalization method radically transforms the shape of the distribution to a roughly bell-shaped curve.
Key TakeAways
- Always look at the data, pay attention to the distribution and shape of the data. Use a Histplot or Distplot or even a LineΒ graph.
- In general, youβll normalize your data if youβre going to be using a machine learning or statistics technique that assumes your data is normally distributed. Some examples include linear discriminant analysis (LDA) and Gaussian naive Bayes. In fact, any method with βGaussianβ or βnormalβ in the name probably assumes normality. This also includes dimensionality reduction techniques likeΒ PCA.
- You want to scale data when youβre using methods based on measures of how far apart data points are, like support vector machines (SVM) or k-nearest neighbors (KNN). Or if you simply want your variables to dwell in a uniform range, so that one does not dominate theΒ other.
- By scaling your variables, you can help compare different variables on equalΒ footing.
- If youβre confused which of scaling or normalisation to use on a variable, a simple hack is to look at the shape of your distributionβ¦
For example, looking at the Histplot of the numerical variables above, variables that seem almost symmetrical or that seem roughly bell-curved, even though the bell might be right-skewed or left-skewed, all such variables may be normalized. These are variables like βAgeβ, βCredit_Amountβ, βDuration_in_Monthsβ.
While the other variables that seem pretty ununiform, rather-distinct, unimodal and asymmetrical like βCountβ, βDefault_on_Paymentβ and βInst_Rate_Incomeβ, may beΒ scaled.
Visualization is key in EDA⦠If you observe via distplot or histplot, that some distributions are symmetrical or roughly normal, you may normalize such features except you have good reason not to. While features that have unimodal or asymmetrical shapes may generally be scaled via min-max or simple-feature scaling.
Cheersβ¦
credit: IBM Data Analysis with Python Course and Data Cleaning Course onΒ Kaggle
About Me:
Lawrence is a Data Specialist at Tech Layer, passionate about fair and explainable AI and Data Science. I hold both the Data Science Professional and Advanced Data Science Professional certifications from IBM. and the Udacity AI Nanodegree. I have conducted several projects using ML and DL libraries, I love to code up my functions as much as possible even when existing libraries abound. Finally, I never stop learning, exploring, getting certified and sharing my experiences via insightful articlesβ¦
Feel free to find meΒ on:-
Scaling vs. Normalizing Data was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI