Scaling vs. Normalizing Data

Last Updated on January 6, 2023 by Editorial Team Last Updated on January 10, 2021 by Editorial Team Author(s): Lawrence Alaso Krukrubo Data Science Understanding When to Apply One or the Other… free image from pixabay Intro: When it comes to data exploration and model building, there are multiple ways to perform certain tasks and often, it all boils down to the goals and the experience or flair of the Data Scientist. For Example, you may want to normalize data via the L1 (Manhattan Distance) or L2 (Euclidean Distance) or even a combination of both. It’s common practice to interchange certain terms in Data Science. Methods are frequently interchanged with functions and vice-versa. These have similar meanings but by behaviour, functions take in one or more parameters, while methods are usually called upon objects… print(‘hello’) # function df.head() # method We see the same interplay in the words mean and average… The words “mean” and “average” are often used interchangeably. The substitution of one word for the other is common practice. The technical term is “arithmetic mean,” and “average” is technically a center location. However, in practice among non-statisticians, “average” is commonly accepted for “arithmetic mean.” (openstax.org) Overview: Scaling and normalization are often used interchangeably. And to make matters more interesting, scaling and normalization are very similar! Similarities: free image from pixabay In both scaling and normalization, you’re transforming the values of numeric variables so that the transformed data points have specific helpful properties. These properties can be exploited to create better features and models. Differences: free image from pixabay In Scaling, we’re changing the range of the distribution of the data… While in normalizing, we’re changing the shape of the distribution of the data. Range is the difference between the smallest and largest element in a distribution. Scaling and normalization are so similar that they’re often applied interchangeably, but as we’ve seen from the definitions, they have different effects on the data. As Data Professionals, we need to understand these differences and more importantly, know when to apply one rather than the other. Why Do We Scale Data? Remember that in scaling, we’re transforming the data so that it fits within a specific scale, like 0-100 or 0-1. Usually 0-1. You want to scale data especially when you’re using methods based on measures of how far apart data points are. For example, while using support vector machines (SVM) or clustering algorithms like k-nearest neighbors (KNN)… With these algorithms, a change of “1” in any numeric feature is given the same importance. Let’s take an example from Kaggle. Imagine you’re looking at the prices of some products in both Yen and US Dollars. One US Dollar is worth about 100 Yen, but if you don’t scale your prices, algorithms like SVM or KNN will consider a difference in price of 1 Yen as important as a difference of 1 US Dollar! This clearly doesn’t fit our intuitions of the world. So generally, we may need to scale data for machine learning problems so that all variables have quite similar distribution range to avoid such issues. By scaling your variables, you can help compare different variables on equal footing…(Kaggle) Some Common Types of Scaling: 1. Simple Feature Scaling: This method simply divides each value by the maximum value for that feature…The resultant values are in the range between zero(0) and one(1) Simple-feature scaling is the defacto scaling method used on image-data. When we scale images by dividing each image by 255 (maximum image pixel intensity) Let’s define a simple-feature scaling function … We can see the above distribution with range[1,10] was scaled via simple-feature scaling to the range[0.1, 1], quite easily. 2. Min-Max Scaling: This is more popular than simple-feature scaling. This scaler takes each value and subtracts the minimum and then divides by the range(max-min).The resultant values range between zero(0) and one(1). Let’s define a min-max function… Just like before, min-max scaling takes a distribution with range[1,10] and scales it to the range[0.0, 1]. Apply Scaling to a Distribution: Let’s grab a data set and apply Scaling to a numerical feature. We’d use the Credit-One Bank credit loan customers dataset. This time, we’ll use the minmax_scaling function from mlxtend.preprocessing. Let’s see the head of the data set. Ok, let’s for the sake of practice, scale the ‘Age’ column of the data After scaling the data, we can see from the image below that the original dataset has a minimum age of 19 and a maximum of 75. And, the scaled dataset has a minimum of [0.] and maximum of [1.] The only thing that changes, when we scale the data is the range of the distribution… The shape and other properties remain the same. Why Do We Normalize Data? Normalization is a more radical transformation. The point of normalization is to change your observations so that they can be described as a normal distribution… (Kaggle) Normal distribution: AKA, “bell curve”, is a specific statistical distribution where roughly equal observations fall above and below the mean, the mean and the median are about the same, and there are more observations closer to the mean. The normal distribution is also known as the Gaussian distribution. image from stack In general, you’ll normalize your data if you’re going to be using a machine learning or statistics technique that assumes your data is normally distributed. Some examples of these include linear discriminant analysis (LDA) and Gaussian naive Bayes. (Pro tip: any method with “Gaussian” in the name probably assumes normality.) Note that Normalization is also referred to as Standardization in some Statistical journals. Standardization aims to normalize the distribution by considering how far away each observation is from the mean in terms of the standard deviation. An example is the Z-Score. Some Common Types of Normalization: 1. Z-Score or Standard Score: For each value in the distribution, we subtract the average or mean…And then divide by the Standard deviation. This gives a range from about minus 3 to 3, could be more, or less. We can easily code it up, let’s define a Z-score method… 2. Box-Cox Normalization: image credit| link A Box-Cox transformation is a transformation of a non-normal dependent variable into a normal shape. The Box-Cox transformation … Continue reading Scaling vs. Normalizing Data