Unleashing the Power of Feature Scaling in Machine Learning
Last Updated on July 25, 2023 by Editorial Team
Author(s): Roli Trivedi
Originally published on Towards AI.
Scaling up Success: Power of Normalization and Standardization
Feature Scaling is the process to standardize or normalize the input feature of a dataset to transform the values of different features to a common scale. It is the last step of feature engineering, which you perform before sending your data to your model.
Why need Feature Scaling
When different features vary over different ranges, it becomes difficult to put them together in a model. This is more complex, specifically when the model is based on distances.
For Example, Letβs say you are working on a housing price prediction task. The minimum and maximum values for the number of bedrooms might be 1 and 5, respectively. The minimum and maximum values for the total area of the house could be, letβs say, 500 square feet and 5000 square feet, respectively. And for the age of the house, the minimum and maximum values might be 1 year and 50 years.
Scaling features like the number of bedrooms, house area, and age to a common range (e.g., 0 to 1) will be helpful for fair comparisons and improved performance of the machine learning model.
There are 2 ways by which Feature Scaling can be done :
1. Normalization
2. Standardization (also called as z-score normalization)
Note: We do feature scaling after trainβtest split to avoid data leakage. Data leakage refers to when info from outside the training dataset is used to create the model. As the test set plays the role of fresh, unseen data, and so itβs not supposed to be accessible at the training stage.
#TrainTestSplit Code
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=70)
Normalization
It is applied as a part of data preparation for feature scaling. The goal of normalization is to change the value of numeric columns in the dataset to use a common scale without distorting differences in ranges of values or losing information. We have different techniques available for performing Normalization. Let us discuss them in detail.
1. MaxAbsolute Scaler:
Here each feature is divided by the maximum value of the feature. This converts all the values between 0 and +1. But still are prone to outliers.
Mathematical Formula:
Scaled value = (X )/ U+007CXmaxU+007C
#Code
from sklearn.preprocessing import MaxAbsScaler
Scaler = MaxAbsScaler().fit(X_train)
Scaler.transform(X_train)
2. MinMax Scaler:
The values created by this method range between 0 and 1. We use this technique when we know minimum and maximum values. For Example, Image Processing.
Mathematical Formula:
Scaled value = (X β X mean) / (X max β X min)
(xβ²=(xβmin(x))/max(x)βmin(x)β)
#Code
from sklearn.preprocessing import MinMaxScaler
Scaler = MinMaxScaler()
Scaler.fit_transform(X_train)
3. Mean Normalization:
The minimum value here is replaced by the mean value. Before the values were altered. The aim is to change the shape of the distribution.
Mathematical Formula:
Scaled value = (X β X mean) / (X max β X min)
4. Robust Scaler:
It is robust to outliers. So when we have outliers in our data, we can use RobustScaler for scaling. But it doesnβt mean that it will remove the outliers. For the removal of outliers, you will have to use other techniques.
Mathematical Formula:
Scaled value = (X β Xmedian)/(IQR)
IQR = 75th percentile β 25th percentile
#Code
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
X_train = scaler.fit_transform(X_train)
Standardization
In standardization, we put the features in the below formula to standardize/scale them to bring all features under the same scale. This method converts each value of the feature to a Z-score.
So the new values which you will get will have mean = 0 and standard deviation = 1.
Mathematical Formula:
(X β X mean) /standard_deviation
(xβ²=(xβΞΌ)/Ο)
#Code
from sklearn.preprocessing import StandardScaler
Scaler = StandardScaler()
Scaler.fit_transform(X_train)
Therefore we can conclude that standardization has 2 things mainly :
1. Mean Centering
2. Scaling by a factor of Standard Deviation
When to use Standardization
Note: Decision Tree, Random Forest, GB, and XGBoost do not generally require scaling because here, the comparison is being made. Usually, where distance is associated with the algorithm there, we use scaling in order to bring all the features under the same scale.
Normalization vs. Standardization
First, we should check if feature scaling is required. Then on the below points, we can decide which scaler technique we need to handle our data.
Note :
- Standardization gets affected negatively by extreme values as well
- However, when data contains outliers,
StandardScaler
can often be misleading(same holds for MinMaxScaler. In such cases, it is better to use a scaler that is robust against outliers which is RobustScaler. - Being robust does not imply being immune or invulnerable, and the objective of scaling is not to eliminate outliers and extreme values entirely. Dealing with outliers requires separate methodologies and techniques, as explicitly stated in the Scikit-learn-docs.
Thanks for reading! If you enjoyed this piece and would like to read more of my work, please consider following me on Medium. I look forward to sharing more with you in the future.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI