Join thousands of AI enthusiasts and experts at the Learn AI Community.

Latest

# The Art and Science of Regularization in Machine Learning: A Comprehensive Guide

Last Updated on January 6, 2023 by Editorial Team

#### Author(s): Data Science meets Cyber Security

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

#### INTRODUCTION:

Are you tired of your machine learning models performing poorly on new data? Are you sick of seeing your modelโs validation accuracy skyrocket only to crash and burn on the test set? If so, itโs time to learn about regularisation!

Regularisation is a technique that is used to prevent overfitting in machine learningย models.

#### LETโS GET OUR BASICS CLEAR FIRSTย :

Weโll go through some interesting analogies and practical implementation of concepts to get things crystal clear in ourย heads.

#### OVERFITTINGย :

Overfitting occurs when a model is too complex and fits the training data too well, but is not able to generalise well to new, unseen data. This can lead to poor performance on the test set or in real-world applications.

LETโS UNDERSTAND IT WITH MORE BETTERย ANALOGY:

Imagine a child who has learned to recognize dogs by seeing pictures of only golden retrievers. When shown a picture of a different dog breed, such as a poodle, the child is unable to recognize it as a dog because they have only learned to recognize a specific type of dog rather than having a general understanding of what features define a dog. This is similar to how a model can overfit the training data and be unable to generalize well to new, unseenย data.

LETโS LOOK THROUGH IT PRACTICALLY:

`import numpy as npimport matplotlib.pyplot as plt # Generate datax = np.linspace(-5, 5, 100)y = x**3 + np.random.normal(0, 10, 100) # Create figure with larger sizefig, ax = plt.subplots(figsize=(10, 6)) # Plot dataax.plot(x, y, 'o')plt.show()`

This generates a set of 100 x-values and corresponding y-values, with the y-values being a cubic function of the x-values with some added noise. The data is plotted asย follows:

Next, we can fit a polynomial regression model to theย data:

`from sklearn.linear_model import LinearRegressionfrom sklearn.preprocessing import PolynomialFeatures # Create polynomial featurespoly = PolynomialFeatures(degree=10)x_poly = poly.fit_transform(x.reshape(-1, 1)) # Fit model to datamodel = LinearRegression()model.fit(x_poly, y) # Predict on original x valuesy_pred = model.predict(x_poly) # Create figure and plot datafig, ax = plt.subplots(figsize=(10, 6))ax.plot(x, y, 'o', label='True data')ax.plot(x, y_pred, 'o', label='Predicted data')ax.legend() # Display figureplt.show()`

This fits a polynomial regression model with a degree of 10 to the data, which is a very high degree and results in a very complex model. The model has plotted along with the original data asย follows:

#### CONCLUSION:

We can see that the model is able to fit the training data very well but is not able to generalize well to new, unseen data. This is an example of overfitting, as the model has learned the specific patterns in the training data but has not developed a general understanding of the underlying relationships in theย data.

To prevent overfitting, we can use techniques such as regularisation or early stopping to reduce the complexity of the model and improve its generalization performance.

#### UNDERFITTING:

Under-fitting, on the other hand, occurs when a model is too simple and is unable to capture the underlying patterns in the data. This can also lead to poor performance, but for a different reason.

LETโS UNDERSTAND IT WITH MORE BETTERย ANALOGY:

One way to understand underfitting is through the analogy of a student preparing for a test. Imagine that a student is studying for a history test and is given a textbook to read. However, the student only reads the first few pages of the textbook and does not fully grasp the material. On the day of the test, the student is unable to answer even the most basic questions because they have not learned enough about theย subject.

This is similar to how a model can underfit the training data. The model is too simple and is unable to capture the underlying patterns in the data, leading to poor performance on the training set and poor generalization to new, unseenย data.

On the other hand, if the student had taken the time to read and fully understand the entire textbook, they would have been better equipped to perform well on the test. Similarly, a model that has not underfitted to the training data and has a good generalization performance will be able to make accurate predictions on the training data and new, unseenย data.

Here is a simple code example that demonstrates underfitting using a linear regression model:

`import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import LinearRegression # Generate synthetic datax = np.linspace(-5, 5, 50)y = x**2 + np.random.normal(0, 10, 50) # Fit linear modelmodel = LinearRegression()model.fit(x.reshape(-1, 1), y) # Predict on training datay_pred = model.predict(x.reshape(-1, 1)) # Create figure with specified sizeplt.figure(figsize=(10, 6)) # Plot data and predictionsplt.plot(x, y, 'o', label='True data')plt.plot(x, y_pred, 'o', label='Predicted data')plt.legend() # Save figure to fileplt.savefig('figure.png', bbox_inches='tight') # Show plotplt.show()`

#### CONCLUSIONย :

In this example, we generate synthetic data that follows a quadratic trend and fit a linear model to the data. We can see that the linear model is unable to capture the underlying quadratic trend and is under-fitting to the data, resulting in poor performance on the trainingย set.

โFeeling lost when it comes to the connection between bias, variance, under-fitting, and overfitting? Donโt worry, Iโve got the answers youย need!โ

#### BIAS, VARIANCE, AND ITS CONNECTION TO OVERFITTING AND UNDERFITTING:

BIAS and VARIANCE are two important concepts that describe the error of a machine learning model. Bias refers to the error due to incorrect assumptions in the model, while variance refers to the error due to the complexity of theย model.

In general, we want to find a model that has low bias and low variance, as this will result in good generalization performance on the test set or in real-world applications. However, it is often not possible to simultaneously minimize both bias and variance, and we must find a balance between theย two.

UNDERFITTING occurs when a model is too simple and is unable to capture the underlying patterns in the data, leading to high bias and low variance.

OVERFITTING occurs when a model is too complex and fits the training data too well but is not able to generalize well to new, unseen data, leading to low bias and high variance.

LETโS UNDERSTAND THROUGH ANย EXAMPLE:

Consider a model that is trying to predict the price of a house based on its size and location. If the model is too simple and only uses the size of the house as a feature, it may have high bias and low variance. This model may make consistent but inaccurate predictions because it is unable to capture the influence of the location on the price. On the other hand, if the model is too complex and uses many features that are not relevant to the price of the house, it may have low bias and high variance. This model may make accurate predictions on the training data but may not generalize well to new, unseenย data.

Therefore, it is important to find a model that strikes a balance between bias and variance and has good generalization performance. Regularisation is one way to control the bias-variance trade-off and find thisย balance.

#### BIAS:

The bias of a model is the difference between the expected prediction of the model and the true value. Mathematically, it can be writtenย as:

BIAS = E[F(X)]โโโF(X)

where f(x) is the true value, and E[f(x)] is the expected prediction of theย model.

#### VARIANCE:

The variance of a model is the variability of the modelโs predictions for a given input. Mathematically, it can be writtenย as:

VARIANCE = E[(F(X)โโโE[F(X)])ยฒ]

where f(x) is the prediction of the model for a given input x and E[f(x)] is the expected prediction of theย model.

Code example that demonstrates how bias and variance are related to overfitting and underfitting in a linear regression model:

`import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import LinearRegressionfrom sklearn.model_selection import train_test_split # Generate synthetic datax = np.linspace(-5, 5, 50)y = x**2 + np.random.normal(0, 10, 50) # Split data into training and test setsx_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2) # Fit linear model to training datamodel = LinearRegression()model.fit(x_train.reshape(-1, 1), y_train) # Predict on training and test setsy_train_pred = model.predict(x_train.reshape(-1, 1))y_test_pred = model.predict(x_test.reshape(-1, 1)) # Create figure with specified sizeplt.figure(figsize=(10, 6)) # Plot data and predictionsplt.plot(x, y, 'o', label='True data')plt.plot(x_train, y_train_pred, 'o', label='Predictions on training set')plt.plot(x_test, y_test_pred, 'o', label='Predictions on test set')plt.legend()plt.show()`

#### CONCLUSION:

We generate synthetic data that follows a quadratic trend, and split the data into a training set and a test set. We then fit a linear model to the training set and make predictions on both the training set and the testย set.

We can see that the model has a low bias on the training set, as it is able to fit the data well. However, the model has high variance on the test set, as it is not able to generalize well to new, unseen data. This demonstrates how bias and variance are related to overfitting and underfitting in a linear regression model.

#### ENTER REGULARISATION!

Regularisation is a method that adds a penalty term to the objective function that the model is trying to optimize. This penalty term helps to reduce the complexity of the model by encouraging the model to use only the most important features and reducing the influence of less important ones.

There are several different types of regularisation techniques, but the three most commonly used are Ridge, Lasso, and Elasticย Net.

#### RIDGE REGULARISATION:

Ridge regularisation, also known as L2 regularisation, adds a penalty term to the objective function that is proportional to the square of the model weights. This results in a model with more uniformly distributed weights, which can help to prevent overfitting.

#### CODEโโโIMPLEMENTATION:

`import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import Ridge, Lasso, ElasticNetfrom sklearn.preprocessing import StandardScaler# Generate synthetic datanp.random.seed(42)m = 20X = np.linspace(0, 1, m).reshape(-1, 1)y = 3 * X + np.random.randn(m, 1) # Scale the datascaler = StandardScaler()X_scaled = scaler.fit_transform(X) # Create the modelridge = Ridge(alpha=1.0) # Fit the modelridge.fit(X_scaled, y) # Predict using the modely_pred_ridge = ridge.predict(X_scaled) # Create figure with specified sizeplt.figure(figsize=(10, 6)) # Plot the predictionsplt.plot(X, y_pred_ridge, label='Ridge')plt.plot(X, y, 'o', label='Original data')plt.legend()plt.show()`

#### LASSO REGULARISATION:

Lasso regularisation, also known as L1 regularisation, adds a penalty term to the objective function that is proportional to the absolute value of the model weights. This results in a model with fewer non-zero coefficients, which can be useful for feature selection.

#### CODEโโโIMPLEMENTATION:

`# Generate synthetic datanp.random.seed(42)m = 20X = np.linspace(0, 1, m).reshape(-1, 1)y = 3 * X + np.random.randn(m, 1) # Scale the datascaler = StandardScaler()X_scaled = scaler.fit_transform(X) # Create the modellasso = Lasso(alpha=1.0) # Fit the modellasso.fit(X_scaled, y) # Predict using the modely_pred_lasso = lasso.predict(X_scaled) # Create figure with specified sizeplt.figure(figsize=(10, 6)) # Plot the predictionsplt.plot(X, y_pred_lasso, label='Lasso')plt.plot(X, y, 'o', label='Original data')plt.legend()plt.show()`

#### ELASTIC NET REGULARISATION:

It is a combination of Ridge and Lasso regularisation, and it allows for a balance between theย two.

So how do we use these regularisation techniques in practice? Letโs take a look at an example using simulated data.

#### CODEโโโIMPLEMENTATION:

`# Generate synthetic datanp.random.seed(42)m = 20X = np.linspace(0, 1, m).reshape(-1, 1)y = 3 * X + np.random.randn(m, 1) # Scale the datascaler = StandardScaler()X_scaled = scaler.fit_transform(X) # Create the modelelastic_net = ElasticNet(alpha=1.0, l1_ratio=0.5) # Fit the modelelastic_net.fit(X_scaled, y) # Predict using the modely_pred_elastic_net = elastic_net.predict(X_scaled)# Create figure with specified sizeplt.figure(figsize=(10, 6)) # Plot the predictionsplt.plot(X, y_pred_elastic_net, label='Elastic Net')plt.plot(X, y, 'o', label='Original data')plt.legend()plt.show()`

#### SUMMING UP THE BLOG BY THINGS TO KEEP INย MIND:

1. Regularisation is a technique used to improve the generalization of a model by preventing it from overfitting the trainingย data.
2. Ridge, lasso, and elastic net are all types of regularisation that can be applied to linear regression models.
3. Ridge regularisation adds a penalty term that is the sum of the squares of the model coefficients, while lasso regularisation adds a penalty term that is the sum of the absolute values of the model coefficients. Elastic net regularisation is a combination of theย two.
4. The strength of the regularisation is controlled by a hyper-parameter, which should be chosen carefully using cross-validation.
5. Ridge, lasso, and elastic net regularisation can all be used to select relevant features in a dataset by driving some of the model coefficients to exactlyย zero.
6. It is often useful to try multiple regularisation methods and compare their performance to choose the most appropriate one for a givenย problem.
7. Regularisation is not always necessary, and in some cases, a simple linear model may perform just as well as a regularised model.

#### MYTH:

Ridge, lasso, and elastic net regularisation can be used interchangeably for anyย problem.

#### FACT:

Each type of regularisation has unique properties and may be more or less suitable for a given problem. Ridge is best for datasets with many features, the lasso is good for feature selection, and the elastic net is a combination of the two that may work well for datasets with many features and some correlated features. It is best to try multiple methods and compare their performance.

STAY TUNED FOR THE NEXT BLOG WHERE WEโLL TRY TO BUILDย THE

โ REGULARISATION MODEL FOR INTRUSION DETECTION โย ๐คฏ

FOLLOW US FOR THE SAME FUN TO LEARN DATA SCIENCE BLOGS AND ARTICLES: ๐

GITHUB: https://github.com/Vidhi1290

The Art and Science of Regularization in Machine Learning: A Comprehensive Guide was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. Itโs free, we donโt spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aย sponsor.

Published via Towards AI