Learning Rate Schedulers

Last Updated on July 25, 2023 by Editorial Team

Author(s): Toluwani Aremu

Originally published on Towards AI.

In my previous Medium article, I talked about the crucial role that the learning rate plays in training Machine Learning and Deep Learning models. In the article, I listed the learning rate scheduler as one way to optimize the learning rate for optimal performance. Today, I will delve and go deeper into the concept of learning rate schedulers and explain how they work. But first (as usual), I’ll begin with a relatable story to explore the topic.

Kim is a dedicated and hard-working teacher who has always wanted to achieve a better balance between her professional and personal life but has always struggled to find enough time for all of her responsibilities, despite her best efforts. This led to her having feelings of stress and burnout. In addition to her teaching duties, she must also grade students’ homework, review her syllabus and lesson notes, and attend to other important tasks.

Backed by her determination to take full control of her schedule, Kim decided to create a daily to-do list in which she prioritized her most important tasks and allocated time slots for each of them. At work, she implemented a strict schedule based on her existing teaching timetable. She also dedicated specific times to review homework, preparing lessons, and attending to other out-of-class responsibilities.

At home, Kim continued to manage her time wisely by scheduling time for exercise, cooking and spending quality time with friends. She also made sure to carve out time for herself, such as reading or taking relaxing baths, to recharge and maintain her energy levels. Staying true to her schedule, she experienced significant improvements in her performance and overall well-being. She was able to accomplish more and feel less stressed, and she was able to spend more quality time with friends, engage in fulfilling activities, and make time for self-care.

Kim’s story highlights the power of scheduling and the importance of making the most of each day. By taking control of her schedule, she was able to live a happier and more productive life.

NOW, WHAT IS A LEARNING RATE SCHEDULER?

A learning rate scheduler is a method used in deep learning to try and adjust the learning rate of a model over time to achieve the best possible performance. The learning rate is one of the most important hyperparameters in deep learning, as it determines how quickly a model updates its parameters based on the gradients computed during training. As I stated in my last medium publication, if the learning rate is set too high, the model may overshoot optimal values and fail to converge. If the learning rate is set too low, the model may converge too slowly or get stuck in a suboptimal local minimum.

Learning rate schedulers help to address these issues by gradually reducing the learning rate over time. There are several popular learning rate scheduler algorithms, including:

Step decay: This scheduler adjusts the learning rate after a fixed number of steps, reducing the learning rate by a specified factor. This is useful for situations where the learning rate needs to decrease over time to allow the model to converge.

class StepLR:
 def __init__(self, optimizer, step_size, gamma):
 self.optimizer = optimizer
 self.step_size = step_size
 self.gamma = gamma
 self.last_step = 0

 def step(self, current_step):
 if current_step - self.last_step >= self.step_size:
 for param_group in self.optimizer.param_groups:
 param_group['lr'] *= self.gamma
 self.last_step = current_step

# Use the scheduler
optimizer = ... # Define your optimizer
scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
for epoch in range(num_epoch):
 ... # Train your model
 scheduler.step(epoch)

"""
The StepLR class takes an optimizer, a step size, and a decay factor (gamma) 
as input and updates the learning rate of the optimizer every step_size 
epochs by multiplying it with gamma. The current step number is passed 
to the step method, and the learning rate is updated only if the difference 
between the current step and the last step is greater than or equal to 
step_size.
""

Multi-Step decay: This scheduler adjusts the learning rate at multiple steps during training, reducing the learning rate by a specified factor after each step. This is useful for scenarios where the learning rate needs to decrease in stages, such as during the later stages of training when the model has already learned some important features.

class MultiStepLR:
 def __init__(self, optimizer, milestones, gamma):
 self.optimizer = optimizer
 self.milestones = milestones
 self.gamma = gamma
 self.last_milestone = 0

 def step(self, current_step):
 if current_step in self.milestones:
 for param_group in self.optimizer.param_groups:
 param_group['lr'] *= self.gamma
 self.last_milestone = current_step

# Use the scheduler
optimizer = ... # Define your optimizer
scheduler = MultiStepLR(optimizer, milestones=[30, 60, 90], gamma=0.1)

for epoch in range(num_epoch):
 ... # Train your model
 scheduler.step(epoch)

"""
The MultiStepLR class takes an optimizer, a list of milestones, and a decay 
factor (gamma) as input and updates the learning rate of the optimizer at 
the milestones specified in the milestones list by multiplying it with 
gamma. The current step number is passed to the step method, and the 
learning rate is updated only if the current step is equal to one of the 
milestones.
"""

Exponential decay: This scheduler adjusts the learning rate by a specified factor after each iteration. The learning rate decreases exponentially over time, which is useful for models that require a gradually decreasing learning rate.

import math

class ExponentialLR:
 def __init__(self, optimizer, gamma, last_epoch=-1):
 self.optimizer = optimizer
 self.gamma = gamma
 self.last_epoch = last_epoch
 
 def step(self, epoch):
 self.last_epoch = epoch
 for param_group in self.optimizer.param_groups:
 param_group['lr'] = param_group['lr'] * self.gamma ** (epoch + 1)

# Use the optimizer
optimizer = ... # Define your optimizer
scheduler = ExponentialLR(optimizer, gamma=0.95)

# Train the model
for epoch in range(num_epochs):
 ...
 scheduler.step(epoch)

"""
In each epoch, the step method updates the learning rate of the optimizer by 
multiplying it with the decay rate raised to the power of the epoch number.
"""

Cosine Annealing: This scheduler adjusts the learning rate according to a cosine annealing schedule, which starts high and decreases over time to zero. This is useful for models that require a gradually decreasing learning rate but with a more gradual decline in the latter stages of training.

import math

class CosineAnnealingLR:
 def __init__(self, optimizer, T_max, eta_min=0):
 self.optimizer = optimizer
 self.T_max = T_max
 self.eta_min = eta_min
 self.current_step = 0

 def step(self):
 self.current_step += 1
 lr = self.eta_min + (1 - self.eta_min) * (1 + math.cos(math.pi * self.current_step / self.T_max)) / 2
 for param_group in self.optimizer.param_groups:
 param_group['lr'] = lr

# Use the scheduler
optimizer = ... # Define your optimizer
scheduler = CosineAnnealingLR(optimizer, T_max=100, eta_min=0.00001)

# Train your model
for epoch in range(num_epochs):
 ... 
 scheduler.step(epoch)

"""
the CosineAnnealingLR class takes an optimizer, T_max, and eta_min as 
input. T_max represents the maximum number of steps over which the learning 
rate will decrease from its initial value to eta_min. eta_min represents 
the minimum value of the learning rate. The step method calculates the 
learning rate using the formula:

eta_min + (1 - eta_min) * (1 + cos(pi * current_step / T_max)) / 2, 

where current_step is incremented each time the step method is called. The 
calculated learning rate is then applied to all the parameter groups in the 
optimizer.
"""

The above examples are easy to implement from scratch, and one can easily come up with his/her own scheduling algorithm. However, ‘the top DAWGS’ i.e., PyTorch and TensorFlow have most/all of these schedulers implemented within their libraries (links attached).

In summary, learning rate schedulers are used to improve the convergence and stability of deep learning models. By carefully selecting and tuning the learning rate schedule, it is possible to achieve better performance and faster convergence, especially for large and complex models.

If you enjoyed reading this article, please give it a like and follow. For questions, please use the comment section. If you want to chat, reach out to me on LinkedIn or Twitter.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Learning Rate Schedulers

Author(s): Toluwani Aremu

NOW, WHAT IS A LEARNING RATE SCHEDULER?

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Building Large Action Models: Insights from Microsoft

My 6 Secret Tips for Getting an ML Job in 2025

People often follow Probabilities, Deviations and Densities that play a key role in ML modeling.

AI Agents: The Missing Link in DeFi’s $100 Billion Liquidity Challenge

Boxes, Violins and Contours Conclude the Exploratory Data Analysis Process.

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Learning Rate Schedulers

Author(s): Toluwani Aremu

NOW, WHAT IS A LEARNING RATE SCHEDULER?

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement