Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Learning Rate Schedulers
Latest   Machine Learning

Learning Rate Schedulers

Last Updated on July 25, 2023 by Editorial Team

Author(s): Toluwani Aremu

Originally published on Towards AI.

Photo by Lucian Alexe on Unsplash

In my previous Medium article, I talked about the crucial role that the learning rate plays in training Machine Learning and Deep Learning models. In the article, I listed the learning rate scheduler as one way to optimize the learning rate for optimal performance. Today, I will delve and go deeper into the concept of learning rate schedulers and explain how they work. But first (as usual), I’ll begin with a relatable story to explore the topic.

Photo by Bonnie Kittle on Unsplash

Kim is a dedicated and hard-working teacher who has always wanted to achieve a better balance between her professional and personal life but has always struggled to find enough time for all of her responsibilities, despite her best efforts. This led to her having feelings of stress and burnout. In addition to her teaching duties, she must also grade students’ homework, review her syllabus and lesson notes, and attend to other important tasks.

Backed by her determination to take full control of her schedule, Kim decided to create a daily to-do list in which she prioritized her most important tasks and allocated time slots for each of them. At work, she implemented a strict schedule based on her existing teaching timetable. She also dedicated specific times to review homework, preparing lessons, and attending to other out-of-class responsibilities.

At home, Kim continued to manage her time wisely by scheduling time for exercise, cooking and spending quality time with friends. She also made sure to carve out time for herself, such as reading or taking relaxing baths, to recharge and maintain her energy levels. Staying true to her schedule, she experienced significant improvements in her performance and overall well-being. She was able to accomplish more and feel less stressed, and she was able to spend more quality time with friends, engage in fulfilling activities, and make time for self-care.

Kim’s story highlights the power of scheduling and the importance of making the most of each day. By taking control of her schedule, she was able to live a happier and more productive life.

NOW, WHAT IS A LEARNING RATE SCHEDULER?

Photo by Towfiqu barbhuiya on Unsplash

A learning rate scheduler is a method used in deep learning to try and adjust the learning rate of a model over time to achieve the best possible performance. The learning rate is one of the most important hyperparameters in deep learning, as it determines how quickly a model updates its parameters based on the gradients computed during training. As I stated in my last medium publication, if the learning rate is set too high, the model may overshoot optimal values and fail to converge. If the learning rate is set too low, the model may converge too slowly or get stuck in a suboptimal local minimum.

Learning rate schedulers help to address these issues by gradually reducing the learning rate over time. There are several popular learning rate scheduler algorithms, including:

Step decay: This scheduler adjusts the learning rate after a fixed number of steps, reducing the learning rate by a specified factor. This is useful for situations where the learning rate needs to decrease over time to allow the model to converge.

class StepLR:
def __init__(self, optimizer, step_size, gamma):
self.optimizer = optimizer
self.step_size = step_size
self.gamma = gamma
self.last_step = 0

def step(self, current_step):
if current_step - self.last_step >= self.step_size:
for param_group in self.optimizer.param_groups:
param_group['lr'] *= self.gamma
self.last_step = current_step

# Use the scheduler
optimizer = ... # Define your optimizer
scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
for epoch in range(num_epoch):
... # Train your model
scheduler.step(epoch)

"""
The StepLR class takes an optimizer, a step size, and a decay factor (gamma)
as input and updates the learning rate of the optimizer every step_size
epochs by multiplying it with gamma. The current step number is passed
to the step method, and the learning rate is updated only if the difference
between the current step and the last step is greater than or equal to
step_size.
""

Multi-Step decay: This scheduler adjusts the learning rate at multiple steps during training, reducing the learning rate by a specified factor after each step. This is useful for scenarios where the learning rate needs to decrease in stages, such as during the later stages of training when the model has already learned some important features.

class MultiStepLR:
def __init__(self, optimizer, milestones, gamma):
self.optimizer = optimizer
self.milestones = milestones
self.gamma = gamma
self.last_milestone = 0

def step(self, current_step):
if current_step in self.milestones:
for param_group in self.optimizer.param_groups:
param_group['lr'] *= self.gamma
self.last_milestone = current_step

# Use the scheduler
optimizer = ... # Define your optimizer
scheduler = MultiStepLR(optimizer, milestones=[30, 60, 90], gamma=0.1)

for epoch in range(num_epoch):
... # Train your model
scheduler.step(epoch)

"""
The MultiStepLR class takes an optimizer, a list of milestones, and a decay
factor (gamma) as input and updates the learning rate of the optimizer at
the milestones specified in the milestones list by multiplying it with
gamma. The current step number is passed to the step method, and the
learning rate is updated only if the current step is equal to one of the
milestones.
"""

Exponential decay: This scheduler adjusts the learning rate by a specified factor after each iteration. The learning rate decreases exponentially over time, which is useful for models that require a gradually decreasing learning rate.

import math

class ExponentialLR:
def __init__(self, optimizer, gamma, last_epoch=-1):
self.optimizer = optimizer
self.gamma = gamma
self.last_epoch = last_epoch

def step(self, epoch):
self.last_epoch = epoch
for param_group in self.optimizer.param_groups:
param_group['lr'] = param_group['lr'] * self.gamma ** (epoch + 1)

# Use the optimizer
optimizer = ... # Define your optimizer
scheduler = ExponentialLR(optimizer, gamma=0.95)

# Train the model
for epoch in range(num_epochs):
...
scheduler.step(epoch)

"""
In each epoch, the step method updates the learning rate of the optimizer by
multiplying it with the decay rate raised to the power of the epoch number.
"""

Cosine Annealing: This scheduler adjusts the learning rate according to a cosine annealing schedule, which starts high and decreases over time to zero. This is useful for models that require a gradually decreasing learning rate but with a more gradual decline in the latter stages of training.

import math

class CosineAnnealingLR:
def __init__(self, optimizer, T_max, eta_min=0):
self.optimizer = optimizer
self.T_max = T_max
self.eta_min = eta_min
self.current_step = 0

def step(self):
self.current_step += 1
lr = self.eta_min + (1 - self.eta_min) * (1 + math.cos(math.pi * self.current_step / self.T_max)) / 2
for param_group in self.optimizer.param_groups:
param_group['lr'] = lr

# Use the scheduler
optimizer = ... # Define your optimizer
scheduler = CosineAnnealingLR(optimizer, T_max=100, eta_min=0.00001)

# Train your model
for epoch in range(num_epochs):
...
scheduler.step(epoch)

"""
the CosineAnnealingLR class takes an optimizer, T_max, and eta_min as
input. T_max represents the maximum number of steps over which the learning
rate will decrease from its initial value to eta_min. eta_min represents
the minimum value of the learning rate. The step method calculates the
learning rate using the formula:

eta_min + (1 - eta_min) * (1 + cos(pi * current_step / T_max)) / 2,

where current_step is incremented each time the step method is called. The
calculated learning rate is then applied to all the parameter groups in the
optimizer.
"""
Photo by freestocks on Unsplash

The above examples are easy to implement from scratch, and one can easily come up with his/her own scheduling algorithm. However, ‘the top DAWGS’ i.e., PyTorch and TensorFlow have most/all of these schedulers implemented within their libraries (links attached).

In summary, learning rate schedulers are used to improve the convergence and stability of deep learning models. By carefully selecting and tuning the learning rate schedule, it is possible to achieve better performance and faster convergence, especially for large and complex models.

If you enjoyed reading this article, please give it a like and follow. For questions, please use the comment section. If you want to chat, reach out to me on LinkedIn or Twitter.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓