Time Series Forecasting using PyCaret
Last Updated on January 29, 2024 by Editorial Team
Author(s): Rakesh M K
Originally published on Towards AI.
PyCaret.
PyCaret is a low-code Python library for Machine Learning that automates workflows and offers efficient end-to-end experiments. Automating major steps in evaluation and comparing models makes it easier for us to examine the performance of different models and select the best one. On this page, letβs see how to approach a time series forecasting problem of familiar airline data with PyCaret.
Install and import PyCaret.
pip install pycaret
import pycaret
Prepare data.
Data available within the PyCaret library can be imported with pycaret.datasets.get_data()
. Otherwise, import using pandas from the device as usual.
'''load Airline data'''
import pandas as pd
data = pd.read_excel('Airlines_data.xlsx')
data.set_index('Month',inplace=True)
# data = pycaret.datasets.get_data('airlines) # to load from pycaret library
data.head()
Setup Environment.
The setup()
function initializes the environment for training and creates the transformation pipeline. It is important to call setup()
before executing any other function. For the Time Series experiment, the following parameters should be passed to the setup function.
fh: forecasting horizon
fold: number of folds for cross validation
session_id: random seed for experiment
'''Import Time Series libraries'''
from pycaret.time_series import *
'''Initialize the training environment'''
s = setup(data, fh = 3, fold = 5, session_id = 101)
As you can see in the above table, AutoML analyzes the time series data and provides a detailed summary, especially about seasonality. Have a look into entries Seasonality
Present and Significant Seasonal Period(s)
in the above table generated by AutoML.
Choose API and Compare Models.
PyCaret has two APIβs. Functional and OOP. Functional API follows a procedural approach and offers a straightforward interface. Since the settings are applied globally across functions, this API is convenient for quick experiments. The OPP API follows an object-oriented approach, providing a customizable and modular workflow and, hence, more suitable for running multiple experiments independently. In this experiment, I am using functional API (Codes for OOP API are commented).
The compare_model()
function returns a table containing different models and performance metrics. It can be observed from the table that Exponential smoothing outperforms all other models.
'''Compare models using Functional API'''
best = compare_models()
# '''OOP API'''
# best = s.compare_models()
Plot the Forecast and Diagnose Residuals.
Now letβs plot the forecast using plot_model()
function. In data_kwargs
the dictionary, we can also give keywords for the plot, such as title, legend, opacity, height, width
etc. as per requirement.
'''Plot forecasting using functional API'''
plot_model(best, plot = 'forecast', data_kwargs = {'fh' : 24})
# OOP API
# s.plot_model(best, plot = 'forecast', data_kwargs = {'fh' : 24})
The modelβs prediction on training data (in sample) can be plotted as below.
'''Insample plot using functional API'''
plot_model(best, plot = 'insample')
# OOP API
# s.plot_model(best, plot = 'insample')
To diagnose the residuals, we can use the same plot_model()
function with key word plot = βdiagnosticsβ
as below.
'''Diagnose best model using functional API'''
plot_model(best, plot = 'diagnostics')
# OOP API
# s.plot_model(best, plot = 'diagnostics')
Forecast using Best Model
Since we have Exponential Smoothing as our best model, letβs forecast and plot for a horizon of 24.
'''Forecast using functional API'''
forecast = predict_model(best, fh = 24)
# OOP API
# s.predict_model(best, fh = 24)
'''Plot forecast'''
import matplotlib.pyplot as plt
plt.plot(forecast.index.strftime('%Y-%m'), forecast['y_pred'].astype(float),label = 'forecast')
plt.legend(loc=[1,1])
plt.xticks(rotation=45);
Save and Load the Best Model.
The model can be saved using save_model()
function that saves the model and returns the model architecture.
'''Save model using functional API'''
finalModel = finalize_model(best)
save_model(finalModel, 'bestModel')
# OOP API
# s.save_model(finalModel, 'bestModel')
Saved model can be loaded using load_model()
function as follows.
'''Load saved model using functional API'''
loaded_model = load_model('bestModel')
# OOP API
# loaded_model = s.load_model('my_final_best_model')
Summary.
We have observed on this page that PyCaret simplifies tasks such as data preprocessing, model selection, hyperparameter tuning, and model evaluation, reforming the workflow for users. The AutoML capability significantly reduces the coding effort required for developing robust machine learning models, making it an efficient and user-friendly tool for both beginners and experienced data scientists. Also, PyCaret can be used for other machine-learning tasks such as Regression, Classification, Clustering, Anomaly detection, etc.
References.
- PyCaret 3.0 β Docs (gitbook.io)
- https://medium.com/python-in-plain-english/pycaret-simplified-mastering-low-code-machine-learning-e0277faed913
U+2709οΈ [email protected]
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI