Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Electricity production forecasting using ARIMA model in Python
Data Visualization

Electricity production forecasting using ARIMA model in Python

Last Updated on February 19, 2021 by Editorial Team

Author(s): Jayashree domala

Data Visualization

Electricity Production Forecasting Using Arima Model inΒ Python

A guide to the step-by-step implementation of ARIMA models usingΒ Python.

Photo by Matthew Henry (Unsplash)

ARIMA which is the short form for β€˜Auto-Regressive Integrated Moving Average’ is used on time series data and it gives insights on the past values like lags and forecast errors that can be used for forecasting futureΒ values.

To know more about ARIMA models, check out the articleΒ below:

A brief introduction to ARIMA models for time series forecasting

How to implement ARIMA models to help forecasting/predicting?

The steps involved are asΒ follows:

  1. Analyzing the time series data by plotting or visualizing it.
  2. Converting the time series data in a stationary form.
  3. Plotting the ACF and PACFΒ plots.
  4. Constructing the ARIMAΒ model.
  5. Making predictions using the modelΒ created.

Implementation usingΒ Python

β†’ ImportΒ packages

The basic packages like NumPy and pandas for dealing with data are imported. For visualization, matplotlib is used. And to implement the ARIMA model, the statsmodel is imported.

>>> import numpy as np
>>> import pandas as pd
>>> import statsmodels.api as sm
>>> import matplotlib.pyplot as plt
>>> %matplotlib inline

β†’ Analyzing data

The data which is used is seasonal data of electricity production.

>>> df = pd.read_csv('Electric_Production.csv')
>>> df.head()

The column names are changed for ease ofΒ use.

>>> df.columns = ['Date','Production']
>>> df.head()

Now to work with time-series data, the β€˜date’ column is converted into DateTimeΒ index.

>>> df['Date'] = pd.to_datetime(df['Date'])
>>> df.head()
>>> df.set_index('Date',inplace=True)
>>> df.head()

β†’ Visualization ofΒ data

>>> df.plot()

As seen from the plot, this is seasonal data as there is some seasonality to it and an upward trendΒ too.

Now comparing the 12-month simple moving average along with the series to ascertain the trend. The standard deviation is also plotted to see if there is any variance orΒ no.

>>> df['Production'].rolling(12).mean().plot(label='12 SMA',figsize=(16,8))
>>> df['Production'].rolling(12).std().plot(label='12 STD')
>>> df['Production'].plot()
>>> plt.legend()

As seen from the above plot, the standard deviation is not varying much so there is not much variance.

β†’ Decomposition of the time series data to its trend, seasonality and residual components.

statsmodels is used for the decomposition.

>>> from statsmodels.tsa.seasonal import seasonal_decompose
>>> decomp = seasonal_decompose(df['Production'])
>>> fig = decomp.plot()
>>> fig.set_size_inches(14,7)

The trend, seasonal and residual errors can be seen individually here.

β†’ Converting the data into the stationary form

The data is first tested using the Dickey-Fuller test to check if the data is in stationary form or no and then change itsΒ form.

The Dickey-Fuller has the null hypothesis that a unit root is present and it is a non-stationary time series. The alternative hypothesis is that there is no unit root and the series is stationary.

We will test with the help of the parameter β€˜p’. If p is small (p<=0.05), we reject the null hypothesis, otherwise not reject the null hypothesis.

From the statsmodels package, the augmented dickey-fuller test function is imported. It returns a tuple that consists of the values: adf, pvalue, usedlag, nobs, critical values, icbest and resstore.

>>> from statsmodels.tsa.stattools import adfuller

Then this function is called on the production column of theΒ dataset.

>>> fuller_test = adfuller(df['Production'])
>>> fuller_test
(-2.256990350047239,
0.18621469116586975,
15,
381,
{'1%': -3.4476305904172904,
'5%': -2.869155980820355,
'10%': -2.570827146203181},
1840.8474501627156)

Now the p-value is printed and using the p-value, it is ascertained if data is stationary orΒ not.

>>> def test_p_value(data):
fuller_test = adfuller(data)
print('P-value: ',fuller_test[1])
if fuller_test[1] <= 0.05:
print('Reject null hypothesis, data is stationary')
else:
print('Do not reject null hypothesis, data is not stationary')
>>> test_p_value(df['Production'])
P-value:  0.18621469116586975
Do not reject null hypothesis, data is not stationary

Since the data is not stationary, differencing is carried out. The difference is the change of the time series from one period to the next. The first difference, second difference and seasonal difference are calculated and for each, the p-value isΒ checked.

df['First_diff'] = df['Production'] - df['Production'].shift(1)
df['First_diff'].plot()
>>> test_p_value(df['First_diff'].dropna())
P-value:  4.0777865655398996e-10
Reject null hypothesis, data is stationary

In the first difference, we got the data in the stationary form. In case, a second difference was needed then the following would have beenΒ done.

>>> df['Second_diff'] = df['First_diff'] - df['First_diff'].shift(1)
>>> df['Second_diff'].plot()
>>> test_p_value(df['Second_diff'].dropna())
P-value:  4.1836937480000375e-17
Reject null hypothesis, data is stationary

A seasonal difference can also be taken as follows. The shifting will happen by an entire season that isΒ β€˜12’.

>>> df['Seasonal_diff'] = df['Production'] - df['Production'].shift(12)
>>> df['Seasonal_diff'].plot()
>>> test_p_value(df['Seasonal_diff'].dropna())
P-value:  8.812644938089026e-07
Reject null hypothesis, data is stationary

β†’ Plotting the ACF and PACFΒ plot

From the statsmodels package, the ACF and PACF plot functions are imported.

>>> from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
>>> first_diff = plot_acf(df['First_diff'].dropna())
>>> sec_diff = plot_pacf(df['Second_diff'].dropna())

Now the final ACF and PACF plots will be plotted which will be usedΒ further.

>>> p1 = plot_acf(df['Seasonal_diff'].dropna())
>>> p2 = plot_pacf(df['Seasonal_diff'].dropna())

β†’ Constructing the ARIMAΒ model

For non-seasonal data, the ARIMA model can be imported from statsmodels module.

>>> from statsmodels.tsa.arima_model import ARIMA

For seasonal data, the seasonal ARIMA model can be imported from the statsmodels module. The data used here is seasonal data so the seasonal ARIMA model is imported. The arguments passed are production column, order and seasonalΒ order.

order: The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters.

seasonal order: The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity.

>>> model = sm.tsa.statespace.SARIMAX(df['Production'],order=(0,1,0),seasonal_order=(1,1,1,12))

Once the model is created, the next thing to do is to fit theΒ model.

>>> results = model.fit()
>>> results.summary()

To know about the residuals values or error, the β€˜resid’ method can be called on theΒ results.

>>> results.resid
Date
1985-01-01 72.505200
1985-02-01 -1.833200
1985-03-01 -8.221800
1985-04-01 -4.978800
1985-05-01 -2.156300
...
2017-09-01 0.529985
2017-10-01 4.057874
2017-11-01 0.690663
2017-12-01 2.477697
2018-01-01 6.953533
Length: 397, dtype: float64

The plot of the residual points can beΒ created.

>>> results.resid.plot()

The distribution of the errors can be seen by plotting the KDE. And as seen from the plot below, the errors are distributed around 0 which isΒ good.

>>> results.resid.plot(kind='kde')

β†’ Predicting or forecasting

By predicting the values, the model’s performance can be ascertained. First, we can look into how it predicts the data present and then move onto predicting futureΒ data.

>>> df['prediction'] = results.predict()
>>> df[['Production','prediction']].plot(figsize=(12,8))

As seen from the above plot, the model does a good job is predicting the present data. Now to predict for the future, we can add more months to the dataset with null values and predict for it. This can be done using pandas. The last index is obtained which is the last date and a month offset is added to it which starts from 1 and goes up toΒ 24.

>>> from pandas.tseries.offsets import DateOffset
>>> extra_dates = [df.index[-1] + DateOffset(months=m) for m in range (1,24)]
>>> extra_dates
[Timestamp('2018-02-01 00:00:00'),
Timestamp('2018-03-01 00:00:00'),
Timestamp('2018-04-01 00:00:00'),
Timestamp('2018-05-01 00:00:00'),
Timestamp('2018-06-01 00:00:00'),
Timestamp('2018-07-01 00:00:00'),
Timestamp('2018-08-01 00:00:00'),
Timestamp('2018-09-01 00:00:00'),
Timestamp('2018-10-01 00:00:00'),
Timestamp('2018-11-01 00:00:00'),
Timestamp('2018-12-01 00:00:00'),
Timestamp('2019-01-01 00:00:00'),
Timestamp('2019-02-01 00:00:00'),
Timestamp('2019-03-01 00:00:00'),
Timestamp('2019-04-01 00:00:00'),
Timestamp('2019-05-01 00:00:00'),
Timestamp('2019-06-01 00:00:00'),
Timestamp('2019-07-01 00:00:00'),
Timestamp('2019-08-01 00:00:00'),
Timestamp('2019-09-01 00:00:00'),
Timestamp('2019-10-01 00:00:00'),
Timestamp('2019-11-01 00:00:00'),
Timestamp('2019-12-01 00:00:00')]

Now another dataframe is created with these extra future date values as an index and the rest of the column values asΒ null.

>>> forecast_df = pd.DataFrame(index=extra_dates,columns=df.columns)
>>> forecast_df.head()

Now the original dataframe and this forecast dataframe are concatenated into a single one for forecasting.

>>> final_df = pd.concat([df,forecast_df])

Now we can predict the values for the end data points by mentioning the start and end arguments while calling the β€˜predict’ function.

>>> final_df['prediction'] = results.predict(start=396, end=430)
>>> final_df[['Production','prediction']].plot(figsize=(12,8))

The prediction line can be seen clearly for the futureΒ values.

Conclusion

ARIMA model was successfully used to predict the future values for the electricity production which is a seasonal dataset with non-stationary behavior. Using the proper steps, the data was converted to the stationary form and the prediction model wasΒ built.

Refer to the notebook and datasetΒ here.

Reach out to me:Β LinkedIn

Check out my other work:Β GitHub


Electricity production forecasting using ARIMA model in Python was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓