Brent Crude Oil Price Forecasting using GRU.
Last Updated on January 10, 2024 by Editorial Team
Author(s): Rakesh M K
Originally published on Towards AI.
Table of Contents
- Introduction
- Seasonality and Trend
- About Data
- Boxplot and Anomalies
- GRU (Gated Recurrent Unit)
- Data Preparation
- GRU Model and Training
- Prediction of test data
- Prediction on real data
- Conclusion
Introduction
Brent oil is light, sweet crude with low sulfur content produced from the North Sea and is a global benchmark for pricing and trading crude oil. Forecasting of crude oil prices is quite challenging since it is affected by many factors other than historical prices, mainly geopolitical concerns. The absence of constant trends and seasonality makes it difficult to predict crude oil prices. This page explains forecasting of Brent crude price using a Deep neural network configured with Gated Recurrent Unit (GRU).
Trend and Seasonality.
Trend indicates systematic increase or decrease of the values in the series (uptrend and downtrend). Seasonality in time series refers to the repeating patterns or fluctuations that occur at regular intervals, often associated with specific seasons, weeks, months, years or other recurring time periods. Just understand from the below plot, a series with uptrend and yearly seasonality.
About Data
Brent crude oil prices from 1987β05β20 to 2023β11β20 are used for the work. A portion of the data frame and the time series plot of historical data are shown below.
Looking at the time series it is quite understandable that it doesn't have a constant trend and seasonality. So, I am not going for decompose and analyze the time series since we are going to see how good a deep neural model will work on it.
Boxplot and Anomalies
Anomalies are deviations from the normal or expected nature of the series. A yearly boxplot of Brent oil since 1987 is plotted as below.
From the boxplot above we can observe abnormal volatility mainly in years 1990, 2008, 2014, 2020 and 2022. The main reasons for anomalies are listed below.
1990: Gulf war and cold winter.
2008: Financial crisis (The Great recession).
2014: Reduced geopolitical concerns and constant OPEC oil supply.
2020: Covid 19 pandemic and travel restrictions.
2022: Russia-Ukraine war.
The above-mentioned events are often anomalies, which makes the series highly volatile and unpredictable since most of the models assume that the data is homoscedastic (mean and variance remain the same over time).
Before moving to model building, letβs see what GRU is.
GRU (Gated Recurrent Unit)
GRU (Gated Recurrent Unit) is a type of recurrent neural network that is well suited for capturing long-term dependencies and sequential patterns in data, which is done with the aid of a reset gate and update gate. GRU overcomes some of the limitations of traditional RNNs, such as the vanishing gradient problem
Vanishing gradient problem occurs when the gradients of the loss function with respect to the weights become extremely small during the backpropagation process. If these gradients are very small (close to zero), they may βvanish,β causing the network to have difficulty learning long-range dependencies and capturing information from earlier time steps. GRU and LSTM effectively handle this problem.
The reset gate in a GRU decides how much of the past information should be forgotten or reset, and the update gate determines how much of the previous hidden state should be retained or βforgotten.β
In this particular forecasting case, the past 10 values (window size) are used to predict the next 1 value (horizon). Letβ prepare our data.
Data Preparation (window size=10, horizon=1).
- Normalize data.
'''normalize data'''
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler().fit(np.array(data1).reshape(-1, 1))
data = scaler.transform(np.array(data1).reshape(-1, 1)) #normalized data
'''convert normalized brent oil price to numpy array'''
prices = data1['price'].to_numpy()
prices[-10:] # last 10 normalized prices
array([0.55120504, 0.53659622, 0.5386726 , 0.55291064, 0.55609937,
0.55691509, 0.54356693, 0.50893585, 0.53481646, 0.54987023])
2. Create windowed data.
'''function to make labelled window'''
def get_labelled_window(x,horizon):
return x[:, :-horizon],x[:, -horizon:]
'''function to make windows'''
def makeWindows(x,windowSize ,horizon):
window_step = np.expand_dims(np.arange(windowSize+horizon)-1,axis=0)
window_indexes = window_step + np.expand_dims(np.arange(len(x)-(windowSize+horizon-1)),axis=0).T
windowed_array=x[window_indexes]
windows,labels = get_labelled_window(windowed_array , horizon=horizon)
return windows, labels
'''create windowed data. windowsize=10, horizon=1'''
full_windowsGRU , full_labelsGRU = makeWindows(prices, windowSize=10, horizon=1)
len(full_windowsGRU) , len(full_labelsGRU) # check length of the data array
(9514, 9514)
3. Split the data.
Split the full data such that training data is 90% and test data is 10%.
'''function to split data'''
def make_train_test_splits(windows,labels,test_split=0.1):
split_size = int(len(windows) *(1-test_split))
train_windows = windows[:split_size]
train_labels = labels[:split_size]
test_windows = windows[split_size:]
test_labels = labels[split_size:]
return train_windows ,test_windows , train_labels , test_labels
'''split the data'''
train_windowsGRU ,test_windowsGRU , train_labelsGRU , test_labelsGRU = make_train_test_splits(full_windowsGRU, full_labelsGRU,
test_split=0.1)
train_windowsGRU.shape,test_windowsGRU.shape
((8562, 10), (952, 10))
GRU Model and Training
Two GRU layers of 10 units with l2 regularization of 0.01 is used for the model configuration.
import tensorflow as tf
from tensorflow.keras import layers
tf.random.set_seed(103)
WINDOW=10
HORIZON=1
inputs = layers.Input(shape=WINDOW) #input
x=layers.Lambda(lambda x: tf.expand_dims(x, axis=1))(inputs)
x=layers.GRU(10, return_sequences=True,kernel_regularizer=tf.keras.regularizers.l2(0.01),input_shape = (WINDOW,HORIZON))(x)
x=layers.GRU(10)(x)
outputs = layers.Dense(HORIZON, activation= 'relu')(x) #output
model_GRU = tf.keras.Model(inputs=inputs,outputs=outputs, name="model_LSTM")
model_GRU.compile(loss="mae",
optimizer=tf.keras.optimizers.Adam(.001), #clipvalue=0.2
metrics=['mae','mse'])
Train the model (epochs =100, batch size=128):
#fit the model
history = model_GRU.fit(x=train_windowsGRU,
y=train_labelsGRU,
epochs=100,
batch_size=128,
verbose=1,
validation_data=(test_windowsGRU, test_labelsGRU),
)
-------------------------------------------------------------------------------------
'''Training progression...'''
-----------------------------------------------------------------------------------
mae: 0.0118 - val_mse: 3.0805e-04
Epoch 98/100
67/67 [==============================] - 0s 5ms/step - loss: 0.0070 - mae: 0.0060 - mse: 8.2355e-05 - val_loss: 0.0124 - val_mae: 0.0114 - val_mse: 2.8682e-04
Epoch 99/100
67/67 [==============================] - 0s 5ms/step - loss: 0.0069 - mae: 0.0059 - mse: 7.9939e-05 - val_loss: 0.0152 - val_mae: 0.0142 - val_mse: 3.5247e-04
Epoch 100/100
67/67 [==============================] - 0s 5ms/step - loss: 0.0072 - mae: 0.0062 - mse: 8.5466e-05 - val_loss: 0.0137 - val_mae: 0.0127 - val_mse: 3.0594e-04
The model is trained for 100 epochs. Loss curves (MSE and MAE) are shown below.
'''extract losses from training history'''
MAEtrain = history.history['loss'][-1]
MAEval = history.history['val_loss'][-1]
MSEtrain = history.history['mse'][-1]
MSEval = history.history['val_mse'][-1]
'''plot loss curves'''
fig, ax = plt.subplots(1,2,figsize=(14,4.5))
ax[0].plot(history.history['loss'],label = f'MAEtrain = {MAEtrain}' )
ax[0].plot(history.history['val_loss'],label = f'MAEval = {MAEval}')
ax[0].set_title('MAE Loss')
ax[0].grid()
ax[0].legend()
ax[1].plot(history.history['mse'],label = f'MSEtrain = {MSEtrain}' )
ax[1].plot(history.history['val_mse'],label = f'MSEval = {MSEval}')
ax[1].set_title('MSE Loss')
ax[1].grid()
ax[1].legend()
Evaluation on test data:
model_GRU.evaluate(test_windowsGRU,test_labelsGRU)
30/30 [==============================] - 0s 3ms/step - loss: 0.0137 - mae: 0.0127 - mse: 3.0594e-04
[0.013682662509381771, 0.012702614068984985, 0.00030594179406762123]
Prediction of test data
Prediction on test data is done to check whether the prediction follows the test data. Also, inverse transformation is applied to the predicted data since normalization(min-max scaling) is done on the data during data preparation.
'''predict on test data'''
predGRU = model_GRU.predict(test_windowsGRU)
predGRU_= scaler.inverse_transform(predGRU)
predGRU_
array([[87.05385 ],
[86.76129 ],
[84.02901 ],
[81.67896 ],
[81.11496 ],
[82.19226 ],
[82.971756],
[83.32066 ],
[82.223625],
[78.73881 ]], dtype=float32)
Once after prediction and inverse transformation of the same, plotting of test data vs predictions is done as below.
'''plot the prediction'''
plt.subplots(figsize=(10,5))
plt.plot(predGRU_, label = 'GRU prediction', c = 'b',lw=3)
plt.plot(scaler.inverse_transform(test_labelsGRU),label = 'test labels',c='orange')
plt.title('Brent oil prediction on test data by GRU model')
plt.grid()
plt.legend()
'''print metrics'''
print(f' MAE GRU test: {MAE(predGRU_, scaler.inverse_transform(test_labelsGRU))}')
print(f' MSE GRU test: {MSE(predGRU_, scaler.inverse_transform(test_labelsGRU))}')
MAE GRU test: 1.7129471663066318
MSE GRU test: 5.563401746617568
The prediction and the metrics look impressive for a horizon of 952 (length of test data) for a highly volatile time series like the crude price. Now let us take some real data (past 10 values of crude oil) and predict using it.
Prediction on real data
Taking past 11 brent oil price till 30β11β2023 (Brent Oil Futures Historical Prices β Investing.com India) to predict for a next day.
x = scaler.transform([[80.61],[82.32],[82.45],[81.96],[81.42],[80.58],[79.98],[81.68],[83.10],[82.83]])
p = loaded_modelGRU.predict(np.array(x.reshape(1, -1, 1)))
predTomorrow= scaler.inverse_transform(p)
predTomorrow
array([[82.11944]], dtype=float32) #0riginal value = 78.88
We really couldnβt get accurate predictions with the most recent data, which may be due to high volatility in the crude price, which is affected by many factors.Also, I infer that a similar series, which we tried to predict, was present in the training data.
Conclusion
It is always challenging to forecast an anomalous series. A good result on test data doesnβt mean that it will work well on the prediction of the same heteroscedastic (mean and variance change over time) series in the future. A better approach is to forecast using a proper model to forecast the series, then forecast its volatility using another model (often called a hybrid model) followed by combining both predictions (ex, ARIMA + GARCH).
References
To read my page about brent crude forecasting using hybrid model, visit the provided page below.
ARIMA + GARCH: A Hybrid Model to Forecast Highly Volatile Data.
Since it is a challenging task to forecast highly anomalous and volatile data like crude price, this page says how toβ¦
ai.plainenglish.io
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI