Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


Brent Crude Oil Price Forecasting using GRU.
Latest   Machine Learning

Brent Crude Oil Price Forecasting using GRU.

Last Updated on January 10, 2024 by Editorial Team

Author(s): Rakesh M K

Originally published on Towards AI.

Credits: Bing Image Creator

Table of Contents

  1. Introduction
  2. Seasonality and Trend
  3. About Data
  4. Boxplot and Anomalies
  5. GRU (Gated Recurrent Unit)
  6. Data Preparation
  7. GRU Model and Training
  8. Prediction of test data
  9. Prediction on real data
  10. Conclusion


Brent oil is light, sweet crude with low sulfur content produced from the North Sea and is a global benchmark for pricing and trading crude oil. Forecasting of crude oil prices is quite challenging since it is affected by many factors other than historical prices, mainly geopolitical concerns. The absence of constant trends and seasonality makes it difficult to predict crude oil prices. This page explains forecasting of Brent crude price using a Deep neural network configured with Gated Recurrent Unit (GRU).

Trend and Seasonality.

Trend indicates systematic increase or decrease of the values in the series (uptrend and downtrend). Seasonality in time series refers to the repeating patterns or fluctuations that occur at regular intervals, often associated with specific seasons, weeks, months, years or other recurring time periods. Just understand from the below plot, a series with uptrend and yearly seasonality.

A time series with uptrend and yearly seasonality.

About Data

Brent crude oil prices from 1987–05–20 to 2023–11–20 are used for the work. A portion of the data frame and the time series plot of historical data are shown below.

Brent oil dataframe.
Historical price of brent oil since 1987–05–20 to 2023–11–20.

Looking at the time series it is quite understandable that it doesn't have a constant trend and seasonality. So, I am not going for decompose and analyze the time series since we are going to see how good a deep neural model will work on it.

Boxplot and Anomalies

Anomalies are deviations from the normal or expected nature of the series. A yearly boxplot of Brent oil since 1987 is plotted as below.

Boxplot of Brent oil (yearly)

From the boxplot above we can observe abnormal volatility mainly in years 1990, 2008, 2014, 2020 and 2022. The main reasons for anomalies are listed below.

1990: Gulf war and cold winter.

2008: Financial crisis (The Great recession).

2014: Reduced geopolitical concerns and constant OPEC oil supply.

2020: Covid 19 pandemic and travel restrictions.

2022: Russia-Ukraine war.

The above-mentioned events are often anomalies, which makes the series highly volatile and unpredictable since most of the models assume that the data is homoscedastic (mean and variance remain the same over time).

Before moving to model building, let’s see what GRU is.

GRU (Gated Recurrent Unit)

GRU (Gated Recurrent Unit) is a type of recurrent neural network that is well suited for capturing long-term dependencies and sequential patterns in data, which is done with the aid of a reset gate and update gate. GRU overcomes some of the limitations of traditional RNNs, such as the vanishing gradient problem

Vanishing gradient problem occurs when the gradients of the loss function with respect to the weights become extremely small during the backpropagation process. If these gradients are very small (close to zero), they may “vanish,” causing the network to have difficulty learning long-range dependencies and capturing information from earlier time steps. GRU and LSTM effectively handle this problem.

A single GRU cell. (Taken from: Bing Images)

The reset gate in a GRU decides how much of the past information should be forgotten or reset, and the update gate determines how much of the previous hidden state should be retained or “forgotten.”

In this particular forecasting case, the past 10 values (window size) are used to predict the next 1 value (horizon). Let’ prepare our data.

Data Preparation (window size=10, horizon=1).

  1. Normalize data.
'''normalize data'''
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler().fit(np.array(data1).reshape(-1, 1))
data = scaler.transform(np.array(data1).reshape(-1, 1)) #normalized data

'''convert normalized brent oil price to numpy array'''
prices = data1['price'].to_numpy()
prices[-10:] # last 10 normalized prices

array([0.55120504, 0.53659622, 0.5386726 , 0.55291064, 0.55609937,
0.55691509, 0.54356693, 0.50893585, 0.53481646, 0.54987023])

2. Create windowed data.

'''function to make labelled window'''
def get_labelled_window(x,horizon):
return x[:, :-horizon],x[:, -horizon:]

'''function to make windows'''
def makeWindows(x,windowSize ,horizon):

window_step = np.expand_dims(np.arange(windowSize+horizon)-1,axis=0)
window_indexes = window_step + np.expand_dims(np.arange(len(x)-(windowSize+horizon-1)),axis=0).T
windows,labels = get_labelled_window(windowed_array , horizon=horizon)

return windows, labels

'''create windowed data. windowsize=10, horizon=1'''
full_windowsGRU , full_labelsGRU = makeWindows(prices, windowSize=10, horizon=1)
len(full_windowsGRU) , len(full_labelsGRU) # check length of the data array

(9514, 9514)

3. Split the data.

Split the full data such that training data is 90% and test data is 10%.

'''function to split data'''
def make_train_test_splits(windows,labels,test_split=0.1):
split_size = int(len(windows) *(1-test_split))
train_windows = windows[:split_size]
train_labels = labels[:split_size]
test_windows = windows[split_size:]
test_labels = labels[split_size:]

return train_windows ,test_windows , train_labels , test_labels

'''split the data'''
train_windowsGRU ,test_windowsGRU , train_labelsGRU , test_labelsGRU = make_train_test_splits(full_windowsGRU, full_labelsGRU,
((8562, 10), (952, 10))

GRU Model and Training

Two GRU layers of 10 units with l2 regularization of 0.01 is used for the model configuration.

import tensorflow as tf
from tensorflow.keras import layers


inputs = layers.Input(shape=WINDOW) #input
x=layers.Lambda(lambda x: tf.expand_dims(x, axis=1))(inputs)
x=layers.GRU(10, return_sequences=True,kernel_regularizer=tf.keras.regularizers.l2(0.01),input_shape = (WINDOW,HORIZON))(x)

outputs = layers.Dense(HORIZON, activation= 'relu')(x) #output

model_GRU = tf.keras.Model(inputs=inputs,outputs=outputs, name="model_LSTM")

optimizer=tf.keras.optimizers.Adam(.001), #clipvalue=0.2

Train the model (epochs =100, batch size=128):

#fit the model
history =,
validation_data=(test_windowsGRU, test_labelsGRU),
'''Training progression...'''
mae: 0.0118 - val_mse: 3.0805e-04
Epoch 98/100
67/67 [==============================] - 0s 5ms/step - loss: 0.0070 - mae: 0.0060 - mse: 8.2355e-05 - val_loss: 0.0124 - val_mae: 0.0114 - val_mse: 2.8682e-04
Epoch 99/100
67/67 [==============================] - 0s 5ms/step - loss: 0.0069 - mae: 0.0059 - mse: 7.9939e-05 - val_loss: 0.0152 - val_mae: 0.0142 - val_mse: 3.5247e-04
Epoch 100/100
67/67 [==============================] - 0s 5ms/step - loss: 0.0072 - mae: 0.0062 - mse: 8.5466e-05 - val_loss: 0.0137 - val_mae: 0.0127 - val_mse: 3.0594e-04

The model is trained for 100 epochs. Loss curves (MSE and MAE) are shown below.

'''extract losses from training history'''
MAEtrain = history.history['loss'][-1]
MAEval = history.history['val_loss'][-1]
MSEtrain = history.history['mse'][-1]
MSEval = history.history['val_mse'][-1]

'''plot loss curves'''

fig, ax = plt.subplots(1,2,figsize=(14,4.5))
ax[0].plot(history.history['loss'],label = f'MAEtrain = {MAEtrain}' )
ax[0].plot(history.history['val_loss'],label = f'MAEval = {MAEval}')
ax[0].set_title('MAE Loss')
ax[1].plot(history.history['mse'],label = f'MSEtrain = {MSEtrain}' )
ax[1].plot(history.history['val_mse'],label = f'MSEval = {MSEval}')
ax[1].set_title('MSE Loss')
GRU Model loss curves (MAE and MSE).

Evaluation on test data:


30/30 [==============================] - 0s 3ms/step - loss: 0.0137 - mae: 0.0127 - mse: 3.0594e-04
[0.013682662509381771, 0.012702614068984985, 0.00030594179406762123]

Prediction of test data

Prediction on test data is done to check whether the prediction follows the test data. Also, inverse transformation is applied to the predicted data since normalization(min-max scaling) is done on the data during data preparation.

'''predict on test data'''
predGRU = model_GRU.predict(test_windowsGRU)
predGRU_= scaler.inverse_transform(predGRU)

array([[87.05385 ],
[86.76129 ],
[84.02901 ],
[81.67896 ],
[81.11496 ],
[82.19226 ],
[83.32066 ],
[78.73881 ]], dtype=float32)

Once after prediction and inverse transformation of the same, plotting of test data vs predictions is done as below.

'''plot the prediction'''
plt.plot(predGRU_, label = 'GRU prediction', c = 'b',lw=3)
plt.plot(scaler.inverse_transform(test_labelsGRU),label = 'test labels',c='orange')
plt.title('Brent oil prediction on test data by GRU model')
GRU Model prediction.
'''print metrics'''
print(f' MAE GRU test: {MAE(predGRU_, scaler.inverse_transform(test_labelsGRU))}')
print(f' MSE GRU test: {MSE(predGRU_, scaler.inverse_transform(test_labelsGRU))}')

MAE GRU test: 1.7129471663066318
MSE GRU test: 5.563401746617568

The prediction and the metrics look impressive for a horizon of 952 (length of test data) for a highly volatile time series like the crude price. Now let us take some real data (past 10 values of crude oil) and predict using it.

Prediction on real data

Taking past 11 brent oil price till 30–11–2023 (Brent Oil Futures Historical Prices — India) to predict for a next day.

Brent oil price since 17–11–2023 till 1–12–2023 (credits:
x = scaler.transform([[80.61],[82.32],[82.45],[81.96],[81.42],[80.58],[79.98],[81.68],[83.10],[82.83]])
p = loaded_modelGRU.predict(np.array(x.reshape(1, -1, 1)))
predTomorrow= scaler.inverse_transform(p)

array([[82.11944]], dtype=float32) #0riginal value = 78.88

We really couldn’t get accurate predictions with the most recent data, which may be due to high volatility in the crude price, which is affected by many factors.Also, I infer that a similar series, which we tried to predict, was present in the training data.


It is always challenging to forecast an anomalous series. A good result on test data doesn’t mean that it will work well on the prediction of the same heteroscedastic (mean and variance change over time) series in the future. A better approach is to forecast using a proper model to forecast the series, then forecast its volatility using another model (often called a hybrid model) followed by combining both predictions (ex, ARIMA + GARCH).


  1. tf.keras.layers.GRU U+007C TensorFlow v2.14.0
  2. Brent Crude Oil Price — India

To read my page about brent crude forecasting using hybrid model, visit the provided page below.

ARIMA + GARCH: A Hybrid Model to Forecast Highly Volatile Data.

Since it is a challenging task to forecast highly anomalous and volatile data like crude price, this page says how to…

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓