Deep Learning for Time Series Forecasting

Last Updated on January 18, 2025 by Editorial Team

Author(s): Sarvesh Khetan

Originally published on Towards AI.

1. Feed Forward Neural Network
2. 1D Convolution Neural Network
3. Hidden Markov Models (HMM)
4. Conditional Random Fields (CRF)
5. Recursive Neural Network (RvNN)
6.1. [1990s] Recurrent Neural Network (RNN) — Unidirectional
6.2. Bidirectional RNN Model
7.1. [1997] Long Short Term Memory (LSTM) RNN — Unidirectional
7.2. Bidirectional LSTM RNN
8.1. [2014] Gated RNN Model — Unidirectional
8.2. Bidirectional Gated RNN Model
9.1. Transformer Encoder Model — Bidirectional
9.2. Transformer Decoder Model — Unidirectional

Feed Forward Neural Network

If you think carefully you will conclude that this can be seen as a multivariate regression problem and hence we can use a FFNN to solve this

FFNN (you can add hidden layers in this if you want to)

Issues with FFNNs :

FFNNs did not take the complete history of sequential information into account to make the prediction, it just took a window of sequential information into account to make the prediction

2. FFNNs will fail for variable size inputs. What do you mean by variable size inputs?? As you can see in FFNNs you have to give N input features to make the prediction but what if we have less than N or more than N no of input features??? FFNN fails to give a prediction in such a case.

Recurrent Neural Network (RNN) — Unidirectional

To solve the issues with FFNN, researchers developed RNNs, you can read more about RNNs here

Recurrent Neural Networks(RNNs) for Sequence Classification

Sequence Modelling

khetansarvesh.medium.com

Note : Time Series Forecasting is a regression task but in above blog I have shown all the equations assuming a classification task, you can change the equations according to a regression Task!!

Bidirectional Recurrent Neural Network (BiRNN)

Single Layer Architecture

Zt represents matured representation of Xt input

Hence we can see that BiRNN consumes twice as much memory for weights and biases as a RNN

Shorthand Notation of above architecture

Shorter Shorthand Notation of above architecture

Stacked BiRNN

Shorter Shorthand Notation of Stacked BiRNN

Long Short Term Memory RNN (LSTM RNN)

To solve the issues with RNNs, researchers developed LSTMs, you can read more about LSTMs here

LSTM for Sequence Classification

Sequence Classification

Sequence Classificationkhetansarvesh.medium.com

Note : Time Series Forecasting is a regression task but in above blog I have shown all the equations assuming a classification task, you can change the equations according to a regression Task!!

Below I have implemented a 4 hidden layer stacked LSTM RNN architecture to solve the univariate time series forecasting problem of google stock price prediction.

Time-Series-Modelling/univariate_time_series/LSTM-Stock-Price-Prediction.ipynb at main ·…

Performed literature survey on various architectures like FFNN, RNN, LSTM RNN, Gated RNN, and Transformers (SOTA Model)…

github.com

Bidirectional LSTM (BiLSTM) RNN

Single Layer Architecture :

Same as what we saw in RNN here, just replace Recurrent unit with LSTM unit

Stacked Architecture :

Same as what we saw in RNN here, just replace Recurrent unit with LSTM unit

Gated RNN Model — Unidirectional

Single Layer Architecture

Concept here remains exactly same as what we have seen in RNNs and LSTMs just that we change the RNN / LSTM cell to GRU cell

Below is the internal working of a GRU cell

# Create a single GRU cell
gru_cell = nn.GRUCell(input_size=10, hidden_size=10)

Stacked Architecture

gru_stack = nn.GRU(input_size=10, hidden_size=10, num_layers=3) 
# 3 single GRU cells stacked on top of each other

Did GRU Solve LSTM Issue?

GRUs are lighter than LSTMs because we already know that LSTMs recurrent units have 3 gates but here in GRU we reduced these 3 gates to 2 gates, thus making it lighter in computation and hence faster in training.
GRUs were proposed in 2014 where we have reduced the computations and yet it works equally well like LSTMs in most cases.Always remember that there is no guarantee that GRUs will work better than LSTMs, the only benefit of GRU over LSTM is that training time decreased significantly.

Issues with GRU :

Though using Gated RNN we reduced the training time and also handled the vanishing gradient problem but in this world of big data now, we want to use multiple GPUs in parallel to train our model to reduce to training time further, but with RNN / LSTM RNN / Gated RNN this parallel training is just impossible because all these model are by nature sequential. FFNNs can be trained in parallel but not these.

Hence researchers wanted to make use of this superpower of a FFNN and hence researchers got their minds into thinking and came up with this new model wherein we can use a FFNN instead of a recurrent networks like RNN / LSTM RNN / Gated RNN and its variants to model sequences.

Bidirectional Gated RNN Model

Single Layer Architecture :

Same as what we saw in RNN here, just replace Recurrent unit with GRU unit

Stacked Architecture :

Same as what we saw in RNN here, just replace Recurrent unit with GRU unit

Transformer Encoder Model — Bidirectional

Single Layer Transformer Model

Implement Transformers (Bidirectional) from Scratch in Pytorch for Sequence Classification

Transformer Architecture

khetansarvesh.medium.com

Stacked Transformer Model

Researchers have noticed that 12 to 24 hidden layers works really well in most cases !!

class TransformerEncoder(nn.Module):

 def __init__(self, TransformerEncoder, N = 24):
 super(TransformerEncoder, self).__init__()
 self.layers = clones(TransformerEncoder, N)

 def forward(self, x):
 for layer in self.layers:
 x = layer(x)
 return x

# in single transformer we saw how to create 'transformer_encoder_layer' 
# now stacking the above transformer encoder layer 24 times
stacked_transformer_encoder = TransformerEncoder(
 transformer_encoder_layer,
 24
 )

'''
Instead of using our own implementation we can use pytorch implementation 

stacked_transformer_encoder = nn.TransformerEncoder(
 transformer_encoder_layer, 
 num_layers=12
 )
'''

Efficient Transformers

Transformers

Transformerskhetansarvesh.medium.com

Transformer Decoder Model — Unidirectional

Single Layer Architecture

We saw bidirectional transformer architecture here, now to make this unidirectional we will just replace the self attention layer in this architecture with masked self attention layer

In self attention, we look at both forward and backward sequential information i.e. say we are at x4 then it will look at x1, x2, x3, x5, x6, …. xm to calculate matured representation of x4.

But in masked self attention we will look only at the backward information to make it unidirectional i.e. say we are at x4 then it will look at x1, x2, x3 only i.e. vectors to left of it

To convert from self attention to mask self attention we just have to make some minor changes in step 2 of the vector implementation of self attention that we saw here. The change goes as follows …

Hence matrix implementation equivalent of this would look something like this

Now we saw multi headed self attention in case of Transformer Encoder Model, similarly here also we can have multi headed masked self attention

Stacked Architecture

# Create a single Transformer decoder cell
transformer_decoder = nn.TransformerDecoderLayer(d_model=768, nhead=12)

# Stack Transformer decoder cells
transformer_decoder_stack = nn.TransformerDecoder(decoder_layer=transformer_decoder, # from above
 num_layers=6) # 6 Transformer decoders stacked on top of each other

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Deep Learning for Time Series Forecasting

Author(s): Sarvesh Khetan

Table of Contents :

Feed Forward Neural Network

Issues with FFNNs :

Recurrent Neural Network (RNN) — Unidirectional

Recurrent Neural Networks(RNNs) for Sequence Classification

Sequence Modelling

Bidirectional Recurrent Neural Network (BiRNN)

Single Layer Architecture

Stacked BiRNN

Long Short Term Memory RNN (LSTM RNN)

LSTM for Sequence Classification

Sequence Classification

Time-Series-Modelling/univariate_time_series/LSTM-Stock-Price-Prediction.ipynb at main ·…

Performed literature survey on various architectures like FFNN, RNN, LSTM RNN, Gated RNN, and Transformers (SOTA Model)…

Bidirectional LSTM (BiLSTM) RNN

Single Layer Architecture :

Stacked Architecture :

Gated RNN Model — Unidirectional

Single Layer Architecture

Stacked Architecture

Did GRU Solve LSTM Issue?

Issues with GRU :

Bidirectional Gated RNN Model

Single Layer Architecture :

Stacked Architecture :

Transformer Encoder Model — Bidirectional

Single Layer Transformer Model

Implement Transformers (Bidirectional) from Scratch in Pytorch for Sequence Classification

Transformer Architecture

Stacked Transformer Model

Efficient Transformers

Efficient Transformers

Transformers

Transformer Decoder Model — Unidirectional

Single Layer Architecture

Stacked Architecture

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement