Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Deep Learning for Time Series Forecasting
Latest   Machine Learning

Deep Learning for Time Series Forecasting

Last Updated on January 18, 2025 by Editorial Team

Author(s): Sarvesh Khetan

Originally published on Towards AI.

Table of Contents :

Feed Forward Neural Network

If you think carefully you will conclude that this can be seen as a multivariate regression problem and hence we can use a FFNN to solve this

FFNN (you can add hidden layers in this if you want to)

Issues with FFNNs :

  1. FFNNs did not take the complete history of sequential information into account to make the prediction, it just took a window of sequential information into account to make the prediction

2. FFNNs will fail for variable size inputs. What do you mean by variable size inputs?? As you can see in FFNNs you have to give N input features to make the prediction but what if we have less than N or more than N no of input features??? FFNN fails to give a prediction in such a case.

Recurrent Neural Network (RNN) β€” Unidirectional

To solve the issues with FFNN, researchers developed RNNs, you can read more about RNNs here

Recurrent Neural Networks(RNNs) for Sequence Classification

Sequence Modelling

khetansarvesh.medium.com

Note : Time Series Forecasting is a regression task but in above blog I have shown all the equations assuming a classification task, you can change the equations according to a regression Task!!

Bidirectional Recurrent Neural Network (BiRNN)

Single Layer Architecture

Zt represents matured representation of Xt input

Hence we can see that BiRNN consumes twice as much memory for weights and biases as a RNN

Shorthand Notation of above architecture
Shorter Shorthand Notation of above architecture

Stacked BiRNN

Stacked BiRNN Architecture (instead of concatenation you could have also done element wise + or * too)
Shorthand Notation of Stacked BiRNN
Shorter Shorthand Notation of Stacked BiRNN

Long Short Term Memory RNN (LSTM RNN)

To solve the issues with RNNs, researchers developed LSTMs, you can read more about LSTMs here

LSTM for Sequence Classification

Sequence Classification

Sequence Classificationkhetansarvesh.medium.com

Note : Time Series Forecasting is a regression task but in above blog I have shown all the equations assuming a classification task, you can change the equations according to a regression Task!!

Below I have implemented a 4 hidden layer stacked LSTM RNN architecture to solve the univariate time series forecasting problem of google stock price prediction.

Time-Series-Modelling/univariate_time_series/LSTM-Stock-Price-Prediction.ipynb at main ·…

Performed literature survey on various architectures like FFNN, RNN, LSTM RNN, Gated RNN, and Transformers (SOTA Model)…

github.com

Bidirectional LSTM (BiLSTM) RNN

Single Layer Architecture :

Same as what we saw in RNN here, just replace Recurrent unit with LSTM unit

Stacked Architecture :

Same as what we saw in RNN here, just replace Recurrent unit with LSTM unit

Gated RNN Model β€” Unidirectional

Single Layer Architecture

Concept here remains exactly same as what we have seen in RNNs and LSTMs just that we change the RNN / LSTM cell to GRU cell

Below is the internal working of a GRU cell

# Create a single GRU cell
gru_cell = nn.GRUCell(input_size=10, hidden_size=10)

Stacked Architecture

Stacked Gated RNN
gru_stack = nn.GRU(input_size=10, hidden_size=10, num_layers=3) 
# 3 single GRU cells stacked on top of each other

Did GRU Solve LSTM Issue?

  • GRUs are lighter than LSTMs because we already know that LSTMs recurrent units have 3 gates but here in GRU we reduced these 3 gates to 2 gates, thus making it lighter in computation and hence faster in training.
  • GRUs were proposed in 2014 where we have reduced the computations and yet it works equally well like LSTMs in most cases.Always remember that there is no guarantee that GRUs will work better than LSTMs, the only benefit of GRU over LSTM is that training time decreased significantly.

Issues with GRU :

Though using Gated RNN we reduced the training time and also handled the vanishing gradient problem but in this world of big data now, we want to use multiple GPUs in parallel to train our model to reduce to training time further, but with RNN / LSTM RNN / Gated RNN this parallel training is just impossible because all these model are by nature sequential. FFNNs can be trained in parallel but not these.

Hence researchers wanted to make use of this superpower of a FFNN and hence researchers got their minds into thinking and came up with this new model wherein we can use a FFNN instead of a recurrent networks like RNN / LSTM RNN / Gated RNN and its variants to model sequences.

Bidirectional Gated RNN Model

Single Layer Architecture :

Same as what we saw in RNN here, just replace Recurrent unit with GRU unit

Stacked Architecture :

Same as what we saw in RNN here, just replace Recurrent unit with GRU unit

Transformer Encoder Model β€” Bidirectional

Single Layer Transformer Model

Implement Transformers (Bidirectional) from Scratch in Pytorch for Sequence Classification

Transformer Architecture

khetansarvesh.medium.com

Stacked Transformer Model

Yi denotes matured representation of Xi

Researchers have noticed that 12 to 24 hidden layers works really well in most cases !!

class TransformerEncoder(nn.Module):

def __init__(self, TransformerEncoder, N = 24):
super(TransformerEncoder, self).__init__()
self.layers = clones(TransformerEncoder, N)

def forward(self, x):
for layer in self.layers:
x = layer(x)
return x

# in single transformer we saw how to create 'transformer_encoder_layer'
# now stacking the above transformer encoder layer 24 times
stacked_transformer_encoder = TransformerEncoder(
transformer_encoder_layer,
24
)

'''
Instead of using our own implementation we can use pytorch implementation

stacked_transformer_encoder = nn.TransformerEncoder(
transformer_encoder_layer,
num_layers=12
)
'''

Efficient Transformers

Efficient Transformers

Transformers

Transformerskhetansarvesh.medium.com

Transformer Decoder Model β€” Unidirectional

Single Layer Architecture

We saw bidirectional transformer architecture here, now to make this unidirectional we will just replace the self attention layer in this architecture with masked self attention layer

In self attention, we look at both forward and backward sequential information i.e. say we are at x4 then it will look at x1, x2, x3, x5, x6, …. xm to calculate matured representation of x4.

But in masked self attention we will look only at the backward information to make it unidirectional i.e. say we are at x4 then it will look at x1, x2, x3 only i.e. vectors to left of it

To convert from self attention to mask self attention we just have to make some minor changes in step 2 of the vector implementation of self attention that we saw here. The change goes as follows …

Hence matrix implementation equivalent of this would look something like this

Now we saw multi headed self attention in case of Transformer Encoder Model, similarly here also we can have multi headed masked self attention

Stacked Architecture

# Create a single Transformer decoder cell
transformer_decoder = nn.TransformerDecoderLayer(d_model=768, nhead=12)

# Stack Transformer decoder cells
transformer_decoder_stack = nn.TransformerDecoder(decoder_layer=transformer_decoder, # from above
num_layers=6) # 6 Transformer decoders stacked on top of each other

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓