Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Stateless vs Stateful LSTMs
Latest

Stateless vs Stateful LSTMs

Last Updated on July 16, 2022 by Editorial Team

Author(s): Harshit Sharma

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

In machine learning, it is generally assumed that the training samples are Independent and Identically Distributed (IID). As far as the sequence data is concerned, this isn’t always true. If the sequence values have temporal dependence among them, such as Time Series data, the IID assumption fails.

The sequence modeling algorithms hence come in two flavors, Stateless and Stateful, depending upon the architecture used while training. Following is a discussion using LSTM as an example, but the notion is applicable to other variants as well, namely RNN, GRU,Β etc.

This architecture is used when the IID assumption holds. While creating batches for training, this means that there is no inter-relationship across the batches, and each batch is independent of oneΒ other.

The typical training process in a stateless LSTM architecture is shownΒ below:

Image byΒ Author

The way these two architectures differ is the manner in which the states (cell and hidden states) of the model (corresponding to each batch) are initialized as the training progresses from one batch to another. This is not to be confused with the parameters/weights, which are anyways propagated through the entire training process (which is the whole point of training)

In the above diagram, the initial states of LSTM are reset to zeros every time the new batch is taken up and processed, thus not utilizing the already learned internal activations (states). This forces the model to forget the learnings from previousΒ batches.

Sequence data such as Time Series contains non-IID samples, and hence it won’t be a good idea to assume that the divided batches are independent when they are actually not. Hence it is intuitive to propagate the learned states across the subsequent batches so that the model captures the temporal dependence not only within each sample sequence but across the batches too.
(Note that for text data, where a sentence represents a sequence, it is generally assumed that the corpus is made up of independent sentences with no connection between them. Hence, it is safe to go for stateless architecture. Whenever this assumption doesn’t hold true, Stateful is to be preferred.)
Below is what a Stateful LSTM architecture looksΒ like:

Image byΒ Author

Here, the cell and hidden states of LSTM for each batch are initialized using the learned states from the previous batch, thereby making the model learn the dependence across the batches. The states are, however, reset at the start of each epoch. A more fine-grained visualization showing this propagation across the batches is shownΒ below:

Image byΒ Author

Here the state of the sample located at index i, X[i] will be used in the computation of sample X[i + b s] in the next batch, where bs is the batch size. More precisely, the last state for each sample at index i in a batch will be used as the initial state for the sample of index i in the following batch. In the diagram, the length of each sample sequence is 4 (timesteps), and the values of LSTM states at timestep t=4 are used for initialization in the nextΒ batch.

Observations:
1. As the batch size increases, Stateless LSTM tends to simulate Stateful LSTM.
2. For Stateful architecture, the batches are not shuffled internally (which otherwise is the default step in the case of stateless ones)

References:

  1. Stateful LSTM inΒ Keras
  2. Stateful and Stateless LSTM for Time Series Forecasting inΒ Python


Stateless vs Stateful LSTMs was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓