SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion

Last Updated on June 4, 2024 by Editorial Team

Author(s): Reza Yazdanfar

Originally published on Towards AI.

Yes I know the subtitle seems too catchy 😅

I’m skipping the part to say “yay, time series is important but challenging! and …” which means I assume the reader knows the delicacy of Time Series forecasting and wants to absorb the core!

What does this paper focus on?

The contribution

There are two main contributions in this paper: SOFT and STAD

SOFT

SOFT: Series-cOre Fused Time Series forecaster

It is designed for multivariate time series forecasting that uses STAD module to balance channel independence and correlation. This helped to achieve superior performance with linear complexity by centralizing channel interactions into a global core representation.

STAD

STAD: the STar Aggregate Dispatch module

STAD is the foundation of SOFT, where SOFT is a simple MLP-based model. STAD is a centralized structure that captures the dependencies between channels in multivariate time series. The results show that this method is both effective and scalable.

Figure 1:Overview of our SOFTS method. The multivariate time series is first embedded along the temporal dimension to get the series representation for each channel. Then the channel correlation is captured by multiple layers of simple and efficient STAD modules. The STAD module utilizes a centralized structure which first aggregates the series representation to obtain a global core representation, and then dispatches and fuses the global information with each series. [source]

Reversible Instance Normalization

The researchers used this method as it is adapted from ITRANSFORMER paper, to consider normalization as a hyperparameter.

Reversible Instance Normalization in iTransformer simply applies attention and feed-forward networks on the inverted dimensions, enabling the model to capture multivariate correlations and learn nonlinear representations effectively.

Series Embedding

Series embedding is less complicated than patch embedding, we can say it’s like setting the patch length to the length of the whole series. The researchers used a linear projection to embed the series of each channel to So:

STar Aggregate Dispatch (STAD) module

We refine the series embedding by several STAD modules:

Linear Predictor

After N layer of STAD, there is a linear predictor for our task (forecasting), if S_N is the output representation of layer n, the prediction will be as follows:

Let’s get into more details of STAD

Star Aggregate Dispatch Module

The STar Aggregate Dispatch (STAD) module is a centralized mechanism designed to capture dependencies between channels in multivariate time series forecasting. Unlike traditional distributed structures such as attention mechanisms, which directly compare characteristics of each channel pair and result in quadratic complexity, STAD reduces this complexity to linear by employing a centralized strategy. It aggregates information from all series into a global core representation and then dispatches this core information back to individual series representations, enabling efficient channel interactions with improved robustness against abnormal channels.

This centralized structure is inspired by the star-shaped systems in software engineering, where a central server aggregates and exchanges information instead of direct peer-to-peer communication between clients. This design allows STAD to maintain the benefits of channel independence while capturing necessary correlations to improve predictive accuracy. By aggregating channel statistics into a single core representation, STAD mitigates the risk of relying on potentially untrustworthy correlations in non-stationary time series.

Figure 2:The comparison of the STAD module and several common modules, like attention, GNN and mixer. These modules employ a distributed structure to perform the interaction, which relies on the quality of each channel. On the contrary, our STAD module utilizes a centralized structure that first aggregates the information from all the series to obtain a comprehensive core representation. Then the core information is dispatched to each channel. This kind of interaction pattern reduces not only the complexity of interaction but also the reliance on the channel quality. [source]

Empirical results show that the STAD module not only achieves superior performance over existing state-of-the-art methods but does so with significantly lower computational demands. This makes it scalable for datasets with a large number of channels or long lookback windows, a challenge for many other transformer-based models. Additionally, the STAD module’s universality allows it to be used as a replacement for attention mechanisms in various transformer-based models, further validating its efficiency and effectiveness.

The input for STAD is the series representation for each channel, processing it through an MLP and then pooling (here it is stochastic pooling):

Now we calculated the core representation (O), we fuse the representations for the core and all the series:

The Repeat_Concat concatenates the core representation O to each series representation to get Fi. Then we give this Fi to another MLP and add the output to the previous hidden dimension to calculate the next one.

Note that there’s also a residual connection from the input to the output.

Results

Though the method looks simple, it reduced the complexity significantly (from quadratic to linear) which is great😅😉

Complexity comparison between popular time series forecasters concerning window length L, number of channels C and forecasting horizon H. Our method achieves only linear complexity.

The researchers experimented with a wide range of datasets and compared to most predecessors, as you can see:

Multivariate forecasting results with horizon H ∈ {12,24,48,96} for PEMS and H ∈ {96,192,336,720} for others and fixed lookback length L = 96. Results are averaged from all prediction horizons. [source]

They did other experiments, but in order to prevent lengthening this article, I recommend reading the original research paper.

Disclaimer: I used Nouswise to write this article, it’s like a search engine that you can find the information through your documents. It’s not generally available, but you can contact me directly to give access, it could be on X (formerly Twitter) or our Discord Server.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion

Author(s): Reza Yazdanfar

The contribution

SOFT

STAD

Reversible Instance Normalization

Series Embedding

STar Aggregate Dispatch (STAD) module

Linear Predictor

Star Aggregate Dispatch Module

Results

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Scaling Intelligence: Overcoming Infrastructure Challenges in Large Language Model Operations

From Code to Conversation: The Rise of Seamless MLOps-DevOps Fusion in Large Language Models

Why Most Task Automation Fails — and How AI Agents Can Fix It

Exploring Deep Learning Models: Comparing ANN vs CNN for Image Recognition

LAI #72: From Python Groundwork to Function Calling, ICL Theory, and Load Balancing MoEs

The World’s Leading AI and Technology Publication.

Company

CONTACT US

🔥 Recommended Articles 🔥

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion

Author(s): Reza Yazdanfar

The contribution

SOFT

STAD

Reversible Instance Normalization

Series Embedding

STar Aggregate Dispatch (STAD) module

Linear Predictor

Star Aggregate Dispatch Module

Results

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥