SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion
Last Updated on June 4, 2024 by Editorial Team
Author(s): Reza Yazdanfar
Originally published on Towards AI.
Yes I know the subtitle seems too catchy 😅
Iβm skipping the part to say βyay, time series is important but challenging! and β¦β which means I assume the reader knows the delicacy of Time Series forecasting and wants to absorb the core!
What does this paper focus on?
The contribution
There are two main contributions in this paper: SOFT and STAD
SOFT
SOFT: Series-cOre Fused Time Series forecaster
It is designed for multivariate time series forecasting that uses STAD module to balance channel independence and correlation. This helped to achieve superior performance with linear complexity by centralizing channel interactions into a global core representation.
STAD
STAD: the STar Aggregate Dispatch module
STAD is the foundation of SOFT, where SOFT is a simple MLP-based model. STAD is a centralized structure that captures the dependencies between channels in multivariate time series. The results show that this method is both effective and scalable.
Reversible Instance Normalization
The researchers used this method as it is adapted from ITRANSFORMER paper, to consider normalization as a hyperparameter.
Reversible Instance Normalization in iTransformer simply applies attention and feed-forward networks on the inverted dimensions, enabling the model to capture multivariate correlations and learn nonlinear representations effectively.
Series Embedding
Series embedding is less complicated than patch embedding, we can say itβs like setting the patch length to the length of the whole series. The researchers used a linear projection to embed the series of each channel to So:
STar Aggregate Dispatch (STAD) module
We refine the series embedding by several STAD modules:
Linear Predictor
After N layer of STAD, there is a linear predictor for our task (forecasting), if S_N is the output representation of layer n, the prediction will be as follows:
Letβs get into more details of STAD
Star Aggregate Dispatch Module
The STar Aggregate Dispatch (STAD) module is a centralized mechanism designed to capture dependencies between channels in multivariate time series forecasting. Unlike traditional distributed structures such as attention mechanisms, which directly compare characteristics of each channel pair and result in quadratic complexity, STAD reduces this complexity to linear by employing a centralized strategy. It aggregates information from all series into a global core representation and then dispatches this core information back to individual series representations, enabling efficient channel interactions with improved robustness against abnormal channels.
This centralized structure is inspired by the star-shaped systems in software engineering, where a central server aggregates and exchanges information instead of direct peer-to-peer communication between clients. This design allows STAD to maintain the benefits of channel independence while capturing necessary correlations to improve predictive accuracy. By aggregating channel statistics into a single core representation, STAD mitigates the risk of relying on potentially untrustworthy correlations in non-stationary time series.
Empirical results show that the STAD module not only achieves superior performance over existing state-of-the-art methods but does so with significantly lower computational demands. This makes it scalable for datasets with a large number of channels or long lookback windows, a challenge for many other transformer-based models. Additionally, the STAD moduleβs universality allows it to be used as a replacement for attention mechanisms in various transformer-based models, further validating its efficiency and effectiveness.
The input for STAD is the series representation for each channel, processing it through an MLP and then pooling (here it is stochastic pooling):
Now we calculated the core representation (O), we fuse the representations for the core and all the series:
The Repeat_Concat concatenates the core representation O to each series representation to get Fi. Then we give this Fi to another MLP and add the output to the previous hidden dimension to calculate the next one.
- Note that thereβs also a residual connection from the input to the output.
Results
Though the method looks simple, it reduced the complexity significantly (from quadratic to linear) which is great😅😉
The researchers experimented with a wide range of datasets and compared to most predecessors, as you can see:
They did other experiments, but in order to prevent lengthening this article, I recommend reading the original research paper.
Disclaimer: I used Nouswise to write this article, itβs like a search engine that you can find the information through your documents. Itβs not generally available, but you can contact me directly to give access, it could be on X (formerly Twitter) or our Discord Server.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI