Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion
Latest   Machine Learning

SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion

Last Updated on June 4, 2024 by Editorial Team

Author(s): Reza Yazdanfar

Originally published on Towards AI.

Yes I know the subtitle seems too catchy 😅

I’m skipping the part to say β€œyay, time series is important but challenging! and …” which means I assume the reader knows the delicacy of Time Series forecasting and wants to absorb the core!

What does this paper focus on?

The contribution

There are two main contributions in this paper: SOFT and STAD

SOFT

SOFT: Series-cOre Fused Time Series forecaster

It is designed for multivariate time series forecasting that uses STAD module to balance channel independence and correlation. This helped to achieve superior performance with linear complexity by centralizing channel interactions into a global core representation.

STAD

STAD: the STar Aggregate Dispatch module

STAD is the foundation of SOFT, where SOFT is a simple MLP-based model. STAD is a centralized structure that captures the dependencies between channels in multivariate time series. The results show that this method is both effective and scalable.

Figure 1:Overview of our SOFTS method. The multivariate time series is first embedded along the temporal dimension to get the series representation for each channel. Then the channel correlation is captured by multiple layers of simple and efficient STAD modules. The STAD module utilizes a centralized structure which first aggregates the series representation to obtain a global core representation, and then dispatches and fuses the global information with each series. [source]

Reversible Instance Normalization

The researchers used this method as it is adapted from ITRANSFORMER paper, to consider normalization as a hyperparameter.

Reversible Instance Normalization in iTransformer simply applies attention and feed-forward networks on the inverted dimensions, enabling the model to capture multivariate correlations and learn nonlinear representations effectively.

Series Embedding

Series embedding is less complicated than patch embedding, we can say it’s like setting the patch length to the length of the whole series. The researchers used a linear projection to embed the series of each channel to So:

STar Aggregate Dispatch (STAD) module

We refine the series embedding by several STAD modules:

Linear Predictor

After N layer of STAD, there is a linear predictor for our task (forecasting), if S_N is the output representation of layer n, the prediction will be as follows:

Let’s get into more details of STAD

Star Aggregate Dispatch Module

The STar Aggregate Dispatch (STAD) module is a centralized mechanism designed to capture dependencies between channels in multivariate time series forecasting. Unlike traditional distributed structures such as attention mechanisms, which directly compare characteristics of each channel pair and result in quadratic complexity, STAD reduces this complexity to linear by employing a centralized strategy. It aggregates information from all series into a global core representation and then dispatches this core information back to individual series representations, enabling efficient channel interactions with improved robustness against abnormal channels.

This centralized structure is inspired by the star-shaped systems in software engineering, where a central server aggregates and exchanges information instead of direct peer-to-peer communication between clients. This design allows STAD to maintain the benefits of channel independence while capturing necessary correlations to improve predictive accuracy. By aggregating channel statistics into a single core representation, STAD mitigates the risk of relying on potentially untrustworthy correlations in non-stationary time series.

Figure 2:The comparison of the STAD module and several common modules, like attention, GNN and mixer. These modules employ a distributed structure to perform the interaction, which relies on the quality of each channel. On the contrary, our STAD module utilizes a centralized structure that first aggregates the information from all the series to obtain a comprehensive core representation. Then the core information is dispatched to each channel. This kind of interaction pattern reduces not only the complexity of interaction but also the reliance on the channel quality. [source]

Empirical results show that the STAD module not only achieves superior performance over existing state-of-the-art methods but does so with significantly lower computational demands. This makes it scalable for datasets with a large number of channels or long lookback windows, a challenge for many other transformer-based models. Additionally, the STAD module’s universality allows it to be used as a replacement for attention mechanisms in various transformer-based models, further validating its efficiency and effectiveness.

The input for STAD is the series representation for each channel, processing it through an MLP and then pooling (here it is stochastic pooling):

Now we calculated the core representation (O), we fuse the representations for the core and all the series:

The Repeat_Concat concatenates the core representation O to each series representation to get Fi. Then we give this Fi to another MLP and add the output to the previous hidden dimension to calculate the next one.

  • Note that there’s also a residual connection from the input to the output.

Results

Though the method looks simple, it reduced the complexity significantly (from quadratic to linear) which is great😅😉

Complexity comparison between popular time series forecasters concerning window length L, number of channels C and forecasting horizon H. Our method achieves only linear complexity.

The researchers experimented with a wide range of datasets and compared to most predecessors, as you can see:

Multivariate forecasting results with horizon H ∈ {12,24,48,96} for PEMS and H ∈ {96,192,336,720} for others and fixed lookback length L = 96. Results are averaged from all prediction horizons. [source]

They did other experiments, but in order to prevent lengthening this article, I recommend reading the original research paper.

Disclaimer: I used Nouswise to write this article, it’s like a search engine that you can find the information through your documents. It’s not generally available, but you can contact me directly to give access, it could be on X (formerly Twitter) or our Discord Server.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓