LLM-Mixer: Multiscale Mixing in LLMs for Time Series Forecasting

Last Updated on November 3, 2024 by Editorial Team

Author(s): Reza Yazdanfar

Originally published on Towards AI.

Time Series

Time series is crucial in different fields, for example, in finance, accurate time series forecasting helps in stock market predictions and risk management, enabling better investment decisions!

Time series forecasting in easy words is more of using the past data, and forecasting the future data; something that we humans, intelligent creatures, do it intuitively more than often (don’t believe me? go and talk with any financial analyst or trader…).

Time series forecasting is challenging because of it inherent complexities of real-world data, which often represents nonlinear, non-stationary, and multivariate characteristics. Moreover time series data features multiple time scales, with short-term fluctuations and long-term trends that traditional models struggle to capture simultaneously.

LLMs in time series

Openai showed the power of LLMs by introducing GPT3.5 and later GPT4, and ever since everything changed. Most researchers realized the power of scaling, and since then, the research in LLMs and AI in general multiplied.

One line of work has been using the power of LLMs in other domains of AI like time series forecasting. LLMs are particularly attractive for time series forecasting due to their abilities in few-shot or zero-shot transfer learning, multimodal knowledge integration, and complex reasoning.

Time series data often represents continuous and irregular patterns, unlike the discrete tokens that LLMs are typically designed to process. Not to mention that time series usually come with multiple time scales, ranging from short term fluctuations to long term trends.

The level of complexity is quite lucrative in my humble opinion, and well, consequently challenging! 🔥😄

Problem in short

The challenge of accurately forecasting time series data, which often involves complex multiscale temporal patterns.

Solution in short: LLM-Mixer

LLM-Mixer adapts LLMs for time series forecasting by decomposing the data into multiple temporal resolutions. This approach allows the LLM to better understand and model the complex patterns within the time series data, capturing both short-term and long-term dependencies effectively. Through multiscale time-series decomposition combined with LLMs, LLM-Mixer achieves competitive performance and improves forecasting accuracy across various datasets and forecasting horizons.

Figure 1: The LLM-Mixer framework for time series forecasting. Time series data is downsampled to multiple scales and enriched with embeddings. These multiscale representations are processed by the Past-Decomposable-Mixing (PDM) module and then input into a pre-trained LLM, which, guided by a textual description, generates the forecast.

LLM-Mixer in details: Architecture

1) Data Downsampling and Embedding:

We start by downsampling the time series data into multiple temporal resolutions to capture both short-term fluctuations and long-term trends.

These multiscale series are then enriched with three types of embeddings: token, temporal, and positional embeddings.

2) Token, Temporal, and Positional Embeddings:

We calculate token embeddings by 1D convolutions, temporal embeddings encode information such as day, week, and month, and positional embeddings encode the sequence positions. These embeddings transform the multiscale time series into deep feature representations.

3) Past-Decomposable-Mixing (PDM) Module:

The multiscale representations are processed by the PDM module, which mixes past information across different scales. The PDM module breaks down complex time series data into separate seasonal and trend components, allowing for targeted processing of each component.

4) Pre-trained Large Language Model (LLM) Processing:

The processed multiscale data, along with a textual prompt that provides task-specific information, is input into a frozen pre-trained LLM. The frozen LLM utilizes its semantic knowledge and the multiscale information to generate the forecast.

5) Forecast Generation:

Finally, a trainable decoder, which is a simple linear transformation, is applied to the last hidden layer of the LLM to predict the next set of future time steps. This step culminates in the output of forecasts, completing the LLM-Mixer framework processing pipeline.

Datasets

For long-term forecasting, the datasets include the ETT datasets (ETTh1, ETTh2, ETTm1, ETTm2), as well as the Weather, Electricity, and Traffic datasets.
For short-term forecasting tasks, the framework uses the PeMS dataset, which consists of four public traffic network datasets (PEMS03, PEMS04, PEMS07, PEMS08), with time series data collected at various frequencies.

This is not that important for the aim of this article, since it’s the standard for every ml paper to evaluate on the same benchmark over and over again.

Results

In short LLM-Mixer is great, lol, if you don’t think so read this section 🙂

For long-term multivariate forecasting, LLM-Mixer shows competitive performance, particularly excelling on the ETTh1, ETTh2, and Electricity datasets. It consistently achieves low Mean Squared Error (MSE) and Mean Absolute Error (MAE) values over multiple forecasting horizons (96, 192, 384, and 720 time steps), outperforming models like TIME-LLM, TimeMixer, and PatchTST.

Table 2: Short-term multivariate forecasting results.

For short-term multivariate forecasting, LLM-Mixer again exhibits strong performance, delivering low MSE and MAE values consistently across the PEMS datasets. It achieves competitive accuracy on datasets like PEMS03, PEMS04, and PEMS07, outperforming other models including TIME-LLM, TimeMixer, and PatchTST. On the PEMS08 dataset, LLM-Mixer delivers superior results compared to iTransformer and DLinear, emphasizing its effectiveness in capturing essential temporal dynamics for short-horizon forecasting tasks.

Table 3: Long-term univariate forecasting results.

Finally, for univariate long-term forecasting, LLM-Mixer achieves the lowest MSE and MAE values across datasets on the ETT benchmark, consistently outperforming methods such as Linear, NLinear, and FEDformer.

Thanks for reading this article, feel free to follow me on X platforms or LinkedIn. I have used Nouswise.com to write this article, you can find the original paper of this article and millions of other papers on it. (caution: you didn’t read an AI generated text 🔥😂)

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

LLM-Mixer: Multiscale Mixing in LLMs for Time Series Forecasting

Author(s): Reza Yazdanfar

Time Series

LLMs in time series

Problem in short

Solution in short: LLM-Mixer

LLM-Mixer in details: Architecture

1) Data Downsampling and Embedding:

2) Token, Temporal, and Positional Embeddings:

3) Past-Decomposable-Mixing (PDM) Module:

4) Pre-trained Large Language Model (LLM) Processing:

5) Forecast Generation:

Datasets

Results

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

#64 Here’s how you keep up with AI!

NN#7 — Neural Networks Decoded: Concepts Over Code

Create a whole book with Claude AI Sonnet 3.7 (Part II)

Microsoft Muse Can Design Video Games Based on Your Playing Style

Decoding OpenAI’s Advanced Reasoning Models: A Gentle Introduction to How They Work

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

LLM-Mixer: Multiscale Mixing in LLMs for Time Series Forecasting

Author(s): Reza Yazdanfar

Time Series

LLMs in time series

Problem in short

Solution in short: LLM-Mixer

LLM-Mixer in details: Architecture

1) Data Downsampling and Embedding:

2) Token, Temporal, and Positional Embeddings:

3) Past-Decomposable-Mixing (PDM) Module:

4) Pre-trained Large Language Model (LLM) Processing:

5) Forecast Generation:

Datasets

Results

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement