LLM-Mixer: Multiscale Mixing in LLMs for Time Series Forecasting
Last Updated on November 3, 2024 by Editorial Team
Author(s): Reza Yazdanfar
Originally published on Towards AI.
Time Series
Time series is crucial in different fields, for example, in finance, accurate time series forecasting helps in stock market predictions and risk management, enabling better investment decisions!
Time series forecasting in easy words is more of using the past data, and forecasting the future data; something that we humans, intelligent creatures, do it intuitively more than often (donβt believe me? go and talk with any financial analyst or traderβ¦).
Time series forecasting is challenging because of it inherent complexities of real-world data, which often represents nonlinear, non-stationary, and multivariate characteristics. Moreover time series data features multiple time scales, with short-term fluctuations and long-term trends that traditional models struggle to capture simultaneously.
LLMs in time series
Openai showed the power of LLMs by introducing GPT3.5 and later GPT4, and ever since everything changed. Most researchers realized the power of scaling, and since then, the research in LLMs and AI in general multiplied.
One line of work has been using the power of LLMs in other domains of AI like time series forecasting. LLMs are particularly attractive for time series forecasting due to their abilities in few-shot or zero-shot transfer learning, multimodal knowledge integration, and complex reasoning.
Time series data often represents continuous and irregular patterns, unlike the discrete tokens that LLMs are typically designed to process. Not to mention that time series usually come with multiple time scales, ranging from short term fluctuations to long term trends.
The level of complexity is quite lucrative in my humble opinion, and well, consequently challenging! 🔥😄
Problem in short
The challenge of accurately forecasting time series data, which often involves complex multiscale temporal patterns.
Solution in short: LLM-Mixer
LLM-Mixer adapts LLMs for time series forecasting by decomposing the data into multiple temporal resolutions. This approach allows the LLM to better understand and model the complex patterns within the time series data, capturing both short-term and long-term dependencies effectively. Through multiscale time-series decomposition combined with LLMs, LLM-Mixer achieves competitive performance and improves forecasting accuracy across various datasets and forecasting horizons.
LLM-Mixer in details: Architecture
1) Data Downsampling and Embedding:
We start by downsampling the time series data into multiple temporal resolutions to capture both short-term fluctuations and long-term trends.
These multiscale series are then enriched with three types of embeddings: token, temporal, and positional embeddings.
2) Token, Temporal, and Positional Embeddings:
We calculate token embeddings by 1D convolutions, temporal embeddings encode information such as day, week, and month, and positional embeddings encode the sequence positions. These embeddings transform the multiscale time series into deep feature representations.
3) Past-Decomposable-Mixing (PDM) Module:
The multiscale representations are processed by the PDM module, which mixes past information across different scales. The PDM module breaks down complex time series data into separate seasonal and trend components, allowing for targeted processing of each component.
4) Pre-trained Large Language Model (LLM) Processing:
The processed multiscale data, along with a textual prompt that provides task-specific information, is input into a frozen pre-trained LLM. The frozen LLM utilizes its semantic knowledge and the multiscale information to generate the forecast.
5) Forecast Generation:
Finally, a trainable decoder, which is a simple linear transformation, is applied to the last hidden layer of the LLM to predict the next set of future time steps. This step culminates in the output of forecasts, completing the LLM-Mixer framework processing pipeline.
Datasets
- For long-term forecasting, the datasets include the ETT datasets (ETTh1, ETTh2, ETTm1, ETTm2), as well as the Weather, Electricity, and Traffic datasets.
- For short-term forecasting tasks, the framework uses the PeMS dataset, which consists of four public traffic network datasets (PEMS03, PEMS04, PEMS07, PEMS08), with time series data collected at various frequencies.
This is not that important for the aim of this article, since itβs the standard for every ml paper to evaluate on the same benchmark over and over again.
Results
In short LLM-Mixer is great, lol, if you donβt think so read this section π
For long-term multivariate forecasting, LLM-Mixer shows competitive performance, particularly excelling on the ETTh1, ETTh2, and Electricity datasets. It consistently achieves low Mean Squared Error (MSE) and Mean Absolute Error (MAE) values over multiple forecasting horizons (96, 192, 384, and 720 time steps), outperforming models like TIME-LLM, TimeMixer, and PatchTST.
For short-term multivariate forecasting, LLM-Mixer again exhibits strong performance, delivering low MSE and MAE values consistently across the PEMS datasets. It achieves competitive accuracy on datasets like PEMS03, PEMS04, and PEMS07, outperforming other models including TIME-LLM, TimeMixer, and PatchTST. On the PEMS08 dataset, LLM-Mixer delivers superior results compared to iTransformer and DLinear, emphasizing its effectiveness in capturing essential temporal dynamics for short-horizon forecasting tasks.
Finally, for univariate long-term forecasting, LLM-Mixer achieves the lowest MSE and MAE values across datasets on the ETT benchmark, consistently outperforming methods such as Linear, NLinear, and FEDformer.
Thanks for reading this article, feel free to follow me on X platforms or LinkedIn. I have used Nouswise.com to write this article, you can find the original paper of this article and millions of other papers on it. (caution: you didnβt read an AI generated text 🔥😂)
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI