Are Language Models Actually Useful for Time Series Forecasting?
Author(s): Reza Yazdanfar
Originally published on Towards AI.
Time Series
Time series is one of the most challenging lines of work in machine learning, and this has made researchers less reluctant to work on it. However, solving time series tasks like anomaly detection, time series forecasting, β¦ are vital in a wide variety of industries and could save tons of money.
What happened in Language processing?
The laws of scaling, initiated by OpenAI, showed that models can generalize better on more raw data, and the result was ChatGPT. Revolutionary!! Since then, LLMs captured the attention of all, politicians to researchers.
What is going on now?
Since then researchers have been trying to use LLMs for time series! I mean it makes sense to an extent because both language data and time series are sequential, and researchers thought if the llms could generalize well on language data, then probably it can be for time series.
There are a bunch of cool works get done about it, you can read more here, here, here, here, and here.
The question is βHow many LLMs are really useful for time series tasks?β
Iβd argue some works are showing promising future for time series such as time series reasoning and social understanding (agents) that used LLMs to achieve what they intended.
Time series reasoning:
Using Large Language Models (LLMs) for time series reasoning can enhance time series reasoning by integrating three key forms of analytical tasks: Etiological Reasoning, Question Answering, and Context-Aided Forecasting.
- Etiological Reasoning involves hypothesizing potential causes behind observed time series patterns, enabling models to identify scenarios that most likely generated given time series data.
- Question Answering enables models to interpret and respond to factual queries about time series, such as identifying trends or making counterfactual inferences about changes in the data.
- Context-Aided Forecasting allows models to leverage additional textual information to enhance their predictions about future data points, integrating relevant context to improve forecast accuracy.
However, current LLMs demonstrate limited proficiency in these tasks, performing marginally above random on etiological and question-answering tasks and showing modest improvements in context-aided forecasting.
Social understanding:
Using Large Language Models (LLMs) for time series analysis can significantly enhance social understanding by enabling agents to systematically analyze and predict societal trends and behaviors. LLM-based agents utilize real-world time series data from various domains such as finance, economics, polls, and search trends to approximate the hidden world state of society. This approximation aids in the formulation and validation of hypotheses about societal behaviors by correlating time-series data with other information sources like news and social media.
By including these diverse data streams, LLMs can provide deep insights into multi-faceted and dynamic societal issues, facilitating complex and hybrid reasoning that holds both logical and numerical analyses.
Moreover, the hyper portfolio task within SocioDojo allows these agents to make investment decisions based on their understanding of societal dynamics, which serves as a proxy to measure their social comprehension and decision-making capabilities.
This method ensures that the agents are not merely performing historical data fitting but are actively engaging with and adapting to the continuous flow of real-world data, making their analyses and predictions relevant and applicable in real-life scenarios.
Pretty mind blowing, ainβt it?
However, these new models do not use the natural reasoning abilities of pretrained LMs when it comes to time series.
Do LLMs really help time series tasks?
A new study just showed that if we replace this language with attention layers, the performance will not change dramatically. Even if they get removed completely, the performance gets better. This even improves both training and inference speed by up to three orders of magnitude.
The researchers chose three ablations: deleting the llm-component or replacing it. The three modifications are as follow:
- W/O LLM (Figure 1 (b)). We remove the language model entirely, instead passing the input tokens directly to the reference methodβs final layer.
- LLM2Attn (Figure 1 (c)). We replace the language model with a single randomly-initialized multi-head attention layer.
- LLM2Trsf (Figure 1 (d)). We replace the language model with a single randomly-initialized transformer block.
The first left one (Figure (a)) is the model with LLM, as the baseline here.
Datasets
The datasets are mainly the benchmark datasets in all other time series research: ETT, Illness, Weather, Traffic, Electricity, Exchange Rate, Covid Deaths, Taxi (30 min), NN5 (Daily) and FRED-MD.
Results
As you can see the ablations are superior to Time-LLM in all cases, LLaTA in 22 out of 26 and OneFitsAll in 19 out 26 cases. The metrics used here are MAE and MSE, mean-absolute-error and mean-square-error, respectively.
It can be concluded that LLMs donβt improve the performance on time series forecasting tasks in a meaningful way.
Now letβs take a look at the computation:
Time-LLM, OneFitsAll, and LLaTA take, on average, 28.2, 2.3 and 1.2 times longer than the modified models. That says, the trade off from the computation of LLMs for time series doesnβt not worth it.
Now, we can have a look at whether pretraining with language datasets could result in better time series forecasting or not?
The research took four different combinations: Pretrain + Finetune, Random Initialization + Finetune, Pretrain + No Finetuning and Random Initialization + No Finetuning.
As you can see, language knowledge offers very limited improvement for forecasting. However, βPretrain + No Finetuningβ and the baseline βRandom Initialization + No Finetuningβ performed the best 5 times and 0 times, respectively, insinuating that Language knowledge does not help during the finetuning process.
In this experiment, three types of shuffling are used: shuffling the entire sequence randomly (βsf-allβ), shuffling only the first half of the sequence (βsf-halfβ), and swapping the first and second halves of the sequence (βex-halfβ).
As the results show that LLM-based models are no more vulnerable to input shuffling than their ablations.
Conclusion
This research showed that itβs better to leave traditional time series forecasting to what theyβre used to, instead of trying to use Large Language models for time series tasks.
It doesnβt mean not to do anything; there are new things that could be interesting to pursue in the intersection of Time Series and Large Language Models.
Iβm working on a new search engine, nouswise, would love if you check it out and let me know your thoughts. You can contact me through any social media, X platform or LinkedIn.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI