Google Just Published a Decoder-Only Model for Time-Series Forecasting With Zero-Shot Learning!!

Last Updated on February 20, 2024 by Editorial Team

Author(s): Saleha Shaikh

Originally published on Towards AI.

Time series forecasting drives most of the business today, and at the moment, it has been predicted using conventional ML/NN models, but Google just published a TimesFM model, which can make time series predictions with zero-shot learning!!

Time series prediction holds importance for businesses as it enables them to make decisions by utilizing past information and future projections. By analyzing data, companies can enhance their planning, optimize resource allocation, and mitigate issues.

In the finance sector, it helps in making investment decisions and avoiding challenges. Supply chain management benefits from predictions in determining inventory levels to minimize costs. In terms of marketing and sales, it assists in identifying the timing for promotions to attract customers and maximize profits. Ultimately time series prediction empowers businesses to make choices. Maintain competitiveness in an ever-evolving world.

Time Series Prediction

Time series prediction is very important for businesses as it enables them to make decisions by leveraging historical data and future projections. By analyzing trends, companies can enhance their planning, optimize resource allocation, and mitigate challenges.

In the finance sector, it empowers investors to make choices and steer clear of pitfalls. In supply chain management it helps in maintaining inventory levels to minimize costs. In marketing and sales, it assists in determining the timing for promotions to attract customers and boost revenue. Overall, time series prediction serves as a tool for businesses to enhance decision-making capabilities and remain competitive in an evolving world.

Conventional way of time series prediction

At present, for time series prediction, there are different approaches that are used in industry and academic research. The method chosen usually depends on the characteristics of the data and the specific prediction task at hand. In today's market, there are cutting-edge models that are widely used. Let’s revise some state-of-the-art models:

1. ARIMA (AutoRegressive Integrated Moving Average)

ARIMA is a classical statistical model. It combines autoregressive (AR) and moving average (MA). It yields good performance for time series data with a strong seasonality or trend.

2. LSTM (Long Short-Term Memory) Networks

LSTM is a type of recurrent neural network (RNN). LSTM is a deep learning model designed to capture long-term dependencies in sequential data.

3. Prophet

FBProphet was developed by Facebook. Prophet is a forecasting tool designed for time series data that exhibits patterns on different time scales. It handles missing data and outliers well.

4. XGBoost (Extreme Gradient Boosting)

XGBoost is an ensemble learning algorithm. It used the predictions from multiple weak models and combined the results to generate the final prediction. It is very good in finding complex relationships in the time series data.

It depends on which models to use based on the type of data. However, it is worth noticing that while using these models, a considerable amount of time is spent on finalizing these models. To solve this problem and to make use of recent advancements in LLMs (generative AI), Google has launched a new model, “TimesFM,” which is a decoder-only foundation model that can perform time series prediction with zero-shot learning!!!

Google TimesFM: A decoder-only foundation model for time-series forecasting

TimesFM is designed as a pre-trained forecasting model with just 200 million parameters, leveraging a large time-series corpus of 100 billion real-world time-points.

Unlike large language models (LLMs) that are typically trained in a decoder-only fashion for natural language processing, TimesFM adapts the decoder-only architecture to time-series forecasting. The model employs stacked transformer layers treating contiguous time-points as tokens. A unique feature is the use of a multilayer perceptron block with residual connections to convert time-series patches into tokens. During training, the model learns to forecast subsequent time-points, with the output patch length potentially exceeding the input patch length. This architecture allows for efficient forecasting with fewer generation steps during inference.

To enhance the model’s performance, a combination of synthetic and real-world data is used for pretraining. Synthetic data aids in grasping fundamental temporal patterns, while a curated corpus of 100 billion time-points, including data from Google Trends and Wikipedia Pageviews, provides real-world flavor, helping the model generalize across diverse domains.

Model architecture

Transformers has shown promising results in adapting to a different context length. Model breaks down the time-series into several patches during training.A patch of a time series is a natural analog for a token in language models and has been shown to improve performance. Moreover, this improves inference speed as the number of tokens being fed into the transformer is reduced by a factor of the patch length.

TimesFM is a decoder-only model, meaning that for a given sequence of input patches, the model predicts the next patch as a function of all past patches. This model has a preprocessing unit in the form of input layers. It converts time series into input tokens. First, break the input into contiguous non-overlapping patches. Then each patch is processed by a Residual Block into a vector of a model dimension size. A binary padding mask is supplied along with input layer. The Residual Block is essentially a Multi-layer Perceptron (MLP) block with one hidden layer with a skip connection.

This model has a stack of transformer layers which are placed on top of each other. Each of these layers has the standard multi-head self-attention (SA) followed by a feedforward network (FFN). The model uses causal attention. That is, each output token can only attend to input tokens that come before it in the sequence (including the corresponding input token). This can be described by the equation:

o_j = StackedTransformer((t_1, m˙ 1), · · · ,(tj , m˙ j )), ∀j ∈ [N]

TimesFM model architecture: Image taken from Google AI’s research blog. Source: https://blog.research.google/2024/02/a-decoder-only-foundation-model-for.html

Performance

To measure the prediction of model, MSE (mean squared error) metric is used. It quantifies the closeness between actual and predicted values.

TimesFM is evaluated through zero-shot testing on various time-series benchmarks, showcasing superior performance compared to statistical methods like ARIMA and DL models like DeepAR and PatchTST. The model’s effectiveness is demonstrated on the Monash Forecasting Archive and popular benchmarks for long-horizon forecasting, outperforming both supervised and zero-shot approaches.

Conclusion

The research presents a promising decoder-only foundation model for time-series forecasting, exhibiting impressive zero-shot performance across different domains and granularities.

The proposed TimesFM architecture, despite its smaller size, proves to be a powerful tool for accurate and efficient forecasting in diverse applications. The work represents a collaboration across Google Research and Google Cloud, involving several individuals contributing to the development and evaluation of TimesFM.

In my thoughts, this is a promising approach to minimize development time and increase model performance for time series predictions.

Frequently Used, Contextual References

Resources

Publication

Google Just Published a Decoder-Only Model for Time-Series Forecasting With Zero-Shot Learning!!

Author(s): Saleha Shaikh

Time Series Prediction

Conventional way of time series prediction

1. ARIMA (AutoRegressive Integrated Moving Average)

2. LSTM (Long Short-Term Memory) Networks

3. Prophet

4. XGBoost (Extreme Gradient Boosting)

Google TimesFM: A decoder-only foundation model for time-series forecasting

Model architecture

Performance

Conclusion

Related reads and important information

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

I Used ChatGPT to Count My Calories

Resource-Efficient Fine-Tuning of DeepSeek-R1

TAI #138: OpenAI’s o3-Mini and Deep Research: A New Era of Reasoning Powered Agents?

Text Preprocessing for NLP: A Step-by-Step Guide to Clean Raw Text Data

DeepSeek AI — The Future is Here

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Google Just Published a Decoder-Only Model for Time-Series Forecasting With Zero-Shot Learning!!

Author(s): Saleha Shaikh

Time Series Prediction

Conventional way of time series prediction

1. ARIMA (AutoRegressive Integrated Moving Average)

2. LSTM (Long Short-Term Memory) Networks

3. Prophet

4. XGBoost (Extreme Gradient Boosting)

Google TimesFM: A decoder-only foundation model for time-series forecasting

Model architecture

Performance

Conclusion

Related reads and important information

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement