pith. sign in

arxiv: 2503.24007 · v4 · pith:R2Q3QVVTnew · submitted 2025-03-31 · 💻 cs.LG · cs.AI

CITRAS: Covariate-Informed Transformer for Time Series Forecasting

classification 💻 cs.LG cs.AI
keywords covariatesdependenciesforecastingknownvariablesattentioncitrascross-variate
0
0 comments X
read the original abstract

In time series forecasting, covariates represent external factors that influence target variables. Some covariates are observable only in the past (observed covariates, such as recorded weather data), while others are known in advance (known covariates, such as calendar events or discount schedules). Although covariates have the potential to enhance forecasting performance, most deep learning-based forecasting models struggle to address the length discrepancy between variables caused by the future portion of known covariates and fail to leverage them flexibly. Moreover, capturing dependencies between target variables and covariates is non-trivial, as models must accurately reflect the local impact of covariates while simultaneously modeling global cross-variate dependencies. To address these challenges, we propose CITRAS, a decoder-only Transformer that flexibly integrates multiple target variables, observed covariates, and known covariates. While preserving strong autoregressive modeling capabilities, CITRAS introduces two novel mechanisms within patch-wise cross-variate attention: Key-Value (KV) Shift and Attention Score Smoothing. KV Shift seamlessly incorporates the future portion of known covariates into the forecasting process by aligning them with target variables based on their concurrent dependencies. Attention Score Smoothing refines locally accurate patch-wise cross-variate dependencies into global variate-level dependencies by smoothing the historical attention scores. Experimentally, CITRAS demonstrates strong performance across a wide range of real-world datasets in both covariate-informed and multivariate settings, showcasing its versatile ability to leverage cross-variate and cross-time dependencies for improved forecasting accuracy.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TiRex-2: Generalizing TiRex to Multivariate Data and Streaming

    cs.LG 2026-07 unverdicted novelty 6.0

    TiRex-2 is a recurrent xLSTM time series foundation model for multivariate forecasting with future covariates and constant-cost streaming that reports SOTA zero-shot results on GIFT-Eval and fev-bench.

  2. CITRAS-FM: Tiny Time Series Foundation Model for Covariate-Informed Zero-Shot Forecasting

    cs.LG 2026-06 unverdicted novelty 4.0

    CITRAS-FM is a 7M-param decoder-only Transformer TSFM with Shifted Attention and CovSynth synthetic covariate pretraining that claims SOTA zero-shot accuracy among sub-10M models on fev-bench with sub-0.1s CPU inference.