hub

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam · 2022 · cs.LG · arXiv 2211.14730

25 Pith papers cite this work. Polarity classification is still indexing.

25 Pith papers citing it

open full Pith review browse 25 citing papers arXiv PDF

abstract

We propose an efficient design of Transformer-based models for multivariate time series forecasting and self-supervised representation learning. It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer; (ii) channel-independence where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series. Patching design naturally has three-fold benefit: local semantic information is retained in the embedding; computation and memory usage of the attention maps are quadratically reduced given the same look-back window; and the model can attend longer history. Our channel-independent patch time series Transformer (PatchTST) can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models. We also apply our model to self-supervised pre-training tasks and attain excellent fine-tuning performance, which outperforms supervised training on large datasets. Transferring of masked pre-trained representation on one dataset to others also produces SOTA forecasting accuracy. Code is available at: https://github.com/yuqinie98/PatchTST.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

MarsTSC is a VLM-based agentic reasoning framework with a self-evolving knowledge bank and Generator-Reflector-Modifier roles that achieves better few-shot multimodal time series classification than baselines on 12 benchmarks.

FactoryBench: Evaluating Industrial Machine Understanding

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

FactoryBench reveals that frontier LLMs achieve under 50% on structured causal questions and under 18% on decision-making in industrial robotic telemetry.

Hedging Memory Horizons for Non-Stationary Prediction via Online Aggregation

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

MELO aggregates base predictors and their multi-scale EWLS adaptations using MLpol to achieve oracle inequalities against best fixed and time-varying predictors in non-stationary settings.

Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters

cs.LG · 2026-05-07 · accept · novelty 7.0

Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.

Discrete Prototypical Memories for Federated Time Series Foundation Models

cs.LG · 2026-04-06 · unverdicted · novelty 7.0

FeDPM learns and aligns local discrete prototypical memories across domains to create a unified discrete latent space for LLM-based time series foundation models in a federated setting.

Self-Supervised Foundation Model for Calcium-imaging Population Dynamics

q-bio.QM · 2026-04-03 · unverdicted · novelty 7.0

CalM uses a discrete tokenizer and dual-axis autoregressive transformer pretrained self-supervised on calcium traces to outperform specialized baselines on population dynamics forecasting and adapt to superior behavior decoding.

What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.

Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework

cs.LG · 2026-04-29 · unverdicted · novelty 6.0

ST-PT turns transformers into explicit factor graphs for time series, enabling structural injection of symbolic priors, per-sample conditional generation, and principled latent autoregressive forecasting via MFVI iterations.

CAARL: In-Context Learning for Interpretable Co-Evolving Time Series Forecasting

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

CAARL decomposes co-evolving time series into autoregressive segments, builds a temporal dependency graph, serializes it into a narrative, and uses LLMs for interpretable forecasting via chain-of-thought reasoning.

M3R: Localized Rainfall Nowcasting with Meteorology-Informed MultiModal Attention

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

M3R improves localized rainfall nowcasting by using weather station time series as queries in multimodal attention to selectively extract precipitation patterns from radar imagery.

A General Framework for Generative Self-supervised Learning in Non-invasive Estimation of Physiological Parameters Using Photoplethysmography

eess.SP · 2026-04-03 · unverdicted · novelty 6.0

TS2TC combines cross-temporal fusion generative anchor pretraining with dual-process transfer to achieve 2.49% lower RMSE than prior methods on PPG parameter estimation using only 10% labeled data.

A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset

cs.LG · 2026-03-27 · accept · novelty 6.0

AgriPriceBD dataset of 1779 daily prices released; naive persistence outperforms deep models like Informer and Time2Vec-Transformer on heterogeneous Bangladeshi commodity series with statistical validation.

Titans: Learning to Memorize at Test Time

cs.LG · 2024-12-31 · unverdicted · novelty 6.0

Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.

cs.LG · 2026-05-11 · unverdicted · novelty 5.0

Temporal Operator Attention augments softmax attention with learnable sequence-space operators for signed temporal mixing and uses stochastic regularization to enable practical training, yielding consistent gains on time series benchmarks.

Mela: Test-Time Memory Consolidation based on Transformation Hypothesis

cs.CL · 2026-05-11 · unverdicted · novelty 5.0

Mela is a Transformer variant with a dual-frequency Hierarchical Memory Module and MemStack that performs test-time memory consolidation, outperforming baselines on long contexts.

Risk-Aware Safe Throughput Forecasting for Starlink Networks

eess.SY · 2026-05-10 · unverdicted · novelty 5.0

BG-CFQS provides risk-aware quantile-based forecasting for Starlink throughput that meets overestimation budgets and reduces positive errors compared to other feasible methods.

TSNN: A Non-parametric and Interpretable Framework for Traffic Time Series Forecasting

cs.LG · 2026-05-09 · unverdicted · novelty 5.0

TSNN matches time series entries to a training-derived memory bank to forecast traffic without any trainable parameters and achieves competitive accuracy on four real-world datasets.

Learning Fingerprints for Medical Time Series with Redundancy-Constrained Information Maximization

cs.LG · 2026-04-30 · unverdicted · novelty 5.0

A self-supervised method learns a fixed set of disentangled fingerprint tokens from medical time series by combining reconstruction loss with a total coding rate diversity penalty, framed as a disentangled rate-distortion problem.

MedMamba: Recasting Mamba for Medical Time Series Classification

eess.SP · 2026-04-17 · unverdicted · novelty 5.0

MedMamba introduces a principle-guided bidirectional multi-scale Mamba model that outperforms prior methods on EEG, ECG, and activity classification benchmarks while delivering 4.6x inference speedup.

Foundation Models Defining A New Era In Sensor-based Human Activity Recognition: A Survey And Outlook

eess.SP · 2026-04-03 · accept · novelty 5.0

The survey organizes foundation models for sensor-based HAR into a lifecycle taxonomy and identifies three trajectories: HAR-specific models from scratch, adaptation of general time-series models, and integration with large language models.

A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks

cs.LG · 2026-03-23 · unverdicted · novelty 5.0

iAmTime is a hierarchical transformer-based time series foundation model that uses semantic tokens and instruction-conditioned prompts to infer tasks from demonstrations, achieving improved zero-shot performance on forecasting benchmarks.

Forecasting Green Skill Demand in the Automotive Industry: Evidence from Online Job Postings

cs.LG · 2026-05-06 · unverdicted · novelty 4.0

A dataset of 204k skill mentions from Mexican auto job postings yields 274 green skills whose demand is best forecasted by transformer models like FEDformer, with current demand focused on operational sustainability and fastest growth in renewables, recycling, and hydrogen.

Interpretable Physics-Informed Load Forecasting for U.S. Grid Resilience: SHAP-Guided Ensemble Validation in Hybrid Deep Learning Under Extreme Weather

cs.LG · 2026-04-26 · unverdicted · novelty 4.0

A hybrid deep learning model with physics regularization and SHAP analysis achieves 1.18% MAPE on ERCOT load data and up to 40.5% better performance on extreme events than its individual branches.

Empirical Assessment of Time-Series Foundation Models For Power System Forecasting Applications

eess.SY · 2026-04-23 · unverdicted · novelty 4.0

The paper benchmarks foundation models like TimesFM and Chronos against baselines on eight forecasting capabilities for power system time series.

citing papers explorer

Showing 25 of 25 citing papers.

Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning cs.AI · 2026-05-10 · unverdicted · none · ref 35 · internal anchor
MarsTSC is a VLM-based agentic reasoning framework with a self-evolving knowledge bank and Generator-Reflector-Modifier roles that achieves better few-shot multimodal time series classification than baselines on 12 benchmarks.
FactoryBench: Evaluating Industrial Machine Understanding cs.AI · 2026-05-08 · unverdicted · none · ref 16 · internal anchor
FactoryBench reveals that frontier LLMs achieve under 50% on structured causal questions and under 18% on decision-making in industrial robotic telemetry.
Hedging Memory Horizons for Non-Stationary Prediction via Online Aggregation cs.LG · 2026-05-07 · unverdicted · none · ref 13 · internal anchor
MELO aggregates base predictors and their multi-scale EWLS adaptations using MLpol to achieve oracle inequalities against best fixed and time-varying predictors in non-stationary settings.
Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters cs.LG · 2026-05-07 · accept · none · ref 292 · internal anchor
Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.
Discrete Prototypical Memories for Federated Time Series Foundation Models cs.LG · 2026-04-06 · unverdicted · none · ref 17 · internal anchor
FeDPM learns and aligns local discrete prototypical memories across domains to create a unified discrete latent space for LLM-based time series foundation models in a federated setting.
Self-Supervised Foundation Model for Calcium-imaging Population Dynamics q-bio.QM · 2026-04-03 · unverdicted · none · ref 10 · internal anchor
CalM uses a discrete tokenizer and dual-axis autoregressive transformer pretrained self-supervised on calcium traces to outperform specialized baselines on population dynamics forecasting and adapt to superior behavior decoding.
What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies cs.LG · 2026-05-08 · unverdicted · none · ref 11 · internal anchor
MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.
Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework cs.LG · 2026-04-29 · unverdicted · none · ref 7 · internal anchor
ST-PT turns transformers into explicit factor graphs for time series, enabling structural injection of symbolic priors, per-sample conditional generation, and principled latent autoregressive forecasting via MFVI iterations.
CAARL: In-Context Learning for Interpretable Co-Evolving Time Series Forecasting cs.LG · 2026-04-20 · unverdicted · none · ref 48 · internal anchor
CAARL decomposes co-evolving time series into autoregressive segments, builds a temporal dependency graph, serializes it into a narrative, and uses LLMs for interpretable forecasting via chain-of-thought reasoning.
M3R: Localized Rainfall Nowcasting with Meteorology-Informed MultiModal Attention cs.LG · 2026-04-15 · unverdicted · none · ref 9 · internal anchor
M3R improves localized rainfall nowcasting by using weather station time series as queries in multimodal attention to selectively extract precipitation patterns from radar imagery.
A General Framework for Generative Self-supervised Learning in Non-invasive Estimation of Physiological Parameters Using Photoplethysmography eess.SP · 2026-04-03 · unverdicted · none · ref 5 · internal anchor
TS2TC combines cross-temporal fusion generative anchor pretraining with dual-process transfer to achieve 2.49% lower RMSE than prior methods on PPG parameter estimation using only 10% labeled data.
A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset cs.LG · 2026-03-27 · accept · none · ref 22 · internal anchor
AgriPriceBD dataset of 1779 daily prices released; naive persistence outperforms deep models like Informer and Time2Vec-Transformer on heterogeneous Bangladeshi commodity series with statistical validation.
Titans: Learning to Memorize at Test Time cs.LG · 2024-12-31 · unverdicted · none · ref 78 · internal anchor
Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.
Beyond Similarity: Temporal Operator Attention for Time Series Analysis cs.LG · 2026-05-11 · unverdicted · none · ref 24 · internal anchor
Temporal Operator Attention augments softmax attention with learnable sequence-space operators for signed temporal mixing and uses stochastic regularization to enable practical training, yielding consistent gains on time series benchmarks.
Mela: Test-Time Memory Consolidation based on Transformation Hypothesis cs.CL · 2026-05-11 · unverdicted · none · ref 13 · internal anchor
Mela is a Transformer variant with a dual-frequency Hierarchical Memory Module and MemStack that performs test-time memory consolidation, outperforming baselines on long contexts.
Risk-Aware Safe Throughput Forecasting for Starlink Networks eess.SY · 2026-05-10 · unverdicted · none · ref 25 · internal anchor
BG-CFQS provides risk-aware quantile-based forecasting for Starlink throughput that meets overestimation budgets and reduces positive errors compared to other feasible methods.
TSNN: A Non-parametric and Interpretable Framework for Traffic Time Series Forecasting cs.LG · 2026-05-09 · unverdicted · none · ref 34 · internal anchor
TSNN matches time series entries to a training-derived memory bank to forecast traffic without any trainable parameters and achieves competitive accuracy on four real-world datasets.
Learning Fingerprints for Medical Time Series with Redundancy-Constrained Information Maximization cs.LG · 2026-04-30 · unverdicted · none · ref 28 · internal anchor
A self-supervised method learns a fixed set of disentangled fingerprint tokens from medical time series by combining reconstruction loss with a total coding rate diversity penalty, framed as a disentangled rate-distortion problem.
MedMamba: Recasting Mamba for Medical Time Series Classification eess.SP · 2026-04-17 · unverdicted · none · ref 31 · internal anchor
MedMamba introduces a principle-guided bidirectional multi-scale Mamba model that outperforms prior methods on EEG, ECG, and activity classification benchmarks while delivering 4.6x inference speedup.
Foundation Models Defining A New Era In Sensor-based Human Activity Recognition: A Survey And Outlook eess.SP · 2026-04-03 · accept · none · ref 111 · internal anchor
The survey organizes foundation models for sensor-based HAR into a lifecycle taxonomy and identifies three trajectories: HAR-specific models from scratch, adaptation of general time-series models, and integration with large language models.
A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks cs.LG · 2026-03-23 · unverdicted · none · ref 14 · internal anchor
iAmTime is a hierarchical transformer-based time series foundation model that uses semantic tokens and instruction-conditioned prompts to infer tasks from demonstrations, achieving improved zero-shot performance on forecasting benchmarks.
Forecasting Green Skill Demand in the Automotive Industry: Evidence from Online Job Postings cs.LG · 2026-05-06 · unverdicted · none · ref 51 · internal anchor
A dataset of 204k skill mentions from Mexican auto job postings yields 274 green skills whose demand is best forecasted by transformer models like FEDformer, with current demand focused on operational sustainability and fastest growth in renewables, recycling, and hydrogen.
Interpretable Physics-Informed Load Forecasting for U.S. Grid Resilience: SHAP-Guided Ensemble Validation in Hybrid Deep Learning Under Extreme Weather cs.LG · 2026-04-26 · unverdicted · none · ref 18 · internal anchor
A hybrid deep learning model with physics regularization and SHAP analysis achieves 1.18% MAPE on ERCOT load data and up to 40.5% better performance on extreme events than its individual branches.
Empirical Assessment of Time-Series Foundation Models For Power System Forecasting Applications eess.SY · 2026-04-23 · unverdicted · none · ref 13 · internal anchor
The paper benchmarks foundation models like TimesFM and Chronos against baselines on eight forecasting capabilities for power system time series.
The CTLNet for Shanghai Composite Index Prediction q-fin.ST · 2026-04-18 · reject · none · ref 7 · internal anchor
CTLNet hybrid model outperforms listed baselines on Shanghai Composite Index prediction task.

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer