MarsTSC is a VLM-based agentic reasoning framework with a self-evolving knowledge bank and Generator-Reflector-Modifier roles that achieves better few-shot multimodal time series classification than baselines on 12 benchmarks.
hub
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
25 Pith papers cite this work. Polarity classification is still indexing.
abstract
We propose an efficient design of Transformer-based models for multivariate time series forecasting and self-supervised representation learning. It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer; (ii) channel-independence where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series. Patching design naturally has three-fold benefit: local semantic information is retained in the embedding; computation and memory usage of the attention maps are quadratically reduced given the same look-back window; and the model can attend longer history. Our channel-independent patch time series Transformer (PatchTST) can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models. We also apply our model to self-supervised pre-training tasks and attain excellent fine-tuning performance, which outperforms supervised training on large datasets. Transferring of masked pre-trained representation on one dataset to others also produces SOTA forecasting accuracy. Code is available at: https://github.com/yuqinie98/PatchTST.
hub tools
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
FactoryBench reveals that frontier LLMs achieve under 50% on structured causal questions and under 18% on decision-making in industrial robotic telemetry.
MELO aggregates base predictors and their multi-scale EWLS adaptations using MLpol to achieve oracle inequalities against best fixed and time-varying predictors in non-stationary settings.
Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.
FeDPM learns and aligns local discrete prototypical memories across domains to create a unified discrete latent space for LLM-based time series foundation models in a federated setting.
CalM uses a discrete tokenizer and dual-axis autoregressive transformer pretrained self-supervised on calcium traces to outperform specialized baselines on population dynamics forecasting and adapt to superior behavior decoding.
MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.
ST-PT turns transformers into explicit factor graphs for time series, enabling structural injection of symbolic priors, per-sample conditional generation, and principled latent autoregressive forecasting via MFVI iterations.
CAARL decomposes co-evolving time series into autoregressive segments, builds a temporal dependency graph, serializes it into a narrative, and uses LLMs for interpretable forecasting via chain-of-thought reasoning.
M3R improves localized rainfall nowcasting by using weather station time series as queries in multimodal attention to selectively extract precipitation patterns from radar imagery.
TS2TC combines cross-temporal fusion generative anchor pretraining with dual-process transfer to achieve 2.49% lower RMSE than prior methods on PPG parameter estimation using only 10% labeled data.
AgriPriceBD dataset of 1779 daily prices released; naive persistence outperforms deep models like Informer and Time2Vec-Transformer on heterogeneous Bangladeshi commodity series with statistical validation.
Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.
Temporal Operator Attention augments softmax attention with learnable sequence-space operators for signed temporal mixing and uses stochastic regularization to enable practical training, yielding consistent gains on time series benchmarks.
Mela is a Transformer variant with a dual-frequency Hierarchical Memory Module and MemStack that performs test-time memory consolidation, outperforming baselines on long contexts.
BG-CFQS provides risk-aware quantile-based forecasting for Starlink throughput that meets overestimation budgets and reduces positive errors compared to other feasible methods.
TSNN matches time series entries to a training-derived memory bank to forecast traffic without any trainable parameters and achieves competitive accuracy on four real-world datasets.
A self-supervised method learns a fixed set of disentangled fingerprint tokens from medical time series by combining reconstruction loss with a total coding rate diversity penalty, framed as a disentangled rate-distortion problem.
MedMamba introduces a principle-guided bidirectional multi-scale Mamba model that outperforms prior methods on EEG, ECG, and activity classification benchmarks while delivering 4.6x inference speedup.
The survey organizes foundation models for sensor-based HAR into a lifecycle taxonomy and identifies three trajectories: HAR-specific models from scratch, adaptation of general time-series models, and integration with large language models.
iAmTime is a hierarchical transformer-based time series foundation model that uses semantic tokens and instruction-conditioned prompts to infer tasks from demonstrations, achieving improved zero-shot performance on forecasting benchmarks.
A dataset of 204k skill mentions from Mexican auto job postings yields 274 green skills whose demand is best forecasted by transformer models like FEDformer, with current demand focused on operational sustainability and fastest growth in renewables, recycling, and hydrogen.
A hybrid deep learning model with physics regularization and SHAP analysis achieves 1.18% MAPE on ERCOT load data and up to 40.5% better performance on extreme events than its individual branches.
The paper benchmarks foundation models like TimesFM and Chronos against baselines on eight forecasting capabilities for power system time series.
citing papers explorer
-
Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning
MarsTSC is a VLM-based agentic reasoning framework with a self-evolving knowledge bank and Generator-Reflector-Modifier roles that achieves better few-shot multimodal time series classification than baselines on 12 benchmarks.
-
FactoryBench: Evaluating Industrial Machine Understanding
FactoryBench reveals that frontier LLMs achieve under 50% on structured causal questions and under 18% on decision-making in industrial robotic telemetry.
-
Hedging Memory Horizons for Non-Stationary Prediction via Online Aggregation
MELO aggregates base predictors and their multi-scale EWLS adaptations using MLpol to achieve oracle inequalities against best fixed and time-varying predictors in non-stationary settings.
-
Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters
Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.
-
Discrete Prototypical Memories for Federated Time Series Foundation Models
FeDPM learns and aligns local discrete prototypical memories across domains to create a unified discrete latent space for LLM-based time series foundation models in a federated setting.
-
Self-Supervised Foundation Model for Calcium-imaging Population Dynamics
CalM uses a discrete tokenizer and dual-axis autoregressive transformer pretrained self-supervised on calcium traces to outperform specialized baselines on population dynamics forecasting and adapt to superior behavior decoding.
-
What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies
MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.
-
Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework
ST-PT turns transformers into explicit factor graphs for time series, enabling structural injection of symbolic priors, per-sample conditional generation, and principled latent autoregressive forecasting via MFVI iterations.
-
CAARL: In-Context Learning for Interpretable Co-Evolving Time Series Forecasting
CAARL decomposes co-evolving time series into autoregressive segments, builds a temporal dependency graph, serializes it into a narrative, and uses LLMs for interpretable forecasting via chain-of-thought reasoning.
-
M3R: Localized Rainfall Nowcasting with Meteorology-Informed MultiModal Attention
M3R improves localized rainfall nowcasting by using weather station time series as queries in multimodal attention to selectively extract precipitation patterns from radar imagery.
-
A General Framework for Generative Self-supervised Learning in Non-invasive Estimation of Physiological Parameters Using Photoplethysmography
TS2TC combines cross-temporal fusion generative anchor pretraining with dual-process transfer to achieve 2.49% lower RMSE than prior methods on PPG parameter estimation using only 10% labeled data.
-
A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset
AgriPriceBD dataset of 1779 daily prices released; naive persistence outperforms deep models like Informer and Time2Vec-Transformer on heterogeneous Bangladeshi commodity series with statistical validation.
-
Titans: Learning to Memorize at Test Time
Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.
-
Beyond Similarity: Temporal Operator Attention for Time Series Analysis
Temporal Operator Attention augments softmax attention with learnable sequence-space operators for signed temporal mixing and uses stochastic regularization to enable practical training, yielding consistent gains on time series benchmarks.
-
Mela: Test-Time Memory Consolidation based on Transformation Hypothesis
Mela is a Transformer variant with a dual-frequency Hierarchical Memory Module and MemStack that performs test-time memory consolidation, outperforming baselines on long contexts.
-
Risk-Aware Safe Throughput Forecasting for Starlink Networks
BG-CFQS provides risk-aware quantile-based forecasting for Starlink throughput that meets overestimation budgets and reduces positive errors compared to other feasible methods.
-
TSNN: A Non-parametric and Interpretable Framework for Traffic Time Series Forecasting
TSNN matches time series entries to a training-derived memory bank to forecast traffic without any trainable parameters and achieves competitive accuracy on four real-world datasets.
-
Learning Fingerprints for Medical Time Series with Redundancy-Constrained Information Maximization
A self-supervised method learns a fixed set of disentangled fingerprint tokens from medical time series by combining reconstruction loss with a total coding rate diversity penalty, framed as a disentangled rate-distortion problem.
-
MedMamba: Recasting Mamba for Medical Time Series Classification
MedMamba introduces a principle-guided bidirectional multi-scale Mamba model that outperforms prior methods on EEG, ECG, and activity classification benchmarks while delivering 4.6x inference speedup.
-
Foundation Models Defining A New Era In Sensor-based Human Activity Recognition: A Survey And Outlook
The survey organizes foundation models for sensor-based HAR into a lifecycle taxonomy and identifies three trajectories: HAR-specific models from scratch, adaptation of general time-series models, and integration with large language models.
-
A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks
iAmTime is a hierarchical transformer-based time series foundation model that uses semantic tokens and instruction-conditioned prompts to infer tasks from demonstrations, achieving improved zero-shot performance on forecasting benchmarks.
-
Forecasting Green Skill Demand in the Automotive Industry: Evidence from Online Job Postings
A dataset of 204k skill mentions from Mexican auto job postings yields 274 green skills whose demand is best forecasted by transformer models like FEDformer, with current demand focused on operational sustainability and fastest growth in renewables, recycling, and hydrogen.
-
Interpretable Physics-Informed Load Forecasting for U.S. Grid Resilience: SHAP-Guided Ensemble Validation in Hybrid Deep Learning Under Extreme Weather
A hybrid deep learning model with physics regularization and SHAP analysis achieves 1.18% MAPE on ERCOT load data and up to 40.5% better performance on extreme events than its individual branches.
-
Empirical Assessment of Time-Series Foundation Models For Power System Forecasting Applications
The paper benchmarks foundation models like TimesFM and Chronos against baselines on eight forecasting capabilities for power system time series.
-
The CTLNet for Shanghai Composite Index Prediction
CTLNet hybrid model outperforms listed baselines on Shanghai Composite Index prediction task.