Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.
hub Baseline reference
The UEA multivariate time series classification archive
Baseline reference. 50% of citing Pith papers use this work as a benchmark or comparison.
abstract
In 2002, the UCR time series classification archive was first released with sixteen datasets. It gradually expanded, until 2015 when it increased in size from 45 datasets to 85 datasets. In October 2018 more datasets were added, bringing the total to 128. The new archive contains a wide range of problems, including variable length series, but it still only contains univariate time series classification problems. One of the motivations for introducing the archive was to encourage researchers to perform a more rigorous evaluation of newly proposed time series classification (TSC) algorithms. It has worked: most recent research into TSC uses all 85 datasets to evaluate algorithmic advances. Research into multivariate time series classification, where more than one series are associated with each class label, is in a position where univariate TSC research was a decade ago. Algorithms are evaluated using very few datasets and claims of improvement are not based on statistical comparisons. We aim to address this problem by forming the first iteration of the MTSC archive, to be hosted at the website www.timeseriesclassification.com. Like the univariate archive, this formulation was a collaborative effort between researchers at the University of East Anglia (UEA) and the University of California, Riverside (UCR). The 2018 vintage consists of 30 datasets with a wide range of cases, dimensions and series lengths. For this first iteration of the archive we format all data to be of equal length, include no series with missing data and provide train/test splits.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
A causal DAG prior synthesizes labeled multivariate TSC datasets with temporal cross-modal structure, yielding statistically significant gains when finetuning TabPFN v2.5 on 75 UCR/UEA datasets over unmodified and tabular-only baselines.
TimeSage-MT introduces a multi-turn benchmark for agentic time series reasoning and shows frontier LLMs drop sharply on decision-oriented tasks due to memory and uncertainty failures.
PDFTime reformulates multivariate time series classification as a multi-stage prototype-based decision process, claiming SOTA results on UCR and UEA benchmarks.
INSHAPE discovers instance-specific non-overlapping shapelets, models their temporal dependencies, and aggregates them bottom-up into population-level prototypes for improved accuracy and interpretability in time-series classification.
Looped SSMs with shared parameters across depth match or exceed standard SSMs with more parameters on time series classification, with additional gains from input reshaping techniques.
This survey and benchmark of deep time series models using the released TSLib library finds that models with specific structures perform well only on distinct analysis tasks.
HIVE-COTE 2.0 improves accuracy on time series classification benchmarks by replacing prior ensemble members with TDE and DrCIF and adding an Arsenal of ROCKET classifiers.
GDPD treats partial student features as degraded observations and uses a learned diffusion prior over teacher features to sample restorative long-context targets for improved partial time-series classification.
TIDES reconciles selective SSM expressivity with continuous-time physical discretization by moving input dependence onto the state matrix, enabling native irregular time series handling and achieving SOTA on UEA and Physiome-ODE benchmarks.
RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.
Soft-MSM is a smooth, gradient-enabled version of the context-aware MSM distance for time series alignment that outperforms Soft-DTW alternatives in clustering and nearest-centroid classification.
MARS parallel reservoirs achieve up to 21x training speedups and outperform LRU, S5, and Mamba on long sequence benchmarks while remaining gradient-free and compact.
ShiFT uses deterministic temporal shifts to enforce shift invariance in contrastive learning, achieving state-of-the-art time series classification on six benchmarks plus UCR/UEA archives while cutting training time.
RocketPFN matches the accuracy of the strongest time series classifier HC2 on 92 UCR datasets using a training-free pipeline of Rocket features and TabPFN.
Flash PD-SSM achieves FSA-level expressivity by discretely selecting one matrix from a trainable set of structured sparse transition matrices at each time step while preserving the runtime and memory efficiency of standard structured SSMs.
MarsTSC is a VLM agentic system with generator, reflector, and modifier roles that iteratively refines a knowledge bank to improve few-shot multimodal time series classification and produce human-readable explanations.
AegisTS is a hierarchical agent system with RL that jointly optimizes cleaning order and method selection for MTS quality issues using a dual-stage reward based on cleaning and downstream performance.
DeMa is a dual-path delay-aware Mamba architecture that decomposes MTS into intra-series temporal and inter-series variate paths to achieve SOTA performance with linear complexity on forecasting, imputation, anomaly detection, and classification.
DyWPE generates positional embeddings for time series transformers from the input signal via Discrete Wavelet Transform and outperforms standard positional encodings on ten datasets, especially longer sequences and biomedical signals.
RoMAE applies rotary positional embeddings to masked autoencoders to enable representation learning and interpolation on continuous positional data across irregular time-series, images, and audio without modality-specific modifications.
A multi-stage prototype learning model for multivariate time series classification that matches deep learning accuracy while supplying explicit hierarchical prototype explanations of single- and cross-variable patterns.
S4D state space models correspond exactly to wave propagation and nonlinear wave interactions in a one-dimensional ring oscillator network, with a closed-form operator describing the complete input-output map.
ROMAN converts time series into a shorter multiscale channel representation that lets standard CNN classifiers access scale and coarse-position information explicitly.
citing papers explorer
-
Beyond IID: How General Are Tabular Foundation Models, Really?
Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.
-
A Causal DAG Prior for Synthetic Time-Series Classification Datasets
A causal DAG prior synthesizes labeled multivariate TSC datasets with temporal cross-modal structure, yielding statistically significant gains when finetuning TabPFN v2.5 on 75 UCR/UEA datasets over unmodified and tabular-only baselines.
-
TimeSage-MT: A Multi-Turn Benchmark for Evaluating Agentic Time Series Reasoning
TimeSage-MT introduces a multi-turn benchmark for agentic time series reasoning and shows frontier LLMs drop sharply on decision-oriented tasks due to memory and uncertainty failures.
-
Prototype-Guided Classification Sub-Task Decoupling Framework: Enhancing Generalization and Interpretability for Multivariate Time Series
PDFTime reformulates multivariate time series classification as a multi-stage prototype-based decision process, claiming SOTA results on UCR and UEA benchmarks.
-
INSHAPE: Instance-Level Shapelets for Interpretable Time-Series Classification
INSHAPE discovers instance-specific non-overlapping shapelets, models their temporal dependencies, and aggregates them bottom-up into population-level prototypes for improved accuracy and interpretability in time-series classification.
-
Looped SSMs: Depth-Recurrence and Input Reshaping for Time Series Classification
Looped SSMs with shared parameters across depth match or exceed standard SSMs with more parameters on time series classification, with additional gains from input reshaping techniques.
-
Deep Time Series Models: A Comprehensive Survey and Benchmark
This survey and benchmark of deep time series models using the released TSLib library finds that models with specific structures perform well only on distinct analysis tasks.
-
HIVE-COTE 2.0: a new meta ensemble for time series classification
HIVE-COTE 2.0 improves accuracy on time series classification benchmarks by replacing prior ensemble members with TDE and DrCIF and adding an Arsenal of ROCKET classifiers.
-
Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer
GDPD treats partial student features as degraded observations and uses a learned diffusion prior over teacher features to sample restorative long-context targets for improved partial time-series classification.
-
TIDES: Implicit Time-Awareness in Selective State Space Models
TIDES reconciles selective SSM expressivity with continuous-time physical discretization by moving input dependence onto the state matrix, enabling native irregular time series handling and achieving SOTA on UEA and Physiome-ODE benchmarks.
-
RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy
RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.
-
Soft-MSM: Differentiable Context-Aware Elastic Alignment for Time Series
Soft-MSM is a smooth, gradient-enabled version of the context-aware MSM distance for time series alignment that outperforms Soft-DTW alternatives in clustering and nearest-centroid classification.
-
Scalable Memristive-Friendly Reservoir Computing for Time Series Classification
MARS parallel reservoirs achieve up to 21x training speedups and outperform LRU, S5, and Mamba on long sequence benchmarks while remaining gradient-free and compact.
-
Learning by Shifting: Temporal View Construction for Time Series Contrastive Learning
ShiFT uses deterministic temporal shifts to enforce shift invariance in contrastive learning, achieving state-of-the-art time series classification on six benchmarks plus UCR/UEA archives while cutting training time.
-
RocketPFN: Accurate Time Series Classification via In-Context Learning
RocketPFN matches the accuracy of the strongest time series classifier HC2 on 92 UCR datasets using a training-free pipeline of Rocket features and TabPFN.
-
Flash PD-SSM: Memory-Optimized Structured Sparse State-Space Models
Flash PD-SSM achieves FSA-level expressivity by discretely selecting one matrix from a trainable set of structured sparse transition matrices at each time step while preserving the runtime and memory efficiency of standard structured SSMs.
-
Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning
MarsTSC is a VLM agentic system with generator, reflector, and modifier roles that iteratively refines a knowledge bank to improve few-shot multimodal time series classification and produce human-readable explanations.
-
AegisTS: A Hierarchical Agent System with Reinforcement Learning for Multivariate Time Series Data Cleaning
AegisTS is a hierarchical agent system with RL that jointly optimizes cleaning order and method selection for MTS quality issues using a dual-stage reward based on cleaning and downstream performance.
-
DeMa: Dual-Path Delay-Aware Mamba for Efficient Multivariate Time Series Analysis
DeMa is a dual-path delay-aware Mamba architecture that decomposes MTS into intra-series temporal and inter-series variate paths to achieve SOTA performance with linear complexity on forecasting, imputation, anomaly detection, and classification.
-
DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers
DyWPE generates positional embeddings for time series transformers from the input signal via Discrete Wavelet Transform and outperforms standard positional encodings on ten datasets, especially longer sequences and biomedical signals.
-
Rotary Masked Autoencoders are Versatile Learners
RoMAE applies rotary positional embeddings to masked autoencoders to enable representation learning and interpolation on continuous positional data across irregular time-series, images, and audio without modality-specific modifications.
-
Multi-Stage Prototype Learning for Interpretable Time Series Classification
A multi-stage prototype learning model for multivariate time series classification that matches deep learning accuracy while supplying explicit hierarchical prototype explanations of single- and cross-variable patterns.
-
An explicit operator explains end-to-end computation in the modern neural networks used for sequence and language modeling
S4D state space models correspond exactly to wave propagation and nonlinear wave interactions in a one-dimensional ring oscillator network, with a closed-form operator describing the complete input-output map.
-
ROMAN: A Multiscale Routing Operator for Convolutional Time Series Models
ROMAN converts time series into a shorter multiscale channel representation that lets standard CNN classifiers access scale and coarse-position information explicitly.
-
Efficient Time Series Clustering from Multiscale Reservoir Dynamics with Granular-Ball Anchoring Graph Optimization
MSRGC-Net combines multiscale reservoir computing, granular-ball anchoring graphs, and consensus optimization to deliver efficient time-series clustering that outperforms prior methods on standard benchmarks.
-
AnchorMoE: Interpretable Time Series Classification via Anchor-Routed MoE
AnchorMoE applies an anchor-routed MoE with geometric orthogonality and an uncertainty gate to deliver ante-hoc interpretable MTSC via additive decomposition over input segments.
-
MambaSL: Exploring Single-Layer Mamba for Time Series Classification
A single-layer Mamba variant with targeted redesigns sets new state-of-the-art average performance on all 30 UEA time series classification datasets under a unified reproducible protocol.
-
Adversarial Robustness of Time-Series Classification for Crystal Collimator Alignment
A CNN for LHC beam-loss time-series classification gains up to 18.6% higher robust accuracy via a differentiable preprocessing wrapper and adversarial fine-tuning, with extension to sequence-level temporal robustness.
-
Quantum Dynamic Time Warping for Multivariate Time Series Classification
A hybrid qDTW architecture with a Unified Pre-Embedding Adjoint Ansatz and identified spatial-temporal tradeoff outperforms classical DTW baselines on multivariate series up to 8 dimensions.
-
Modality vs. Morphology: A Framework for Time Series Classification for Biological Signals
A review synthesizes evidence from EEG, EMG, ECG, PPG and ocular signals to argue that waveform morphology, rather than modality or model class, primarily determines TSC performance and interpretability.
-
A Unified Framework for Modeling Heterogeneous Financial Data via Dual-Granularity Prompting
FinLangNet applies dual-granularity prompting in a sequential model to heterogeneous financial data, reporting 6.3 pp KS improvement and 9.9% bad debt reduction in real-world deployment.
-
Univariate Channel Fusion for Multivariate Time Series Classification
UCF reduces multivariate time series to univariate form via channel fusion, enabling efficient classification that often matches or exceeds specialized multivariate methods especially when channels are correlated.
-
Universal Time-Series Representation Learning: A Survey
A survey that proposes a taxonomy for universal time-series representation learning and reviews existing deep learning studies along with experimental setups.
- MSTN: A Lightweight and Fast Model for General TimeSeries Analysis
- Dataset-Driven Channel Masks in Transformers for Multivariate Time Series