arxiv: 2509.24789 · v4 · submitted 2025-09-29 · 💻 cs.LG · stat.ML

Fidel-TS: A High-Fidelity Multimodal Benchmark for Time Series Forecasting

Zhijian Xu , Wanxu Cai , Xilin Dai , Zhaorong Deng , Qiang Xu This is my paper

classification 💻 cs.LG stat.ML

keywords evaluationforecastingmultimodalbenchmarkbenchmarksdatadesignsexisting

0 comments

read the original abstract

The evaluation of time series forecasting models is hindered by a lack of high-quality benchmarks, leading to overestimated assessments of progress. Existing datasets suffer from issues ranging from small-scale, low-frequency, pre-training data contamination in unimodal designs to the temporal and description leakage prevalent in early multimodal designs. To address this, we formalize the core principles of high-fidelity benchmarking, focusing on data sourcing integrity, leak-free design, and structural clarity. We introduce Fidel-TS, a new large-scale benchmark built from these principles. Our experiments reveal the limitations of prior benchmarks and the potential discrepancies in model evaluation, providing new insights into multiple existing unimodal and multimodal forecasting models and LLMs across various evaluation tasks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TS-Arena -- A Live Forecast Pre-Registration Platform
cs.LG 2025-12 conditional novelty 7.0

TS-Arena is a live pre-registration platform that evaluates time series forecasts on future data streams to eliminate information leakage.