pith. machine review for the scientific record. sign in

arxiv: 2509.24789 · v4 · submitted 2025-09-29 · 💻 cs.LG · stat.ML

Fidel-TS: A High-Fidelity Multimodal Benchmark for Time Series Forecasting

classification 💻 cs.LG stat.ML
keywords evaluationforecastingmultimodalbenchmarkbenchmarksdatadesignsexisting
0
0 comments X
read the original abstract

The evaluation of time series forecasting models is hindered by a lack of high-quality benchmarks, leading to overestimated assessments of progress. Existing datasets suffer from issues ranging from small-scale, low-frequency, pre-training data contamination in unimodal designs to the temporal and description leakage prevalent in early multimodal designs. To address this, we formalize the core principles of high-fidelity benchmarking, focusing on data sourcing integrity, leak-free design, and structural clarity. We introduce Fidel-TS, a new large-scale benchmark built from these principles. Our experiments reveal the limitations of prior benchmarks and the potential discrepancies in model evaluation, providing new insights into multiple existing unimodal and multimodal forecasting models and LLMs across various evaluation tasks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TS-Arena -- A Live Forecast Pre-Registration Platform

    cs.LG 2025-12 conditional novelty 7.0

    TS-Arena is a live pre-registration platform that evaluates time series forecasts on future data streams to eliminate information leakage.