Fidel-TS: A High-Fidelity Multimodal Benchmark for Time Series Forecasting
read the original abstract
The evaluation of time series forecasting models is hindered by a lack of high-quality benchmarks, leading to overestimated assessments of progress. Existing datasets suffer from issues ranging from small-scale, low-frequency, pre-training data contamination in unimodal designs to the temporal and description leakage prevalent in early multimodal designs. To address this, we formalize the core principles of high-fidelity benchmarking, focusing on data sourcing integrity, leak-free design, and structural clarity. We introduce Fidel-TS, a new large-scale benchmark built from these principles. Our experiments reveal the limitations of prior benchmarks and the potential discrepancies in model evaluation, providing new insights into multiple existing unimodal and multimodal forecasting models and LLMs across various evaluation tasks.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
TS-Arena -- A Live Forecast Pre-Registration Platform
TS-Arena is a live pre-registration platform that evaluates time series forecasts on future data streams to eliminate information leakage.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.