TSFMAudit detects pretraining contamination in time series foundation models via probe adaptation dynamics (faster loss drop, smaller backbone shift), tested on 6 models and 187 datasets against 10 LLM-derived baselines.
Time Series Foundation Models as Strong Baselines in Transportation Forecasting: A Large-Scale Benchmark Analysis
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Accurate forecasting of transportation dynamics is essential for urban mobility and infrastructure planning. Although recent work has achieved strong performance with deep learning models, these methods typically require dataset-specific training, architecture design and hyper-parameter tuning. This paper evaluates whether general-purpose time-series foundation models can serve as forecasters for transportation tasks by benchmarking the zero-shot performance of the state-of-the-art model, Chronos-2, across ten real-world datasets covering highway traffic volume and flow, urban traffic speed, bike-sharing demand, and electric vehicle charging station data. Under a consistent evaluation protocol, we find that, even without any task-specific fine-tuning, Chronos-2 delivers state-of-the-art or competitive accuracy across most datasets, frequently outperforming classical statistical baselines and specialized deep learning architectures, particularly at longer horizons. Beyond point forecasting, we evaluate its native probabilistic outputs using prediction-interval coverage and sharpness, demonstrating that Chronos-2 also provides useful uncertainty quantification without dataset-specific training. In general, this study supports the adoption of time-series foundation models as a key baseline for transportation forecasting research.
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models
TSFMAudit detects pretraining contamination in time series foundation models via probe adaptation dynamics (faster loss drop, smaller backbone shift), tested on 6 models and 187 datasets against 10 LLM-derived baselines.