Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?

Coen Adler; Felix Draxler; Padhraic Smyth; Samar Abdi; Yuxin Chang

arxiv: 2510.16060 · v2 · pith:Q6LPG6H6new · submitted 2025-10-17 · 💻 cs.LG · cs.AI· stat.ME· stat.ML

Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?

Coen Adler , Yuxin Chang , Felix Draxler , Samar Abdi , Padhraic Smyth This is my paper

classification 💻 cs.LG cs.AIstat.MEstat.ML

keywords modelsfoundationseriescalibrationtimeapplicationsover-properties

0 comments

read the original abstract

The recent development of foundation models for time series data has generated considerable interest in using such models across a variety of applications. Although foundation models achieve state-of-the-art predictive performance, their calibration properties remain relatively underexplored, despite the fact that calibration can be critical for many practical applications. In this paper, we investigate the calibration-related properties of five recent time series foundation models and two competitive baselines. We perform a series of systematic evaluations assessing model calibration (i.e., over- or under-confidence), effects of varying prediction heads, and calibration under long-term autoregressive forecasting. We find that time series foundation models are consistently better calibrated than baseline models and tend not to be either systematically over- or under-confident, in contrast to the overconfidence often seen in other deep learning models.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Do Time Series Foundation Model Benchmarks Hide Regime-Dependent Failures? Evidence from Traffic Speed Forecasting
cs.LG 2026-06 unverdicted novelty 7.0

Regime-stratified evaluation of three TSFMs on traffic benchmarks reveals sharp drops in accuracy and coverage during transitions that aggregate metrics conceal, with BMA proposed to combine model forecasts and histor...