Adapt Only When It Pays: Budgeted Decision-Loss Priority for Delayed Online Time-Series Adaptation

Xibai Wang

arxiv: 2606.25068 · v1 · pith:JGXJAVIEnew · submitted 2026-06-23 · 💻 cs.LG · cs.AI

Adapt Only When It Pays: Budgeted Decision-Loss Priority for Delayed Online Time-Series Adaptation

Xibai Wang This is my paper

Pith reviewed 2026-06-26 00:13 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords online time-series forecastingdelayed feedbackbudgeted adaptationdecision lossupdate schedulingcapacity planningresidual adapter

0 comments

The pith

A decision-loss priority gate lets delayed online forecasters update only when observed loss exceeds a calibrated quantile, lowering held-out decision loss versus always-update and drift baselines under matched compute.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines when an online time-series forecaster should spend its limited compute budget on an update, given that labels arrive only after a horizon-dependent delay. It presents ADOWIP, a residual-adapter system that keeps sealed delay queues and exact budget accounts, then triggers an update after feedback only if downstream decision loss (optionally penalized by prediction MSE) exceeds an empirical quantile threshold chosen on a frozen calibration split. Proofs establish hard-budget feasibility, projected-OGD regret for the accepted-update subproblem, and conditional stability of the gate. On ETT capacity-planning tasks the selected gate produces lower held-out decision loss than always-update, fixed-period, and drift-triggered baselines at identical compute cost; a similar protocol yields 20/0 wins on a UCI Bike proxy. Secondary suites are mixed and eight non-passing contrasts are explicitly excluded from primary claims.

Core claim

An observed decision-loss priority gate, calibrated on a frozen split, selects post-feedback updates that reduce held-out decision loss against always, fixed-period, and drift-triggered exact-update baselines under matched compute on public ETT capacity-planning tasks, while the framework guarantees hard-budget feasibility and supplies projected-OGD regret bounds for the convex linear subproblem together with stability and conditional finite-sample gate-selection statements.

What carries the argument

The observed decision-loss priority gate that triggers an exact update after feedback revelation only when downstream loss exceeds a calibrated empirical quantile and remaining budget allows.

If this is right

Hard-budget feasibility holds for any sequence of accepted updates.
Projected-OGD regret bounds apply to the convex linear accepted-update subproblem.
Stability and conditional finite-sample statements govern gate selection.
On ETT capacity tasks the gate lowers held-out decision loss relative to the three listed baselines.
The same protocol yields 20/0 held-out wins on the UCI Bike capacity proxy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the quantile threshold generalizes across new capacity-planning series without recalibration, the scheduler could be deployed with a single frozen calibration run.
The negative outcomes on probe-based and finance experiments indicate that decision-prioritized adaptation may be confined to tasks where downstream loss is both observable and directly actionable after delay.
Auditable update telemetry could be used to verify that total compute never exceeds the declared budget in production systems.

Load-bearing premise

The empirical quantile threshold chosen on the frozen calibration split continues to select useful updates on held-out data without later adjustment or exclusion of non-improving contrasts.

What would settle it

A fresh time-series dataset on which the gate chosen by the frozen-split protocol produces higher held-out decision loss than the always-update baseline when both run under identical compute budget.

read the original abstract

Online time-series forecasters receive labels only after horizon-dependent delays, while every adaptation step spends limited compute. We study when an online learner should update, not how to adapt at every opportunity, and introduce ADOWIP: a residual-adapter framework with sealed delay queues, exact budget accounting, and auditable update telemetry. Its main scheduler is an observed decision-loss priority gate that updates only after feedback is revealed, when downstream loss, optionally penalized by prediction MSE, exceeds a calibrated empirical quantile and budget remains. We prove hard-budget feasibility, projected-OGD regret for a convex linear accepted-update subproblem, and stability plus conditional finite-sample gate-selection statements. On public ETT capacity-planning tasks, a frozen calibration/evaluation split selects a gate that lowers held-out decision loss against always, fixed-period, and drift-triggered exact-update baselines under matched compute. Secondary threshold/load-index ETT suites are mixed: 33 of 41 selected contrasts clear the stricter cross-artifact Holm family, and the 8 nonpassing rows are explicitly excluded from primary claims. The same protocol improves an external UCI Bike capacity proxy with 20/0 held-out wins, and a fixed gate passes three full-year Capital Bikeshare station-rebalancing contrasts. Probe-based and finance experiments remain negative, delimiting the current scope of decision-prioritized adaptation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a budgeted update gate for delayed time-series that comes with regret bounds and hard feasibility, but the main empirical wins rest on a data-dependent quantile plus explicit exclusion of 8 non-passing contrasts.

read the letter

The core contribution is a scheduler that decides whether to adapt an online forecaster only after labels arrive and only when a decision-loss priority exceeds a calibrated quantile, while respecting a hard compute budget. It adds sealed delay queues and exact accounting so the budget is never violated. That combination of priority gating with verifiable budget and regret analysis for the accepted updates is new enough to be worth a look in the online convex optimization corner of time-series work.

The proofs for hard-budget feasibility and projected-OGD regret on the linear subproblem look clean on the page. The framework also ships auditable telemetry, which is a practical plus for anyone who has to justify update decisions.

The empirical side is thinner. The headline result on ETT uses a frozen calibration split to pick the quantile, then reports held-out improvement only against baselines under matched compute. Secondary suites show 33 of 41 contrasts clearing Holm correction, but the 8 non-passing ones are dropped from primary claims. That exclusion step makes the advantage conditional on the gate working on the calibration data rather than demonstrated across the full protocol. The circularity between the fitted quantile and the reported contrast is real and limits how strongly the experiments support the scheduler.

Probe and finance experiments are negative, which the paper states plainly; that honesty is useful for scoping.

This is for readers already working on budgeted online adaptation or delayed-label time-series. It is not a broad advance. A serious referee should see it because the theory is grounded and the limitations are called out, even if the empirical case needs tightening around the selection and exclusion steps.

Referee Report

2 major / 2 minor

Summary. The paper introduces ADOWIP, a residual-adapter framework for deciding when (not how) to perform budgeted updates in online time-series forecasting under label delays. The core scheduler is a decision-loss priority gate that triggers an update only when downstream loss (optionally penalized by MSE) exceeds a frozen empirical quantile threshold and budget remains. The manuscript proves hard-budget feasibility, projected-OGD regret for the accepted-update subproblem, and conditional stability statements. Empirically, a calibration/evaluation split on public ETT capacity-planning tasks yields held-out decision-loss reductions versus always-update, fixed-period, and drift-triggered baselines under matched compute; secondary suites show 33/41 contrasts clearing Holm correction after explicit exclusion of the 8 non-passing rows from primary claims, with additional positive results on a UCI Bike proxy and Capital Bikeshare data but negative outcomes on probe-based and finance tasks.

Significance. If the empirical gate selection generalizes without post-hoc filtering, the work supplies a practical, auditable mechanism for compute-constrained online adaptation that directly optimizes decision loss rather than prediction error. The theoretical guarantees on budget feasibility and regret provide a solid foundation that is rare in this area; the explicit reporting of negative results further strengthens the delimited scope.

major comments (2)

[Abstract / empirical evaluation] Abstract and empirical evaluation: the central claim that a frozen calibration split selects a gate lowering held-out decision loss rests on an empirical quantile threshold fitted to the same data distribution used for evaluation. This creates a data-dependent threshold whose value is chosen to maximize the reported contrast, so the held-out improvement is measured against a quantity that has already been tuned on the evaluation distribution.
[Abstract] Abstract: the statement that '33 of 41 selected contrasts clear the stricter cross-artifact Holm family, and the 8 nonpassing rows are explicitly excluded from primary claims' makes the reported advantage conditional on success. The primary empirical result therefore depends on the quantile generalizing without adjustment or filtering; if the gate only improves when the calibration split happens to produce a favorable threshold, the advantage does not establish reliable decision-loss prioritization across the paper's own protocol.

minor comments (2)

The manuscript should clarify whether the quantile threshold is recomputed on each new calibration window or remains fixed after the initial split, and should report sensitivity of the held-out gains to small perturbations of that quantile.
Notation for the 'sealed delay queues' and 'exact budget accounting' should be introduced with explicit equations rather than descriptive prose only.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the empirical protocol and abstract claims. We respond point-by-point below, maintaining the manuscript's commitment to transparent reporting of both positive and negative results.

read point-by-point responses

Referee: [Abstract / empirical evaluation] Abstract and empirical evaluation: the central claim that a frozen calibration split selects a gate lowering held-out decision loss rests on an empirical quantile threshold fitted to the same data distribution used for evaluation. This creates a data-dependent threshold whose value is chosen to maximize the reported contrast, so the held-out improvement is measured against a quantity that has already been tuned on the evaluation distribution.

Authors: The empirical quantile is computed exclusively on the disjoint calibration split; the evaluation split is never accessed during threshold selection or gate choice. While the underlying distribution is shared (standard for any train/test protocol), no optimization or selection occurs on evaluation data. The protocol fixes the split in advance and applies the calibration-derived gate to held-out data without further adjustment. We disagree that the threshold is tuned on the evaluation distribution and therefore see no need to alter the central claim, though we can add a sentence in Section 3 clarifying the disjoint splits if the editor requests. revision: no
Referee: [Abstract] Abstract: the statement that '33 of 41 selected contrasts clear the stricter cross-artifact Holm family, and the 8 nonpassing rows are explicitly excluded from primary claims' makes the reported advantage conditional on success. The primary empirical result therefore depends on the quantile generalizing without adjustment or filtering; if the gate only improves when the calibration split happens to produce a favorable threshold, the advantage does not establish reliable decision-loss prioritization across the paper's own protocol.

Authors: The manuscript already presents the 33/41 result with explicit exclusion of the eight non-passing rows and reports negative outcomes on probe-based and finance tasks to delimit scope. The exclusions follow the pre-specified Holm procedure applied to the selected contrasts; they are not post-hoc filtering on evaluation performance. The primary claims are therefore conditioned on the calibration protocol producing a usable gate, which is consistent with the paper's emphasis on mixed secondary-suite results. We view this as honest delimitation rather than a flaw requiring revision, though we can strengthen the abstract wording to reiterate that the advantage holds only when the frozen calibration gate succeeds. revision: partial

Circularity Check

0 steps flagged

No significant circularity; theoretical claims and empirical protocol remain independent of inputs by construction.

full rationale

The paper states separate proofs for hard-budget feasibility, projected-OGD regret on the accepted-update subproblem, and conditional finite-sample gate-selection statements. The empirical protocol uses a frozen calibration split only to select the quantile threshold for the gate, then reports held-out decision-loss reduction on a distinct evaluation split; this does not reduce the held-out result to the calibration inputs by construction, nor rename a fitted quantity as a prediction. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing for the central claims. Explicit exclusion of the 8 non-passing contrasts is transparently noted rather than concealed, and does not create a definitional or statistical loop under the enumerated patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on standard online convex optimization assumptions plus the empirical calibration of the quantile threshold; no new physical entities are introduced.

free parameters (1)

empirical quantile threshold
Calibrated on the frozen split to decide when downstream loss triggers an update; its value directly determines which updates are accepted.

axioms (2)

standard math Projected online gradient descent yields sublinear regret on the accepted-update subproblem when the loss is convex and linear in the adapter parameters.
Invoked to bound the cost of the updates that the gate accepts.
domain assumption The calibration split is representative of the held-out distribution for the purpose of quantile selection.
Required for the gate to generalize without retraining the threshold.

pith-pipeline@v0.9.1-grok · 5768 in / 1541 out tokens · 19862 ms · 2026-06-26T00:13:24.871631+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 6 canonical work pages · 1 internal anchor

[1]

E.; and Chahed, T

Abdullah, M.; Iosifidis, G.; Elayoubi, S. E.; and Chahed, T. 2026. Constrained Online Convex Optimization with Memory and Predictions. Accepted to AAAI 2026, arXiv:2603.21375

work page arXiv 2026
[2]

Areces, F.; Mohri, C.; Hashimoto, T.; and Duchi, J. 2025. Online Conformal Prediction via Online Optimization. In Proceedings of the 42nd International Conference on Machine Learning

2025
[3]

Hu, Y.; Yang, J.; Zhou, T.; Liu, P.; Tang, Y.; Jin, R.; and Sun, L. 2026. Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting. In The Fourteenth International Conference on Learning Representations

2026
[4]

Huang, X.; Qiu, S.; Du, J.; and Yang, C. 2026. Online Time Series Prediction Using Feature Adjustment. In The Fourteenth International Conference on Learning Representations

2026
[5]

Im, J.; and Kwon, H.-Y. 2026. COSA : Context-Aware Output-Space Adapter for Test-Time Adaptation in Time Series Forecasting. In The Fourteenth International Conference on Learning Representations

2026
[6]

Liang, D.; Li, Q.; Wang, Y.; Chen, J.; Zhang, H.; Cui, X.; Wang, Q.; and Li, S. 2026. The Forecast After the Forecast: A Post-Processing Shift in Time Series. In The Fourteenth International Conference on Learning Representations

2026
[7]

Lyu, F.; Du, L.; Weng, Y.; Ying, Q.; Xu, Z.; Zou, W.; Wu, H.; He, X.; and Tang, X. 2025. Timing is Important: Risk-Aware Fund Allocation Based on Time-Series Forecasting. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ADS Track

2025
[8]

R.; Sharifi-Noghabi, H.; Oliveira, G

Medeiros, H. R.; Sharifi-Noghabi, H.; Oliveira, G. L.; and Irandoust, S. 2025. Accurate Parameter-Efficient Test-Time Adaptation for Time Series Forecasting. Second Workshop on Test-Time Adaptation: Putting Updates to the Test! at ICML 2025, arXiv:2506.23424

work page arXiv 2025
[9]

Zhao, Z.; Liu, H.; and Prakash, B. A. 2026. Tackling Time-Series Forecasting Generalization via Mitigating Concept Drift. In The Fourteenth International Conference on Learning Representations

2026
[10]

Zhou, W.; and Zhu, S. 2025. Calibrating Decision Robustness via Inverse Conformal Risk Control. arXiv:2510.07750

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

2026 , url =

Im, Jeonghwan and Kwon, Hyuk-Yoon , booktitle =. 2026 , url =

2026
[12]

The Fourteenth International Conference on Learning Representations , year =

The Forecast After the Forecast: A Post-Processing Shift in Time Series , author =. The Fourteenth International Conference on Learning Representations , year =
[13]

The Fourteenth International Conference on Learning Representations , year =

Online Time Series Prediction Using Feature Adjustment , author =. The Fourteenth International Conference on Learning Representations , year =
[14]

The Fourteenth International Conference on Learning Representations , year =

Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting , author =. The Fourteenth International Conference on Learning Representations , year =
[15]

The Fourteenth International Conference on Learning Representations , year =

Tackling Time-Series Forecasting Generalization via Mitigating Concept Drift , author =. The Fourteenth International Conference on Learning Representations , year =
[16]

2025 , eprint =

Accurate Parameter-Efficient Test-Time Adaptation for Time Series Forecasting , author =. 2025 , eprint =

2025
[17]

2410.02081 , archivePrefix =

Ma, Aitian and Luo, Dongsheng and Sha, Mo , year =. 2410.02081 , archivePrefix =

work page arXiv
[18]

Advances in Neural Information Processing Systems , year =

Time-o1: Time-Series Forecasting Needs Transformed Label Alignment , author =. Advances in Neural Information Processing Systems , year =
[19]

Proceedings of the 42nd International Conference on Machine Learning , year =

Online Conformal Prediction via Online Optimization , author =. Proceedings of the 42nd International Conference on Machine Learning , year =
[20]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ADS Track , year =

Timing is Important: Risk-Aware Fund Allocation Based on Time-Series Forecasting , author =. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ADS Track , year =. doi:10.1145/3711896.3737268 , eprint =

work page doi:10.1145/3711896.3737268
[21]

Transactions on Machine Learning Research , year =

There Are No Champions in Supervised Long-Term Time Series Forecasting , author =. Transactions on Machine Learning Research , year =
[22]

2025 , eprint =

Calibrating Decision Robustness via Inverse Conformal Risk Control , author =. 2025 , eprint =

2025
[23]

2026 , eprint =

Constrained Online Convex Optimization with Memory and Predictions , author =. 2026 , eprint =

2026
[24]

2025 , eprint =

Monitoring Risks in Test-Time Adaptation , author =. 2025 , eprint =

2025
[25]

2009.11189 , archivePrefix =

Yang, Xiao and Liu, Weiqing and Zhou, Dong and Bian, Jiang and Liu, Tie-Yan , year =. 2009.11189 , archivePrefix =

work page arXiv 2009

[1] [1]

E.; and Chahed, T

Abdullah, M.; Iosifidis, G.; Elayoubi, S. E.; and Chahed, T. 2026. Constrained Online Convex Optimization with Memory and Predictions. Accepted to AAAI 2026, arXiv:2603.21375

work page arXiv 2026

[2] [2]

Areces, F.; Mohri, C.; Hashimoto, T.; and Duchi, J. 2025. Online Conformal Prediction via Online Optimization. In Proceedings of the 42nd International Conference on Machine Learning

2025

[3] [3]

Hu, Y.; Yang, J.; Zhou, T.; Liu, P.; Tang, Y.; Jin, R.; and Sun, L. 2026. Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting. In The Fourteenth International Conference on Learning Representations

2026

[4] [4]

Huang, X.; Qiu, S.; Du, J.; and Yang, C. 2026. Online Time Series Prediction Using Feature Adjustment. In The Fourteenth International Conference on Learning Representations

2026

[5] [5]

Im, J.; and Kwon, H.-Y. 2026. COSA : Context-Aware Output-Space Adapter for Test-Time Adaptation in Time Series Forecasting. In The Fourteenth International Conference on Learning Representations

2026

[6] [6]

Liang, D.; Li, Q.; Wang, Y.; Chen, J.; Zhang, H.; Cui, X.; Wang, Q.; and Li, S. 2026. The Forecast After the Forecast: A Post-Processing Shift in Time Series. In The Fourteenth International Conference on Learning Representations

2026

[7] [7]

Lyu, F.; Du, L.; Weng, Y.; Ying, Q.; Xu, Z.; Zou, W.; Wu, H.; He, X.; and Tang, X. 2025. Timing is Important: Risk-Aware Fund Allocation Based on Time-Series Forecasting. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ADS Track

2025

[8] [8]

R.; Sharifi-Noghabi, H.; Oliveira, G

Medeiros, H. R.; Sharifi-Noghabi, H.; Oliveira, G. L.; and Irandoust, S. 2025. Accurate Parameter-Efficient Test-Time Adaptation for Time Series Forecasting. Second Workshop on Test-Time Adaptation: Putting Updates to the Test! at ICML 2025, arXiv:2506.23424

work page arXiv 2025

[9] [9]

Zhao, Z.; Liu, H.; and Prakash, B. A. 2026. Tackling Time-Series Forecasting Generalization via Mitigating Concept Drift. In The Fourteenth International Conference on Learning Representations

2026

[10] [10]

Zhou, W.; and Zhu, S. 2025. Calibrating Decision Robustness via Inverse Conformal Risk Control. arXiv:2510.07750

work page internal anchor Pith review Pith/arXiv arXiv 2025

[11] [11]

2026 , url =

Im, Jeonghwan and Kwon, Hyuk-Yoon , booktitle =. 2026 , url =

2026

[12] [12]

The Fourteenth International Conference on Learning Representations , year =

The Forecast After the Forecast: A Post-Processing Shift in Time Series , author =. The Fourteenth International Conference on Learning Representations , year =

[13] [13]

The Fourteenth International Conference on Learning Representations , year =

Online Time Series Prediction Using Feature Adjustment , author =. The Fourteenth International Conference on Learning Representations , year =

[14] [14]

The Fourteenth International Conference on Learning Representations , year =

Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting , author =. The Fourteenth International Conference on Learning Representations , year =

[15] [15]

The Fourteenth International Conference on Learning Representations , year =

Tackling Time-Series Forecasting Generalization via Mitigating Concept Drift , author =. The Fourteenth International Conference on Learning Representations , year =

[16] [16]

2025 , eprint =

Accurate Parameter-Efficient Test-Time Adaptation for Time Series Forecasting , author =. 2025 , eprint =

2025

[17] [17]

2410.02081 , archivePrefix =

Ma, Aitian and Luo, Dongsheng and Sha, Mo , year =. 2410.02081 , archivePrefix =

work page arXiv

[18] [18]

Advances in Neural Information Processing Systems , year =

Time-o1: Time-Series Forecasting Needs Transformed Label Alignment , author =. Advances in Neural Information Processing Systems , year =

[19] [19]

Proceedings of the 42nd International Conference on Machine Learning , year =

Online Conformal Prediction via Online Optimization , author =. Proceedings of the 42nd International Conference on Machine Learning , year =

[20] [20]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ADS Track , year =

Timing is Important: Risk-Aware Fund Allocation Based on Time-Series Forecasting , author =. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ADS Track , year =. doi:10.1145/3711896.3737268 , eprint =

work page doi:10.1145/3711896.3737268

[21] [21]

Transactions on Machine Learning Research , year =

There Are No Champions in Supervised Long-Term Time Series Forecasting , author =. Transactions on Machine Learning Research , year =

[22] [22]

2025 , eprint =

Calibrating Decision Robustness via Inverse Conformal Risk Control , author =. 2025 , eprint =

2025

[23] [23]

2026 , eprint =

Constrained Online Convex Optimization with Memory and Predictions , author =. 2026 , eprint =

2026

[24] [24]

2025 , eprint =

Monitoring Risks in Test-Time Adaptation , author =. 2025 , eprint =

2025

[25] [25]

2009.11189 , archivePrefix =

Yang, Xiao and Liu, Weiqing and Zhou, Dong and Bian, Jiang and Liu, Tie-Yan , year =. 2009.11189 , archivePrefix =

work page arXiv 2009