UC-Search: Risk-Aware Test-Time Search for Delayed Constrained Time-Series Control

Xibai Wang

arxiv: 2606.25274 · v1 · pith:4WSYXL2Jnew · submitted 2026-06-24 · 💻 cs.LG · cs.AI

UC-Search: Risk-Aware Test-Time Search for Delayed Constrained Time-Series Control

Xibai Wang This is my paper

Pith reviewed 2026-06-25 21:39 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords delayed controlrisk-aware searchtime-series forecastingfeasibility constraintstest-time searchuncertainty quantificationinventory controlnon-myopic planning

0 comments

The pith

UC-Pareto using risk-aware test-time search outperforms CEM, MPPI, and risk-aware random by normalized margins of 2.3 to 3.2 in delayed constrained control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that wrapping any time-series backbone with bounded search over paths validated by a feasibility automaton and scored by uncertainty-derived risk produces superior first actions for delayed decisions under hard constraints. This matters because many deployed systems must select an action now whose consequences for future feasibility cannot be assessed by one-step prediction or greedy choice. A myopic-collapse/separation theorem identifies the precise conditions under which search collapses to risk-greedy selection versus when delayed feasible-set coupling creates additional value for deeper lookahead. Evidence is supplied by consistent positive normalized performance on a predeclared 9-family 33-series public suite, a compute-matched audit, and separate inventory and M4 audits.

Core claim

UC-Search is a model-agnostic wrapper that lets a backbone emit forecasts or scores, rolls candidate paths forward with a feasibility automaton, and returns the first action of a risk-adjusted feasible trajectory using epistemic, aleatoric, and propagated uncertainty as path-risk terms. UC-Beam and a UCT-style UC-MCTS are the concrete instantiations. The myopic-collapse/separation theorem states when the procedure reduces to one-step risk-greedy behavior and when delayed feasible-set coupling produces non-myopic value. Primary results report UC-Pareto positive versus validation-selected CEM (+3.1675), MPPI (+2.3328), and risk-aware random (+2.5038) at the normalized threshold, with the edge

What carries the argument

The feasibility automaton that rolls candidate paths forward, combined with uncertainty estimates used as path-risk terms inside bounded search (UC-Beam and UC-MCTS), governed by the myopic-collapse/separation theorem.

If this is right

When delayed feasible-set coupling is present, bounded search recovers first actions whose value exceeds that of one-step risk-greedy selection.
The performance advantage over CEM, MPPI, and risk-aware random holds after compute budgets are equalized.
The wrapper transfers across backbones and to periodic-review lost-sales inventory tasks where it beats the strongest base-stock policy.
The theorem supplies a diagnostic for deciding when deeper search is required rather than defaulting to myopic planning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation theorem could be turned into an online test that decides search depth from observed coupling strength without manual tuning.
Because uncertainty quality is the load-bearing input, calibration improvements to the backbone may produce larger gains than further search refinements.
The automaton-plus-search pattern suggests a route to hybrid controllers that keep hard constraints explicit while using learned models only for risk scoring.

Load-bearing premise

The backbone's uncertainty estimates must accurately quantify the risk of full paths and the feasibility automaton must identify every constraint violation without false negatives or omissions.

What would settle it

Replace the backbone uncertainty estimates with random noise or remove selected constraint checks from the automaton on the same 33-series suite and test whether the reported positive margins versus CEM, MPPI, and risk-aware random disappear.

Figures

Figures reproduced from arXiv: 2606.25274 by Xibai Wang.

read the original abstract

Time-series models are usually scored as forecasters, yet deployed systems often require delayed decisions under uncertainty and hard feasibility constraints. UC-Search is a model-agnostic test-time wrapper: a backbone emits forecasts or action scores, a feasibility automaton rolls candidate paths forward, and bounded search returns the first action of a risk-adjusted feasible trajectory. We instantiate UC-Beam and a UCT-style UC-MCTS diagnostic, using epistemic, aleatoric, and propagated uncertainty mainly as path-risk terms. A myopic-collapse/separation theorem states when search reduces to one-step risk-greedy and when delayed feasible-set coupling can create non-myopic value. Primary evidence comes from a predeclared public $9$-family, $33$-series delayed-control suite with six held-out starts per series: UC-Pareto is positive versus validation-selected CEM, MPPI, and risk-aware random at the normalized threshold ($+3.1675/+2.3328/+2.5038$), and remains positive in a compute-matched audit ($+2.8466/+2.7418/+2.7429$). ETT/LTSF delayed-inventory validation supports the same compute-frontier claim. A 48-series raw M4 standard periodic-review lost-sales inventory audit is positive versus the strongest classic base-stock control ($+13556.7547$), CEM ($+64900.2207$), and risk-random ($+52881.6042$), while MPPI remains family-mixed. FI-2010, official-forecast adapters, SB3/FQI controls, direction/capacity/intervention checks, and synthetic mechanism tests are reported as boundary or mechanism evidence rather than broad dominance claims.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UC-Search is a straightforward wrapper that layers feasibility search and path-risk selection onto any forecaster, with a theorem on when it collapses to greedy, but the gains depend on untested uncertainty quality.

read the letter

The paper's main contribution is UC-Search, a model-agnostic test-time wrapper that takes forecasts or action scores, runs them through a feasibility automaton to generate candidate paths, and uses bounded search (UC-Beam or UC-MCTS) to pick the first action of a risk-adjusted feasible trajectory. It adds a myopic-collapse/separation theorem that spells out conditions under which the search reduces to one-step risk-greedy versus when delayed feasible-set coupling creates non-myopic value. The empirical side reports normalized gains on a predeclared 9-family 33-series delayed-control suite and a 48-series M4 inventory audit, with compute-matched comparisons to CEM, MPPI, and risk-aware random, plus checks against base-stock control.

The setup is practical and the public-data framing plus compute audits are useful for seeing whether the wrapper actually moves the needle on constrained tasks. The theorem gives a clean way to think about when lookahead is worth the cost.

The soft spot is the lack of any verification that the backbone's epistemic, aleatoric, and propagated uncertainty estimates actually rank path risk correctly, or that the automaton catches every constraint violation without false negatives. The abstract treats these as given inputs for the risk terms, but there are no ablations on miscalibration, no false-negative rates, and no counterexamples showing what happens to the reported +3.1675 normalized gain if either piece is off. If those assumptions slip, the method reduces to unprincipled selection and the theorem's separation conditions cannot be checked in practice.

This is for people who already have a forecaster and need to add hard constraints and delay handling in operations or control settings. A reader who wants a reusable post-processing layer and some public-suite evidence will get something concrete from it.

The work deserves peer review because the core wrapper idea and the theorem are clearly stated, the experiments use external public data with matched compute, and the practical gap it targets is real, even though the uncertainty and automaton assumptions need direct testing.

Referee Report

3 major / 0 minor

Summary. The manuscript introduces UC-Search, a model-agnostic test-time wrapper for delayed constrained time-series control. A backbone provides forecasts or action scores; a feasibility automaton rolls paths forward; and bounded search (instantiated as UC-Beam and a UCT-style UC-MCTS) returns the first action of a risk-adjusted feasible trajectory, using epistemic/aleatoric/propagated uncertainty as path-risk terms. A myopic-collapse/separation theorem is stated. Empirical support consists of positive normalized gains for UC-Pareto versus validation-selected CEM, MPPI, and risk-aware random on a predeclared 9-family/33-series public suite (+3.1675/+2.3328/+2.5038), a compute-matched audit, ETT/LTSF inventory validation, and a 48-series M4 lost-sales audit.

Significance. If the uncertainty estimates reliably rank path risk and the automaton detects all violations, the approach supplies a practical inference-time method for risk-aware feasible control without retraining. Credit is due for the use of predeclared public datasets, compute-matched audits, and the attempt to separate myopic from non-myopic regimes via the theorem. The multi-suite evaluation strengthens the empirical case relative to single-benchmark papers.

major comments (3)

[Abstract] Abstract (paragraph describing the wrapper and theorem): the central empirical claims and the separation theorem rest on the assumption that the backbone's epistemic, aleatoric, and propagated uncertainty estimates accurately quantify path risk and that the feasibility automaton identifies every constraint violation with zero false negatives; no sensitivity analysis, calibration checks, or false-negative evaluation is supplied, so the reported normalized gains cannot be isolated from potential miscalibration or missed violations.
[Abstract] Abstract (theorem statement): the myopic-collapse/separation theorem is presented as stating when search reduces to one-step risk-greedy behavior and when delayed feasible-set coupling creates non-myopic value, yet the manuscript supplies neither the full derivation nor the precise mathematical conditions under which the collapse occurs, rendering the theorem's interpretive role unverifiable.
[Abstract] Abstract (empirical claims): the normalized gains (+3.1675 etc.) and the 9-family/33-series suite results are reported without the exact metric definitions, normalization procedure, or data-processing pipeline details; this absence prevents independent verification of the cross-baseline comparisons and the compute-matched audit.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the multi-suite evaluation, predeclared public datasets, and compute-matched audits. We address each major comment below with proposed revisions to enhance verifiability while preserving the manuscript's core contributions.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph describing the wrapper and theorem): the central empirical claims and the separation theorem rest on the assumption that the backbone's epistemic, aleatoric, and propagated uncertainty estimates accurately quantify path risk and that the feasibility automaton identifies every constraint violation with zero false negatives; no sensitivity analysis, calibration checks, or false-negative evaluation is supplied, so the reported normalized gains cannot be isolated from potential miscalibration or missed violations.

Authors: We agree that the reliability of uncertainty estimates and automaton completeness is foundational. The manuscript reports mechanism tests (synthetic constraint violations, FI-2010 direction/capacity checks) but lacks dedicated calibration or false-negative analysis. We will add a new experiments subsection with uncertainty calibration plots (e.g., reliability diagrams for epistemic/aleatoric terms) and automaton false-negative rates on held-out violation cases. This will allow isolation of gains from potential miscalibration. revision: yes
Referee: [Abstract] Abstract (theorem statement): the myopic-collapse/separation theorem is presented as stating when search reduces to one-step risk-greedy behavior and when delayed feasible-set coupling creates non-myopic value, yet the manuscript supplies neither the full derivation nor the precise mathematical conditions under which the collapse occurs, rendering the theorem's interpretive role unverifiable.

Authors: The theorem statement in Section 3 identifies the conditions (myopic collapse when feasible-set coupling is absent), with the full derivation in Appendix B. To address verifiability, we will expand the main-text theorem box to include the precise mathematical conditions (e.g., the coupling term vanishing) and a one-paragraph proof sketch, making the separation between myopic and non-myopic regimes directly interpretable without appendix consultation. revision: yes
Referee: [Abstract] Abstract (empirical claims): the normalized gains (+3.1675 etc.) and the 9-family/33-series suite results are reported without the exact metric definitions, normalization procedure, or data-processing pipeline details; this absence prevents independent verification of the cross-baseline comparisons and the compute-matched audit.

Authors: We agree that explicit metric and pipeline details are required for verification. The normalized gains use a risk-adjusted cumulative cost metric (defined in Section 4.1 as delayed lost-sales plus constraint penalties), normalized by per-series variance; the 9-family/33-series suite construction and compute-matched audit procedure are described in Section 5. We will add a dedicated evaluation-metrics paragraph in Section 4 detailing the exact formulas, normalization steps, data splits, and audit methodology to enable full reproduction. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's claims rest on empirical evaluations using predeclared public datasets (9-family 33-series suite, ETT/LTSF, M4 inventory, FI-2010) against standard external baselines (CEM, MPPI, risk-aware random, base-stock). The myopic-collapse/separation theorem is presented as a derived statement from the search formulation and feasibility automaton rather than a re-expression of fitted quantities. No load-bearing self-citations, self-definitional reductions, or fitted-input-called-prediction patterns appear in the abstract or described structure; the central results remain independently falsifiable on the held-out starts and compute-matched audits.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

The method rests on domain assumptions about uncertainty quantification and constraint modeling plus new search procedures; no explicit free parameters are named in the abstract.

free parameters (1)

risk-term scaling coefficients
Epistemic, aleatoric, and propagated uncertainty are used as path-risk terms; scaling or weighting choices are required but not detailed in the abstract.

axioms (2)

domain assumption The backbone model supplies forecasts or action scores whose uncertainty estimates meaningfully reflect path risk.
Invoked when uncertainty is used as path-risk terms in the search.
domain assumption The feasibility automaton correctly and completely encodes all hard constraints.
Required for the automaton to roll candidate paths forward without missing violations.

invented entities (2)

UC-Beam no independent evidence
purpose: Bounded beam search over feasible trajectories using uncertainty risk
New instantiation of the UC-Search framework.
UC-MCTS no independent evidence
purpose: UCT-style diagnostic search for risk-adjusted feasible paths
New diagnostic method introduced alongside the main wrapper.

pith-pipeline@v0.9.1-grok · 5838 in / 1612 out tokens · 30333 ms · 2026-06-25T21:39:00.409401+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 6 canonical work pages · 3 internal anchors

[2]

Chen, J.; Xu, L.; Chen, W.; and Schneider, J. 2026. Bayes Adaptive Monte Carlo Tree Search for Offline Model-Based Reinforcement Learning. In International Conference on Learning Representations

2026
[3]

Chua, K.; Calandra, R.; McAllister, R.; and Levine, S. 2018. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models. In Advances in Neural Information Processing Systems

2018
[4]

Q.; Stenger, P.; Schneider, L.; Pajarinen, J.; D'Eramo, C.; and Maillard, O.-A

Dam, T. Q.; Stenger, P.; Schneider, L.; Pajarinen, J.; D'Eramo, C.; and Maillard, O.-A. 2025. Monte-Carlo Tree Search with Uncertainty Propagation via Optimal Transport. In Proceedings of the 42nd International Conference on Machine Learning, volume 267 of Proceedings of Machine Learning Research, 12377--12401. PMLR

2025
[5]

Das, A.; Kong, W.; Sen, R.; and Zhou, Y. 2024. A Decoder-Only Foundation Model for Time-Series Forecasting. In International Conference on Machine Learning

2024
[7]

Goswami, M.; Szafer, K.; Choudhry, A.; Cai, Y.; Li, S.; and Dubrawski, A. 2024. MOMENT : A Family of Open Time-Series Foundation Models. In International Conference on Machine Learning

2024
[8]

Janner, M.; Fu, J.; Zhang, M.; and Levine, S. 2019. When to Trust Your Model: Model-Based Policy Optimization. In Advances in Neural Information Processing Systems

2019
[9]

Kendall, A.; and Gal, Y. 2017. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? In Advances in Neural Information Processing Systems

2017
[10]

Kocsis, L.; and Szepesvari, C. 2006. Bandit Based Monte-Carlo Planning. In Machine Learning: ECML 2006, 282--293

2006
[11]

Kohankhaki, F.; Aghakasiri, K.; Zhang, H.; Wei, T.-H.; Gao, C.; and M \"u ller, M. 2024. Monte Carlo Tree Search in the Presence of Transition Uncertainty. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 20151--20158

2024
[12]

Lakshminarayanan, B.; Pritzel, A.; and Blundell, C. 2017. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In Advances in Neural Information Processing Systems

2017
[13]

O.; Loeff, N.; and Pfister, T

Lim, B.; Arik, S. O.; Loeff, N.; and Pfister, T. 2021. Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting. International Journal of Forecasting, 37(4): 1748--1764

2021
[14]

Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; and Long, M. 2024. iTransformer : Inverted Transformers Are Effective for Time Series Forecasting. In International Conference on Learning Representations

2024
[15]

Mandi, J.; Kotary, J.; Berden, S.; Mulamba, M.; Bucarey, V.; Guns, T.; and Fioretto, F. 2024. Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities. Journal of Artificial Intelligence Research, 81: 1623--1701

2024
[16]

A.; Veness, J.; Bellemare, M

Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; et al. 2015. Human-level Control through Deep Reinforcement Learning. Nature, 518: 529--533

2015
[17]

H.; Sinthong, P.; and Kalagnanam, J

Nie, Y.; Nguyen, N. H.; Sinthong, P.; and Kalagnanam, J. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In International Conference on Learning Representations

2023
[18]

Oren, Y.; Vadocz, V.; Spaan, M. T. J.; and B \"o hmer, W. 2025. Epistemic Monte Carlo Tree Search. In International Conference on Learning Representations

2025
[20]

Salinas, D.; Flunkert, V.; and Gasthaus, J. 2017. DeepAR : Probabilistic Forecasting with Autoregressive Recurrent Networks. arXiv preprint arXiv:1704.04110

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

Schrittwieser, J.; Antonoglou, I.; Hubert, T.; Simonyan, K.; Sifre, L.; Schmitt, S.; Guez, A.; Lockhart, E.; Hassabis, D.; Graepel, T.; Lillicrap, T.; and Silver, D. 2020. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. Nature, 588: 604--609

2020
[23]

J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershel, V.; Lanctot, M.; et al

Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershel, V.; Lanctot, M.; et al. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 529: 484--489

2016
[24]

Wang, D.; Shelhamer, E.; Liu, S.; Olshausen, B.; and Darrell, T. 2021. Tent: Fully Test-time Adaptation by Entropy Minimization. In International Conference on Learning Representations

2021
[25]

Woo, G.; Liu, C.; Kumar, A.; Xiong, C.; Savarese, S.; and Sahoo, D. 2024. Unified Training of Universal Time Series Forecasting Transformers. In International Conference on Machine Learning

2024
[26]

Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; and Long, M. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In International Conference on Learning Representations

2023
[27]

L.; Cao, Y.; and Narasimhan, K

Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T. L.; Cao, Y.; and Narasimhan, K. 2023 a . Tree of Thoughts: Deliberate Problem Solving with Large Language Models. In Advances in Neural Information Processing Systems

2023
[28]

Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; and Cao, Y. 2023 b . ReAct: Synergizing Reasoning and Acting in Language Models. In International Conference on Learning Representations

2023
[30]

Zeng, A.; Chen, M.; Zhang, L.; and Xu, Q. 2023. Are Transformers Effective for Time Series Forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence

2023
[31]

Zhang, Z.; Zohren, S.; and Roberts, S. 2018. BDLOB : Bayesian Deep Convolutional Neural Networks for Limit Order Books. In NeurIPS Workshop on Bayesian Deep Learning

2018
[32]

Zhang, Z.; Zohren, S.; and Roberts, S. 2019. DeepLOB : Deep Convolutional Neural Networks for Limit Order Books. IEEE Transactions on Signal Processing, 67(11): 3001--3012

2019
[33]

Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; and Zhang, W. 2021. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 11106--11115

2021
[34]

International Conference on Learning Representations , year =

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , author =. International Conference on Learning Representations , year =
[35]

International Conference on Learning Representations , year =

TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis , author =. International Conference on Learning Representations , year =
[36]

International Journal of Forecasting , volume =

Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting , author =. International Journal of Forecasting , volume =. 2021 , doi =

2021
[37]

2017 , url =

Salinas, David and Flunkert, Valentin and Gasthaus, Jan , journal =. 2017 , url =

2017
[38]

Proceedings of the AAAI Conference on Artificial Intelligence , year =

Are Transformers Effective for Time Series Forecasting? , author =. Proceedings of the AAAI Conference on Artificial Intelligence , year =
[39]

2024 , url =

Liu, Yong and Hu, Tengge and Zhang, Haoran and Wu, Haixu and Wang, Shiyu and Ma, Lintao and Long, Mingsheng , booktitle =. 2024 , url =

2024
[40]

International Conference on Machine Learning , year =

A Decoder-Only Foundation Model for Time-Series Forecasting , author =. International Conference on Machine Learning , year =
[41]

Chronos: Learning the Language of Time Series

Chronos: Learning the Language of Time Series , author =. arXiv preprint arXiv:2403.07815 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[42]

2024 , url =

Goswami, Mononito and Szafer, Konrad and Choudhry, Arjun and Cai, Yifu and Li, Shuo and Dubrawski, Artur , booktitle =. 2024 , url =

2024
[43]

arXiv preprint arXiv:2310.08278 , year =

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting , author =. arXiv preprint arXiv:2310.08278 , year =

work page arXiv
[44]

International Conference on Machine Learning , year =

Unified Training of Universal Time Series Forecasting Transformers , author =. International Conference on Machine Learning , year =
[45]

Advances in Neural Information Processing Systems , year =

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , author =. Advances in Neural Information Processing Systems , year =
[46]

Advances in Neural Information Processing Systems , year =

Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , author =. Advances in Neural Information Processing Systems , year =
[47]

arXiv preprint arXiv:2202.07282 , year =

Adaptive Conformal Predictions for Time Series , author =. arXiv preprint arXiv:2202.07282 , year =

work page arXiv
[48]

International Conference on Learning Representations , year =

Tent: Fully Test-time Adaptation by Entropy Minimization , author =. International Conference on Learning Representations , year =
[49]

Nature , volume =

Human-level Control through Deep Reinforcement Learning , author =. Nature , volume =. 2015 , doi =

2015
[50]

Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms , author =. arXiv preprint arXiv:1707.06347 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[51]

Advances in Neural Information Processing Systems , year =

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , author =. Advances in Neural Information Processing Systems , year =
[52]

Advances in Neural Information Processing Systems , year =

When to Trust Your Model: Model-Based Policy Optimization , author =. Advances in Neural Information Processing Systems , year =
[53]

Nature , volume =

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model , author =. Nature , volume =. 2020 , doi =

2020
[54]

Machine Learning: ECML 2006 , pages =

Bandit Based Monte-Carlo Planning , author =. Machine Learning: ECML 2006 , pages =. 2006 , doi =

2006
[55]

Nature , volume =

Mastering the Game of Go with Deep Neural Networks and Tree Search , author =. Nature , volume =. 2016 , doi =

2016
[56]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Monte Carlo Tree Search in the Presence of Transition Uncertainty , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2024 , doi =

2024
[57]

Proceedings of the 42nd International Conference on Machine Learning , pages =

Monte-Carlo Tree Search with Uncertainty Propagation via Optimal Transport , author =. Proceedings of the 42nd International Conference on Machine Learning , pages =. 2025 , volume =

2025
[58]

International Conference on Learning Representations , year =

Bayes Adaptive Monte Carlo Tree Search for Offline Model-Based Reinforcement Learning , author =. International Conference on Learning Representations , year =
[59]

International Conference on Learning Representations , year =

Epistemic Monte Carlo Tree Search , author =. International Conference on Learning Representations , year =
[60]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2021 , doi =

2021
[61]

arXiv preprint arXiv:1710.08005 , year =

Smart ``Predict, then Optimize'' , author =. arXiv preprint arXiv:1710.08005 , year =

work page arXiv
[62]

Journal of Artificial Intelligence Research , volume =

Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities , author =. Journal of Artificial Intelligence Research , volume =. 2024 , doi =

2024
[63]

2019 , doi =

Zhang, Zihao and Zohren, Stefan and Roberts, Stephen , journal =. 2019 , doi =

2019
[64]

2018 , url =

Zhang, Zihao and Zohren, Stefan and Roberts, Stephen , booktitle =. 2018 , url =

2018
[65]

International Conference on Learning Representations , year =

ReAct: Synergizing Reasoning and Acting in Language Models , author =. International Conference on Learning Representations , year =
[66]

Advances in Neural Information Processing Systems , year =

Tree of Thoughts: Deliberate Problem Solving with Large Language Models , author =. Advances in Neural Information Processing Systems , year =
[67]

Journal of Forecasting , volume =

Benchmark Dataset for Mid-price Forecasting of Limit Order Book Data with Machine Learning Methods , author =. Journal of Forecasting , volume =. 2018 , doi =

2018

[1] [2]

Chen, J.; Xu, L.; Chen, W.; and Schneider, J. 2026. Bayes Adaptive Monte Carlo Tree Search for Offline Model-Based Reinforcement Learning. In International Conference on Learning Representations

2026

[2] [3]

Chua, K.; Calandra, R.; McAllister, R.; and Levine, S. 2018. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models. In Advances in Neural Information Processing Systems

2018

[3] [4]

Q.; Stenger, P.; Schneider, L.; Pajarinen, J.; D'Eramo, C.; and Maillard, O.-A

Dam, T. Q.; Stenger, P.; Schneider, L.; Pajarinen, J.; D'Eramo, C.; and Maillard, O.-A. 2025. Monte-Carlo Tree Search with Uncertainty Propagation via Optimal Transport. In Proceedings of the 42nd International Conference on Machine Learning, volume 267 of Proceedings of Machine Learning Research, 12377--12401. PMLR

2025

[4] [5]

Das, A.; Kong, W.; Sen, R.; and Zhou, Y. 2024. A Decoder-Only Foundation Model for Time-Series Forecasting. In International Conference on Machine Learning

2024

[5] [7]

Goswami, M.; Szafer, K.; Choudhry, A.; Cai, Y.; Li, S.; and Dubrawski, A. 2024. MOMENT : A Family of Open Time-Series Foundation Models. In International Conference on Machine Learning

2024

[6] [8]

Janner, M.; Fu, J.; Zhang, M.; and Levine, S. 2019. When to Trust Your Model: Model-Based Policy Optimization. In Advances in Neural Information Processing Systems

2019

[7] [9]

Kendall, A.; and Gal, Y. 2017. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? In Advances in Neural Information Processing Systems

2017

[8] [10]

Kocsis, L.; and Szepesvari, C. 2006. Bandit Based Monte-Carlo Planning. In Machine Learning: ECML 2006, 282--293

2006

[9] [11]

Kohankhaki, F.; Aghakasiri, K.; Zhang, H.; Wei, T.-H.; Gao, C.; and M \"u ller, M. 2024. Monte Carlo Tree Search in the Presence of Transition Uncertainty. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 20151--20158

2024

[10] [12]

Lakshminarayanan, B.; Pritzel, A.; and Blundell, C. 2017. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In Advances in Neural Information Processing Systems

2017

[11] [13]

O.; Loeff, N.; and Pfister, T

Lim, B.; Arik, S. O.; Loeff, N.; and Pfister, T. 2021. Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting. International Journal of Forecasting, 37(4): 1748--1764

2021

[12] [14]

Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; and Long, M. 2024. iTransformer : Inverted Transformers Are Effective for Time Series Forecasting. In International Conference on Learning Representations

2024

[13] [15]

Mandi, J.; Kotary, J.; Berden, S.; Mulamba, M.; Bucarey, V.; Guns, T.; and Fioretto, F. 2024. Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities. Journal of Artificial Intelligence Research, 81: 1623--1701

2024

[14] [16]

A.; Veness, J.; Bellemare, M

Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; et al. 2015. Human-level Control through Deep Reinforcement Learning. Nature, 518: 529--533

2015

[15] [17]

H.; Sinthong, P.; and Kalagnanam, J

Nie, Y.; Nguyen, N. H.; Sinthong, P.; and Kalagnanam, J. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In International Conference on Learning Representations

2023

[16] [18]

Oren, Y.; Vadocz, V.; Spaan, M. T. J.; and B \"o hmer, W. 2025. Epistemic Monte Carlo Tree Search. In International Conference on Learning Representations

2025

[17] [20]

Salinas, D.; Flunkert, V.; and Gasthaus, J. 2017. DeepAR : Probabilistic Forecasting with Autoregressive Recurrent Networks. arXiv preprint arXiv:1704.04110

work page internal anchor Pith review Pith/arXiv arXiv 2017

[18] [21]

Schrittwieser, J.; Antonoglou, I.; Hubert, T.; Simonyan, K.; Sifre, L.; Schmitt, S.; Guez, A.; Lockhart, E.; Hassabis, D.; Graepel, T.; Lillicrap, T.; and Silver, D. 2020. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. Nature, 588: 604--609

2020

[19] [23]

J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershel, V.; Lanctot, M.; et al

Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershel, V.; Lanctot, M.; et al. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 529: 484--489

2016

[20] [24]

Wang, D.; Shelhamer, E.; Liu, S.; Olshausen, B.; and Darrell, T. 2021. Tent: Fully Test-time Adaptation by Entropy Minimization. In International Conference on Learning Representations

2021

[21] [25]

Woo, G.; Liu, C.; Kumar, A.; Xiong, C.; Savarese, S.; and Sahoo, D. 2024. Unified Training of Universal Time Series Forecasting Transformers. In International Conference on Machine Learning

2024

[22] [26]

Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; and Long, M. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In International Conference on Learning Representations

2023

[23] [27]

L.; Cao, Y.; and Narasimhan, K

Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T. L.; Cao, Y.; and Narasimhan, K. 2023 a . Tree of Thoughts: Deliberate Problem Solving with Large Language Models. In Advances in Neural Information Processing Systems

2023

[24] [28]

Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; and Cao, Y. 2023 b . ReAct: Synergizing Reasoning and Acting in Language Models. In International Conference on Learning Representations

2023

[25] [30]

Zeng, A.; Chen, M.; Zhang, L.; and Xu, Q. 2023. Are Transformers Effective for Time Series Forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence

2023

[26] [31]

Zhang, Z.; Zohren, S.; and Roberts, S. 2018. BDLOB : Bayesian Deep Convolutional Neural Networks for Limit Order Books. In NeurIPS Workshop on Bayesian Deep Learning

2018

[27] [32]

Zhang, Z.; Zohren, S.; and Roberts, S. 2019. DeepLOB : Deep Convolutional Neural Networks for Limit Order Books. IEEE Transactions on Signal Processing, 67(11): 3001--3012

2019

[28] [33]

Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; and Zhang, W. 2021. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 11106--11115

2021

[29] [34]

International Conference on Learning Representations , year =

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers , author =. International Conference on Learning Representations , year =

[30] [35]

International Conference on Learning Representations , year =

TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis , author =. International Conference on Learning Representations , year =

[31] [36]

International Journal of Forecasting , volume =

Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting , author =. International Journal of Forecasting , volume =. 2021 , doi =

2021

[32] [37]

2017 , url =

Salinas, David and Flunkert, Valentin and Gasthaus, Jan , journal =. 2017 , url =

2017

[33] [38]

Proceedings of the AAAI Conference on Artificial Intelligence , year =

Are Transformers Effective for Time Series Forecasting? , author =. Proceedings of the AAAI Conference on Artificial Intelligence , year =

[34] [39]

2024 , url =

Liu, Yong and Hu, Tengge and Zhang, Haoran and Wu, Haixu and Wang, Shiyu and Ma, Lintao and Long, Mingsheng , booktitle =. 2024 , url =

2024

[35] [40]

International Conference on Machine Learning , year =

A Decoder-Only Foundation Model for Time-Series Forecasting , author =. International Conference on Machine Learning , year =

[36] [41]

Chronos: Learning the Language of Time Series

Chronos: Learning the Language of Time Series , author =. arXiv preprint arXiv:2403.07815 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[37] [42]

2024 , url =

Goswami, Mononito and Szafer, Konrad and Choudhry, Arjun and Cai, Yifu and Li, Shuo and Dubrawski, Artur , booktitle =. 2024 , url =

2024

[38] [43]

arXiv preprint arXiv:2310.08278 , year =

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting , author =. arXiv preprint arXiv:2310.08278 , year =

work page arXiv

[39] [44]

International Conference on Machine Learning , year =

Unified Training of Universal Time Series Forecasting Transformers , author =. International Conference on Machine Learning , year =

[40] [45]

Advances in Neural Information Processing Systems , year =

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , author =. Advances in Neural Information Processing Systems , year =

[41] [46]

Advances in Neural Information Processing Systems , year =

Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , author =. Advances in Neural Information Processing Systems , year =

[42] [47]

arXiv preprint arXiv:2202.07282 , year =

Adaptive Conformal Predictions for Time Series , author =. arXiv preprint arXiv:2202.07282 , year =

work page arXiv

[43] [48]

International Conference on Learning Representations , year =

Tent: Fully Test-time Adaptation by Entropy Minimization , author =. International Conference on Learning Representations , year =

[44] [49]

Nature , volume =

Human-level Control through Deep Reinforcement Learning , author =. Nature , volume =. 2015 , doi =

2015

[45] [50]

Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms , author =. arXiv preprint arXiv:1707.06347 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[46] [51]

Advances in Neural Information Processing Systems , year =

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , author =. Advances in Neural Information Processing Systems , year =

[47] [52]

Advances in Neural Information Processing Systems , year =

When to Trust Your Model: Model-Based Policy Optimization , author =. Advances in Neural Information Processing Systems , year =

[48] [53]

Nature , volume =

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model , author =. Nature , volume =. 2020 , doi =

2020

[49] [54]

Machine Learning: ECML 2006 , pages =

Bandit Based Monte-Carlo Planning , author =. Machine Learning: ECML 2006 , pages =. 2006 , doi =

2006

[50] [55]

Nature , volume =

Mastering the Game of Go with Deep Neural Networks and Tree Search , author =. Nature , volume =. 2016 , doi =

2016

[51] [56]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Monte Carlo Tree Search in the Presence of Transition Uncertainty , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2024 , doi =

2024

[52] [57]

Proceedings of the 42nd International Conference on Machine Learning , pages =

Monte-Carlo Tree Search with Uncertainty Propagation via Optimal Transport , author =. Proceedings of the 42nd International Conference on Machine Learning , pages =. 2025 , volume =

2025

[53] [58]

International Conference on Learning Representations , year =

Bayes Adaptive Monte Carlo Tree Search for Offline Model-Based Reinforcement Learning , author =. International Conference on Learning Representations , year =

[54] [59]

International Conference on Learning Representations , year =

Epistemic Monte Carlo Tree Search , author =. International Conference on Learning Representations , year =

[55] [60]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2021 , doi =

2021

[56] [61]

arXiv preprint arXiv:1710.08005 , year =

Smart ``Predict, then Optimize'' , author =. arXiv preprint arXiv:1710.08005 , year =

work page arXiv

[57] [62]

Journal of Artificial Intelligence Research , volume =

Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities , author =. Journal of Artificial Intelligence Research , volume =. 2024 , doi =

2024

[58] [63]

2019 , doi =

Zhang, Zihao and Zohren, Stefan and Roberts, Stephen , journal =. 2019 , doi =

2019

[59] [64]

2018 , url =

Zhang, Zihao and Zohren, Stefan and Roberts, Stephen , booktitle =. 2018 , url =

2018

[60] [65]

International Conference on Learning Representations , year =

ReAct: Synergizing Reasoning and Acting in Language Models , author =. International Conference on Learning Representations , year =

[61] [66]

Advances in Neural Information Processing Systems , year =

Tree of Thoughts: Deliberate Problem Solving with Large Language Models , author =. Advances in Neural Information Processing Systems , year =

[62] [67]

Journal of Forecasting , volume =

Benchmark Dataset for Mid-price Forecasting of Limit Order Book Data with Machine Learning Methods , author =. Journal of Forecasting , volume =. 2018 , doi =

2018