A model-based bootstrap achieves distributional consistency for transition estimators in controlled Markov chains with unknown policies and yields asymptotically valid confidence intervals for offline policy evaluation and optimal policy recovery.
We define a stationary behavior policy ¯πb(a|s) :=p (a) s /ps, where ps :=P a′∈A p(a′) s is the sum of the ergodic occupation measure by actions
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ML 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Model-based Bootstrap of Controlled Markov Chains
A model-based bootstrap achieves distributional consistency for transition estimators in controlled Markov chains with unknown policies and yields asymptotically valid confidence intervals for offline policy evaluation and optimal policy recovery.