We define a stationary behavior policy ¯πb(a|s) :=p (a) s /ps, where ps :=P a′∈A p(a′) s is the sum of the ergodic occupation measure by actions

B Auxiliary Lemmas for Theorem 1 B · 2026

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Model-based Bootstrap of Controlled Markov Chains

stat.ML · 2026-05-12 · unverdicted · novelty 6.0

A model-based bootstrap achieves distributional consistency for transition estimators in controlled Markov chains with unknown policies and yields asymptotically valid confidence intervals for offline policy evaluation and optimal policy recovery.

citing papers explorer

Showing 1 of 1 citing paper.

Model-based Bootstrap of Controlled Markov Chains stat.ML · 2026-05-12 · unverdicted · none · ref 8
A model-based bootstrap achieves distributional consistency for transition estimators in controlled Markov chains with unknown policies and yields asymptotically valid confidence intervals for offline policy evaluation and optimal policy recovery.

We define a stationary behavior policy ¯πb(a|s) :=p (a) s /ps, where ps :=P a′∈A p(a′) s is the sum of the ergodic occupation measure by actions

fields

years

verdicts

representative citing papers

citing papers explorer