Formalizes counterfactual individual harm in RL and introduces a two-stage policy learning method with finite-sample guarantees on sub-optimality gap and harm rate control.
Worst-case aware policy optimization for robust rein- forcement learning.arXiv preprint arXiv:2002.08033,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ML 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Counterfactually Safe Reinforcement Learning
Formalizes counterfactual individual harm in RL and introduces a two-stage policy learning method with finite-sample guarantees on sub-optimality gap and harm rate control.