RHC-UCRL is the first algorithm for safety-constrained RL under explicit adversarial dynamics, providing sub-linear regret and constraint violation guarantees by maintaining optimism over both agent and adversary policies.
Robust dynamic programming
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A nested dynamic program using the Regret-Bellman operator computes regret-optimal policies that interpolate between MDP and robust controllers for finite-state systems.
citing papers explorer
-
Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees
RHC-UCRL is the first algorithm for safety-constrained RL under explicit adversarial dynamics, providing sub-linear regret and constraint violation guarantees by maintaining optimism over both agent and adversary policies.
-
Regret-Optimal Control for Finite-State Systems
A nested dynamic program using the Regret-Bellman operator computes regret-optimal policies that interpolate between MDP and robust controllers for finite-state systems.