Safe-Support Q-Learning trains Q-functions and policies in reinforcement learning without ever visiting unsafe states by constraining the behavior policy to a safe set and using KL-regularized Bellman targets in a two-stage framework.
[3]Boyd, S., and V andenberghe, L.Convex optimization
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Safe-Support Q-Learning: Learning without Unsafe Exploration
Safe-Support Q-Learning trains Q-functions and policies in reinforcement learning without ever visiting unsafe states by constraining the behavior policy to a safe set and using KL-regularized Bellman targets in a two-stage framework.