ReGuard discovers network scenarios where RL controllers perform 43-64% worse than achievable and reduces those gaps by 79-85% with lightweight rule-based protection that preserves normal performance.
Stabilizing off-policy q-learning via bootstrapping error reduction.Advances in neural information processing systems, 32, 2019
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Target-Aligned Bellman Backup (TABB) improves cross-domain offline RL by selecting source transitions according to their contribution to accurate target-domain Bellman target estimation.
Injecting RTG into states outside the autoregressive sequence yields shorter, more efficient Decision Transformers that outperform the original on offline RL tasks.
citing papers explorer
-
Worst-Case Discovery and Runtime Protection for RL-Based Network Controllers
ReGuard discovers network scenarios where RL controllers perform 43-64% worse than achievable and reduces those gaps by 79-85% with lightweight rule-based protection that preserves normal performance.
-
Target-Aligned Bellman Backup for Cross-domain Offline Reinforcement Learning
Target-Aligned Bellman Backup (TABB) improves cross-domain offline RL by selecting source transitions according to their contribution to accurate target-domain Bellman target estimation.
-
Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer
Injecting RTG into states outside the autoregressive sequence yields shorter, more efficient Decision Transformers that outperform the original on offline RL tasks.