DIBS decouples task policy learning via RL from evolution function learning via behavioral cloning to achieve more stable training and better generalization than prior RL and meta-RL methods for inductive generalization from specifications.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Framework using parameterized Signal Temporal Logic specifications to shape rewards for PPO-based RL, yielding tighter velocity tracking and more stable training than hand-crafted rewards on Barkour quadruped in MuJoCo simulation.
Iterative refinement of unknown MDP parameters allows repeated satisfaction of PAC conditions, yielding asymptotic optimality for reachability specifications in RL.
citing papers explorer
-
Decoupled Behavioral Cloning for Scalable Inductive Generalization in RL from Specifications
DIBS decouples task policy learning via RL from evolution function learning via behavioral cloning to achieve more stable training and better generalization than prior RL and meta-RL methods for inductive generalization from specifications.
-
Learning Gait-Aware Quadruped Locomotion with Temporal Logic Specifications
Framework using parameterized Signal Temporal Logic specifications to shape rewards for PPO-based RL, yielding tighter velocity tracking and more stable training than hand-crafted rewards on Barkour quadruped in MuJoCo simulation.
-
Reinforcement Learning for Reachability: Guaranteeing Asymptotic Optimality
Iterative refinement of unknown MDP parameters allows repeated satisfaction of PAC conditions, yielding asymptotic optimality for reachability specifications in RL.