Title resolution pending

· 1909 · arXiv 1909.11591

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Decoupled Behavioral Cloning for Scalable Inductive Generalization in RL from Specifications

cs.AI · 2026-05-30 · unverdicted · novelty 6.0

DIBS decouples task policy learning via RL from evolution function learning via behavioral cloning to achieve more stable training and better generalization than prior RL and meta-RL methods for inductive generalization from specifications.

Learning Gait-Aware Quadruped Locomotion with Temporal Logic Specifications

cs.RO · 2026-07-01 · unverdicted · novelty 5.0

Framework using parameterized Signal Temporal Logic specifications to shape rewards for PPO-based RL, yielding tighter velocity tracking and more stable training than hand-crafted rewards on Barkour quadruped in MuJoCo simulation.

Reinforcement Learning for Reachability: Guaranteeing Asymptotic Optimality

cs.LG · 2026-05-23 · unverdicted · novelty 4.0

Iterative refinement of unknown MDP parameters allows repeated satisfaction of PAC conditions, yielding asymptotic optimality for reachability specifications in RL.

citing papers explorer

Showing 3 of 3 citing papers.

Decoupled Behavioral Cloning for Scalable Inductive Generalization in RL from Specifications cs.AI · 2026-05-30 · unverdicted · none · ref 40
DIBS decouples task policy learning via RL from evolution function learning via behavioral cloning to achieve more stable training and better generalization than prior RL and meta-RL methods for inductive generalization from specifications.
Learning Gait-Aware Quadruped Locomotion with Temporal Logic Specifications cs.RO · 2026-07-01 · unverdicted · none · ref 23
Framework using parameterized Signal Temporal Logic specifications to shape rewards for PPO-based RL, yielding tighter velocity tracking and more stable training than hand-crafted rewards on Barkour quadruped in MuJoCo simulation.
Reinforcement Learning for Reachability: Guaranteeing Asymptotic Optimality cs.LG · 2026-05-23 · unverdicted · none · ref 7
Iterative refinement of unknown MDP parameters allows repeated satisfaction of PAC conditions, yielding asymptotic optimality for reachability specifications in RL.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer