TRIRL enables explicit dual-ascent IRL via trust-region local policy updates that guarantee monotonic improvement without full RL solves per iteration, outperforming prior imitation methods by 2.4x aggregate IQM and recovering generalizable rewards.
arXiv preprint arXiv:1611.03852 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
SeqLight maps music to multi-light HSV control via SkipBART for global color prediction followed by hybrid imitation learning in a goal-conditioned MDP to decompose colors across lights.
Maximum entropy reinforcement learning is equivalent to exact probabilistic inference for deterministic dynamics and variational inference for stochastic dynamics.
citing papers explorer
-
Trust Region Inverse Reinforcement Learning: Explicit Dual Ascent using Local Policy Updates
TRIRL enables explicit dual-ascent IRL via trust-region local policy updates that guarantee monotonic improvement without full RL solves per iteration, outperforming prior imitation methods by 2.4x aggregate IQM and recovering generalizable rewards.
-
Stage Light is Sequence$^2$: Multi-Light Control via Imitation Learning
SeqLight maps music to multi-light HSV control via SkipBART for global color prediction followed by hybrid imitation learning in a goal-conditioned MDP to decompose colors across lights.
-
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review
Maximum entropy reinforcement learning is equivalent to exact probabilistic inference for deterministic dynamics and variational inference for stochastic dynamics.