Occupancy Reward Shaping extracts goal-reaching rewards from world-model occupancy measures using optimal transport, improving offline goal-conditioned RL performance 2.2x on 13 tasks without changing the optimal policy.
Reward models in deep reinforcement learning: A survey.arXiv preprint arXiv:2506.15421
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
MORL with augmented states for non-linear utilities requires ongoing reward signal access post-deployment.
citing papers explorer
-
Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning
Occupancy Reward Shaping extracts goal-reaching rewards from world-model occupancy measures using optimal transport, improving offline goal-conditioned RL performance 2.2x on 13 tasks without changing the optimal policy.
-
Multi-objective Reinforcement Learning With Augmented States Requires Rewards After Deployment
MORL with augmented states for non-linear utilities requires ongoing reward signal access post-deployment.
- D-VLA: A High-Concurrency Distributed Asynchronous Reinforcement Learning Framework for Vision-Language-Action Models