Occupancy Reward Shaping extracts goal-reaching rewards from world-model occupancy measures using optimal transport, improving offline goal-conditioned RL performance 2.2x on 13 tasks without changing the optimal policy.
So we can lower bound the entire difference: V π∗ W (s1, g)−V π∗ W (s2, g)≥γ k−1 ·0 + k−2X t=0 γt(1−γ)∆ Φ = (1−γ)∆ Φ k−2X t=0 γt = (1−γ)∆ Φ 1−γ k−1 1−γ = (1−γ k−1)∆Φ
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning
Occupancy Reward Shaping extracts goal-reaching rewards from world-model occupancy measures using optimal transport, improving offline goal-conditioned RL performance 2.2x on 13 tasks without changing the optimal policy.