So we can lower bound the entire difference: V π∗ W (s1, g)−V π∗ W (s2, g)≥γ k−1 ·0 + k−2X t=0 γt(1−γ)∆ Φ = (1−γ)∆ Φ k−2X t=0 γt = (1−γ)∆ Φ 1−γ k−1 1−γ = (1−γ k−1)∆Φ

We knowM πD ≥0 · 2026

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning

cs.LG · 2026-04-22 · conditional · novelty 6.0

Occupancy Reward Shaping extracts goal-reaching rewards from world-model occupancy measures using optimal transport, improving offline goal-conditioned RL performance 2.2x on 13 tasks without changing the optimal policy.

citing papers explorer

Showing 1 of 1 citing paper.

Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning cs.LG · 2026-04-22 · conditional · none · ref 32
Occupancy Reward Shaping extracts goal-reaching rewards from world-model occupancy measures using optimal transport, improving offline goal-conditioned RL performance 2.2x on 13 tasks without changing the optimal policy.

So we can lower bound the entire difference: V π∗ W (s1, g)−V π∗ W (s2, g)≥γ k−1 ·0 + k−2X t=0 γt(1−γ)∆ Φ = (1−γ)∆ Φ k−2X t=0 γt = (1−γ)∆ Φ 1−γ k−1 1−γ = (1−γ k−1)∆Φ

fields

years

verdicts

representative citing papers

citing papers explorer