Mollified Value Learning

· 2026 · cs.LG · arXiv 2602.23280

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Offline goal-conditioned reinforcement learning (GCRL) learns goal-reaching behaviors from static datasets, but accurate value estimation remains challenging under limited state-action coverage. Existing physics-informed approaches address this by imposing pointwise distance-like geometric constraints derived from Hamilton--Jacobi--Bellman (HJB) optimality principles, often through first-order partial differential equations such as the Eikonal equation. However, enforcing local consistency through explicit differential structure can become unstable in complex, high-dimensional environments. Our key insight is to instead reinterpret distance-like constraints as an expectation over a local spatial measure. By aggregating constraints over this measure rather than evaluating them pointwise, the objective acts as a spatial mollifier, inducing distance-like value geometry without requiring expensive differential operators. We refer to this as Mollified Value Learning (MVL). Experiments across navigation and high-dimensional robotic manipulation tasks show that MVL learns structured, value representations, improving goal-reaching performance, when used with implicit value representation learning methods. Open-source codes are available at https://github.com/HrishikeshVish/MVL.

representative citing papers

Physics-informed Goal-Conditioned Reinforcement Learning under Hybrid Contact Dynamics

cs.RO · 2026-05-28 · unverdicted · novelty 5.0

Analysis reveals Pi-GCRL degradation in contact-rich tasks due to hybrid dynamics; contact-aware and hierarchical formulations are proposed to extend it to manipulation.

citing papers explorer

Showing 1 of 1 citing paper.

Physics-informed Goal-Conditioned Reinforcement Learning under Hybrid Contact Dynamics cs.RO · 2026-05-28 · unverdicted · none · ref 34 · internal anchor
Analysis reveals Pi-GCRL degradation in contact-rich tasks due to hybrid dynamics; contact-aware and hierarchical formulations are proposed to extend it to manipulation.

Mollified Value Learning

fields

years

verdicts

representative citing papers

citing papers explorer