pith. sign in

arxiv: 2605.25740 · v1 · pith:MPKDLZVJnew · submitted 2026-05-25 · 💻 cs.LG

Latent Representation Alignment for Offline Goal-Conditioned Reinforcement Learning

classification 💻 cs.LG
keywords valuegoal-conditionedlavllearningofflinedatasetsgcrlperformance
0
0 comments X
read the original abstract

Offline goal-conditioned reinforcement learning (GCRL) provides a practical framework for obtaining goal-reaching policies from fixed datasets. However, learning a reliable goal-conditioned value function in long-horizon tasks remains challenging. In this paper, we identify erroneous generalization in goal-conditioned value functions as a fundamental bottleneck, and demonstrate that appropriate inductive bias in the value function is crucial for addressing the bottleneck. Building on these findings, we propose Latent-Aligned Value Learning (LAVL), an offline GCRL algorithm that integrates latent-representation-based value generalization with hierarchical planning in a unified framework. Extensive experiments on OGBench demonstrate that LAVL consistently outperforms existing offline GCRL methods, achieving the highest performance on 20 out of 22 datasets. Notably, LAVL exhibits strong performance in long-horizon tasks and trajectory stitching datasets, where prior methods suffer significant performance degradation. Our code is available at https://github.com/oh-lab/LAVL.git.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.