Test-Time Graph Search for Goal-Conditioned Reinforcement Learning

Claas Voelcker; Evgenii Opryshko; Igor Gilitschenski; Junwei Quan; Yilun Du

arxiv: 2510.07257 · v2 · pith:ZLVRPNYBnew · submitted 2025-10-08 · 💻 cs.LG

Test-Time Graph Search for Goal-Conditioned Reinforcement Learning

Evgenii Opryshko , Junwei Quan , Claas Voelcker , Yilun Du , Igor Gilitschenski This is my paper

classification 💻 cs.LG

keywords tasksgoal-conditionedgraphlong-horizonplanningsearchttgsvalue

0 comments

read the original abstract

Offline goal-conditioned reinforcement learning (GCRL) often struggles with long-horizon tasks, where errors in value estimation accumulate and produce unreliable policies. It is typically assumed that effective long-term planning is infeasible without specialized training. In contrast, our work demonstrates that existing GCRL policies can complete long-horizon tasks when combined with a lightweight, training-free planning wrapper. We find that standard goal-conditioned value functions encode locally consistent geometric structure sufficient for planning. Our approach, Test-Time Graph Search (TTGS), constructs a graph over the offline dataset and employs an adaptive subgoal selection strategy. To address unreliable value estimates during shortest-path search, we propose a novel mechanism that softly penalizes long-distance transitions. Our method incurs negligible computational overhead and requires no additional supervision or parameter updates. On the OGBench benchmark, TTGS significantly boosts success rates across multiple base learners and tasks, with primary gains on challenging long-horizon locomotion tasks where some success rates are improved from near-zero to over 90\%, often matching or outperforming methods that require complex auxiliary training. Code and videos can be found at https://ktolnos.github.io/ttgs.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Refining Compositional Diffusion for Reliable Long-Horizon Planning
cs.RO 2026-05 unverdicted novelty 6.0

RCD steers compositional diffusion sampling toward high-density coherent plans by combining reconstruction-error guidance with overlap consistency, outperforming prior methods on locomotion, manipulation, and pixel-ba...
SAGAS: Semantic-Aware Graph-Assisted Stitching for Offline Temporal Logic Planning
cs.RO 2025-11 unverdicted novelty 6.0

SAGAS stitches learned reachability graphs from fragmented offline data with symbolic Buchi search to produce cost-aware plans for unseen LTL tasks executed by a frozen controller.