pith. machine review for the scientific record. sign in

arxiv: 2506.10137 · v3 · submitted 2025-06-11 · 💻 cs.LG · cs.AI

Recognition: unknown

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

Authors on Pith no claims yet
classification 💻 cs.LG cs.AI
keywords generalizationrepresentationscombinatorialrepresentationtaskscloningconsistencygcbc
0
0 comments X
read the original abstract

While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally correlated states are properly encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. We formalize this notion by demonstrating how encouraging long-range temporal consistency via successor representations (SR) can facilitate generalization. We then propose a simple yet effective representation learning objective, $\text{BYOL-}\gamma$ for GCBC, which theoretically approximates the successor representation in the finite MDP case through self-predictive representations, and achieves competitive empirical performance across a suite of challenging tasks requiring combinatorial generalization.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 6.0

    Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.

  2. Improving Zero-Shot Offline RL via Behavioral Task Sampling

    cs.AI 2026-04 unverdicted novelty 6.0

    Extracting task vectors from the offline dataset for policy training improves zero-shot offline RL performance by an average of 20% over random sampling baselines.