Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.
Learning to reach goals via iterated supervised learning
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6verdicts
UNVERDICTED 6representative citing papers
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
RCD steers compositional diffusion sampling toward high-density coherent plans by combining reconstruction-error guidance with overlap consistency, outperforming prior methods on locomotion, manipulation, and pixel-based long-horizon tasks.
GCImOpt trains compact goal-conditioned neural policies by imitating efficiently generated optimal trajectories, achieving high success rates and near-optimal performance on cart-pole, quadcopter, and robot arm tasks while running thousands of times faster than optimization solvers.
CGCL progressively trains LLMs to generate Toulmin-structured clinical diagnostic arguments across three curriculum stages, achieving accuracy and reasoning quality comparable to RL methods with improved stability and efficiency.
Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.
citing papers explorer
-
Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning
Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.
-
Predictive but Not Plannable: RC-aux for Latent World Models
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
-
Refining Compositional Diffusion for Reliable Long-Horizon Planning
RCD steers compositional diffusion sampling toward high-density coherent plans by combining reconstruction-error guidance with overlap consistency, outperforming prior methods on locomotion, manipulation, and pixel-based long-horizon tasks.
-
GCImOpt: Learning efficient goal-conditioned policies by imitating optimal trajectories
GCImOpt trains compact goal-conditioned neural policies by imitating efficiently generated optimal trajectories, achieving high success rates and near-optimal performance on cart-pole, quadcopter, and robot arm tasks while running thousands of times faster than optimization solvers.
-
From Answers to Arguments: Toward Trustworthy Clinical Diagnostic Reasoning with Toulmin-Guided Curriculum Goal-Conditioned Learning
CGCL progressively trains LLMs to generate Toulmin-structured clinical diagnostic arguments across three curriculum stages, achieving accuracy and reasoning quality comparable to RL methods with improved stability and efficiency.
-
Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning
Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.