Mixed citations

Learning latent action world models in the wild

· 2026 · arXiv 2601.05230

Mixed citation behavior. Most common role is background (60%).

12 Pith papers citing it

Background 60% of classified citations

read on arXiv browse 12 citing papers

citation-role summary

background 5

citation-polarity summary

background 3 unclear 2

representative citing papers

Latent Actions from Factorized Transition Effects under Agent Ambiguity

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

OTF decomposes transitions into reusable primitives to form action-like latents in OTF-LAM and OTF-LAM-Dino, enabling zeroshot transfer and competitive policy learning under visual ambiguity.

Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

cs.CV · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.

Latent State Design for World Models under Sufficiency Constraints

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

cs.RO · 2026-02-06 · unverdicted · novelty 7.0

DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.

CLAW: Learning Continuous Latent Action World Models via Adversarial Latent Regularization

cs.RO · 2026-06-02 · unverdicted · novelty 6.0

CLAW is an end-to-end self-supervised method that learns semantically meaningful continuous latent actions and predictive world models from action-free videos to support imitation learning and goal-directed planning.

DiLA: Disentangled Latent Action World Models

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

DiLA uses content-structure disentanglement driven by predictive bottlenecks to create semantically structured latent actions for high-fidelity video world models.

SCAR: Self-Supervised Continuous Action Representation Learning

cs.RO · 2026-05-13 · unverdicted · novelty 6.0

SCAR proposes a joint inverse-forward dynamics framework to learn transferable continuous action representations across embodiments from visual data using regularization and adversarial invariance.

Why Latent Actions Fail, and How to Prevent It

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Extending linear LAMs to model exogenous state shows standard reconstruction encodes future exogenous info in latent actions, while endogenous-focused spaces and auxiliary objectives like action-supervision enforce consistency across noise.

GazeVLA: Learning Human Intention for Robotic Manipulation

cs.RO · 2026-04-24 · unverdicted · novelty 6.0

GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.

Human Cognition in Machines: A Unified Perspective of World Models

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

Emergent Semantic Representations in World Models through Physical Interaction without Linguistic Supervision

cs.LG · 2026-05-22 · unverdicted · novelty 5.0

VAE world model trained on embodied exploration develops latent representations aligned with physical geometry, with metrics improving together and collapsing together under high KL regularization.

PhyWorld: Physics-Faithful World Model for Video Generation

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

PhyWorld improves temporal consistency and physical plausibility in video world models via flow matching fine-tuning followed by DPO on physics preference pairs, with reported gains on VBench and a custom physical-faithfulness benchmark.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Emergent Semantic Representations in World Models through Physical Interaction without Linguistic Supervision cs.LG · 2026-05-22 · unverdicted · none · ref 9
VAE world model trained on embodied exploration develops latent representations aligned with physical geometry, with metrics improving together and collapsing together under high KL regularization.

Learning latent action world models in the wild

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer