A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
Deep hierarchical planning from pixels
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
LLMs form an inner monologue from closed-loop language feedback to improve high-level instruction completion in simulated and real robotic rearrangement and kitchen manipulation tasks.
Disentangled World Models transfer semantic knowledge from distracting videos to RL world models via offline pretraining and latent distillation to improve sample efficiency under visual variations.
citing papers explorer
-
Zero-shot World Models Are Developmentally Efficient Learners
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
-
Inner Monologue: Embodied Reasoning through Planning with Language Models
LLMs form an inner monologue from closed-loop language feedback to improve high-level instruction completion in simulated and real robotic rearrangement and kitchen manipulation tasks.
-
Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning
Disentangled World Models transfer semantic knowledge from distracting videos to RL world models via offline pretraining and latent distillation to improve sample efficiency under visual variations.
- Learning to Theorize the World from Observation