3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS

· 2026 · cs.RO · arXiv 2604.11302

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

We present 3D-Anchored Lookahead Planning (3D-ALP), a System 2 reasoning engine for robotic manipulation that combines Monte Carlo Tree Search (MCTS) with a 3D-consistent world model as the rollout oracle. Unlike reactive policies that evaluate actions from the current camera frame only, 3D-ALP maintains a persistent camera-to-world (c2w) anchor that survives occlusion, enabling accurate replanning to object positions that are no longer directly observable. On a 5-step sequential reach task requiring spatial memory (Experiment E3), 3D-ALP achieves 0.650 0.109 success rate on memory-required steps versus 0.006 0.008 for a greedy reactive baseline ({\Delta}=+0.645), while step 5 success reaches 0.822 against 0.000 for greedy. An ablation study (30 episodes, 3 seeds) isolates tree search spatial memory as the primary driver (+0.533, 82% of gain) with additional benefit from deeper lookahead (+0.111, 17%). We also identify and resolve four structural failure modes in applying UCT-MCTS (Upper Confidence Bounds applied to Trees [10]) to continuous robotic manipulation.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models

cs.RO · 2026-05-28 · unverdicted · novelty 5.0

VLA-Pro improves cross-task generalization in vision-language-action models by storing task-specific LoRA adapters as procedural memories and retrieving/fusing them at inference.

World Action Models: A Survey

cs.RO · 2026-06-18 · unverdicted · novelty 3.0

A survey that clarifies boundaries and organizes World Action Models by generation requirements and predictive substrates, identifying a trend toward generating less of the future.

citing papers explorer

Showing 2 of 2 citing papers after filters.

VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models cs.RO · 2026-05-28 · unverdicted · none · ref 36 · internal anchor
VLA-Pro improves cross-task generalization in vision-language-action models by storing task-specific LoRA adapters as procedural memories and retrieving/fusing them at inference.
World Action Models: A Survey cs.RO · 2026-06-18 · unverdicted · none · ref 142 · internal anchor
A survey that clarifies boundaries and organizes World Action Models by generation requirements and predictive substrates, identifying a trend toward generating less of the future.

3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer