DIRECT is a multimodal-context router that allocates test-time compute across chain-of-thought depth, model size, and memory history for VLM embodied planners, improving the success-cost Pareto frontier and matching stronger models at up to 65% lower latency on benchmarks and a physical Franka arm.
Fast ecot: Efficient embodied chain-of-thought via thoughts reuse
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
VISUALTHINK-VLA uses visual evidence tokens and selective routing to reach top success rates on VLA benchmarks while cutting reasoning latency from multi-second to sub-second levels.
OBEYED-VLA improves VLA robustness in cluttered real-world manipulation by disentangling perception into VLM-based object-centric grounding and geometry-aware stages, then fine-tuning the policy only on single-object demonstrations.
VLA benchmark success rates cannot distinguish semantic generalization from physical reasoning due to an identifiability gap in current evaluation protocols.
ResDreamer proposes a residual-reconstruction hierarchical world model for purely self-supervised visual foresight that claims SOTA sample and parameter efficiency in open-world RL.
REIS reduces inference redundancy in embodied robotic planning via lightweight gating and routing while preserving task performance on ALFRED and real robots.
citing papers explorer
-
Self-supervised Hierarchical Visual Reasoning with World Model
ResDreamer proposes a residual-reconstruction hierarchical world model for purely self-supervised visual foresight that claims SOTA sample and parameter efficiency in open-world RL.