{"total":11,"items":[{"citing_arxiv_id":"2606.18375","ref_index":60,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation","primary_cat":"cs.RO","submitted_at":"2026-06-16T18:23:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PAIWorld adds explicit geometric cross-view mechanisms and 3D distillation to DiT world models to achieve multi-view 3D consistency in robotic manipulation benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.16533","ref_index":165,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Kairos: A Native World Model Stack for Physical AI","primary_cat":"cs.AI","submitted_at":"2026-06-15T10:37:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Kairos is a native world model stack using cross-embodiment pretraining, hybrid linear temporal attention with theoretical error bounds, and deployment-aware co-design, reporting top performance on embodied benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.13674","ref_index":25,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"RepWAM: World Action Modeling with Representation Visual-Action Tokenizers","primary_cat":"cs.CV","submitted_at":"2026-06-11T17:59:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RepWAM introduces representation visual-action tokenizers to pretrain world action models that jointly model future visual states and latent actions under instructions for improved robot manipulation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12217","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Making Foresight Actionable: Repurposing Representation Alignment in World Action Models","primary_cat":"cs.CV","submitted_at":"2026-06-10T15:31:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AGRA is an Action-Grounded Representation Alignment objective that aligns intermediate video diffusion features with semantic representations to make world action model hidden states more useful for low-level robot control, improving localization, affordance, and robustness.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.10363","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"HiMem-WAM: Hierarchical Memory-Gated World Action Models for Robotic Manipulation","primary_cat":"cs.RO","submitted_at":"2026-06-09T03:22:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"HiMem-WAM integrates hierarchical latent actions and boundary-aware memory gates into world action models to enhance robustness and performance on memory-dependent long-horizon robotic tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09457","ref_index":32,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"$\\omega$-EVA: Envision, Verify, and Act with Latent Interactive World Models","primary_cat":"cs.RO","submitted_at":"2026-06-08T13:12:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ω-EVA is a three-stage latent world model framework that trains action-conditioned dynamics, a language-conditioned flow policy, and a tri-branch refiner to improve embodied action generation in simulation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04233","ref_index":54,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"What Are We Actually Benchmarking in Robot Manipulation?","primary_cat":"cs.RO","submitted_at":"2026-06-02T21:33:28+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LIBERO and CALVIN fail multiple proposed diagnostics for shortcut solvability, statistical significance, overfitting, and data dependence, while a tiny 0.09B probe reaches near-SOTA on LIBERO.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01955","ref_index":69,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"WALL-WM: Carving World Action Modeling at the Event Joints","primary_cat":"cs.RO","submitted_at":"2026-06-01T09:14:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"WALL-WM introduces event-grounded Vision-Language-Action pretraining that uses semantic events as the atomic unit to address granularity mismatch in world action models and reports state-of-the-art generalization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28132","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models","primary_cat":"cs.CV","submitted_at":"2026-05-27T08:20:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"VLMs excel at semantic and grouping tasks while VGMs are stronger on dense geometry and camera motion, with naive fusion yielding balanced representations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27947","ref_index":30,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SANTS: A State-Adaptive Scheduler for World Action Models","primary_cat":"cs.RO","submitted_at":"2026-05-27T04:40:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SANTS adaptively chooses denoising depth in video-based robot action diffusion policies using a state-dependent stopping hazard and noise ratio, trained via downstream action reward to reduce latency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12090","ref_index":111,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"World Action Models: The Next Frontier in Embodied AI","primary_cat":"cs.RO","submitted_at":"2026-05-12T13:10:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"P AD [21], VideoVLA [94], UWM [20], DreamZero [ 17], CosmosPolicy [16], FLARE [95], UV A [96] FRAPPE [97], CoV AR [98], LDA1B [99], W A V [100], DUST [101], LingBotV A [18], AIM [ 102] DexWorldModel [103], FastW AM [104], MotuBrain [105] AdaWorldPolicy [106], DiT4DiT [107], Motus [19], Act2Goal [108], PhysGen [22], GigaWorld-Policy [109], UD-VLA [110], X-W AM [111] Training data Robot-centric Teleoperation QT-Opt [112], MIME [ 113], RoboNet [114], Robo T urk-Real [115], BridgeData [116], MT-Opt [117] BC-Z [118], RT-1 [119], Language-Table [120], BridgeData v2 [ 121], Jaco Play [ 122] Cable Routing Dataset [ 123], RH20T [124], OXE [125], DROID [126], RH20T-P [127], RoboMIND [128] ARIO [129], RoboData [130], DexCap [131], FuSe [132], AgiBot World [133], REASSEMBLE [ 134]"}],"limit":50,"offset":0}