SafeManip is a new benchmark that applies LTLf monitors to assess temporal safety properties across eight categories in robotic manipulation, demonstrating that task success frequently fails to ensure safe execution in vision-language-action policies.
Robocasa365: A large-scale simulation framework for training and benchmarking generalist robots
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
MemCompiler introduces state-conditioned memory compilation that dynamically selects and compiles relevant memory into text and latent guidance, yielding up to 129% gains over no-memory baselines and 60% lower latency across multiple embodied benchmarks.
Proposes a levels x laws taxonomy for world models in AI agents, defining L1-L3 capabilities across physical, digital, social, and scientific regimes while reviewing over 400 works to outline a roadmap for advanced agentic modeling.
RLDX-1 outperforms frontier VLAs such as π0.5 and GR00T N1.6 on dexterous manipulation benchmarks, reaching 86.8% success on ALLEX humanoid tasks versus around 40% for the baselines.
citing papers explorer
-
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
Proposes a levels x laws taxonomy for world models in AI agents, defining L1-L3 capabilities across physical, digital, social, and scientific regimes while reviewing over 400 works to outline a roadmap for advanced agentic modeling.