A vision-language framework generates text-based rigid-body scene configurations from videos using motion reasoning and optical flow, reporting 0.30 IoU on CLEVRER (7x over baselines) and transfer to 235 real videos.
gradsim: Differentiable simulation for sys- tem identification and visuomotor control
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
DRIS improves zero-shot sim-to-real transfer for reactive catching by maintaining and acting on sets of randomized dynamics instances instead of single instances per episode.
Embodied AI requires query-conditioned world models that select the simplest physical abstraction sufficient to answer intervention queries.
The authors develop a differentiable simulator enforcing Markovian dynamics on a position-velocity manifold and using a mass-aligned preconditioner with a soft Fischer-Burmeister operator to produce stable gradients for frictional contact in large-deformation scenarios.
citing papers explorer
-
Physically Viable World Models: A Case for Query-Conditioned Embodied AI
Embodied AI requires query-conditioned world models that select the simplest physical abstraction sufficient to answer intervention queries.