A vision-language framework generates text-based rigid-body scene configurations from videos using motion reasoning and optical flow, reporting 0.30 IoU on CLEVRER (7x over baselines) and transfer to 235 real videos.
gradsim: Differentiable simulation for sys- tem identification and visuomotor control
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
DRIS improves zero-shot sim-to-real transfer for reactive catching by maintaining and acting on sets of randomized dynamics instances instead of single instances per episode.
Embodied AI requires query-conditioned world models that select the simplest physical abstraction sufficient to answer intervention queries.
The authors develop a differentiable simulator enforcing Markovian dynamics on a position-velocity manifold and using a mass-aligned preconditioner with a soft Fischer-Burmeister operator to produce stable gradients for frictional contact in large-deformation scenarios.
citing papers explorer
-
Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching
DRIS improves zero-shot sim-to-real transfer for reactive catching by maintaining and acting on sets of randomized dynamics instances instead of single instances per episode.