LIME formulates language-conditioned camera motion as predicting SE(3) target poses from RGB and intent text, using mined multi-intent supervision from egocentric video and a flow-matching pose head.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.RO 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Instant-Fold enables execution of multiple deformable object manipulation modes from a single demonstration via a flow-matching transformer policy that transfers zero-shot from simulation to real robots.
citing papers explorer
-
LIME: Learning Intent-aware Camera Motion from Egocentric Video
LIME formulates language-conditioned camera motion as predicting SE(3) target poses from RGB and intent text, using mined multi-intent supervision from egocentric video and a flow-matching pose head.
-
Instant-Fold: In-Context Imitation Learning for Deformable Object Manipulation
Instant-Fold enables execution of multiple deformable object manipulation modes from a single demonstration via a flow-matching transformer policy that transfers zero-shot from simulation to real robots.