Inverse dynamics prediction is added as an auxiliary task to reduce state aliasing in VLA models by directly supervising the vision encoder on action-relevant visual distinctions using only standard observation-action pairs.
LM4LV: A frozen large language model for low-level vision tasks.arXiv preprint arXiv:2405.15734, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Mitigating State Aliasing in Vision-Language-Action Models via Inverse Dynamics Learning
Inverse dynamics prediction is added as an auxiliary task to reduce state aliasing in VLA models by directly supervising the vision encoder on action-relevant visual distinctions using only standard observation-action pairs.