SkillNav decomposes VLN into skill-specific agents trained on synthetic data and routed by a VLM, achieving competitive benchmark results and SOTA generalization on GSA-R2R.
32- If the current image shows the agent at or beyond the target of a sub-instruction, that step can be considered completed
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents
SkillNav decomposes VLN into skill-specific agents trained on synthetic data and routed by a VLM, achieving competitive benchmark results and SOTA generalization on GSA-R2R.