pith. machine review for the scientific record. sign in

arxiv: 2601.03136 · v2 · submitted 2026-01-06 · 💻 cs.CL · cs.AI· cs.RO

Recognition: unknown

Limited Linguistic Diversity in Embodied AI Datasets

Authors on Pith no claims yet
classification 💻 cs.CL cs.AIcs.RO
keywords datasetslanguagedatasetlinguisticinstructionlimitedusedvariety
0
0 comments X
read the original abstract

Language plays a critical role in Vision-Language-Action (VLA) models, yet the linguistic characteristics of the datasets used to train and evaluate these systems remain poorly documented. In this work, we present a systematic dataset audit of several widely used VLA corpora, aiming to characterize what kinds of instructions these datasets actually contain and how much linguistic variety they provide. We quantify instruction language along complementary dimensions--including lexical variety, duplication and overlap, semantic similarity, and syntactic complexity. Our analysis shows that many datasets rely on highly repetitive, template-like commands with limited structural variation, yielding a narrow distribution of instruction forms. We position these findings as descriptive documentation of the language signal available in current VLA training and evaluation data, intended to support more detailed dataset reporting, more principled dataset selection, and targeted curation or augmentation strategies that broaden language coverage.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control

    cs.RO 2026-02 unverdicted novelty 6.0

    Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and ge...