PreferenceASR is a preference-aware ASR test set built from seven corpora that shows model rankings change when user output-style instructions are considered.
Preference-ASR: A Preference-Aware Test Set for Benchmarking ASR in the Era of Speech LLMs
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Popular ASR test sets adopt inconsistent conventions for numbers, disfluencies, entities, and casing, while standard normalizers erase the format distinctions users care about. Current benchmarks therefore cannot measure whether a model follows user preferences for output style. We introduce PreferenceASR, a test set evaluating ASR systems on their ability to follow natural-language preference instructions across four categories: normalization, entities, disfluencies, and case. Built from seven open-source corpora via a two-stage LLM-assisted pipeline with human verification, it is evaluated with a preference-aware normalizer that selectively skips steps matching the active instruction. Benchmarking four models shows rankings shift across preference types, exposing quality differences traditional evaluation obscures. We publicly release the dataset.
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Preference-ASR: A Preference-Aware Test Set for Benchmarking ASR in the Era of Speech LLMs
PreferenceASR is a preference-aware ASR test set built from seven corpora that shows model rankings change when user output-style instructions are considered.