Fine-tuning Whisper on Swiss German speech with subtitle supervision yields an honest 25.6% WER baseline (13.8% cWER) and demonstrates that prior SOTA claims of 17% WER result from benchmark contamination allowing 13.88% WER with no dialect training.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR: Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6% WER (13.8% cWER)
Fine-tuning Whisper on Swiss German speech with subtitle supervision yields an honest 25.6% WER baseline (13.8% cWER) and demonstrates that prior SOTA claims of 17% WER result from benchmark contamination allowing 13.88% WER with no dialect training.