A new multi-accent long-form call-center dialogue dataset for English ASR evaluation shows substantial performance variation across accents and segmentation methods.
Canary-1B- v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
Contextual Earnings-22 is a new benchmark dataset showing that scaled keyword prompting and boosting both deliver significantly better accuracy on custom vocabularies than standard academic tests.
ASPIRin decouples speaking timing from token content via binary action space projection and applies GRPO with rule-based rewards to optimize interactivity in SLMs without semantic collapse or repetition.
Classical codecs prove more robust to noise than neural codecs, speech enhancement significantly helps noise-affected codecs, and listening effort plus ASR-based metrics add useful nuance beyond basic intelligibility scores.
BUT's CHiME-9 MCoRec system conditions Parakeet-v2 ASR on AV-HuBERT visuals for 33.7% WER and uses Qwen3.5 LLM for hierarchical clustering to reach 0.97 F1, beating the baseline by 16.2% WER and 0.15 F1 on the development set.
citing papers explorer
-
AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR
A new multi-accent long-form call-center dialogue dataset for English ASR evaluation shows substantial performance variation across accents and segmentation methods.
-
Contextual Earnings-22: A Speech Recognition Benchmark with Custom Vocabulary in the Wild
Contextual Earnings-22 is a new benchmark dataset showing that scaled keyword prompting and boosting both deliver significantly better accuracy on custom vocabularies than standard academic tests.
-
ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models
ASPIRin decouples speaking timing from token content via binary action space projection and applies GRPO with rule-based rewards to optimize interactivity in SLMs without semantic collapse or repetition.
-
Assessing the Impact of Noise and Speech Enhancement on the Intelligibility of Speech Codecs
Classical codecs prove more robust to noise than neural codecs, speech enhancement significantly helps noise-affected codecs, and listening effort plus ASR-based metrics add useful nuance beyond basic intelligibility scores.
-
BUT System Description for CHiME-9 MCoRec Challenge
BUT's CHiME-9 MCoRec system conditions Parakeet-v2 ASR on AV-HuBERT visuals for 33.7% WER and uses Qwen3.5 LLM for hierarchical clustering to reach 0.97 F1, beating the baseline by 16.2% WER and 0.15 F1 on the development set.