Phoneme-based interfaces match or surpass projector-based ones for LLM ASR, especially in low-resource languages, and a BPE-phoneme hybrid offers additional improvements.
SALM: Speech-augmented language model with in-context learning for speech recognition and translation
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A consequence-aware evaluation framework applied to LLMs in ATC finds peak Risk Score of only 0.69 despite high macro-F1, with errors concentrated in high-impact entities.
Audio language models are benchmarked on five semantic and paralinguistic reasoning tasks to reveal limitations in handling spoken audio evidence, accent variation, and domain shifts.
citing papers explorer
-
Phonemes vs. Projectors: An Investigation of Speech-Language Interfaces for LLM-based ASR
Phoneme-based interfaces match or surpass projector-based ones for LLM ASR, especially in low-resource languages, and a BPE-phoneme hybrid offers additional improvements.
-
Safety-Oriented Evaluation of Language Understanding Systems for Air Traffic Control
A consequence-aware evaluation framework applied to LLMs in ATC finds peak Risk Score of only 0.69 despite high macro-F1, with errors concentrated in high-impact entities.
-
Afrispeech Semantics: Evaluating Audio Semantic Reasoning in Spoken Language Models Across Domains and Accents
Audio language models are benchmarked on five semantic and paralinguistic reasoning tasks to reveal limitations in handling spoken audio evidence, accent variation, and domain shifts.