Mel-LLM shows an LLM can achieve competitive ASR by directly ingesting pre-processed Mel spectrogram patches through a linear projection layer.
Speech LLMs are contextual reasoning transcribers,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LaSR improves context-aware terminology recognition in speech LLMs by aligning latent CoT supervision on acoustic regions and introducing latent reasoning periods, shown on a new academic corpus to outperform standard fine-tuning without added latency.
citing papers explorer
-
LLM can Read Spectrogram: Encoder-free Speech-Language Modeling
Mel-LLM shows an LLM can achieve competitive ASR by directly ingesting pre-processed Mel spectrogram patches through a linear projection layer.
-
LaSR: Context-Aware Speech Recognition via Latent Reasoning
LaSR improves context-aware terminology recognition in speech LLMs by aligning latent CoT supervision on acoustic regions and introducing latent reasoning periods, shown on a new academic corpus to outperform standard fine-tuning without added latency.