Decoder-only archi- tecture for speech recognition with CTC prompts and text-only training,

· 2023 · arXiv 2309.08876

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

eess.AS · 2026-06-10 · unverdicted · novelty 5.0

Empirical sweep finds 4.17 Hz frame rate plus intermediate-layer alignment optimal for speech QA under frozen text LLM backbone.

Showing 1 of 1 citing paper.

Which Speech Representation Better Matches Text-Native Reasoning? A Study of Speech-Text Alignment on Frame Rate and Representation eess.AS · 2026-06-10 · unverdicted · none · ref 53
Empirical sweep finds 4.17 Hz frame rate plus intermediate-layer alignment optimal for speech QA under frozen text LLM backbone.