A Finetuned SpeechLLM for Joint Multi-Granular L2 Assessment and Natural-Language Rationales

Aditya Kamlesh Parikh; Catia Cucchiarini; Cristian Tejedor-Garcia; Helmer Strik

arxiv: 2606.09470 · v1 · pith:7VF7EEWEnew · submitted 2026-06-08 · 💻 cs.CL · cs.AI

A Finetuned SpeechLLM for Joint Multi-Granular L2 Assessment and Natural-Language Rationales

Aditya Kamlesh Parikh , Cristian Tejedor-Garcia , Catia Cucchiarini , Helmer Strik This is my paper

classification 💻 cs.CL cs.AI

keywords labelsassessmentaccuracyfaithfulnesslevelmodelmulti-granularnatural-language

0 comments

read the original abstract

Automated L2 speech assessment can assign proficiency labels, but often lacks interpretability. We propose a rubric-guided SpeechLLM for multi-aspect, multi-granular assessment, trained with a hybrid objective combining supervised fine-tuning and Bounded Direct Preference Optimization. The model jointly predicts ordinal labels at the sentence-level (accuracy, fluency, prosody), word/phoneme-level accuracy, and generates a natural-language rationale in the same response. On SpeechOcean762, our approach matches or outperforms single-granularity models while remaining competitive with prior approaches. We analyze rationale reliability along two axes: self-consistency with model predictions and alignment with ground-truth labels, using sentiment consistency (plausibility) and mention-based agreement (faithfulness). Rationales are plausible at the sentence level, but faithfulness degrades at the word/phoneme level: references are sparse and weakly aligned with token-level labels.

This paper has not been read by Pith yet.

A Finetuned SpeechLLM for Joint Multi-Granular L2 Assessment and Natural-Language Rationales

discussion (0)