pith. sign in

arxiv: 2606.09470 · v1 · pith:7VF7EEWEnew · submitted 2026-06-08 · 💻 cs.CL · cs.AI

A Finetuned SpeechLLM for Joint Multi-Granular L2 Assessment and Natural-Language Rationales

classification 💻 cs.CL cs.AI
keywords labelsassessmentaccuracyfaithfulnesslevelmodelmulti-granularnatural-language
0
0 comments X
read the original abstract

Automated L2 speech assessment can assign proficiency labels, but often lacks interpretability. We propose a rubric-guided SpeechLLM for multi-aspect, multi-granular assessment, trained with a hybrid objective combining supervised fine-tuning and Bounded Direct Preference Optimization. The model jointly predicts ordinal labels at the sentence-level (accuracy, fluency, prosody), word/phoneme-level accuracy, and generates a natural-language rationale in the same response. On SpeechOcean762, our approach matches or outperforms single-granularity models while remaining competitive with prior approaches. We analyze rationale reliability along two axes: self-consistency with model predictions and alignment with ground-truth labels, using sentiment consistency (plausibility) and mention-based agreement (faithfulness). Rationales are plausible at the sentence level, but faithfulness degrades at the word/phoneme level: references are sparse and weakly aligned with token-level labels.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.