SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model

Gang Pan; Guifeng Deng; Haiteng Jiang; Jiquan Wang; Junyi Xie; Mengfan Niu; Pan Wang; Sha Zhao; Shuying Rao; Tao Li

arxiv: 2603.26738 · v3 · pith:RLRV3557new · submitted 2026-03-22 · 💻 cs.CV · cs.AI· cs.CL

SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model

Guifeng Deng , Pan Wang , Mengfan Niu , Jiquan Wang , Shuying Rao , Junyi Xie , Xi'ang Chen , Sha Zhao

show 4 more authors

Gang Pan Wanjun Guo Tao Li Haiteng Jiang

This is my paper

classification 💻 cs.CV cs.AIcs.CL

keywords sleepsleepvlmmodelrule-groundedstagingaccuracyachievedautomated

0 comments

read the original abstract

While automated sleep staging has achieved expert-level accuracy, its clinical adoption is hindered by a lack of auditable reasoning. We introduce SleepVLM, a rule-grounded vision-language model (VLM) that stages sleep from multi-channel polysomnography (PSG) waveform images and generates clinician-readable rationales based on American Academy of Sleep Medicine (AASM) scoring criteria. Utilizing waveform-perceptual pre-training and rule-grounded supervised fine-tuning, SleepVLM achieved Cohen's kappa of 0.767 on a held-out test set (MASS-SS1) and 0.743 on an external cohort (ZUAMHCS), matching state-of-the-art performance. Independent expert evaluation by two trained sleep technologists further validated the model's reasoning quality, with mean scores of 3.75-3.96 out of 5 across factual accuracy, evidence comprehensiveness, and logical coherence on both datasets. By coupling competitive performance with transparent, rule-based explanations, SleepVLM may improve the trustworthiness and auditability of automated sleep staging in clinical workflows. To facilitate further research in interpretable sleep medicine, we release MASS-EX, a novel expert-annotated dataset.

This paper has not been read by Pith yet.

SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model

discussion (0)