pith. sign in

arxiv: 2602.23605 · v2 · pith:C2IB7ODKnew · submitted 2026-02-27 · 💻 cs.AI

SleepLM: Natural-Language Intelligence for Human Sleep

classification 💻 cs.AI
keywords sleepsleeplmalignmentgenerationcaptioncross-modaldatahuman
0
0 comments X
read the original abstract

We present SleepLM, a family of sleep-language foundation models that enable human sleep alignment, interpretation, and interaction with natural language. Despite the critical role of sleep, learning-based sleep analysis systems operate in closed label spaces (e.g., predefined stages or events) and fail to describe, query, or generalize to novel sleep phenomena. SleepLM bridges natural language and multimodal polysomnography, enabling language-grounded representations of sleep physiology. To support this alignment, we introduce a multilevel sleep caption generation pipeline that enables the curation of the first large-scale sleep-text dataset, comprising over 100K hours of data from more than 10,000 individuals. Furthermore, we present a unified pretraining objective that combines contrastive alignment, caption generation, and signal reconstruction to better capture physiological fidelity and cross-modal interactions. Extensive experiments on real-world sleep understanding tasks verify that SleepLM outperforms state-of-the-art in zero-shot and few-shot learning, cross-modal retrieval, and sleep captioning. Importantly, SleepLM also exhibits intriguing capabilities including language-guided event localization, targeted insight generation, and zero-shot generalization to unseen tasks. All code and data will be open-sourced.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. GlucoFM: A Dual-Stream Foundation Model for Continuous Glucose Monitoring

    cs.LG 2026-05 unverdicted novelty 7.0

    GlucoFM decomposes CGM traces into dual state-event streams, pretrains on 109k hours of unlabeled data, and reports superior subject-disjoint performance on seven clinical tasks across four cohorts.

  2. TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs -- A Case Study in Mental Health

    cs.LG 2026-05 unverdicted novelty 6.0

    TimeSRL uses semantic abstractions from time-series data optimized via reinforcement learning to achieve better cross-dataset generalization than standard ML or LLM baselines in mental health prediction.