Recognition: unknown
Unsupervised domain transfer: Overcoming signal degradation in sleep monitoring by increasing scoring realism
Pith reviewed 2026-05-10 14:05 UTC · model grok-4.3
The pith
A discriminator trained on realistic sleep stage sequences can guide unsupervised feature alignment to recover scoring accuracy lost to signal degradation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By attaching a discriminator network trained to distinguish real from generated hypnograms to a pretrained U-Sleep model and using it to align target-domain features during fine-tuning, the approach recovers a substantial fraction of the scoring performance that would otherwise be lost to realistic signal degradations such as noise, filtering, or amplitude changes.
What carries the argument
Discriminator-guided fine-tuning, in which a network scoring hypnogram realism is used to adapt the feature extractor of the sleep model so that degraded inputs produce realistic stage sequences.
If this is right
- Cohen's kappa rises by 0.03 to 0.29 for the tested signal distortions.
- Scoring performance never falls below the unadapted baseline in any transfer.
- Adapted models approach but do not equal the accuracy of models trained with full supervision on the target domain.
- Real domain shifts between separate sleep studies produce no statistically meaningful improvement.
Where Pith is reading between the lines
- Output realism constraints may serve as a general regularizer for other physiological time-series tasks where valid label sequences are easy to characterize.
- The method suggests that sequence-level structure in sleep staging can be leveraged to reduce sensitivity to sensor artifacts even when direct input matching is unavailable.
- Combining this discriminator signal with other unsupervised techniques such as cycle-consistent translation could be tested on the same degradation suite.
- One could measure whether the learned discriminator indirectly flags particular artifact types by examining its activations on known failure cases.
Load-bearing premise
That enforcing realism on the output hypnogram sequence is sufficient to correct for feature misalignment caused by input signal degradation without introducing new errors.
What would settle it
Running the adapted model on a fresh set of recordings with a previously untested degradation and finding that its Cohen's kappa is no higher than the unadapted model's kappa, or that it remains unchanged on a real inter-study mismatch.
Figures
read the original abstract
Objective: Investigate whether hypnogram 'realism' can be used to guide an unsupervised method for handling arbitrary types of signal degradation in mobile sleep monitoring. Approach: Combining a pretrained, state-of-the-art 'u-sleep' model with a 'discriminator' network, we align features from a target domain with a feature space learned during pretraining. To test the approach, we distort the source domain with realistic signal degradations, to see how well the method can adapt to different types of degradation. We compare the performance of the resulting model with best-case models designed in a supervised manner for each type of transfer. Main Results: Depending on the type of distortion, we find that the unsupervised approach can increase Cohen's kappa with as little as 0.03 and up to 0.29, and that for all transfers, the method does not decrease performance. However, the approach never quite reaches the estimated theoretical optimal performance, and when tested on a real-life domain mismatch between two sleep studies, the benefit was insignificant. Significance: 'Discriminator-guided fine tuning' is an interesting approach to handling signal degradation for 'in the wild' sleep monitoring, with some promise. In particular, what it says about sleep data in general is interesting. However, more development will be necessary before using it 'in production'.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an unsupervised domain adaptation technique for sleep staging that combines a pretrained u-sleep model with a discriminator network trained on hypnogram realism to align features from degraded target domains. Synthetic signal degradations are used to test adaptation, yielding Cohen's kappa gains of 0.03–0.29 with no performance decrease across transfers, though the method falls short of estimated theoretical optima. A real-life domain shift between two sleep studies produces an insignificant benefit.
Significance. If the approach can be strengthened to deliver reliable gains on authentic domain mismatches, it would be a meaningful contribution to unsupervised adaptation for mobile biosignal monitoring, particularly by leveraging scoring realism as a supervisory signal. The mixed outcomes (positive on synthetic, null on real) and the explicit acknowledgment that further development is required temper the immediate impact, but the core idea remains worth pursuing with additional validation.
major comments (2)
- Abstract / Main Results: The central claim that the method handles arbitrary signal degradations without performance loss is load-bearing on the real-life transfer result, yet this transfer yields only an insignificant benefit. This discrepancy indicates that the synthetic distortion regime may not capture the structure of genuine mismatches, requiring explicit analysis of why the discriminator-guided alignment fails to improve performance here.
- Approach / Main Results: No statistical details (sample sizes, confidence intervals, p-values, or exact training procedures) are supplied for either the synthetic or real-life experiments. This absence prevents assessment of whether the reported kappa gains (0.03–0.29) are robust or whether the real-life null result reflects low statistical power rather than a fundamental limitation.
minor comments (2)
- Abstract: The phrase 'estimated theoretical optimal performance' is used without a clear definition or derivation; a brief explanation of how this optimum is computed would improve clarity.
- Significance: The statement that 'more development will be necessary before using it in production' is appropriate but could be expanded with concrete next steps (e.g., larger real-world cohorts or alternative discriminators).
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive review. The comments highlight important aspects of our claims and experimental reporting. We address each major comment below and have revised the manuscript accordingly to improve clarity and rigor.
read point-by-point responses
-
Referee: Abstract / Main Results: The central claim that the method handles arbitrary signal degradations without performance loss is load-bearing on the real-life transfer result, yet this transfer yields only an insignificant benefit. This discrepancy indicates that the synthetic distortion regime may not capture the structure of genuine mismatches, requiring explicit analysis of why the discriminator-guided alignment fails to improve performance here.
Authors: We agree that the real-life result tempers the strength of broader claims about arbitrary degradations. The abstract and main results already note that the real-life benefit was insignificant, while synthetic transfers showed no performance decrease. The synthetic regime was designed to isolate specific, realistic signal degradations (e.g., noise, amplitude changes) to test the core mechanism. We have added a new discussion subsection analyzing why the method yielded limited gains on the real domain shift, including potential differences in mismatch structure (e.g., inter-study variations in recording hardware and protocols versus controlled synthetic distortions) and the discriminator's sensitivity to hypnogram realism under those conditions. We have also revised the abstract to more precisely frame the 'no performance loss' finding as holding across the tested synthetic transfers, with the real-life case presented as an initial validation step requiring further work. revision: partial
-
Referee: Approach / Main Results: No statistical details (sample sizes, confidence intervals, p-values, or exact training procedures) are supplied for either the synthetic or real-life experiments. This absence prevents assessment of whether the reported kappa gains (0.03–0.29) are robust or whether the real-life null result reflects low statistical power rather than a fundamental limitation.
Authors: We acknowledge this omission and have revised the manuscript to include the requested details. The updated Methods and Results sections now report: dataset sample sizes (number of nights/recordings per source and target domain), bootstrap-derived 95% confidence intervals for all Cohen's kappa values, p-values from paired statistical tests comparing adapted vs. baseline models, and full training procedures (including optimizer settings, batch sizes, early stopping criteria, and cross-validation scheme). These additions allow direct evaluation of result robustness and statistical power for the real-life experiment. revision: yes
Circularity Check
No significant circularity; empirical tests against baselines remain independent
full rationale
The paper describes an empirical unsupervised adaptation method that combines a fixed pretrained u-sleep model with a discriminator network trained on hypnogram realism to align target-domain features. Performance is measured via Cohen's kappa on both synthetic signal degradations and a real inter-study mismatch, with explicit comparisons to supervised best-case models for each transfer. No derivation step reduces by construction to a fitted parameter or self-citation; the reported gains (0.03–0.29) and null real-world result are presented as direct experimental outcomes rather than tautological predictions. The method therefore contains independent content relative to its inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Hypnogram realism serves as a valid unsupervised proxy for aligning features across domains with signal degradation.
Reference graph
Works this paper leans on
-
[1]
The National Academies Collection: Reports funded by National Institutes of Health
Sleep Disorders and Sleep Deprivation: An Unmet Public Health Problem. The National Academies Collection: Reports funded by National Institutes of Health. Washington (DC), 2006.isbn: 978-0- 309-10111-0.url:http://www.ncbi.nlm.nih.gov/books/NBK19960/(visited on 04/06/2026)
2006
-
[2]
International classification of sleep disorders-third edition: highlights and modifications
Michael J. Sateia. “International classification of sleep disorders-third edition: highlights and modifications”.In:Chest146.5(Nov.2014),pp.1387–1394.issn:1931-3543.doi: 10.1378/chest.14- 0970
-
[3]
Automatic sleep staging of EEG signals: recent development, challenges, and future directions
Huy Phan and Kaare Mikkelsen. “Automatic sleep staging of EEG signals: recent development, challenges, and future directions”. In:Physiological Measurement43.4 (2022), 04TR01
2022
-
[4]
The AASM manual for the scoring of sleep and associated events: rules, terminology and technical specifications
American Academy of Sleep Medicine et al. “The AASM manual for the scoring of sleep and associated events: rules, terminology and technical specifications”. In:Westchester, IL: American Academy of Sleep Medicine23 (2007)
2007
-
[5]
Automated sleep scoring: A review of the latest approaches
Luigi Fiorillo, Alessandro Puiatti, Michela Papandrea, Pietro-Luca Ratti, Paolo Favaro, Corinne Roth, Panagiotis Bargiotas, Claudio L Bassetti, and Francesca D Faraci. “Automated sleep scoring: A review of the latest approaches”. In:Sleep medicine reviews48 (2019), p. 101204
2019
-
[6]
U-Sleep: resilient high-frequency sleep staging
Mathias Perslev, Sune Darkner, Lykke Kempfner, Miki Nikolic, Poul Jørgen Jennum, and Christian Igel. “U-Sleep: resilient high-frequency sleep staging”. In:NPJ digital medicine4.1 (2021), p. 72
2021
-
[7]
Sleeptransformer: Automatic sleep staging with interpretability and uncertainty quantification
Huy Phan, Kaare Mikkelsen, Oliver Y Chén, Philipp Koch, Alfred Mertins, and Maarten De Vos. “Sleeptransformer: Automatic sleep staging with interpretability and uncertainty quantification”. In: IEEE Transactions on Biomedical Engineering69.8 (2022), pp. 2456–2467
2022
-
[8]
Medical domain knowledge in domain-agnostic generative AI.npj Digital Medicine, 5(1):90, 2022
Luigi Fiorillo, Giuliana Monachino, Julia van der Meer, Marco Pesce, Jan D. Warncke, Markus H. Schmidt, Claudio L. A. Bassetti, Athina Tzovara, Paolo Favaro, and Francesca D. Faraci. “U-Sleep’s resilience to AASM guidelines”. en. In:npj Digital Medicine6.1 (Mar. 2023). Number: 1 Publisher: Nature Publishing Group, pp. 1–9.issn: 2398-6352.doi:10.1038/s4174...
-
[9]
Patrycja Lebiecka-Johansen, Jesper Strøm, Kaare B. Mikkelsen, Alvaro F. Cabrera, Rasmus E. Madsen, Julie A. E. Christensen, Martin C. Hemmsen, and Preben Kidmose. “Benefits of Different Strategies to Adapt Sleep Scoring Models from Scalp- to Ear-EEG”. In:2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMB...
-
[10]
Fromunsupervisedtosemi-supervisedadversarialdomainadaptationinelectroencephalography- based sleep staging
Elisabeth RM Heremans, Huy Phan, Pascal Borzée, Bertien Buyse, Dries Testelmans, and Maarten DeVos.“Fromunsupervisedtosemi-supervisedadversarialdomainadaptationinelectroencephalography- based sleep staging”. In:Journal of Neural Engineering19.3 (2022), p. 036044
2022
-
[11]
IGI global, 2009
Emilio Soria Olivas, Jos David Mart Guerrero, Marcelino Martinez-Sober, Jose Rafael Magdalena- Benedito, L Serrano, et al.Handbook of research on machine learning applications and trends: Algorithms, methods, and techniques: Algorithms, methods, and techniques. IGI global, 2009
2009
-
[12]
Personalized automatic sleep staging with single-night data: a pilot study with Kullback–Leibler divergence regularization
Huy Phan, Kaare Mikkelsen, Oliver Y Chén, Philipp Koch, Alfred Mertins, Preben Kidmose, and Maarten De Vos. “Personalized automatic sleep staging with single-night data: a pilot study with Kullback–Leibler divergence regularization”. In:Physiological measurement41.6 (2020), p. 064004
2020
-
[13]
Person- alization of automatic sleep scoring: How best to adapt models to personal domains in wearable EEG
Kristian P Lorenzen, Elisabeth RM Heremans, Maarten de Vos, and Kaare B Mikkelsen. “Person- alization of automatic sleep scoring: How best to adapt models to personal domains in wearable EEG”. In:IEEE Journal of Biomedical and Health Informatics(2024)
2024
-
[14]
Accurate whole-night sleep monitoring with dry-contact ear-EEG
Kaare B Mikkelsen, Yousef R Tabar, Simon L Kappel, Christian B Christensen, Hans O Toft, Martin C Hemmsen, Mike L Rank, Marit Otto, and Preben Kidmose. “Accurate whole-night sleep monitoring with dry-contact ear-EEG”. In:Scientific reports9.1 (2019), p. 16824
2019
-
[15]
A Protocol for Comparing Dry and Wet EEG Electrodes During Sleep
Sven Leach, Ku-Young Chung, Laura Tüshaus, Reto Huber, and Walter Karlen. “A Protocol for Comparing Dry and Wet EEG Electrodes During Sleep”. In:Frontiers in Neuroscience14 (2020), p. 586.issn: 1662-4548.doi:10.3389/fnins.2020.00586
-
[16]
Daria Kleeva, Ivan Ninenko, and Mikhail A. Lebedev. “Resting-state EEG recorded with gel-based vs. consumer dry electrodes: spectral characteristics and across-device correlations”. In:Frontiers in Neuroscience18 (Feb. 2, 2024). Publisher: Frontiers.issn: 1662-453X.doi: 10.3389/fnins.2024. 1326139.url: https://www.frontiersin.org/journals/neuroscience/art...
-
[17]
Ear-EEG sleep monitoring data sets
Kaare Bjarke Mikkelsen, Yousef Rezai Tabar, Laura Rævsbæk Birch, Simon Lind Kappel, Christian Bech Christensen, Lars Dalskov Mosgaard, Marit Otto, Martin Christian Hemmsen, Mike Lind Rank, and Preben Kidmose. “Ear-EEG sleep monitoring data sets”. In:Scientific Data12.1 (Feb. 2025), p. 301.issn: 2052-4463.doi: 10.1038/s41597-025-04579-8 .url: https://doi.o...
-
[18]
Domain-adversarial training of neural networks
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario March, and Victor Lempitsky. “Domain-adversarial training of neural networks”. In:Journal of machine learning research17.59 (2016), pp. 1–35
2016
-
[19]
Vandana Akshath Raj, Tejasvi Parupudi, Ananthakrishna Thalengala, and Subramanya G. Nayak. “A comprehensive review of deep learning models for denoising EEG signals: challenges, advances, and future directions”. In:Discover Applied Sciences7.11 (Oct. 22, 2025), p. 1268.issn: 3004-9261. doi: 10.1007/s42452- 025- 07808- 2.url: https://doi.org/10.1007/s42452...
-
[20]
AnEEG: leveraging deep learning for effective artifact removal in EEG data
Bhabesh Kalita, Nabamita Deb, and Daisy Das. “AnEEG: leveraging deep learning for effective artifact removal in EEG data”. In:Scientific Reports14.1 (Oct. 16, 2024). Publisher: Nature Publishing Group, p. 24234.issn: 2045-2322.doi: 10.1038/s41598-024-75091-z .url: https: //www.nature.com/articles/s41598-024-75091-z(visited on 01/15/2026)
-
[21]
Removal of movement-induced EEGartifacts:currentstateoftheartandguidelines
Dasa Gorjan, Klaus Gramann, Kevin De Pauw, and Uros Marusic. “Removal of movement-induced EEGartifacts:currentstateoftheartandguidelines”.en.In:Journal of Neural Engineering19.1(Feb. 2022). Publisher: IOP Publishing, p. 011004.issn: 1741-2552.doi:10.1088/1741-2552/ac542c. url:https://doi.org/10.1088/1741-2552/ac542c(visited on 01/15/2026)
-
[22]
Joint moment-matching autoencoders
Mohammad Ahangar Kiasari, Dennis Singh Moirangthem, and Minho Lee. “Joint moment-matching autoencoders”. In:Neural Networks106 (2018), pp. 185–193
2018
-
[23]
Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation
Wei-Ning Hsu, Yu Zhang, and James Glass. “Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation”. In:2017 IEEE automatic speech recognition and understanding workshop (ASRU). IEEE. 2017, pp. 16–23
2017
-
[24]
Transferring structured knowledge in unsuper- vised domain adaptation of a sleep staging network
Chaehwa Yoo, Hyang Woon Lee, and Je-Won Kang. “Transferring structured knowledge in unsuper- vised domain adaptation of a sleep staging network”. In:IEEE journal of biomedical and health informatics26.3 (2021), pp. 1273–1284. 14
2021
-
[25]
Unsupervised domain adaptation by statistics alignment for deep sleep staging networks
Jiahao Fan, Hangyu Zhu, Xinyu Jiang, Long Meng, Chen Chen, Cong Fu, Huan Yu, Chenyun Dai, and Wei Chen. “Unsupervised domain adaptation by statistics alignment for deep sleep staging networks”. In:IEEE Transactions on Neural Systems and Rehabilitation Engineering30 (2022), pp. 205–216
2022
-
[26]
The National Sleep Research Resource: towards a sleep data commons
Guo-Qiang Zhang, Licong Cui, Remo Mueller, Shiqiang Tao, Matthew Kim, Michael Rueschman, Sara Mariani, Daniel Mobley, and Susan Redline. “The National Sleep Research Resource: towards a sleep data commons”. eng. In:Journal of the American Medical Informatics Association: JAMIA 25.10 (Oct. 2018), pp. 1351–1358.issn: 1527-974X.doi:10.1093/jamia/ocy064
-
[27]
Common sleep data pipeline for combined data sets
Jesper Strøm, Andreas Larsen Engholm, Kristian Peter Lorenzen, and Kaare B Mikkelsen. “Common sleep data pipeline for combined data sets”. In:Plos one19.8 (2024), e0307202
2024
-
[28]
10-10 electrode system for EEG recording
Marc R. Nuwer. “10-10 electrode system for EEG recording”. In:Clinical Neurophysiology129.5 (May 1, 2018), p. 1103.issn: 1388-2457.doi: 10.1016/j.clinph.2018.01.065 .url: https: //www.sciencedirect.com/science/article/pii/S1388245718300907(visited on 02/27/2024)
-
[29]
Jessie P. Bakker, Ali Tavakkoli, Michael Rueschman, Wei Wang, Robert Andrews, Atul Malhotra, Robert L. Owens, Amit Anand, Katherine A. Dudley, and Sanjay R. Patel. “Gastric Banding Surgery versus Continuous Positive Airway Pressure for Obstructive Sleep Apnea: A Randomized Controlled Trial”. eng. In:American Journal of Respiratory and Critical Care Medici...
-
[30]
The familial aggregation of obstructive sleep apnea
Susan Redline, Peter V Tishler, Tor D Tosteson, John Williamson, Kenneth Kump, Ilene Browner, Veronica Ferrette, and Patrick Krejci. “The familial aggregation of obstructive sleep apnea.” In: American journal of respiratory and critical care medicine151.3 (1995), pp. 682–687
1995
-
[31]
Dreem open datasets: Multi-scored sleep datasets to compare human and automated sleep staging
Antoine Guillot, Fabien Sauvet, Emmanuel H During, and Valentin Thorey. “Dreem open datasets: Multi-scored sleep datasets to compare human and automated sleep staging”. In:IEEE transactions on neural systems and rehabilitation engineering28.9 (2020), pp. 1955–1965
2020
-
[32]
Carol L. Rosen, Dennis Auckley, Ruth Benca, Nancy Foldvary-Schaefer, Conrad Iber, Vishesh Kapur, Michael Rueschman, Phyllis Zee, and Susan Redline. “A multisite randomized trial of portable sleep studies and positive airway pressure autotitration versus laboratory-based polysomnography for the diagnosis and treatment of obstructive sleep apnea: the HomePA...
-
[33]
ISRUC-Sleep: A com- prehensive public dataset for sleep researchers
Sirvan Khalighi, Teresa Sousa, José Moutinho Santos, and Urbano Nunes. “ISRUC-Sleep: A com- prehensive public dataset for sleep researchers”. In:Computer methods and programs in biomedicine 124 (2016), pp. 180–192
2016
-
[34]
Christian O’Reilly, Nadia Gosselin, Julie Carrier, and Tore Nielsen. “Montreal Archive of Sleep Stud- ies: an open-access resource for instrument benchmarking and exploratory research”. en. In:Journal of Sleep Research23.6 (2014). _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/jsr.12169, pp. 628–635.issn: 1365-2869.doi: https : / / doi . org / 1...
-
[35]
You snooze, you win: the physionet/computing in cardiology challenge 2018
Mohammad M Ghassemi, Benjamin E Moody, Li-Wei H Lehman, Christopher Song, Qiao Li, Haoqi Sun, Roger G Mark, M Brandon Westover, and Gari D Clifford. “You snooze, you win: the physionet/computing in cardiology challenge 2018”. In:2018 Computing in Cardiology Conference (CinC). Vol. 45. IEEE. 2018, pp. 1–4
2018
-
[36]
Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the EEG
Bob Kemp, Aeilko H Zwinderman, Bert Tuk, Hilbert AC Kamphuisen, and Josefien JL Oberye. “Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the EEG”. In:IEEE Transactions on Biomedical Engineering47.9 (2000), pp. 1185–1194
2000
-
[37]
PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals
Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. “PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals”. In:circulation101.23 (2000), e215–e220
2000
-
[38]
Sleep-disordered breathing and cognition in older women
Adam P Spira, Terri Blackwell, Katie L Stone, Susan Redline, Jane A Cauley, Sonia Ancoli-Israel, and Kristine Yaffe. “Sleep-disordered breathing and cognition in older women”. In:Journal of the American Geriatrics Society56.1 (2008), pp. 45–50
2008
-
[39]
A coefficient of agreement for nominal scales
Jacob Cohen. “A coefficient of agreement for nominal scales”. In:Educational and psychological measurement20.1 (1960), pp. 37–46. 15
1960
-
[40]
Heidi Danker-Hopfe, D. Kunz, G. Gruber, G. Klösch, J. L. Lorenzo, S. L. Himanen, B. Kemp, T. Penzel, J. Röschke, H. Dorn, et al. “Interrater reliability between scorers from eight European sleep laboratories in subjects with different sleep disorders”. en. In:Journal of Sleep Research13.1 (2004). _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1046/j....
-
[41]
At-home sleep monitoring using generic ear-EEG
Yousef R Tabar, Kaare B Mikkelsen, Nelly Shenton, Simon L Kappel, Astrid R Bertelsen, Reza Nikbakht, Hans O Toft, Chris H Henriksen, Martin C Hemmsen, Mike L Rank, et al. “At-home sleep monitoring using generic ear-EEG”. In:Frontiers in neuroscience17 (2023), p. 987578. 16 A Appendix A.1 Distortion details Fig 7 compares the original waveform with the thr...
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.