SPOTR: Spatio-temporal Pooling One-Token Reconstruction for Universal Physiological Signal Self-supervised Learning
Pith reviewed 2026-06-26 11:57 UTC · model grok-4.3
The pith
A single-token global bottleneck in self-supervised pretraining produces stronger representations for EEG, ECG, and PPG signals under linear probing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SPOTR claims that conditioning reconstruction on a single global token obtained after spatio-temporal pooling yields representations that generalize across heterogeneous physiological datasets. Pretrained on 20 datasets covering EEG, iEEG, ECG, and PPG, these representations raise average linear-probing AUC by 18.49 percent on EEG, 21.71 percent on iEEG, 17.86 percent on ECG, and 4.64 percent on PPG relative to the strongest baseline. The same model also runs with roughly 78 percent lower latency and 52 percent lower peak GPU memory than a representative general-purpose time-series foundation model.
What carries the argument
The single-token global bottleneck, which compresses the entire input waveform into one representation before any reconstruction occurs, together with the spatio-temporal compaction module that reduces token count and computation.
If this is right
- Linear probing on EEG, iEEG, ECG, and PPG datasets shows consistent AUC gains without modality-specific retraining.
- The compaction module delivers 78 percent lower average latency and 52 percent lower peak memory than general time-series models.
- A single pretrained model serves all four signal types rather than requiring separate per-modality training.
- The framework supports lightweight adaptation suitable for clinical scenarios with scarce labeled data.
Where Pith is reading between the lines
- The same bottleneck principle could be tested on other sequential biomedical recordings such as EMG or fMRI time courses.
- Lower memory and latency may enable on-device inference for wearable physiological monitors.
- If global compression is the key mechanism, similar one-token designs might reduce redundancy issues in non-physiological time-series tasks.
Load-bearing premise
The single-token bottleneck actually blocks shortcut learning from temporal and cross-channel redundancy while still retaining the clinically meaningful signal features needed for downstream tasks.
What would settle it
Linear-probing AUC on a new physiological dataset held out from the 20-dataset pretraining collection fails to exceed the strongest baseline by margins comparable to those reported.
Figures
read the original abstract
Physiological signals such as EEG, ECG, and PPG are widely used in clinical monitoring. Recent self-supervised learning (SSL) methods offer an attractive way to leverage unlabeled recordings, yet they still fall short in practice. In particular, current SSL methods struggle across heterogeneous datasets, often distorting clinically meaningful structures or learning shortcuts from temporal and cross-channel redundancy. Consequently, existing SSL methods often deliver limited performance under linear probing, a lightweight adaptation setting that better matches real-world medical scenarios. Moreover, most Transformer-based SSL models encode a flattened spatiotemporal token sequence, incurring high computation and memory cost, and are typically developed within a single modality. To address these limitations, we present SPOTR (Spatio-temporal Pooling One-Token Reconstruction), a compress-reconstruct pretraining framework that introduces a single-token global bottleneck for physiological signals. SPOTR compresses each waveform into a single-token representation and reconstructs the signal conditioned only on this representation. Meanwhile, SPOTR introduces an efficient spatio-temporal compaction module to reduce computation and memory cost. Pretrained on 20 datasets spanning EEG, iEEG, ECG, and PPG, SPOTR consistently outperforms the strongest baseline under linear probing, improving average AUC by 18.49%, 21.71%, 17.86%, and 4.64%, respectively. Compared with a representative general-purpose time-series foundation model, SPOTR achieves around 78% lower latency and 52% lower peak GPU memory on average. The code can be found at https://github.com/5GYYYYY/SPOTR.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SPOTR, a self-supervised pretraining framework for physiological signals (EEG, iEEG, ECG, PPG) that employs a single-token global bottleneck combined with spatio-temporal compaction to compress each waveform into one token and reconstruct the original signal from it alone. The method is pretrained on 20 heterogeneous datasets and evaluated under linear probing, claiming consistent outperformance of the strongest baseline with average AUC gains of 18.49% (EEG), 21.71% (iEEG), 17.86% (ECG), and 4.64% (PPG), plus substantial efficiency improvements (78% lower latency, 52% lower peak GPU memory) versus a general-purpose time-series foundation model. The abstract positions the single-token bottleneck as a remedy for shortcut learning from temporal and cross-channel redundancy.
Significance. If the performance and efficiency claims are reproducible, SPOTR would represent a practical advance for universal physiological-signal SSL by offering a lightweight adaptation pathway that aligns with clinical constraints. The multi-modality pretraining scope and public code release are positive attributes that could facilitate follow-up work. However, the absence of baseline implementation details, statistical tests, and diagnostics for the claimed anti-shortcut mechanism limits the immediate impact; the result would need to be shown robust to these factors to shift practice.
major comments (3)
- [Abstract, §4] Abstract and §4 (Experiments): the headline AUC improvements (18.49–21.71 %) are reported without any description of baseline implementations, data splits, exclusion criteria, or statistical testing. This omission makes the central performance claim impossible to evaluate and leaves open the possibility of post-hoc selection or protocol differences.
- [§3] §3 (Method): the premise that the single-token global bottleneck plus spatio-temporal compaction blocks redundancy shortcuts while preserving clinical structure is stated but unsupported by any diagnostic (channel ablation, time-shift invariance test, or information-bottleneck analysis). Without such evidence the linear-probing gains cannot be attributed to the architectural innovation rather than dataset artifacts.
- [§4] §4 (Experiments): no comparison is shown against the same set of baselines under identical splits and preprocessing for all 20 datasets; the reported modality-wise averages therefore cannot be taken as a controlled demonstration of universality.
minor comments (2)
- [Abstract] The abstract states that prior SSL methods “learn shortcuts from temporal and cross-channel redundancy” but does not cite the specific prior works or quantify the redundancy in the 20 datasets used here.
- [Abstract] Notation for the spatio-temporal compaction module and the reconstruction loss is introduced without an accompanying equation or diagram in the provided abstract; readers must wait until the methods section for formal definitions.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and indicate the revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: [Abstract, §4] Abstract and §4 (Experiments): the headline AUC improvements (18.49–21.71 %) are reported without any description of baseline implementations, data splits, exclusion criteria, or statistical testing. This omission makes the central performance claim impossible to evaluate and leaves open the possibility of post-hoc selection or protocol differences.
Authors: We agree that greater detail is required for reproducibility. In the revised manuscript we will expand §4 with explicit descriptions of baseline implementations (including code references and any adaptations), per-dataset splits, exclusion criteria, and statistical testing (standard deviations across runs plus paired significance tests on the AUC differences). revision: yes
-
Referee: [§3] §3 (Method): the premise that the single-token global bottleneck plus spatio-temporal compaction blocks redundancy shortcuts while preserving clinical structure is stated but unsupported by any diagnostic (channel ablation, time-shift invariance test, or information-bottleneck analysis). Without such evidence the linear-probing gains cannot be attributed to the architectural innovation rather than dataset artifacts.
Authors: The cross-dataset linear-probing gains serve as the primary empirical support for the design choice. To strengthen attribution we will add channel-ablation and time-shift invariance experiments to §4; a full information-bottleneck analysis lies outside the current scope. revision: partial
-
Referee: [§4] §4 (Experiments): no comparison is shown against the same set of baselines under identical splits and preprocessing for all 20 datasets; the reported modality-wise averages therefore cannot be taken as a controlled demonstration of universality.
Authors: Dataset heterogeneity (sampling rates, channel counts) precludes fully identical preprocessing across all 20 recordings. We will revise §4 to tabulate the shared preprocessing steps, confirm that the identical baseline codebases were used, and report per-dataset rather than only aggregated results so readers can judge the degree of control. revision: yes
Circularity Check
No circularity: empirical performance claims rest on independent pretraining and linear probing evaluation.
full rationale
The paper presents SPOTR as an engineering framework that compresses signals to a single-token bottleneck and reconstructs from it, with reported AUC gains obtained via linear probing on 20 external datasets. No equations, fitted parameters, or self-citations are shown that would make the AUC improvements or latency reductions equivalent to the input data or prior results by construction. The method description and evaluation protocol remain self-contained against external benchmarks, with no load-bearing step reducing to a self-definition, fitted-input prediction, or author-imported uniqueness theorem.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Classification of 12-lead ecgs: the physionet/computing in cardiology challenge 2020.Physiological measurement, 41(12):124003,
[Aldayet al., 2020 ] Erick A Perez Alday, Annie Gu, et al. Classification of 12-lead ecgs: the physionet/computing in cardiology challenge 2020.Physiological measurement, 41(12):124003,
2020
-
[2]
Nutzung der ekg-signaldatenbank cardiodat der ptb ¨uber das internet.Type: dataset,
[Bousseljotet al., 1995 ] Ralf Bousseljot, Dieter Kreiseler, and Allard Schnabel. Nutzung der ekg-signaldatenbank cardiodat der ptb ¨uber das internet.Type: dataset,
1995
-
[3]
[Burrelloet al., 2019 ] Alessio Burrello, Kaspar Schindler, et al. Hyperdimensional computing with local binary patterns: One-shot learning of seizure onset and identi- fication of ictogenic brain regions using short-time ieeg recordings.IEEE Transactions on Biomedical Engineer- ing, 67(2):601–613,
2019
-
[4]
Ssddb: A semantic-structural dual-drive pretraining framework for brain signals
[Chenet al., 2026 ] Mingzhi Chen, Yiyu Gui, et al. Ssddb: A semantic-structural dual-drive pretraining framework for brain signals. InICASSP, pages 6496–6500. IEEE,
2026
-
[5]
Eeg synchronization analysis for seizure prediction: A study on data of noninvasive recordings.Processes,
[Dettiet al., 2020 ] Paolo Detti, Giampaolo Vatti, and Garazi Zabalo Manrique de Lara. Eeg synchronization analysis for seizure prediction: A study on data of noninvasive recordings.Processes,
2020
-
[6]
El-Dahshan, Mah- moud M
[El-Dahshanet al., 2024 ] El-Sayed A. El-Dahshan, Mah- moud M. Bassiouni, et al. Exhyptnet: An explainable diag- nosis of hypertension using efficientnet with ppg signals. Expert Systems with Applications, 239:122388,
2024
-
[7]
An attention-based deep learning approach for sleep stage classification with single-channel eeg.IEEE Trans- actions on Neural Systems and Rehabilitation Engineer- ing, 29:809–818,
[Eldeleet al., 2021 ] Emadeldeen Eldele, Zhenghua Chen, et al. An attention-based deep learning approach for sleep stage classification with single-channel eeg.IEEE Trans- actions on Neural Systems and Rehabilitation Engineer- ing, 29:809–818,
2021
-
[8]
Towards multi- resolution spatiotemporal graph learning for medical time series classification
[Fanet al., 2025 ] Wei Fan, Jingru Fei, et al. Towards multi- resolution spatiotemporal graph learning for medical time series classification. InWWW, page 5054–5064,
2025
-
[9]
Development of a screening tool for sleep disordered breathing in children using the phone oximeter™.PLoS ONE, 9,
[Gardeet al., 2014 ] Ainara Garde, Parastoo Dehkordi, et al. Development of a screening tool for sleep disordered breathing in children using the phone oximeter™.PLoS ONE, 9,
2014
-
[10]
MOMENT: A family of open time-series foundation models
[Goswamiet al., 2024 ] Mononito Goswami, Konrad Szafer, et al. MOMENT: A family of open time-series foundation models. InICML, pages 16115–16152,
2024
-
[11]
Mimic-iv- ecg: Diagnostic electrocardiogram matched subset.Type: dataset, 6:13–14,
[Gowet al., 2023 ] Brian Gow, Tom Pollard, et al. Mimic-iv- ecg: Diagnostic electrocardiogram matched subset.Type: dataset, 6:13–14,
2023
-
[12]
[Guhdaret al., 2025 ] Mohammed Guhdar, Abdulhakeem Mohammed, and Ramadhan J. Mstafa. Advanced deep learning framework for ecg arrhythmia classification us- ing 1d-cnn with attention mechanism.Knowl. Based Syst., 315:113301,
2025
-
[13]
Masked autoencoders are scalable vision learners.CVPR, pages 15979–15988,
[Heet al., 2021 ] Kaiming He, Xinlei Chen, et al. Masked autoencoders are scalable vision learners.CVPR, pages 15979–15988,
2021
-
[14]
Mathieson, et al
[Hoganet al., 2025 ] Robert Hogan, Sean R. Mathieson, et al. Scaling convolutional neural networks achieves ex- pert level seizure detection in neonatal eeg.NPJ Digital Medicine, 8,
2025
-
[15]
Xsleep- fusion: A dual-stage information bottleneck fusion frame- work for interpretable multimodal sleep analysis.Infor- mation Fusion, 123:103275,
[Huet al., 2025 ] Shuaicong Hu, Yanan Wang, et al. Xsleep- fusion: A dual-stage information bottleneck fusion frame- work for interpretable multimodal sleep analysis.Infor- mation Fusion, 123:103275,
2025
-
[16]
Large brain model for learning generic represen- tations with tremendous eeg data in bci
[Jianget al., 2024 ] Wei-Bang Jiang, Liming Zhao, and Bao- liang Lu. Large brain model for learning generic represen- tations with tremendous eeg data in bci. InICLR, pages 16405–16426,
2024
-
[17]
Reading your heart: Learning ecg words and sentences via pre-training ecg language model
[Jinet al., 2025 ] Jiarui Jin, Haoyu Wang, et al. Reading your heart: Learning ecg words and sentences via pre-training ecg language model. InICLR, pages 8207–8227,
2025
-
[18]
Develop- ment of expert-level classification of seizures and rhythmic and periodic patterns during eeg interpretation.Neurology, 100(17):e1750–e1762,
[Jinget al., 2023 ] Jin Jing, Wendong Ge, et al. Develop- ment of expert-level classification of seizures and rhythmic and periodic patterns during eeg interpretation.Neurology, 100(17):e1750–e1762,
2023
-
[19]
Cuff-less high-accuracy calibration-free blood pressure estimation using pulse transit time.ISCAS, pages 1006–1009,
[Kachueeet al., 2015 ] Mohamad Kachuee, Moham- mad Mahdi Kiani, et al. Cuff-less high-accuracy calibration-free blood pressure estimation using pulse transit time.ISCAS, pages 1006–1009,
2015
-
[20]
Lobachevsky university electrocardiography database.Type: Dataset.,
[Kalyakulinaet al., 2020 ] Alena Kalyakulina, Igor Yusipov, et al. Lobachevsky university electrocardiography database.Type: Dataset.,
2020
-
[21]
Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the eeg.IEEE Transactions on Biomedical Engineering, 47(9):1185–1194,
[Kempet al., 2000 ] Bob Kemp, Aeilko H Zwinderman, et al. Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the eeg.IEEE Transactions on Biomedical Engineering, 47(9):1185–1194,
2000
-
[22]
Isruc-sleep: A comprehensive public dataset for sleep researchers.Computer methods and programs in biomedicine, 124:180–92,
[Khalighiet al., 2016 ] Sirvan Khalighi, Teresa Sousa, et al. Isruc-sleep: A comprehensive public dataset for sleep researchers.Computer methods and programs in biomedicine, 124:180–92,
2016
-
[23]
The nmt scalp eeg dataset: An open-source annotated dataset of healthy and pathological eeg recordings for pre- dictive modeling.Frontiers in neuroscience, 15:755817,
[Khanet al., 2022 ] Hassan Aqeel Khan, Rahat Ul Ain, et al. The nmt scalp eeg dataset: An open-source annotated dataset of healthy and pathological eeg recordings for pre- dictive modeling.Frontiers in neuroscience, 15:755817,
2022
-
[24]
Vi- taldb, a high-fidelity multi-parameter vital signs database in surgical patients.Scientific Data, 9(1):279,
[Leeet al., 2022 ] Hyung-Chul Lee, Yoonsang Park, et al. Vi- taldb, a high-fidelity multi-parameter vital signs database in surgical patients.Scientific Data, 9(1):279,
2022
-
[25]
Neural fragility as an eeg marker of the seizure onset zone.Nature neuroscience, 24(10):1465–1474,
[Liet al., 2021 ] Adam Li, Chester Huynh, et al. Neural fragility as an eeg marker of the seizure onset zone.Nature neuroscience, 24(10):1465–1474,
2021
-
[26]
A new, short-recorded photoplethysmogram dataset for blood pressure monitoring in china.Scientific Data, 5,
[Lianget al., 2018 ] Yongbo Liang, Zhencheng Chen, et al. A new, short-recorded photoplethysmogram dataset for blood pressure monitoring in china.Scientific Data, 5,
2018
-
[27]
Longitudinal wrist ppg analysis for reliable hypertension risk screening using deep learning
[Linet al., 2025 ] Hui Lin, Jiyang Li, et al. Longitudinal wrist ppg analysis for reliable hypertension risk screening using deep learning. InICASSP, pages 1–5,
2025
-
[28]
Sample entropy analysis for the estimating depth of anaesthesia through human eeg signal at different levels of unconsciousness during surgeries.PeerJ, 6,
[Liuet al., 2018 ] Quan Liu, Li Ma, et al. Sample entropy analysis for the estimating depth of anaesthesia through human eeg signal at different levels of unconsciousness during surgeries.PeerJ, 6,
2018
-
[29]
A large-scale multi-label 12-lead electrocardiogram database with stan- dardized diagnostic statements.Scientific data, 9(1):272,
[Liuet al., 2022 ] Hui Liu, Dan Chen, et al. A large-scale multi-label 12-lead electrocardiogram database with stan- dardized diagnostic statements.Scientific data, 9(1):272,
2022
-
[30]
[Liuet al., 2024 ] Che Liu, Zhongwei Wan, Ouyang Cheng, Anand Shah, Wenjia Bai, and Rossella Arcucci. Zero-shot ecg classification with multimodal learning and test-time clinical knowledge enhancement.ArXiv, abs/2403.06659,
arXiv 2024
-
[31]
Cl-mae: Curriculum-learned masked autoencoders
[Madanet al., 2023 ] Neelu Madan, Nicolae-C ˘at˘alin Ristea, et al. Cl-mae: Curriculum-learned masked autoencoders. WACV, pages 2480–2490,
2023
-
[32]
Pulse transit time ppg dataset.PhysioNet, 10:e215– e220,
[Mehrgardtet al., 2022 ] Philip Mehrgardt, Matloob Khushi, et al. Pulse transit time ppg dataset.PhysioNet, 10:e215– e220,
2022
-
[33]
A dataset of scalp eeg recordings of alzheimer’s disease, frontotemporal dementia and healthy subjects from routine eeg.Data, 8(6):95,
[Miltiadouset al., 2023 ] Andreas Miltiadous, Katerina D Tzimourta, et al. A dataset of scalp eeg recordings of alzheimer’s disease, frontotemporal dementia and healthy subjects from routine eeg.Data, 8(6):95,
2023
-
[34]
Guiding masked representation learning to capture spatio-temporal relationship of electrocardiogram
[Naet al., 2024 ] Yeongyeon Na, Minje Park, et al. Guiding masked representation learning to capture spatio-temporal relationship of electrocardiogram. InICLR, pages 15012– 15035,
2024
-
[35]
Multicenter intracranial eeg dataset for classification of graphoelements and artifactual signals.Scientific Data, 7,
[Nejedlyet al., 2020 ] Petr Nejedly, V ´aclav Kremen, et al. Multicenter intracranial eeg dataset for classification of graphoelements and artifactual signals.Scientific Data, 7,
2020
-
[36]
Brno university of technology smartphone ppg database (but ppg).PhysioNet, 101:e215–e220,
[Nemcovaet al., 2021 ] Andrea Nemcova, Radovan Smisek, et al. Brno university of technology smartphone ppg database (but ppg).PhysioNet, 101:e215–e220,
2021
-
[37]
[Nget al., 2018 ] Eddie Y . K. Ng, Feifei Liu, et al. An open access database for evaluating the algorithms of electrocar- diogram rhythm and morphology abnormality detection. Journal of Medical Imaging and Health Informatics,
2018
-
[38]
Graph-based analysis of brain con- nectivity in schizophrenia.PLoS ONE, 12,
[Olejarczyk and Jernajczyk, 2017] Elzbieta Olejarczyk and Wojciech Jernajczyk. Graph-based analysis of brain con- nectivity in schizophrenia.PLoS ONE, 12,
2017
-
[39]
Pa- pagei: Open foundation models for optical physiological signals
[Pillaiet al., 2025 ] Arvind Pillai, Dimitris Spathis, et al. Pa- pagei: Open foundation models for optical physiological signals. InICLR, pages 48230–48261,
2025
-
[40]
The sleep heart health study: design, rationale, and meth- ods.Sleep, 20(12):1077–1085,
[Quanet al., 1997 ] Stuart F Quan, Barbara V Howard, et al. The sleep heart health study: design, rationale, and meth- ods.Sleep, 20(12):1077–1085,
1997
-
[41]
Dynamic prototype rehearsal for continual ecg ar- rhythmia detection.ICASSP, pages 1–5,
[Rahmaniet al., 2025 ] Sana Rahmani, Reetam Chatterjee, et al. Dynamic prototype rehearsal for continual ecg ar- rhythmia detection.ICASSP, pages 1–5,
2025
-
[42]
Deep ppg: Large-scale heart rate estimation with convolutional neural networks.Sensors, 19(14):3079,
[Reisset al., 2019 ] Attila Reiss, Ina Indlekofer, et al. Deep ppg: Large-scale heart rate estimation with convolutional neural networks.Sensors, 19(14):3079,
2019
-
[43]
Ribeiro, Gabriela M
[Ribeiroet al., 2021 ] Antˆonio H. Ribeiro, Gabriela M. M. Paix˜ao, et al. Code-15%: a large scale annotated dataset of 12-lead ecgs.Zenodo, Jun,
2021
-
[44]
[Sahaet al., 2025 ] Mithun Saha, Maxwell A Xu, et al. Pulse- ppg: An open-source field-trained ppg foundation model for wearable applications across lab and field settings.Pro- ceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, pages 1–35,
2025
-
[45]
In- troducing wesad, a multimodal dataset for wearable stress and affect detection.Proceedings of the 20th ACM Inter- national Conference on Multimodal Interaction,
[Schmidtet al., 2018 ] Philip Schmidt, Attila Reiss, et al. In- troducing wesad, a multimodal dataset for wearable stress and affect detection.Proceedings of the 20th ACM Inter- national Conference on Multimodal Interaction,
2018
-
[46]
PhD thesis, Massachusetts Institute of Technology,
[Shoeb, 2009] Ali Hossam Shoeb.Application of machine learning to epileptic seizure onset detection and treatment. PhD thesis, Massachusetts Institute of Technology,
2009
-
[47]
The european st-t database: standard for evaluating systems for the analysis of st-t changes in ambulatory elec- trocardiography.European heart journal, pages 1164–72,
[Taddeiet al., 1992 ] Alessandro Taddei, Giovanni Distante, et al. The european st-t database: standard for evaluating systems for the analysis of st-t changes in ambulatory elec- trocardiography.European heart journal, pages 1164–72,
1992
-
[48]
Sleepfm: Multi-modal representation learning for sleep across brain activity, ecg and respiratory signals
[Thapaet al., 2024 ] Rahul Thapa, Bryan He, et al. Sleepfm: Multi-modal representation learning for sleep across brain activity, ecg and respiratory signals. InICML, pages 48019–48037,
2024
-
[49]
St petersburg incart 12-lead arrhythmia database.Phys- ioBank PhysioToolkit and PhysioNet,
[Tihonenkoet al., 2008 ] V Tihonenko, A Khaustov, et al. St petersburg incart 12-lead arrhythmia database.Phys- ioBank PhysioToolkit and PhysioNet,
2008
-
[50]
The two decades brainclinics research archive for insights in neurophysiology (tdbrain) database.Scien- tific data, 9(1):333,
[Van Dijket al., 2022 ] Hanneke Van Dijk, Guido Van Win- gen, et al. The two decades brainclinics research archive for insights in neurophysiology (tdbrain) database.Scien- tific data, 9(1):333,
2022
-
[51]
Ptb-xl, a large publicly available electrocardiography dataset.Scientific Data, 7,
[Wagneret al., 2020 ] Patrick Wagner, Nils Strodthoff, et al. Ptb-xl, a large publicly available electrocardiography dataset.Scientific Data, 7,
2020
-
[52]
Mdd patients and healthy con- trols eeg data (new).figshare, Dataset,
[Wajid, 2016] Mumtaz Wajid. Mdd patients and healthy con- trols eeg data (new).figshare, Dataset,
2016
-
[53]
Med- former: A multi-granularity patching transformer for med- ical time-series classification
[Wanget al., 2024 ] Yihe Wang, Nan Huang, et al. Med- former: A multi-granularity patching transformer for med- ical time-series classification. InNeurIPS, pages 36314– 36341,
2024
-
[54]
Cbramod: A criss-cross brain foundation model for eeg decoding
[Wanget al., 2025 ] Jiquan Wang, Sha Zhao, et al. Cbramod: A criss-cross brain foundation model for eeg decoding. In ICLR, pages 75310–75346,
2025
-
[55]
Diffusion models as masked autoencoders.ICCV, pages 16238–16248,
[Weiet al., 2023 ] Chen Wei, Karttikeya Mangalam, et al. Diffusion models as masked autoencoders.ICCV, pages 16238–16248,
2023
-
[56]
Biot: Biosignal transformer for cross-data learning in the wild
[Yanget al., 2023 ] Chaoqi Yang, M Westover, and Jimeng Sun. Biot: Biosignal transformer for cross-data learning in the wild. InNeurIPS, pages 78240–78260,
2023
-
[57]
A large scale 12-lead electrocardiogram database for arrhythmia study (version 1.0
[Zhenget al., 2022 ] Jianwei Zheng, Hangyuan Guo, and Huimin Chu. A large scale 12-lead electrocardiogram database for arrhythmia study (version 1.0. 0).PhysioNet, 23:7,
2022
-
[58]
[Zhouet al., 2025 ] Yuchen Zhou, Jiamin Wu, et al. Csbrain: A cross-scale spatiotemporal brain foundation model for eeg decoding.arXiv preprint arXiv:2506.23075,
arXiv 2025
-
[59]
words” and rhythms as “sentences
Appendix 1 A More Details on Experimental Setup 2 A.1 Baselines 3 MOMENT [Goswamiet al., 2024 ] is a foundation model for multivariate time-series signals across domains (e.g., healthcare,4 engineering, finance). It segments each series into fixed-length patch tokens and pretrains via masked time-series prediction,5 reconstructing masked patches to learn ...
arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.