Uncertainty-Driven Anomaly Detection for Psychotic Relapse Using Smartwatches: Forecasting and Multi-Task Learning Fusion
Pith reviewed 2026-05-14 19:16 UTC · model grok-4.3
Add this Pith Number to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{QVFEHGXI}
Prints a linked pith:QVFEHGXI badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
Late fusion of cardiac forecasting and multi-task sleep-motion models on smartwatches detects psychotic relapse with an 8% improvement over the winning baseline.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that a late-fusion strategy combining anomaly scores from a cardiac forecasting pipeline and a multi-task learning pipeline for fused sleep-motion-cardiac signals, each using Transformer encoders and uncertainty estimation via MLP ensembles, achieves an 8% relative improvement over the competition-winning baseline on the 2nd e-Prevention Grand Challenge dataset, indicating that diverse digital phenotypes are essential for high-fidelity psychotic relapse detection.
What carries the argument
Late-fusion of uncertainty-based anomaly scores from Transformer-based cardiac forecasting and multi-task sleep-motion-cardiac embedding models.
If this is right
- Cardiac forecasting deviations serve as reliable indicators of abnormality when combined with other signals.
- Multi-task formulation captures time-aware embeddings that complement pure forecasting.
- Ensemble uncertainty estimation improves robustness against real-world wearable noise.
- The fused anomaly score enables better daily relapse prediction than single-modality approaches.
- Integration of cardiac, motion, and sleep data is required for optimal performance in real settings.
Where Pith is reading between the lines
- Similar fusion techniques could extend to monitoring other psychiatric conditions using passive wearable sensing.
- Deploying the model on-device might support proactive interventions before full relapse occurs.
- Larger longitudinal studies could test whether the anomaly scores predict relapse onset by days or weeks.
- The approach might generalize to anomaly detection in other physiological time-series from consumer devices.
Load-bearing premise
Deviations flagged as anomalies by the uncertainty scores directly correspond to clinical psychotic relapse events and not to other behavioral changes or sensor artifacts.
What would settle it
Observing a high rate of anomaly flags on days without confirmed clinical relapse or failure to flag on confirmed relapse days in a new validation cohort would falsify the claim that the scores indicate psychotic relapse.
Figures
read the original abstract
Digital phenotyping enables continuous passive monitoring of behavior and physiology, offering a promising paradigm for early detection of psychotic relapse. In this work, we develop and systematically study two smartwatch-based frameworks for daily relapse detection. The first forecasts cardiac dynamics and flags deviations between predicted and observed features as indicators of abnormality. The second adopts a multi-task formulation that fuses sleep with motion and cardiac-derived signals, learning time-aware embeddings and predicting measurement timing. Both pipelines use Transformer encoders and output a daily anomaly score, derived from predictive uncertainty estimated via an ensemble of multilayer perceptrons to improve robustness to real-world wearable variability. While each framework independently demonstrates strong predictive power, we show that they capture complementary physiological signatures. Consequently, we propose a late-fusion strategy that synergistically combines the anomaly signals from both architectures into a unified decision score. We benchmark our methodology on the 2nd e-Prevention Grand Challenge dataset, where our fused model achieves a 8% relative improvement over the competition-winning baseline. Our results, supported by extensive ablation studies, suggest that the integration of diverse digital phenotypes, cardiac, motion, and sleep, is essential for the high-fidelity detection of psychotic relapse in real-world settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops two smartwatch-based frameworks for daily psychotic relapse detection. The first forecasts cardiac dynamics and uses deviations from predictions as anomaly indicators. The second employs multi-task learning to fuse sleep, motion, and cardiac signals via Transformer encoders and time-aware embeddings. Both derive daily anomaly scores from ensemble-MLP predictive uncertainty. A late-fusion strategy combines the scores, yielding an 8% relative improvement over the competition-winning baseline on the 2nd e-Prevention Grand Challenge dataset, with supporting ablation studies.
Significance. If the empirical result holds after proper statistical validation, the work shows that fusing complementary digital phenotypes (cardiac, motion, sleep) via uncertainty-driven anomaly detection can improve relapse forecasting in real-world wearable settings, with potential value for continuous passive monitoring in psychiatry.
major comments (2)
- [Abstract] Abstract and Results: The central claim of an 8% relative improvement is reported without error bars, dataset size, cross-validation details, or statistical significance tests. This directly affects interpretability of the benchmark gain over the baseline.
- [Evaluation] Evaluation section: No analysis demonstrates that high anomaly scores are enriched for clinically labeled relapse days after controlling for confounders such as activity levels, medication effects, or sensor artifacts. This is load-bearing for the claim that the scores detect relapse-specific events rather than general wearable variability.
minor comments (2)
- [Abstract] Abstract: The phrase 'high-fidelity detection' is used without reference to the specific metrics (e.g., AUC, precision-recall) that support it.
- [Methods] The description of the late-fusion strategy would benefit from an explicit equation or diagram showing how the two anomaly scores are combined into the final decision score.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important aspects for improving the interpretability and robustness of our claims. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract and Results: The central claim of an 8% relative improvement is reported without error bars, dataset size, cross-validation details, or statistical significance tests. This directly affects interpretability of the benchmark gain over the baseline.
Authors: We agree that these details enhance interpretability. In the revised manuscript, we will update the abstract to report the dataset size (number of participants and observation days from the 2nd e-Prevention Grand Challenge), specify the cross-validation strategy (subject-independent folds to prevent leakage), include error bars or standard deviations from the ensemble-MLP uncertainty estimates, and add statistical significance testing (e.g., paired Wilcoxon signed-rank test) for the reported 8% relative improvement. These elements are already computed and described in the full evaluation but will be explicitly summarized in the abstract and results. revision: yes
-
Referee: [Evaluation] Evaluation section: No analysis demonstrates that high anomaly scores are enriched for clinically labeled relapse days after controlling for confounders such as activity levels, medication effects, or sensor artifacts. This is load-bearing for the claim that the scores detect relapse-specific events rather than general wearable variability.
Authors: This is a valid concern for validating specificity. We will add a new subsection in the revised evaluation that performs controlled comparisons: we will stratify or regress anomaly scores against activity levels (using the motion features already in the multi-task model) and sensor artifact indicators (e.g., signal quality flags from the smartwatch data) to test enrichment on clinically labeled relapse days. For medication effects, the dataset does not include granular logs, so we will explicitly discuss this as a limitation while demonstrating controls on the available variables; ablations already show the fused model outperforms single-modality baselines, supporting that gains are not solely due to general variability. revision: partial
Circularity Check
No significant circularity detected
full rationale
The paper describes two empirical ML pipelines (cardiac forecasting with Transformer encoders plus ensemble-MLP uncertainty, and multi-task sleep/motion fusion) whose outputs are anomaly scores evaluated on the external e-Prevention Grand Challenge dataset. The reported 8% relative improvement is a measured performance number on held-out benchmark data, not a quantity obtained by algebraic reduction, parameter renaming, or self-citation of a uniqueness theorem. No equations are presented that define the target relapse signal in terms of the model's own fitted outputs, and the architectures are standard supervised components whose training objectives do not presuppose the final fusion gain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The ground-truth relapse labels in the 2nd e-Prevention Grand Challenge dataset accurately reflect clinical psychotic relapse events.
Reference graph
Works this paper leans on
-
[1]
Opportunities and challenges in the collection and anal- ysis of digital phenotyping data,
J.-P. Onnela, “Opportunities and challenges in the collection and anal- ysis of digital phenotyping data,”Neuropsychopharmacology, vol. 46, pp. 45–54, 2021
work page 2021
-
[2]
Digital phenotyping: a global tool for psychiatry,
T. R. Insel, “Digital phenotyping: a global tool for psychiatry,”World Psychiatry, vol. 17, no. 3, p. 276, 2018
work page 2018
-
[3]
Automatic recognition of schizophrenia from facial videos using 3D convolutional neural net- work,
J. Huang, Y . Zhao, W. Qu, and Z. Tian, “Automatic recognition of schizophrenia from facial videos using 3D convolutional neural net- work,”
-
[4]
D. A. Adler, D. Ben-Zeev, V . W. S. Tseng, J. M. Kane,et al., “Predicting early warning signs of psychotic relapse from passive sensing data: an approach using encoder-decoder neural networks,”JMIR mHealth and uHealth, vol. 8, p. e19962, 2020
work page 2020
-
[5]
E. Rodriguez-Villa, U. M. Mehta, J. Naslund, D. Tugnawat,et al., “Smartphone Health Assessment for Relapse Prevention (SHARP): a digital solution toward global mental health,”BJPsych Open, vol. 7, p. e29, 2021
work page 2021
-
[6]
Relapse prediction in schizophrenia through digital phenotyping: a pilot study,
I. Barnett, J. Torous, P. Staples, L. Sandoval,et al., “Relapse prediction in schizophrenia through digital phenotyping: a pilot study,”Neuropsy- chopharmacology, vol. 43, pp. 1660–1666, 2018
work page 2018
-
[7]
B. Lamichhane, J. Zhou, and A. Sano, “Psychotic relapse prediction in schizophrenia patients using a personalized mobile sensing-based supervised deep learning model,”IEEE Journal of Biomedical and Health Informatics, vol. 27, pp. 3246–3257, 2023
work page 2023
-
[8]
From digital phenotype identification to detection of psychotic relapses,
N. Efthymiou, G. Retsinas, P. P. Filntisis, C. Garoufis,et al., “From digital phenotype identification to detection of psychotic relapses,” in 2023 Proc. ICHI, 2023
work page 2023
-
[9]
A. Y . Yan, T. J. Speed, and C. O. Taylor, “Relapse prediction using wearable data through convolutional autoencoders and clustering for patients with psychotic disorders,”Scientific Reports, vol. 15, p. 18806, 2025
work page 2025
-
[10]
M. M. Misgar and M. P. S. Bhatia, “Unveiling psychotic disorder patterns: A deep learning model analysing motor activity time-series data with explainable AI,”Biomedical Signal Processing and Control, vol. 91, p. 106000, 2024
work page 2024
-
[11]
P. P. Filntisis, N. Efthymiou, G. Retsinas, A. Zlatintsi,et al., “The 2nd E- Prevention Challenge: Psychotic and Non-Psychotic Relapse Detection Using Wearable-Based Digital Phenotyping,” inProc. ICASSP, 2024
work page 2024
-
[12]
A. Hein, S. Gronauer, and K. Diepold, “Patient-specific modeling of daily activity patterns for unsupervised detection of psychotic and non- psychotic relapses,” in2024 ICASSPW, 2024
work page 2024
-
[13]
A. Mallol-Ragolta, A. Spiesberger, A. Triantafyllopoulos, and B. Schuller, “Personalised anomaly detectors and prototypical represen- tations for relapse detection from wearable-based digital phenotyping,” inICASSPW, 2024
work page 2024
-
[14]
J. Wu, and M. Tu, “Unsupervised relapse detection using wearable-based digital phenotyping for the 2nd e-prevention challenge, ” inICASSPW, 2024
work page 2024
-
[15]
E. Kalisperakis, T. Karantinos, M. Lazaridi, V . Garyfalli,et al., “Smart- watch digital phenotypes predict positive and negative symptom vari- ation in a longitudinal monitoring study of patients with psychotic disorders,”Frontiers in Psychiatry, vol. 14, 2023
work page 2023
-
[16]
A. Zlatintsi, P. P. Filntisis, C. Garoufis, N. Efthymiou,et al., “E- prevention: Advanced support system for monitoring and relapse preven- tion in patients with psychotic disorders analyzing long-term multimodal data from wearables and video captures,”Sensors, vol. 22, p. 7544, 2022
work page 2022
-
[17]
G. Retsinas, P. P. Filntisis, N. Efthymiou, and E. Theodosis, “Person identification using deep convolutional neural networks on short-term signals from wearable sensors,” inICASSP, 2020
work page 2020
-
[18]
Train short, test long: Attention with linear biases enables input length extrapolation,
O. Press, N. A. Smith, and M. Lewis, “Train short, test long: Attention with linear biases enables input length extrapolation,” inProc. ICLR, 2022
work page 2022
-
[19]
Roformer: Enhanced transformer with rotary position embedding,
J. Su, M. Ahmed, Y . Lu, and S. Pan, “Roformer: Enhanced transformer with rotary position embedding,”Neurocomputing, vol. 568, p. 127063, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.