arxiv: 2605.13816 · v1 · pith:QVFEHGXInew · submitted 2026-05-13 · 💻 cs.LG

Uncertainty-Driven Anomaly Detection for Psychotic Relapse Using Smartwatches: Forecasting and Multi-Task Learning Fusion

Nikolaos Tsalkitzis , Panagiotis P.Filntisis , Petros Maragos , Niki Efthymiou This is my paper

Pith reviewed 2026-05-14 19:16 UTC · model grok-4.3

classification 💻 cs.LG

keywords psychotic relapse detectionsmartwatch monitoringanomaly detectionmulti-task learningtransformer encodersdigital phenotypinguncertainty estimationwearable sensors

0 comments p. Extension

Add this Pith Number to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{QVFEHGXI}

Prints a linked pith:QVFEHGXI badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Late fusion of cardiac forecasting and multi-task sleep-motion models on smartwatches detects psychotic relapse with an 8% improvement over the winning baseline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents two frameworks for detecting psychotic relapse from smartwatch data: one that forecasts cardiac dynamics and flags prediction deviations as anomalies, and another that uses multi-task learning to integrate sleep, motion, and cardiac signals via Transformer encoders while predicting measurement timing. Both generate daily anomaly scores based on predictive uncertainty from ensembles of multilayer perceptrons. When these signals are combined through late fusion, the resulting model shows stronger performance than either alone. The work demonstrates that integrating diverse digital phenotypes from wearables is crucial for accurate relapse detection in everyday conditions, as validated on a public challenge dataset.

Core claim

The authors establish that a late-fusion strategy combining anomaly scores from a cardiac forecasting pipeline and a multi-task learning pipeline for fused sleep-motion-cardiac signals, each using Transformer encoders and uncertainty estimation via MLP ensembles, achieves an 8% relative improvement over the competition-winning baseline on the 2nd e-Prevention Grand Challenge dataset, indicating that diverse digital phenotypes are essential for high-fidelity psychotic relapse detection.

What carries the argument

Late-fusion of uncertainty-based anomaly scores from Transformer-based cardiac forecasting and multi-task sleep-motion-cardiac embedding models.

If this is right

Cardiac forecasting deviations serve as reliable indicators of abnormality when combined with other signals.
Multi-task formulation captures time-aware embeddings that complement pure forecasting.
Ensemble uncertainty estimation improves robustness against real-world wearable noise.
The fused anomaly score enables better daily relapse prediction than single-modality approaches.
Integration of cardiac, motion, and sleep data is required for optimal performance in real settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar fusion techniques could extend to monitoring other psychiatric conditions using passive wearable sensing.
Deploying the model on-device might support proactive interventions before full relapse occurs.
Larger longitudinal studies could test whether the anomaly scores predict relapse onset by days or weeks.
The approach might generalize to anomaly detection in other physiological time-series from consumer devices.

Load-bearing premise

Deviations flagged as anomalies by the uncertainty scores directly correspond to clinical psychotic relapse events and not to other behavioral changes or sensor artifacts.

What would settle it

Observing a high rate of anomaly flags on days without confirmed clinical relapse or failure to flag on confirmed relapse days in a new validation cohort would falsify the claim that the scores indicate psychotic relapse.

Figures

Figures reproduced from arXiv: 2605.13816 by Niki Efthymiou, Nikolaos Tsalkitzis, Panagiotis P.Filntisis, Petros Maragos.

**Figure 1.** Figure 1: The proposed transformer-based relapse and anomaly detection framework where windowed wearable features are [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Patient-wise comparison of forecasting and multi-task [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

Digital phenotyping enables continuous passive monitoring of behavior and physiology, offering a promising paradigm for early detection of psychotic relapse. In this work, we develop and systematically study two smartwatch-based frameworks for daily relapse detection. The first forecasts cardiac dynamics and flags deviations between predicted and observed features as indicators of abnormality. The second adopts a multi-task formulation that fuses sleep with motion and cardiac-derived signals, learning time-aware embeddings and predicting measurement timing. Both pipelines use Transformer encoders and output a daily anomaly score, derived from predictive uncertainty estimated via an ensemble of multilayer perceptrons to improve robustness to real-world wearable variability. While each framework independently demonstrates strong predictive power, we show that they capture complementary physiological signatures. Consequently, we propose a late-fusion strategy that synergistically combines the anomaly signals from both architectures into a unified decision score. We benchmark our methodology on the 2nd e-Prevention Grand Challenge dataset, where our fused model achieves a 8% relative improvement over the competition-winning baseline. Our results, supported by extensive ablation studies, suggest that the integration of diverse digital phenotypes, cardiac, motion, and sleep, is essential for the high-fidelity detection of psychotic relapse in real-world settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper fuses cardiac forecasting anomalies with multi-task sleep-motion embeddings via uncertainty ensembles for an 8% gain on e-Prevention relapse detection, but the result rests on unverified mapping from general anomalies to clinical events.

read the letter

The main thing to know is that this work runs two parallel pipelines on smartwatch data—one forecasting cardiac dynamics to flag prediction errors as anomalies, the other using multi-task learning to embed sleep, motion, and cardiac signals while predicting measurement timing—then late-fuses the uncertainty-derived scores from both to detect psychotic relapse. They report an 8% relative lift over the prior competition winner on the 2nd e-Prevention dataset, with ablations suggesting the fusion and the three-signal combination each help.

Referee Report

2 major / 2 minor

Summary. The paper develops two smartwatch-based frameworks for daily psychotic relapse detection. The first forecasts cardiac dynamics and uses deviations from predictions as anomaly indicators. The second employs multi-task learning to fuse sleep, motion, and cardiac signals via Transformer encoders and time-aware embeddings. Both derive daily anomaly scores from ensemble-MLP predictive uncertainty. A late-fusion strategy combines the scores, yielding an 8% relative improvement over the competition-winning baseline on the 2nd e-Prevention Grand Challenge dataset, with supporting ablation studies.

Significance. If the empirical result holds after proper statistical validation, the work shows that fusing complementary digital phenotypes (cardiac, motion, sleep) via uncertainty-driven anomaly detection can improve relapse forecasting in real-world wearable settings, with potential value for continuous passive monitoring in psychiatry.

major comments (2)

[Abstract] Abstract and Results: The central claim of an 8% relative improvement is reported without error bars, dataset size, cross-validation details, or statistical significance tests. This directly affects interpretability of the benchmark gain over the baseline.
[Evaluation] Evaluation section: No analysis demonstrates that high anomaly scores are enriched for clinically labeled relapse days after controlling for confounders such as activity levels, medication effects, or sensor artifacts. This is load-bearing for the claim that the scores detect relapse-specific events rather than general wearable variability.

minor comments (2)

[Abstract] Abstract: The phrase 'high-fidelity detection' is used without reference to the specific metrics (e.g., AUC, precision-recall) that support it.
[Methods] The description of the late-fusion strategy would benefit from an explicit equation or diagram showing how the two anomaly scores are combined into the final decision score.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important aspects for improving the interpretability and robustness of our claims. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract and Results: The central claim of an 8% relative improvement is reported without error bars, dataset size, cross-validation details, or statistical significance tests. This directly affects interpretability of the benchmark gain over the baseline.

Authors: We agree that these details enhance interpretability. In the revised manuscript, we will update the abstract to report the dataset size (number of participants and observation days from the 2nd e-Prevention Grand Challenge), specify the cross-validation strategy (subject-independent folds to prevent leakage), include error bars or standard deviations from the ensemble-MLP uncertainty estimates, and add statistical significance testing (e.g., paired Wilcoxon signed-rank test) for the reported 8% relative improvement. These elements are already computed and described in the full evaluation but will be explicitly summarized in the abstract and results. revision: yes
Referee: [Evaluation] Evaluation section: No analysis demonstrates that high anomaly scores are enriched for clinically labeled relapse days after controlling for confounders such as activity levels, medication effects, or sensor artifacts. This is load-bearing for the claim that the scores detect relapse-specific events rather than general wearable variability.

Authors: This is a valid concern for validating specificity. We will add a new subsection in the revised evaluation that performs controlled comparisons: we will stratify or regress anomaly scores against activity levels (using the motion features already in the multi-task model) and sensor artifact indicators (e.g., signal quality flags from the smartwatch data) to test enrichment on clinically labeled relapse days. For medication effects, the dataset does not include granular logs, so we will explicitly discuss this as a limitation while demonstrating controls on the available variables; ablations already show the fused model outperforms single-modality baselines, supporting that gains are not solely due to general variability. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes two empirical ML pipelines (cardiac forecasting with Transformer encoders plus ensemble-MLP uncertainty, and multi-task sleep/motion fusion) whose outputs are anomaly scores evaluated on the external e-Prevention Grand Challenge dataset. The reported 8% relative improvement is a measured performance number on held-out benchmark data, not a quantity obtained by algebraic reduction, parameter renaming, or self-citation of a uniqueness theorem. No equations are presented that define the target relapse signal in terms of the model's own fitted outputs, and the architectures are standard supervised components whose training objectives do not presuppose the final fusion gain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the e-Prevention dataset provides accurate daily relapse labels and that wearable signals contain detectable precursors to relapse. No new physical entities or ad-hoc constants are introduced.

axioms (1)

domain assumption The ground-truth relapse labels in the 2nd e-Prevention Grand Challenge dataset accurately reflect clinical psychotic relapse events.
All performance numbers are computed against these labels.

pith-pipeline@v0.9.0 · 5536 in / 1268 out tokens · 52642 ms · 2026-05-14T19:16:16.824307+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Opportunities and challenges in the collection and anal- ysis of digital phenotyping data,

J.-P. Onnela, “Opportunities and challenges in the collection and anal- ysis of digital phenotyping data,”Neuropsychopharmacology, vol. 46, pp. 45–54, 2021

work page 2021
[2]

Digital phenotyping: a global tool for psychiatry,

T. R. Insel, “Digital phenotyping: a global tool for psychiatry,”World Psychiatry, vol. 17, no. 3, p. 276, 2018

work page 2018
[3]

Automatic recognition of schizophrenia from facial videos using 3D convolutional neural net- work,

J. Huang, Y . Zhao, W. Qu, and Z. Tian, “Automatic recognition of schizophrenia from facial videos using 3D convolutional neural net- work,”

work page
[4]

Predicting early warning signs of psychotic relapse from passive sensing data: an approach using encoder-decoder neural networks,

D. A. Adler, D. Ben-Zeev, V . W. S. Tseng, J. M. Kane,et al., “Predicting early warning signs of psychotic relapse from passive sensing data: an approach using encoder-decoder neural networks,”JMIR mHealth and uHealth, vol. 8, p. e19962, 2020

work page 2020
[5]

Smartphone Health Assessment for Relapse Prevention (SHARP): a digital solution toward global mental health,

E. Rodriguez-Villa, U. M. Mehta, J. Naslund, D. Tugnawat,et al., “Smartphone Health Assessment for Relapse Prevention (SHARP): a digital solution toward global mental health,”BJPsych Open, vol. 7, p. e29, 2021

work page 2021
[6]

Relapse prediction in schizophrenia through digital phenotyping: a pilot study,

I. Barnett, J. Torous, P. Staples, L. Sandoval,et al., “Relapse prediction in schizophrenia through digital phenotyping: a pilot study,”Neuropsy- chopharmacology, vol. 43, pp. 1660–1666, 2018

work page 2018
[7]

Psychotic relapse prediction in schizophrenia patients using a personalized mobile sensing-based supervised deep learning model,

B. Lamichhane, J. Zhou, and A. Sano, “Psychotic relapse prediction in schizophrenia patients using a personalized mobile sensing-based supervised deep learning model,”IEEE Journal of Biomedical and Health Informatics, vol. 27, pp. 3246–3257, 2023

work page 2023
[8]

From digital phenotype identification to detection of psychotic relapses,

N. Efthymiou, G. Retsinas, P. P. Filntisis, C. Garoufis,et al., “From digital phenotype identification to detection of psychotic relapses,” in 2023 Proc. ICHI, 2023

work page 2023
[9]

Relapse prediction using wearable data through convolutional autoencoders and clustering for patients with psychotic disorders,

A. Y . Yan, T. J. Speed, and C. O. Taylor, “Relapse prediction using wearable data through convolutional autoencoders and clustering for patients with psychotic disorders,”Scientific Reports, vol. 15, p. 18806, 2025

work page 2025
[10]

Unveiling psychotic disorder patterns: A deep learning model analysing motor activity time-series data with explainable AI,

M. M. Misgar and M. P. S. Bhatia, “Unveiling psychotic disorder patterns: A deep learning model analysing motor activity time-series data with explainable AI,”Biomedical Signal Processing and Control, vol. 91, p. 106000, 2024

work page 2024
[11]

The 2nd E- Prevention Challenge: Psychotic and Non-Psychotic Relapse Detection Using Wearable-Based Digital Phenotyping,

P. P. Filntisis, N. Efthymiou, G. Retsinas, A. Zlatintsi,et al., “The 2nd E- Prevention Challenge: Psychotic and Non-Psychotic Relapse Detection Using Wearable-Based Digital Phenotyping,” inProc. ICASSP, 2024

work page 2024
[12]

Patient-specific modeling of daily activity patterns for unsupervised detection of psychotic and non- psychotic relapses,

A. Hein, S. Gronauer, and K. Diepold, “Patient-specific modeling of daily activity patterns for unsupervised detection of psychotic and non- psychotic relapses,” in2024 ICASSPW, 2024

work page 2024
[13]

Personalised anomaly detectors and prototypical represen- tations for relapse detection from wearable-based digital phenotyping,

A. Mallol-Ragolta, A. Spiesberger, A. Triantafyllopoulos, and B. Schuller, “Personalised anomaly detectors and prototypical represen- tations for relapse detection from wearable-based digital phenotyping,” inICASSPW, 2024

work page 2024
[14]

Unsupervised relapse detection using wearable-based digital phenotyping for the 2nd e-prevention challenge,

J. Wu, and M. Tu, “Unsupervised relapse detection using wearable-based digital phenotyping for the 2nd e-prevention challenge, ” inICASSPW, 2024

work page 2024
[15]

Smart- watch digital phenotypes predict positive and negative symptom vari- ation in a longitudinal monitoring study of patients with psychotic disorders,

E. Kalisperakis, T. Karantinos, M. Lazaridi, V . Garyfalli,et al., “Smart- watch digital phenotypes predict positive and negative symptom vari- ation in a longitudinal monitoring study of patients with psychotic disorders,”Frontiers in Psychiatry, vol. 14, 2023

work page 2023
[16]

Zlatintsi, P

A. Zlatintsi, P. P. Filntisis, C. Garoufis, N. Efthymiou,et al., “E- prevention: Advanced support system for monitoring and relapse preven- tion in patients with psychotic disorders analyzing long-term multimodal data from wearables and video captures,”Sensors, vol. 22, p. 7544, 2022

work page 2022
[17]

Person identification using deep convolutional neural networks on short-term signals from wearable sensors,

G. Retsinas, P. P. Filntisis, N. Efthymiou, and E. Theodosis, “Person identification using deep convolutional neural networks on short-term signals from wearable sensors,” inICASSP, 2020

work page 2020
[18]

Train short, test long: Attention with linear biases enables input length extrapolation,

O. Press, N. A. Smith, and M. Lewis, “Train short, test long: Attention with linear biases enables input length extrapolation,” inProc. ICLR, 2022

work page 2022
[19]

Roformer: Enhanced transformer with rotary position embedding,

J. Su, M. Ahmed, Y . Lu, and S. Pan, “Roformer: Enhanced transformer with rotary position embedding,”Neurocomputing, vol. 568, p. 127063, 2024

work page 2024