Recognition: no theorem link
From Data Lifting to Continuous Risk Estimation: A Process-Aware Pipeline for Predictive Monitoring of Clinical Pathways
Pith reviewed 2026-05-13 07:19 UTC · model grok-4.3
The pith
A process-aware pipeline using data lifting and prefix representations supports continuous risk estimation for clinical pathways, with accuracy increasing as patient data accumulates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that a pipeline from data lifting through temporal reconstruction and prefix construction enables predictive models to perform continuous risk estimation on clinical pathways. Using ICU admission in COVID-19 as the target, the models demonstrate increasing accuracy with pathway progression, highlighting that predictive signals strengthen over time in evolving trajectories.
What carries the argument
The prefix-based representation derived from lifted event logs, which encodes the sequence of events up to the current point in a patient's clinical pathway for use in predictive models.
Load-bearing premise
The case-level split and data lifting process produce unbiased prefix representations that accurately reflect real evolving trajectories without leakage or systematic missing data in the COVID-19 dataset.
What would settle it
A failure to observe increasing AUC as prefixes lengthen in additional patient cohorts, or evidence of performance drop due to leakage in the split, would falsify the claim.
Figures
read the original abstract
This paper presents a reproducible and process-aware pipeline for predictive monitoring of clinical pathways. The approach integrates data lifting, temporal reconstruction, event log construction, prefix-based representations, and predictive modeling to support continuous reasoning on partially observed patient trajectories, overcoming the limitations of traditional retrospective process mining. The framework is evaluated on COVID-19 clinical pathways using ICU admission as the prediction target, considering 4,479 patient cases and 46,804 prefixes. Predictive models are trained and evaluated using a case-level split, with 896 patients in the test set. Logistic Regression achieves the best performance (AUC 0.906, F1-score 0.835). A detailed prefix-based analysis shows that predictive performance improves progressively as new clinical events become available, with AUC increasing from 0.642 at early stages to 0.942 at later stages of the pathway. The results highlight two key findings: predictive signals emerge progressively along clinical pathways, and process-aware representations enable effective early risk estimation from evolving patient trajectories. Overall, the findings suggest that predictive monitoring in healthcare is best conceived as a continuous, dynamically aware process, in which risk estimates are progressively refined as the patient journey evolves.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a reproducible and process-aware pipeline for predictive monitoring of clinical pathways, integrating data lifting, temporal reconstruction, event log construction, prefix-based representations, and predictive modeling. Evaluated on a COVID-19 dataset with 4,479 patient cases, 46,804 prefixes, and ICU admission as the target, using case-level splitting with 896 patients in the test set, it reports that Logistic Regression achieves the best performance (AUC 0.906, F1-score 0.835). The central empirical finding is that predictive performance improves progressively as new events become available, with AUC increasing from 0.642 at early stages to 0.942 at later stages of the pathway.
Significance. If the no-leakage assumption in prefix construction holds, the work provides concrete evidence that process-aware representations support continuous risk estimation, with performance scaling as trajectories evolve. Strengths include the large dataset size, explicit case-level splitting to prevent cross-patient leakage, and the progressive prefix analysis that directly tests the dynamic nature of the claims. This offers a practical, reproducible framework with potential impact on real-time clinical decision support systems.
major comments (1)
- [Pipeline Description (Data Lifting and Prefix Construction)] The description of data lifting, temporal reconstruction, and prefix-based representations does not explicitly state that all features, aggregates, and embeddings are recomputed exclusively from the strict prefix available at each time point (i.e., without using future events or case-level global statistics). This verification is load-bearing for the headline result that AUC rises from 0.642 early to 0.942 late, because any non-local computation would introduce leakage and invalidate the progressive-performance claim.
minor comments (2)
- [Abstract] The abstract introduces 'data lifting' without a one-sentence definition or pointer to its role in ensuring prefix locality, which reduces immediate accessibility for readers outside process mining.
- [Evaluation] The evaluation section would benefit from a brief statement on the exact train/test split ratio and any stratification criteria beyond case-level independence.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback, which helps strengthen the clarity of our pipeline description. We address the major comment point by point below.
read point-by-point responses
-
Referee: The description of data lifting, temporal reconstruction, and prefix-based representations does not explicitly state that all features, aggregates, and embeddings are recomputed exclusively from the strict prefix available at each time point (i.e., without using future events or case-level global statistics). This verification is load-bearing for the headline result that AUC rises from 0.642 early to 0.942 late, because any non-local computation would introduce leakage and invalidate the progressive-performance claim.
Authors: We agree that an explicit statement on this point is essential for validating the no-leakage assumption underlying the progressive AUC results. In the implemented pipeline, all features, aggregates, and embeddings are strictly recomputed from the events present in each prefix only, with no access to future events or case-level global statistics; this is enforced by constructing independent prefix logs during temporal reconstruction and by limiting all computations (e.g., frequency counts, duration aggregates, and embeddings) to the prefix snapshot at each step. The case-level split further ensures no cross-patient information leakage. To address the referee's concern, we will revise the manuscript by adding a dedicated paragraph (and accompanying pseudocode) in the Methods section that explicitly describes this prefix-only recomputation process and its enforcement. This clarification will directly reinforce the validity of the reported performance progression without changing any empirical results. revision: yes
Circularity Check
No circularity: standard held-out evaluation on independent cases
full rationale
The paper reports empirical results from a standard ML pipeline (data lifting to prefixes, case-level split into 896 test patients, Logistic Regression training) with AUC/F1 computed on held-out prefixes. No equations, derivations, or self-citations reduce any reported prediction or performance number to a quantity fitted on the same data used for evaluation. The progressive AUC claim (0.642 early to 0.942 late) is obtained by direct evaluation on temporally ordered prefixes from the test set, not by construction or renaming of inputs. The derivation chain is self-contained against external benchmarks with no load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
R. Moore and J. Lopes. Paper templates. TEMPLATE'06, 1st International Conference on Template Production. 1999
work page 1999
-
[2]
J. Smith. The Book. 1998
work page 1998
-
[3]
Process mining: Data science in action , pages=
Data science in action , author=. Process mining: Data science in action , pages=. 2016 , publisher=
work page 2016
-
[4]
Process Mining in Healthcare: A Literature Review , journal =
Eric Rojas and Jorge Munoz-Gama and Marcos Sep. Process Mining in Healthcare: A Literature Review , journal =
-
[5]
Journal of Biomedical Informatics , volume=
Process mining for healthcare: Characteristics and challenges , author=. Journal of Biomedical Informatics , volume=. 2022 , publisher=
work page 2022
-
[6]
Automated Discovery of Process Models from Event Logs: Review and Benchmark , year=
Augusto, Adriano and Conforti, Raffaele and Dumas, Marlon and Rosa, Marcello La and Maggi, Fabrizio Maria and Marrella, Andrea and Mecella, Massimo and Soo, Allar , journal=. Automated Discovery of Process Models from Event Logs: Review and Benchmark , year=
-
[7]
Event abstraction in process mining: literature review and taxonomy , author=. Granular Computing , volume=. 2021 , publisher=
work page 2021
-
[8]
Wil M. P. van der Aalst and M. H. Schonenberg and Minseok Song , title =. Information Systems , volume =
-
[9]
International conference on advanced information systems engineering , pages=
Predictive business process monitoring with LSTM neural networks , author=. International conference on advanced information systems engineering , pages=. 2017 , organization=
work page 2017
-
[10]
Predicting process behaviour using deep learning , journal =
Joerg Evermann and Jana-Rebecca Rehse and Peter Fettke , keywords =. Predicting process behaviour using deep learning , journal =. 2017 , note =. doi:https://doi.org/10.1016/j.dss.2017.04.003 , url =
-
[11]
2020 2nd International Conference on Process Mining (ICPM) , pages=
Explainable predictive process monitoring , author=. 2020 2nd International Conference on Process Mining (ICPM) , pages=. 2020 , organization=
work page 2020
-
[12]
International conference on business process management , pages=
Explainability in predictive process monitoring: When understanding helps improving , author=. International conference on business process management , pages=. 2020 , organization=
work page 2020
-
[13]
Large language models encode clinical knowledge , author=. Nature , volume=. 2023 , publisher=
work page 2023
-
[14]
Global strategy on digital health 2020-2027 , author=. 2025 , publisher=
work page 2020
-
[15]
A. Ritor. COVID Data for Shared Learning (CDSL): A Comprehensive, Multimodal COVID-19 Dataset from HM Hospitales , howpublished =. 2024 , note =
work page 2024
-
[16]
Pasquale Ardimento and Mario Luca Bernardi and Marta Cimitile and Simone Latorre , title =. WorldCIST 2026 , series =. 2026 , note =
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.