arxiv: 2604.22216 · v1 · submitted 2026-04-24 · 📊 stat.ME

Recognition: unknown

Optimal Stopping in Sequential Clinical Prediction

Hui-Mean Foo , Yuan-chin Ivan Chang

Authors on Pith no claims yet

Pith reviewed 2026-05-08 11:05 UTC · model grok-4.3

classification 📊 stat.ME

keywords optimal stoppingsequential predictionclinical decision-makingmartingalerisk trajectorystaged informationretrospective analysis

0 comments

The pith

In sequential clinical prediction, the stage with the best model is not always the optimal time to stop testing and decide.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that most clinical prediction models assume all information is available at once, but in practice information arrives sequentially and clinicians must choose when to stop. By modeling this as an optimal stopping problem, the authors demonstrate across four datasets that sometimes stopping early or midway is better than waiting for the most accurate model. This matters because it challenges the assumption that more data always leads to better decisions and could help reduce unnecessary tests and delays in patient care.

Core claim

Sequential clinical prediction is formulated as an optimal-stopping problem under staged information. The central mechanism is the patient-specific conditional risk trajectory, where forward martingales track coherent risk updates and reverse martingales model information loss from richer to simpler predictors. Across four retrospective datasets, the preferred stopping stage varied by setting, showing that the best-performing model is not always the best stage for clinical decision-making.

What carries the argument

Patient-specific conditional risk trajectory using forward and reverse martingale structures to represent risk updating and information loss across stages.

If this is right

Preferred stopping stages differ by clinical setting, sometimes favoring early action.
Fuller information does not always justify added delay or invasiveness.
Optimal decisions require balancing model accuracy with timing costs.
Martingale structures allow coherent modeling of risk changes over stages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could be extended to other sequential decision domains like legal evidence gathering or investment timing.
Prospective studies in live clinical environments would be needed to validate the retrospective findings.
Integrating test costs explicitly into the stopping rule could make the framework more actionable.
The approach highlights limitations in standard retrospective model evaluation for real-time use.

Load-bearing premise

Retrospective cohorts accurately capture the real-time sequential arrival of information and patient-specific conditional risk trajectories can be reliably modeled with forward and reverse martingale structures.

What would settle it

Comparing outcomes in a setting where clinicians follow the paper's optimal stopping recommendations versus standard full-information practice would falsify the claim if no improvement is seen.

Figures

Figures reproduced from arXiv: 2604.22216 by Hui-Mean Foo, Yuan-chin Ivan Chang.

**Figure 1.** Figure 1: Breast Cancer Wisconsin (Study 1): stagewise predictive performance over 1,000 view at source ↗

**Figure 2.** Figure 2: Study 1: total expected cost (decision loss + cumulative test cost) by stage. The view at source ↗

**Figure 3.** Figure 3: Cleveland Heart Disease (Study 2): stagewise predictive performance over 1,000 view at source ↗

**Figure 4.** Figure 4: Cleveland Heart Disease (Study 2): total expected cost (decision loss + cumulative view at source ↗

**Figure 5.** Figure 5: Pima Diabetes (Study 3): stagewise predictive performance over 1,000 repeated view at source ↗

**Figure 6.** Figure 6: Pima Diabetes (Study 3): total expected cost (decision loss + cumulative test cost) view at source ↗

**Figure 7.** Figure 7: eICU Demo (Study 4): stagewise predictive performance. AUC improves mono view at source ↗

**Figure 8.** Figure 8: eICU Demo (Study 4): total expected cost (decision loss + cumulative test cost) view at source ↗

read the original abstract

Most clinical prediction studies are developed from retrospective cohorts and reported as if all patient information were observed at once. In practice, clinicians face a more consequential question: \emph{when is there already enough information to stop testing and act?} A later stage can produce a better-looking model and still fail to justify the added delay, burden, or invasiveness of further workup. We formulate sequential clinical prediction as an \emph{optimal-stopping} problem under staged information, and illustrate the framework across four retrospective clinical datasets. The preferred stopping stage differed substantially by setting: sometimes fuller information justified waiting, whereas in other cases early or intermediate action was preferable. The key object is the patient-specific conditional risk trajectory: forward martingale structure represents coherent risk updating across stages, while reverse-martingale ideas describe information loss when a richer predictor is replaced by a simpler score. The results demonstrate that the best-performing model is not always the best stage for clinical decision-making.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper recasts clinical prediction as an optimal stopping problem on patient risk trajectories using forward and reverse martingales, and shows across four datasets that the best model is not always the best stage to act, but the retrospective construction leaves the practical stopping rules open to bias.

read the letter

The core contribution is treating the timing of clinical decisions as an explicit optimal stopping problem rather than just fitting a model to all available data at once. They define patient-specific conditional risk trajectories, use forward martingales to capture coherent updating as more information arrives, and reverse martingales to describe what is lost when moving to a simpler score. On four retrospective datasets the preferred stopping stage varied, sometimes favoring earlier action despite later models looking stronger on accuracy metrics. That framing is new in this literature and directly addresses a deployment question clinicians actually face: when is the added test or delay no longer worth it. The examples make the point concrete without overclaiming universality. The main limitation is the data source. Retrospective cohorts record everything that happened, so the staged filtration is built after the fact. In real time, a decision to stop or continue changes which future tests occur and which patients remain observable, which can alter the conditional distributions the martingales are meant to represent. The paper does not appear to include sensitivity checks or prospective-style simulations that would bound how much this selection effect moves the optimal stopping times. Without those, the claim that best model and best stage diverge rests on trajectories that may not match what would be seen under sequential decision-making. Readers working on sequential medical decisions or deployment of prediction tools would find the angle useful for thinking through timing trade-offs. The work is coherent enough on its own terms to merit referee time, though reviewers will likely press on the retrospective-to-prospective gap and ask for more detail on how the trajectories were estimated and validated. I would send it to review rather than desk reject.

Referee Report

3 major / 2 minor

Summary. The manuscript formulates sequential clinical prediction as an optimal-stopping problem under staged information. It models patient-specific conditional risk trajectories via forward martingales for coherent risk updating across stages and reverse-martingale structures for information loss when richer predictors are replaced by simpler scores. Illustrated on four retrospective clinical datasets, the results indicate that the preferred stopping stage varies substantially by setting and that the best-performing predictive model is not necessarily the optimal stage for clinical decision-making.

Significance. If the central claims hold after addressing data and estimation issues, the work offers a principled framework for incorporating timing, burden, and information costs into clinical prediction evaluation, moving beyond static accuracy metrics. The martingale approach supplies a coherent mathematical structure for sequential risk trajectories. The cross-dataset illustration is a strength, as is the explicit contrast between model performance and decision-stage optimality. However, the retrospective design limits immediate applicability without further validation of the trajectories under prospective sequential observation.

major comments (3)

Methods (trajectory estimation): The patient-specific conditional risk trajectories are constructed from fully observed retrospective records, yet the manuscript provides no explicit treatment of selection bias or altered filtrations that would arise when stopping decisions prospectively determine which future tests occur and which patients remain in the cohort. This assumption is load-bearing for the optimality claims and the conclusion that best model ≠ best stage.
Results (four datasets): The reported differences in preferred stopping stages lack visible derivation details, error quantification, confidence intervals, or sensitivity analyses for the martingale-based stopping rules, as well as explicit handling of post-hoc stage selection. Without these, it is difficult to assess whether the variation supports the central claim or reflects sampling variability in the retrospective cohorts.
Methods (martingale construction): It remains unclear whether the forward and reverse martingale trajectories are derived from first principles independent of the evaluation data or fitted to the same retrospective records used to assess stopping performance; the latter would introduce circularity that undermines the optimality guarantees.

minor comments (2)

Abstract: A brief statement of the specific clinical outcomes or dataset characteristics would help readers assess the scope of the four settings.
Notation: The precise definition of the staged filtration and the exact functional form of the reverse martingale (information loss) should be stated explicitly to permit replication and verification.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: Methods (trajectory estimation): The patient-specific conditional risk trajectories are constructed from fully observed retrospective records, yet the manuscript provides no explicit treatment of selection bias or altered filtrations that would arise when stopping decisions prospectively determine which future tests occur and which patients remain in the cohort. This assumption is load-bearing for the optimality claims and the conclusion that best model ≠ best stage.

Authors: We agree that the retrospective design assumes complete observation of all stages, which does not capture the selection bias or filtration changes that would arise prospectively when stopping decisions censor future data. The martingale properties are defined conditionally on the observed filtration in the data at hand. We will add an explicit limitations subsection discussing this point, clarifying that the current results provide a benchmark under full observation and that prospective validation with adaptive data collection is required to confirm the optimality claims in practice. revision: partial
Referee: Results (four datasets): The reported differences in preferred stopping stages lack visible derivation details, error quantification, confidence intervals, or sensitivity analyses for the martingale-based stopping rules, as well as explicit handling of post-hoc stage selection. Without these, it is difficult to assess whether the variation supports the central claim or reflects sampling variability in the retrospective cohorts.

Authors: We will revise the results section to include the full dynamic-programming derivation of the optimal stopping times from the estimated value functions. Bootstrap confidence intervals will be added for the patient-level stopping stages and expected costs, together with sensitivity analyses over cost parameters, discount factors, and model specifications. We will also clarify that stage selection follows directly from the optimal-stopping recursion rather than post-hoc comparison, and report the empirical distribution of stopping times across patients in each dataset. revision: yes
Referee: Methods (martingale construction): It remains unclear whether the forward and reverse martingale trajectories are derived from first principles independent of the evaluation data or fitted to the same retrospective records used to assess stopping performance; the latter would introduce circularity that undermines the optimality guarantees.

Authors: The forward-martingale property follows from the tower property of conditional expectation and holds for any coherent risk process by construction, independent of any particular dataset. The reverse-martingale structure for information loss under reduced filtrations is likewise a general consequence of conditional-expectation projections. While the numerical trajectories are estimated from the retrospective records, the optimality guarantees apply to the estimated process itself. We will add a clarifying paragraph in the methods section making this distinction explicit and note that out-of-sample or cross-validated estimation can be used in extensions to reduce in-sample effects. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard optimal stopping applied to staged data

full rationale

The paper formulates sequential prediction as an optimal-stopping problem and invokes forward/reverse martingale properties to represent risk trajectories. These are standard stochastic-process tools applied to the clinical setting rather than derived from or fitted to the target conclusions. The empirical illustrations on four retrospective cohorts compare stopping stages to model performance without any quoted step reducing a prediction to its own inputs by construction, self-defining a quantity, or relying on a load-bearing self-citation chain. The central claim therefore remains an independent modeling result rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard probability concepts applied to clinical staging; no new physical entities are postulated.

axioms (2)

domain assumption Patient information arrives in discrete stages and conditional risk estimates update coherently across stages.
Invoked to justify the forward-martingale representation of risk trajectories.
domain assumption Information loss when moving from a richer predictor to a simpler score can be described by reverse-martingale properties.
Used to quantify the cost of stopping early.

pith-pipeline@v0.9.0 · 5456 in / 1258 out tokens · 45945 ms · 2026-05-08T11:05:41.510944+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Practical Boundary Degeneracy and Reverse-Martingale Limits in Sequential Binary Models
stat.ME 2026-05 unverdicted novelty 7.0

Finite sequential binary data support practical boundary probabilities via reverse-martingale limits rather than exact degeneracy, with a three-condition stopping rule that separates transient from genuine cases.

Reference graph

Works this paper leans on

16 extracted references · cited by 1 Pith paper

[1]

Doob, J. L. (1953).Stochastic Processes. Wiley, New York

1953
[2]

(1991).Probability with Martingales

Williams, D. (1991).Probability with Martingales. Cambridge University Press

1991
[3]

Wald, A. (1945). Sequential tests of statistical hypotheses.Annals of Mathematical Statistics, 16(2), 117–186

1945
[4]

(1985).Sequential Analysis: Tests and Confidence Intervals

Siegmund, D. (1985).Sequential Analysis: Tests and Confidence Intervals. Springer, New York

1985
[5]

S., Robbins, H., and Siegmund, D

Chow, Y. S., Robbins, H., and Siegmund, D. (1971).Great Expectations: The Theory of Optimal Stopping. Houghton Mifflin, Boston

1971
[6]

and Shiryaev, A

Peskir, G. and Shiryaev, A. (2006).Optimal Stopping and Free-Boundary Problems. Birkh¨ auser, Basel

2006
[7]

Berger, J. O. (1985).Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer, New York

1985
[8]

Berry, D. A. and Fristedt, B. (1985).Bandit Problems: Sequential Allocation of Exper- iments. Chapman and Hall, London

1985
[9]

J., Freedman, L

Spiegelhalter, D. J., Freedman, L. S., and Parmar, M. K. B. (1994). Bayesian approaches to randomized trials.Journal of the Royal Statistical Society, Series A, 157(3), 357–416

1994
[10]

(1975).Sequential Medical Trials, 2nd ed

Armitage, P. (1975).Sequential Medical Trials, 2nd ed. Blackwell, Oxford. 29

1975
[11]

Vickers, A. J. and Elkin, E. B. (2006). Decision curve analysis: a novel method for evaluating prediction models.Medical Decision Making, 26(6), 565–574

2006
[12]

and Raftery, A

Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation.Journal of the American Statistical Association, 102(477), 359–378

2007
[13]

H., Street, W

Wolberg, W. H., Street, W. N., and Mangasarian, O. L. (1994). Machine learning tech- niques to diagnose breast cancer from image-processed nuclear features of fine needle aspirates.Cancer Letters, 77, 163–171

1994
[14]

Detrano, R., Janosi, A., Steinbrunn, W., et al. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease.American Journal of Cardiology, 64(5), 304–310

1989
[15]

W., Everhart, J

Smith, J. W., Everhart, J. E., Dickson, W. C., Knowler, W. C., and Johannes, R. S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. Proceedings of the Annual Symposium on Computer Application in Medical Care, 261– 265

1988
[16]

J., Johnson, A

Pollard, T. J., Johnson, A. E. W., Raffa, J. D., Celi, L. A., Mark, R. G., and Badawi, O. (2018). The eICU Collaborative Research Database, a freely available multi-center database for critical care research.Scientific Data, 5, 180178. A Proofs of Main Results Throughout, (Ω,F,P) is a fixed probability space. All conditional expectations are versions of t...

2018