Recognition: 2 theorem links
Evaluation of the npde performance for the evaluation of joint model with longitudinal and TTE data: an application in metastatic hormono-resistant prostate cancer
Pith reviewed 2026-05-08 19:06 UTC · model grok-4.3
The pith
Normalized prediction discrepancies extend to joint models by imputing censored event times based on model predictions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that prediction discrepancies for unobserved censored event times can be imputed from a uniform distribution conditional on the model-predicted censoring probability, extending the npd and npde framework to joint longitudinal and survival models. They further show that combining the p-values from tests on each data component with a Bonferroni correction yields a test with appropriate type I error and sensitivity to alternatives in simulation studies based on metastatic prostate cancer data.
What carries the argument
The imputation of prediction discrepancies for censored event times from a uniform distribution up to the model-predicted probability of censoring, combined with Bonferroni correction to merge tests on longitudinal and time-to-event components.
If this is right
- The combined test maintains type I error close to 5 percent across a range of misspecifications.
- Power increases with larger differences from the true model and with increasing sample size.
- Graphical displays of the discrepancies highlight larger deviations in survival functions or biomarker paths under misspecified models.
- The method permits routine use of npde-based checks on clinical data sets that include both repeated measures and event times.
Where Pith is reading between the lines
- The same imputation step could be applied to joint models in other disease areas that track biomarkers alongside event risks.
- Software implementations of nonlinear mixed-effects models could incorporate this diagnostic to support model building in drug development.
- Performance under alternative censoring patterns or with several longitudinal markers remains to be quantified.
Load-bearing premise
The imputation of prediction discrepancies for censored event times based on the model-predicted probability of censoring produces values that correctly represent the distribution under the assumed joint model.
What would settle it
Repeated simulations under the true joint model in which the combined test rejects at a rate materially above 5 percent, or a large-sample simulation in which the test fails to flag a known misspecification such as an incorrect link between biomarker trajectory and hazard.
Figures
read the original abstract
Introduction: Joint models are increasingly used in clinical trials. An important part of model building is to properly assess the descriptive and predictive ability of these models. Normalised prediction discrepancies (npd) and normalised prediction distribution errors (npde) have been developed to evaluate graphically and statistically non-linear mixed effect models for continuous responses. In this work, we propose to use a combined test to evaluate joint models. Methods: Prediction discrepancies (pd) are defined as the quantile of the observation within its predictive distribution and obtained by Monte-Carlo simulations. The pd for unobserved (censored) event times are imputed in a uniform distribution based on the model prediction of the probability of censoring, using a similar method as the one developed to handle data under the lower quantification limit (LOQ). We propose to combine the p-values of the tests on longitudinal data and on time-to-event (TTE) data, adjusted with a Bonferroni correction. We performed simulation studies based on a joint model characterising the relationship between prostate specific antigen biomarker (PSA) and survival in prostate cancer patients to evaluate the type I error and power of npd/npde to detect different types of model misspecifications. Results: For all types of misspecifications, the type I error of the combined test was found to be close to the expected 5%. The power of the combined test to detect model misspecifications increased with the difference from the true model and as expected, with sample size. Graphically the power increase can be related to larger differences in the shape of the survival function or PSA evolution. Conclusions: npd can be readily extended for event data by imputing the pd for censored event under the model. The test showed an adequate type I error, and was quite sensitive to alternative models tested.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes extending normalized prediction discrepancies (npd) and normalized prediction distribution errors (npde) to evaluate joint models for longitudinal continuous responses and time-to-event (TTE) data. It defines prediction discrepancies via Monte Carlo simulation from the predictive distribution and imputes pd values for right-censored event times by drawing from a uniform distribution scaled by the model-predicted censoring probability, analogous to the LOQ handling method. A Bonferroni-adjusted combined test of the longitudinal and TTE components is introduced. Simulation studies based on a joint model for PSA biomarker trajectories and survival in metastatic hormone-resistant prostate cancer patients are used to assess type I error (reported near 5%) and power for detecting misspecifications, with results showing increasing power with greater model deviation and larger sample sizes.
Significance. If the imputation procedure and combined test are shown to be valid, the work supplies a practical diagnostic tool for joint models that are widely used in clinical trial analysis to link longitudinal biomarkers with survival. The simulation framework, which generates data from a known true model, provides direct evidence on type I error control and power, addressing a gap in graphical and formal assessment of joint model fit. This could improve model building in oncology applications where joint models are common.
major comments (2)
- [Methods] Methods section: The simulation study design is described at a high level but omits key load-bearing details such as the exact parameter values or functional forms used to create each misspecification scenario, the number of Monte Carlo replicates per dataset for pd computation, the sample sizes tested, the censoring rate and mechanism, and the precise definition of the 'combined test' statistic. These omissions prevent full verification of the reported type I error rates near 5% and the power results in the Results section.
- [Methods] Methods (imputation for censored TTE): The procedure for imputing pd for censored observations is stated to draw from a uniform distribution using the model-predicted probability of censoring. However, no explicit derivation or reference is given showing that the imputed values remain uniformly distributed on [0,1] under the null (i.e., when the joint model is correct), which is required for the subsequent npde tests and the combined p-value procedure to have correct size.
minor comments (2)
- [Abstract] Abstract and title: The title refers to 'npde performance' while the text alternates between npd and npde; ensure consistent terminology and clarify whether the combined test uses npd or the normalized version (npde).
- [Results] Results: The graphical illustrations of power are described qualitatively ('larger differences in the shape of the survival function or PSA evolution'); quantitative summaries (e.g., tables of power by misspecification type and n) would strengthen the presentation.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive comments, which highlight important areas for improving the clarity and completeness of our manuscript. We address each major comment below and will revise the manuscript to incorporate the requested details.
read point-by-point responses
-
Referee: [Methods] Methods section: The simulation study design is described at a high level but omits key load-bearing details such as the exact parameter values or functional forms used to create each misspecification scenario, the number of Monte Carlo replicates per dataset for pd computation, the sample sizes tested, the censoring rate and mechanism, and the precise definition of the 'combined test' statistic. These omissions prevent full verification of the reported type I error rates near 5% and the power results in the Results section.
Authors: We agree that the current Methods section provides only a high-level description and lacks the specific details needed for full verification and reproducibility. In the revised manuscript, we will expand this section to explicitly include the exact parameter values and functional forms used to generate each misspecification scenario, the number of Monte Carlo replicates per dataset for computing the pd values, the sample sizes tested in the simulations, the censoring rate and mechanism, and the precise mathematical definition of the combined test statistic (including the Bonferroni adjustment and how the p-values from the longitudinal and TTE components are combined). These additions will enable readers to directly verify the reported type I error rates near 5% and the power results. revision: yes
-
Referee: [Methods] Methods (imputation for censored TTE): The procedure for imputing pd for censored observations is stated to draw from a uniform distribution using the model-predicted probability of censoring. However, no explicit derivation or reference is given showing that the imputed values remain uniformly distributed on [0,1] under the null (i.e., when the joint model is correct), which is required for the subsequent npde tests and the combined p-value procedure to have correct size.
Authors: We acknowledge that the manuscript does not include an explicit derivation or additional reference beyond noting the analogy to the LOQ handling method. The imputation procedure is intended to follow the established approach for handling data below the limit of quantification, which has been shown in prior npde literature to preserve uniformity under the null. In the revised Methods section, we will add a detailed step-by-step derivation demonstrating that the imputed pd values remain uniformly distributed on [0,1] when the joint model is correct, along with a specific reference to the relevant LOQ method. This will confirm the validity of the npde tests and the combined p-value procedure. revision: yes
Circularity Check
Partial circularity in type I error validation due to imputation by construction
specific steps
-
self definitional
[Methods (abstract)]
"The pd for unobserved (censored) event times are imputed in a uniform distribution based on the model prediction of the probability of censoring, using a similar method as the one developed to handle data under the lower quantification limit (LOQ). We propose to combine the p-values of the tests on longitudinal data and on time-to-event (TTE) data, adjusted with a Bonferroni correction."
The imputation draws pd values for censored observations directly from a uniform distribution conditioned on the model's predicted censoring probability. This rule guarantees that the resulting pd are uniformly distributed under the correct joint model by construction. Consequently, the TTE component of the combined test (and its type I error rate) has nominal properties tautologically from the imputation definition, rather than as an independent result of the evaluation.
full rationale
The paper proposes extending npde to joint models by imputing pd for censored TTE data from a uniform distribution scaled by the model-predicted censoring probability (analogous to LOQ handling), then combining tests via Bonferroni. Simulations from a known true model are used to assess type I error and power. The type I error result reduces to a tautology because the imputation rule enforces uniformity of pd under the assumed model by design, so nominal 5% control follows directly rather than providing independent confirmation. However, the power analysis against misspecifications and graphical evaluations remain substantive and non-circular. No load-bearing self-citations or other reductions are evident. This yields a moderate score reflecting one self-definitional aspect amid otherwise independent simulation-based checks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The joint model generates a correct predictive distribution for Monte-Carlo simulation of prediction discrepancies.
- ad hoc to paper Imputation of pd for censored times from a uniform distribution using model-predicted censoring probability is statistically valid.
Reference graph
Works this paper leans on
-
[1]
J Pharmacokinet Pharmacodyn , author =
Extension of. J Pharmacokinet Pharmacodyn , author =. 2012 , pages =. doi:10.1007/s10928-012-9264-2 , language =
-
[2]
J Pharmacokinet Pharmacodyn , author =
Prediction. J Pharmacokinet Pharmacodyn , author =. 2006 , pages =. doi:10.1007/s10928-005-0016-4 , language =
-
[3]
Computer Methods and Programs in Biomedicine , author =
Computing normalised prediction distribution errors to evaluate nonlinear mixed-effect models:. Computer Methods and Programs in Biomedicine , author =. 2008 , pages =. doi:10.1016/j.cmpb.2007.12.002 , abstract =
-
[4]
J Pharmacokinet Pharmacodyn , author =
Evaluation of different tests based on observations for external model evaluation of population analyses , volume =. J Pharmacokinet Pharmacodyn , author =. 2010 , pages =. doi:10.1007/s10928-009-9143-7 , language =
-
[5]
Model evaluation in nonlinear mixed effect models, with applications to pharmacokinetics , volume =
Comets, Emmanuelle and Brendel, Karl and Mentré, France , year =. Model evaluation in nonlinear mixed effect models, with applications to pharmacokinetics , volume =. J SFDS , number =
-
[6]
Clin Pharmacokinet , author =
Are population pharmacokinetic and/or pharmacodynamic models adequately evaluated?. Clin Pharmacokinet , author =. 2007 , pmid =
2007
-
[7]
Nonlinear. AAPS J , author =. 2015 , pages =. doi:10.1208/s12248-015-9745-5 , abstract =
-
[8]
Model. CPT Pharmacometrics Syst. Pharmacol. , author =. 2017 , pages =. doi:10.1002/psp4.12161 , language =
-
[9]
Metrics for external model evaluation with an application to the population pharmacokinetics of gliclazide , volume =. Pharm. Res. , author =. 2006 , pmid =. doi:10.1007/s11095-006-9067-5 , abstract =
-
[10]
Development and performance of npde for the evaluation of time-to-event models , volume =. Pharm Res , author =. 2018 , pages =. doi:10.1007/s11095-017-2291-3 , abstract =
-
[11]
Using the. Biometrics , author =. 2017 , pmid =. doi:10.1111/biom.12537 , abstract =
-
[12]
Association between tumor size kinetics and survival in urothelial carcinoma patients treated with atezolizumab: implication for patient's follow-up , issn =. Clin. Pharmacol. Ther. , author =. 2019 , pmid =. doi:10.1002/cpt.1450 , abstract =
-
[13]
BMC Med Res Methodol , author =
Joint models for longitudinal and time-to-event data: a review of reporting quality with a view to meta-analysis , volume =. BMC Med Res Methodol , author =. 2016 , pmid =. doi:10.1186/s12874-016-0272-6 , abstract =
-
[14]
Statistical. J Psychiatr Res , author =. 2007 , pmid =. doi:10.1016/j.jpsychires.2006.09.007 , abstract =
-
[15]
Has the rising placebo response impacted antidepressant clinical trial outcome?. World Psychiatry , author =. 2017 , pmid =. doi:10.1002/wps.20421 , abstract =
-
[16]
Journal of Psychiatric Research , author =
Severity of depressive symptoms and response to antidepressants and placebo in antidepressant trials , volume =. Journal of Psychiatric Research , author =. 2005 , keywords =. doi:10.1016/j.jpsychires.2004.06.005 , abstract =
-
[17]
Neuropsychopharmacology , author =
Effect of. Neuropsychopharmacology , author =. 2003 , pages =. doi:10.1038/sj.npp.1300091 , abstract =
-
[18]
Annu Rev Public Health , author =
The epidemiology of depression across cultures , volume =. Annu Rev Public Health , author =. 2013 , pmid =. doi:10.1146/annurev-publhealth-031912-114409 , abstract =
-
[19]
Educational Measurement: Issues and Practice , author =
A. Educational Measurement: Issues and Practice , author =. 1997 , pages =. doi:10.1111/j.1745-3992.1997.tb00605.x , abstract =
-
[20]
JAMA , author =
Cross-national epidemiology of major depression and bipolar disorder , volume =. JAMA , author =. 1996 , pmid =
1996
-
[21]
Social Science & Medicine , author =
Gender differences in depression in 23. Social Science & Medicine , author =. 2010 , keywords =. doi:10.1016/j.socscimed.2010.03.035 , abstract =
-
[22]
Am J Psychiatry , author =
Multiple recurrences of major depressive disorder , volume =. Am J Psychiatry , author =. 2000 , pages =
2000
-
[23]
World J Biol Psychiatry , author =
Mode of action of agomelatine: synergy between melatonergic and 5-. World J Biol Psychiatry , author =. 2011 , pages =
2011
-
[24]
Lancet , author =
Comparative efficacy and acceptability of 21 antidepressant drugs for the acute treatment of adults with major depressive disorder: a systematic review and network meta-analysis , volume =. Lancet , author =. 2018 , pages =
2018
-
[25]
Br Med J , author =
Antidepressant efficacy of agomelatine: meta-analysis of published and unpublished studies , volume =. Br Med J , author =. 2014 , pages =
2014
-
[26]
J Nerv Ment Dis , author =
Placebo response and antidepressant clinical trial outcome , volume =. J Nerv Ment Dis , author =. 2003 , pages =
2003
-
[27]
J Clin Psychiatry , author =
Correlation between different levels of placebo response rate and clinical trial outcome in major depressive disorder: a meta-analysis , volume =. J Clin Psychiatry , author =. 2012 , pages =
2012
-
[28]
J Neurol Neurosurg Psychiatry , author =
A rating scale for depression , volume =. J Neurol Neurosurg Psychiatry , author =. 1960 , pages =. doi:10.1136/jnnp.23.1.56 , number =
-
[29]
Acta Psychiatr Scand , author =
The use of rating scales exemplified by a comparison of the. Acta Psychiatr Scand , author =. 1980 , pages =
1980
-
[30]
J Psychiatr Res , author =
Exactly what does the. J Psychiatr Res , author =. 1993 , pages =
1993
-
[31]
Br J Psychiatry , author =
A new depression scale designed to be sensitive to change , volume =. Br J Psychiatry , author =. 1979 , pages =
1979
-
[32]
Pharmacopsychiatry , author =
Improving the assessment of severity of depressive states: a reduction of the. Pharmacopsychiatry , author =. 1985 , pages =
1985
-
[33]
Am J Psychiatry , author =
The. Am J Psychiatry , author =. 2004 , pages =
2004
-
[34]
J Psychiatr Res , author =
The responsiveness of the. J Psychiatr Res , author =. 2000 , pages =
2000
-
[35]
Dialogues Clin Neurosci , author =
Core symptoms of major depressive disorder: relevance to diagnosis and treatment , volume =. Dialogues Clin Neurosci , author =. 2008 , pages =
2008
-
[36]
J Psychiatry Neurosci , author =
Assessing full remission , volume =. J Psychiatry Neurosci , author =. 2002 , pages =
2002
-
[37]
Pharm Res , author =
Improved utilization of. Pharm Res , author =. 2014 , pages =
2014
-
[38]
Pharm Res , author =
Item response theory as an efficient tool to describe a heterogeneous clinical rating scale in de novo idiopathic. Pharm Res , author =. 2017 , pages =
2017
-
[39]
Alzheimer's Res Ther , author =
New scoring methodology improves the sensitivity of the. Alzheimer's Res Ther , author =. 2015 , pages =
2015
-
[40]
Neurology , author =
Differences in. Neurology , author =. 2015 , pages =
2015
-
[41]
Psychol Assess , author =
An approach for estimating item sensitivity to within-person change over time:. Psychol Assess , author =. 2016 , pages =
2016
-
[42]
AAPS J , author =
Application of item response theory to modeling of expanded disability status scale in multiple sclerosis , volume =. AAPS J , author =. 2017 , pages =
2017
-
[43]
AAPS J , author =
Modeling a. AAPS J , author =. 2017 , pages =
2017
-
[44]
Br J Clin Pharmacol , author =
Model-based assessment of the benefits and risks of recombinant tissue plasminogen activator treatment in acute ischaemic stroke , volume =. Br J Clin Pharmacol , author =. 2018 , keywords =. doi:10.1111/bcp.13715 , number =
-
[45]
J Pharmacokinet Pharmacodyn , author =
An item response theory based integrated model of headache, nausea, photophobia, and phonophobia in migraine patients , volume =. J Pharmacokinet Pharmacodyn , author =. 2018 , pages =
2018
-
[46]
The basics of item response theory , publisher =
Baker, Frank B , year =. The basics of item response theory , publisher =
-
[47]
A double-blind comparison of paroxetine with imipramine in the long-term treatment of depression
Claghorn, James L and Feighner, John P , year =. A double-blind comparison of paroxetine with imipramine in the long-term treatment of depression. , journal =
-
[48]
Psychiatry Res , author =
Nocebo in clinical trials for depression: a meta-analysis , volume =. Psychiatry Res , author =. 2014 , pages =
2014
-
[49]
J Pharmacokinet Pharmacodyn , author =
A joint model for nonlinear longitudinal data with informative dropout , volume =. J Pharmacokinet Pharmacodyn , author =. 2003 , pages =
2003
-
[50]
AAPS J , author =
Performance of nonlinear mixed effects models in the presence of informative dropout , volume =. AAPS J , author =. 2015 , pages =
2015
-
[51]
Eur J Pharm Sci , author =
Modelling placebo response in depression trials using a longitudinal model with informative dropout , volume =. Eur J Pharm Sci , author =. 2009 , pages =
2009
-
[52]
, volume =
Agomelatine prevents relapse in patients with major depressive disorder without evidence of a discontinuation syndrome: a 24-week randomized, double-blind, placebo-controlled trial. , volume =. J Clin Psychiatry , author =. 2009 , pages =
2009
-
[53]
Int J Neuropsychopharmacol , author =
Efficacy of agomelatine, a. Int J Neuropsychopharmacol , author =. 2007 , pages =
2007
-
[54]
Eur Neuropsychopharmacol , author =
Placebo-controlled trial of agomelatine in the treatment of major depressive disorder , volume =. Eur Neuropsychopharmacol , author =. 2006 , pages =
2006
-
[55]
, volume =
The efficacy of agomelatine in elderly patients with recurrent major depressive disorder: a placebo-controlled study. , volume =. J Clin Psychiatry , author =. 2013 , pages =
2013
-
[56]
J Pharmacokinet Pharmacodyn , author =
Comparison of proportional and differential odds models for mixed-effects analysis of categorical data , volume =. J Pharmacokinet Pharmacodyn , author =. 2008 , keywords =
2008
-
[57]
Abstracts of the XI annual meeting of the population approach group in Europe , author =
Population disease progress models for the time course of. Abstracts of the XI annual meeting of the population approach group in Europe , author =
-
[58]
Br J Clin Pharmacol , author =
Bayesian modelling and. Br J Clin Pharmacol , author =. 2007 , pages =
2007
-
[59]
Developing models of disease progression , journal =
Mould, Diane R , year =. Developing models of disease progression , journal =
-
[60]
J Pharmacokinet Pharmacodyn , author =
Modelling response time profiles in the absence of drug concentrations: definition and performance evaluation of the. J Pharmacokinet Pharmacodyn , author =. 2007 , pages =
2007
-
[61]
Proc Natl Acad Sci U S A , author =
Methodologic aspects of a population pharmacodynamic model for cognitive effects in. Proc Natl Acad Sci U S A , author =. 1992 , pages =
1992
-
[62]
Clin Pharmacol Ther , author =
Joint. Clin Pharmacol Ther , author =. 2012 , pages =
2012
-
[63]
J Pharmacokinet Pharmacodyn , author =
Evaluation of structural models to describe the effect of placebo upon the time course of major depressive disorder , volume =. J Pharmacokinet Pharmacodyn , author =. 2009 , pages =
2009
-
[64]
and ICON, Development, Solutions , month = mar, year =
Bauer, J.R. and ICON, Development, Solutions , month = mar, year =
-
[65]
Statistical power analysis for the behavioral sciences 2nd edn , publisher =
Cohen, Jacob , year =. Statistical power analysis for the behavioral sciences 2nd edn , publisher =
-
[66]
J Affect Disord , author =
Severity classification on the. J Affect Disord , author =. 2013 , pages =
2013
-
[67]
Med Clin North Am , author =
Major depression , volume =. Med Clin North Am , author =. 2014 , pages =
2014
-
[68]
Int J Neuropsychopharmacol , author =
Efficacy of agomelatine in major depressive disorder: meta-analysis and appraisal , volume =. Int J Neuropsychopharmacol , author =. 2012 , pages =
2012
-
[69]
Br J Psychiatry , author =
Agomelatine efficacy and acceptability revisited: systematic review and meta-analysis of published and unpublished randomised trials , volume =. Br J Psychiatry , author =. 2013 , pages =
2013
-
[70]
Rating scales for depression , booktitle =
Cusin, Cristina and Yang, Huaiyu and Yeung, Albert and Fava, Maurizio , year =. Rating scales for depression , booktitle =
-
[71]
Lavielle, Marc , year =
-
[72]
Nat Rev Drug Discov , author =
Agomelatine, the first melatonergic antidepressant: discovery, characterization and development , volume =. Nat Rev Drug Discov , author =. 2010 , pages =
2010
-
[73]
J Am Stat Assoc , author =
Modeling the drop-out mechanism in repeated-measures studies , volume =. J Am Stat Assoc , author =. 1995 , pages =
1995
-
[74]
version 2016R1 , year =
-
[75]
Mixed effects models for the population approach: models, tasks, methods and tools , publisher =
Lavielle, Marc , year =. Mixed effects models for the population approach: models, tasks, methods and tools , publisher =
-
[76]
Guideline on reporting the results of population pharmacokinetic analysis
-
[77]
American Conference on Pharmacometrics, October 4-7, 2009, Mashantucket, USA , author =
Prediction. American Conference on Pharmacometrics, October 4-7, 2009, Mashantucket, USA , author =
2009
-
[78]
A tutorial on
Karlsson, M and Holford, N , year =. A tutorial on. 17 ^
-
[79]
Clinical Pharmacology & Therapeutics , author =
Diagnosing model diagnostics , volume =. Clinical Pharmacology & Therapeutics , author =. 2007 , pages =
2007
-
[80]
The visual predictive check—superiority to standard diagnostic (
Holford, Nick , year =. The visual predictive check—superiority to standard diagnostic (
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.