arxiv: 2605.11371 · v1 · submitted 2026-05-12 · 📊 stat.AP

Recognition: no theorem link

Statistical evaluation of measurement precision in linear dose-response relationships via interlaboratory studies

Jun-ichi Takeshita, Tomomichi Suzuki, Yuto Ikeuchi

Pith reviewed 2026-05-13 02:11 UTC · model grok-4.3

classification 📊 stat.AP

keywords interlaboratory studiesdose-responseprecision evaluationlinear mixed-effects modelANOVA decompositionrepeatabilitybetween-laboratory varianceF-tests

0 comments

The pith

For balanced interlaboratory designs, a linear mixed-effects model yields exact ANOVA estimators of repeatability and between-laboratory variances along with F-tests for trend, intercept homogeneity, and slope homogeneity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a statistical framework to quantify measurement precision when dose-response relationships are measured across multiple laboratories. It models the data with laboratory-specific intercepts and slopes, then defines method-level repeatability and between-laboratory variances that match ISO 5725 definitions. For fully balanced designs that share the same dose levels and equal replication, the total sum of squares decomposes exactly into independent components, producing closed-form estimators and three F-tests. These tools let analysts measure overall precision and determine whether laboratory differences arise mainly from baseline shifts or from changes in sensitivity to dose.

Core claim

For fully balanced designs with common dose levels and equal replication, the total sum of squares decomposes exactly, closed-form ANOVA estimators recover the repeatability and between-laboratory variances, and three F-tests assess the overall dose-response trend, the homogeneity of intercepts, and the homogeneity of slopes across laboratories. This formulation quantifies precision directly at the method level and distinguishes whether between-laboratory discrepancies stem primarily from baseline shifts or from differences in sensitivity.

What carries the argument

The linear mixed-effects model with laboratory-specific intercepts and slopes, together with the exact sum-of-squares decomposition that holds only under fully balanced designs.

If this is right

Repeatability and between-laboratory variances become directly estimable without iterative numerical methods.
Analysts can test whether between-laboratory variation is driven more by intercept differences or by slope differences.
The three F-tests provide separate significance statements for overall dose response, baseline consistency, and sensitivity consistency.
Precision metrics remain defined even when laboratories differ systematically in level or in dose sensitivity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition might guide sample-size planning for future balanced interlaboratory protocols.
When designs are mildly unbalanced the closed-form estimators could serve as starting values for more general mixed-model fitting.
The separation of intercept and slope effects could help regulators decide whether harmonization efforts should target calibration offsets or assay sensitivity.

Load-bearing premise

The data-generating process follows a linear mixed-effects model with lab-specific intercepts and slopes, and the experimental design is fully balanced with common dose levels and equal replication.

What would settle it

In a fully balanced interlaboratory study, compute the proposed ANOVA estimators and observe whether they produce negative variance components or whether the three F-statistics fail to follow their expected central F distributions under the null hypotheses of no trend, intercept homogeneity, or slope homogeneity.

read the original abstract

This paper proposes a framework for evaluating the statistical precision of measurement methods from interlaboratory studies where the outcome is a dose-response relationship summarized by a regression line. For such measurement methods, where a linear mixed-effects model is applied that allows laboratories to differ in both baseline level and dose-response slope, we define precision evaluation metrics specified in ISO 5725, repeatability and between-laboratory variances. These are method-level precision metrics, and the latter are constructed as design-averaged dose-specific between-laboratory variances over the dose levels and the participating laboratories. For fully balanced designs with common dose levels and equal replication, we obtain an exact decomposition of the total sum of squares, closed-form analysis of variance (ANOVA) estimators of the precision variances, and three associated $F$-tests targeting (i) the overall dose-response trend, (ii) homogeneity of intercepts, and (iii) homogeneity of slopes across laboratories. This formulation enables precision to be quantified and estimated directly and supports an evaluation of whether between-laboratory discrepancies are caused primarily by baseline shifts or by differences in sensitivity, in contrast to fixed-effect comparisons that only detect the presence of differences. Furthermore, we analyze data obtained from an interlaboratory study on observations in bronchoalveolar lavage fluid from experiments involving the intratracheal administration of nanomaterials to rats, using the proposed method as a case study.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives exact ANOVA decompositions and three F-tests for repeatability and between-lab variance in balanced interlab dose-response studies under a random-coefficient mixed model.

read the letter

The main contribution is a set of precision metrics that extend ISO 5725 to linear dose-response data. They define repeatability and between-laboratory variance from a mixed model with lab-specific intercepts and slopes, then average the between-lab component over the dose levels. For fully balanced designs they derive closed-form estimators from the sum-of-squares decomposition and add F-tests for the overall trend, intercept homogeneity, and slope homogeneity. That last part is useful because it separates baseline shifts from sensitivity differences across labs, which fixed-effect tests cannot do directly. The nanomaterial case study illustrates the workflow on real bronchoalveolar lavage data. The derivations follow standard balanced random-coefficient theory, so the exactness claim holds where the design assumptions are met. The main limitation is the restriction to fully balanced layouts with common doses and equal replication; unbalanced studies would require numerical methods instead of the closed forms. The linearity assumption is also taken as given without much sensitivity checking. The paper is aimed at measurement scientists and regulatory statisticians who run interlaboratory validation studies on dose-response assays. It is narrow but cleanly executed, and the central claims are verifiable from the stated model. I would send it to peer review.

Referee Report

1 major / 3 minor

Summary. The paper proposes a framework for evaluating measurement precision in interlaboratory studies of linear dose-response relationships. It applies a linear mixed-effects model allowing random laboratory-specific intercepts and slopes, defines repeatability and between-laboratory precision variances following ISO 5725, and constructs the latter as design-averaged dose-specific quantities. For fully balanced designs with common dose levels and equal replication, the paper claims an exact total sum-of-squares decomposition, closed-form ANOVA estimators of the precision variances, and three F-tests for the overall dose-response trend, homogeneity of intercepts, and homogeneity of slopes. The approach is demonstrated via a case study on bronchoalveolar lavage fluid observations from rat nanomaterial intratracheal administration experiments.

Significance. If the central derivations hold, the work provides a useful extension of ISO 5725-style precision metrics to dose-response settings by separating baseline shifts from sensitivity differences across laboratories. This distinction is practically relevant for standardizing measurement methods in toxicology and analytical chemistry, and the closed-form estimators for balanced designs represent a clear computational advantage over general REML fitting.

major comments (1)

The exact sum-of-squares decomposition and closed-form ANOVA estimators are asserted for fully balanced designs under the random-coefficient model, but the manuscript should explicitly equate the observed mean squares to their expectations (including the design-averaged between-laboratory variance functional) to confirm that the three F-tests are the standard ratios for fixed slope, random-intercept component, and random-slope component.

minor comments (3)

In the case-study section, report the numerical values of the estimated variance components, the design-averaged between-laboratory variances at each dose, and the p-values of the three F-tests so that readers can assess the practical magnitude of intercept versus slope heterogeneity.
Clarify the precise definition and weighting scheme used for the design-averaged between-laboratory variance; although it is a well-defined functional of the estimated random-coefficient covariance matrix, the averaging weights over dose levels and laboratories should be stated explicitly.
Add a brief simulation study (or reference to one) confirming that the closed-form estimators recover the true variance components under the balanced design and the stated linear mixed model.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive comment. The suggestion to make the expected mean squares explicit improves the clarity of the ANOVA justification, and we have incorporated this into the revised manuscript.

read point-by-point responses

Referee: The exact sum-of-squares decomposition and closed-form ANOVA estimators are asserted for fully balanced designs under the random-coefficient model, but the manuscript should explicitly equate the observed mean squares to their expectations (including the design-averaged between-laboratory variance functional) to confirm that the three F-tests are the standard ratios for fixed slope, random-intercept component, and random-slope component.

Authors: We agree that an explicit derivation of the expected mean squares strengthens the presentation. In the revised manuscript we have added a new subsection (Section 3.3) that equates each observed mean square to its model expectation under the balanced random-coefficient design. The derivation shows that the design-averaged between-laboratory variance functional appears precisely in the expectation of the mean square for the slope-homogeneity test, while the intercept-homogeneity test isolates the random-intercept component and the overall-trend test isolates the fixed slope. Consequently the three reported F-ratios are the standard ANOVA ratios for these respective hypotheses. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central claims concern standard derivations for balanced linear mixed models: an exact sum-of-squares decomposition follows from the orthogonality of fixed and random effects in fully balanced designs with common dose levels and equal replication; closed-form ANOVA estimators arise by equating observed mean squares to their expectations under the random-coefficient model; and the three F-tests are the usual ratios of mean squares for the fixed slope, random-intercept component, and random-slope component. Precision metrics are defined directly from the model's variance components using ISO 5725 terminology, with the between-laboratory variance expressed as a design-averaged functional; these definitions do not reduce any claimed result to a fitted parameter by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked, and the case-study application simply plugs the derived estimators into observed data without circular loops.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the standard assumptions of linear mixed-effects models (normality, independence of random effects, correct specification of lab-specific intercepts and slopes) and on the applicability of ISO 5725 repeatability and reproducibility definitions to dose-response data. No free parameters, new entities, or ad-hoc axioms are introduced in the abstract description.

axioms (2)

domain assumption Linear mixed-effects model with lab-specific random intercepts and slopes is appropriate for the interlaboratory dose-response data
Invoked to define the variance components that become the precision metrics.
domain assumption ISO 5725 definitions of repeatability and between-laboratory variances extend directly to method-level summaries of regression lines
Used to label the derived quantities as the target precision metrics.

pith-pipeline@v0.9.0 · 5548 in / 1506 out tokens · 72561 ms · 2026-05-13T02:11:36.811269+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

survey on standardization of intratracheal administration study for nanomaterials and related issues

AIST , Annual report on the project “survey on standardization of intratracheal administration study for nanomaterials and related issues” (2017) , 2018

work page 2017
[2]

R. K. Burdick, C. M. Borror, and D. C. Montgomery , Design and Analysis of Gauge R & R Studies : Making Decisions with Confidence Intervals in Random and Mixed ANOVA Models , ASA-SIAM Series on Statistics and Applied Probability, Society for Industrial Applied Mathematics ; American Statistical Association, Philadelphia, Pa. : Alexandria, Va, 2005

work page 2005
[3]

K. R. Davidson, D. M. Ha, M. I. Schwarz, and E. D. Chan , Bronchoalveolar lavage as a diagnostic procedure: a review of known cellular and molecular findings in various lung diseases , Journal of Thoracic Disease, 12 (2020), pp. 4991--5019

work page 2020
[4]

K. E. Driscoll, D. L. Costa, G. Hatch, R. Henderson, G. Oberdorster, H. Salem, and R. B. Schlesinger , Intratracheal Instillation as an Exposure Technique for the Evaluation of Respiratory Tract Toxicity : Uses and Limitations , Toxicological Sciences, 55 (2000), pp. 24--35

work page 2000
[5]

rep., 2003

FDA/CDER , Guidnace for industry: Exposure-response relationships --- study design, data analysis, and regulatory applications , tech. rep., 2003

work page 2003
[6]

ISO , ISO 5725-1:2025 Accuracy (trueness and precision) of measurement methods and results --- Part 1: General principles and definitions , 2023

work page 2025
[7]

height 2pt depth -1.6pt width 23pt, ISO 5725-2:2025 Accuracy (trueness and precision) of measurement methods and results --- Part 2: Basic method for the determination of repeatability and reproducibility of a standard measurement method , 2025

work page 2025
[8]

Kappenberg, J

F. Kappenberg, J. C. Duda, L. Schürmeyer, O. Gül, T. Brecklinghaus, J. G. Hengstler, K. Schorning, and J. Rahnenführer , Guidance for statistical design and analysis of toxicological dose–response experiments, based on a comprehensive literature review , Archives of Toxicology, 97 (2023), pp. 2741--2761

work page 2023
[9]

Laird, L

G. Laird, L. Xu, M. Liu, and J. Liu , Beyond exposure-response: A tutorial on statistical considerations in dose-ranging studies , British Journal of Clinical Pharmacology, (2021)

work page 2021
[10]

Parodi, W

S. Parodi, W. K. Lutz, A. Colacci, M. Mazzullo, M. Taningher, and S. Grillit , Results of animal and studies suggest and a and nonlinear dose-response and relationship for and benzene effects , Environmental Health Perspectives, 82 (1989), pp. 171--176

work page 1989
[11]

J. H. Proost, D. J. Eleveld, and M. M. R. F. Struys , Population pharmacodynamic modeling using the sigmoid emax model: Influence of inter-individual variability on the steepness of the concentration–effect relationship. a simulation study , The AAPS Journal, 23 (2020)

work page 2020
[12]

O. C. Shanks, M. Sivaganesan, L. Peed, C. A. Kelty, A. D. Blackwood, M. R. Greene, R. T. Noble, R. N. Bushon, E. A. Stelzer, J. Kinzelman, T. Anan’eva, C. Sinigalliano, D. Wanless, J. G. andYiping Cao, S. Weisberg, V. J. Harwood, C. Staley, K. H. Oshima, M. Varma, and R. A. Haugland , Interlaboratory comparison of real-time pcr protocols forquantification...

work page 2026
[13]

Uhlig, K

S. Uhlig, K. Frost, B. Colson, K. Simon, D. Mäde, R. Reiting, P. Gowik, and L. Grohmann , Validation of qualitative PCR methods on the basis of mathematical–statistical modelling of the probability of detection , Accreditation and Quality Assurance, 20 (2015), pp. 75--83

work page 2015