pith. sign in

arxiv: 2606.10109 · v1 · pith:Q7WYLPHVnew · submitted 2026-06-08 · 🧬 q-bio.OT

When is Enough Enough? A Proposed Termination Point for the Number of Replicates in Computational Simulations

Pith reviewed 2026-06-27 14:01 UTC · model grok-4.3

classification 🧬 q-bio.OT
keywords computational simulationreplicate numbertermination criterionΩ testfrequentist statisticsin silico experiments
0
0 comments X

The pith

Simulations should stop adding replicates when the proposed Ω test signals sufficient stability, modeled on P-value logic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies that computational simulations can produce arbitrarily precise results simply by running more trials, creating ambiguity about what counts as enough data and wasting resources on extra runs. It proposes the Ω test as a uniform, objective stopping rule analogous to traditional frequentist P-tests. Adoption of this standard would let researchers terminate simulations at a theoretically grounded point rather than through ad-hoc choices. A reader would care because the method directly addresses efficiency and reproducibility in fields that rely on in silico experiments.

Core claim

The authors claim that the Ω test provides a simple, straightforward criterion for halting additional simulation trials once results stabilize in a manner comparable to how P-tests indicate statistical significance, thereby replacing arbitrary decisions about replicate count with a consistent, communicable rule.

What carries the argument

The Ω test, a proposed termination criterion designed to function like a frequentist P-test by objectively signaling when further replicates are unnecessary.

If this is right

  • Simulation studies could adopt a shared stopping rule that permits direct comparison of results across papers.
  • Computational resources would be allocated only until the Ω criterion is met, reducing unnecessary runs.
  • Interpretation of simulation outcomes would become less dependent on the number of trials performed.
  • Reviewers could evaluate whether a study met an explicit, pre-specified termination standard.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Fields that already use power analysis or convergence diagnostics might integrate the Ω test as an additional or alternative check.
  • If the test generalizes beyond the models tested in the paper, it could influence standards for reporting in agent-based modeling and similar domains.
  • The analogy to P-tests suggests the Ω test could be taught alongside classical statistics in methods courses for computational scientists.

Load-bearing premise

That an objective, uniform test for sufficient replicates can be defined without relying on arbitrary thresholds or post-hoc adjustments.

What would settle it

Running the Ω test on a set of established simulation models and checking whether it produces consistent stopping points that align with or contradict existing best-practice replicate counts across different model types.

read the original abstract

Computational simulation provides a powerful toolkit for in silico experimentation. However, while the field has developed best practices for the design and implementation of such models, there remains ambiguity in discussions about how to understand and/or interpret their results due to their inherent ability to overwhelm traditional frequentist statistics by simply increasing the number of trials simulated. This fails the discipline in two ways: first, it leaves the community unsure of what constitutes a best practice for uniform understanding, and second, it potentially overburdens computational studies that burn clock cycles solely to ensure "enough runs to satisfy peers" without any theoretical underpinning for a definition of "enough". We propose a simple and straightforward standard for when to stop simulating additional trials, the {\Omega} test, designed to be analogous to the function of traditional frequentist P-tests. Community adoption of a reasonable and uniform standard will permit more efficient computational experimentation and clearly communication/interpretation of the findings discovered in this way.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript identifies ambiguity in determining sufficient replicates for computational simulations, noting that unlimited trials can overwhelm frequentist statistics and lead to inefficient resource use without a theoretical basis for 'enough'. It proposes the Ω test as a simple, uniform standard analogous to p-tests for deciding when to terminate additional simulations, enabling better best practices and clearer interpretation of results.

Significance. A well-defined, non-arbitrary stopping rule for simulation replicates could standardize practices and improve efficiency if it were theoretically grounded and validated. The manuscript, however, supplies no such rule, so the work primarily restates a known issue without advancing a solution.

major comments (1)
  1. [Abstract] Abstract: The central claim introduces the Ω test as 'a simple and straightforward standard' analogous to p-tests, yet provides no equation, algorithm, convergence criterion, derivation, or example. This absence makes it impossible to evaluate whether the test avoids arbitrary thresholds or rests on identifiable statistical principles.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. We address the sole major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim introduces the Ω test as 'a simple and straightforward standard' analogous to p-tests, yet provides no equation, algorithm, convergence criterion, derivation, or example. This absence makes it impossible to evaluate whether the test avoids arbitrary thresholds or rests on identifiable statistical principles.

    Authors: We agree with the referee that the abstract (and, by extension, the current manuscript) does not supply the equation, algorithm, convergence criterion, derivation, or example for the Ω test. This omission prevents evaluation of its statistical grounding. In the revised manuscript we will add a dedicated methods section that defines the Ω statistic mathematically, specifies the termination algorithm and convergence criterion, derives the test from first principles, and includes a concrete numerical example. These additions will allow direct assessment of whether the rule is non-arbitrary and rests on identifiable principles. revision: yes

Circularity Check

0 steps flagged

No derivation chain present; proposal asserted without equations or reductions

full rationale

The paper proposes the Ω test as an analogous stopping rule to p-values for simulation replicates but supplies neither equations, algorithms, convergence criteria, nor any derivation from first principles, data, or prior results. With no claimed mathematical chain or fitted parameters to inspect, none of the enumerated circularity patterns (self-definitional, fitted-input prediction, self-citation load-bearing, etc.) can be exhibited. The central claim is a normative proposal rather than a derived result, rendering the manuscript self-contained against the circularity criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract only; no free parameters, axioms, or invented entities are specified beyond the named test itself.

pith-pipeline@v0.9.1-grok · 5702 in / 1096 out tokens · 31379 ms · 2026-06-27T14:01:42.898513+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references

  1. [1]

    Ross, Simulation, academic press2022

    S.M. Ross, Simulation, academic press2022

  2. [2]

    Boulesteix, R.H

    A.-L. Boulesteix, R.H. Groenwold, M. Abrahamowicz, H. Binder, M. Briel, R. Hornung, T.P. Morris, J. Rahnenführer, W. Sauerbrei, Introduction to statistical simulations in health research, BMJ open, 10 (2020) e039921

  3. [3]

    Alizadeh, J.K

    R. Alizadeh, J.K. Allen, F. Mistree, Managing computational complexity using surrogate models: a critical review, Research in Engineering Design, 31 (2020) 275-298

  4. [4]

    R. Seri, D. Secchi, How many times should one run a computational simulation?, Simulating social complexity: A handbook, (2017) 229-251

  5. [5]

    Confidence Intervals

    E.T. Lofgren, Visualizing Results From Infection Transmission Models: A Case Against “Confidence Intervals”, Epidemiology, 23 (2012) 738-741

  6. [6]

    Halsey, The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum?, Biology letters, 15 (2019) 20190174

    L.G. Halsey, The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum?, Biology letters, 15 (2019) 20190174

  7. [7]

    Halsey, D

    L.G. Halsey, D. Curran-Everett, S.L. Vowler, G.B. Drummond, The fickle P value generates irreproducible results, Nature methods, 12 (2015) 179-185

  8. [8]

    Greenland, S.J

    S. Greenland, S.J. Senn, K.J. Rothman, J.B. Carlin, C. Poole, S.N. Goodman, D.G. Altman, Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations, European journal of epidemiology, 31 (2016) 337-350

  9. [9]

    Nakagawa, T.M

    S. Nakagawa, T.M. Foster, The case against retrospective statistical power analyses with an introduction to power analysis, Acta ethologica, 7 (2004) 103-108

  10. [10]

    Hoenig, D.M

    J.M. Hoenig, D.M. Heisey, The abuse of power: the pervasive fallacy of power calculations for data analysis, The American Statistician, 55 (2001) 19-24

  11. [11]

    Maxwell, K

    S.E. Maxwell, K. Kelley, J.R. Rausch, Sample size planning for statistical power and accuracy in parameter estimation, Annu. Rev. Psychol., 59 (2008) 537-563

  12. [12]

    Cox, P.J

    D.R. Cox, P.J. Solomon, Components of variance, Chapman and Hall/CRC2002

  13. [13]

    M.L. Head, L. Holman, R. Lanfear, A.T. Kahn, M.D. Jennions, The extent and consequences of p- hacking in science, PLoS biology, 13 (2015) e1002106

  14. [14]

    Greenland, Modeling and variable selection in epidemiologic analysis, American journal of public health, 79 (1989) 340-349

    S. Greenland, Modeling and variable selection in epidemiologic analysis, American journal of public health, 79 (1989) 340-349

  15. [15]

    Westreich, S

    D. Westreich, S. Greenland, The table 2 fallacy: presenting and interpreting confounder and modifier coefficients, American journal of epidemiology, 177 (2013) 292-298

  16. [16]

    White, A

    J.W. White, A. Rassweiler, J.F. Samhouri, A.C. Stier, C. White, Ecologists should not use statistical significance tests to interpret simulation model results, Oikos, 123 (2014) 385-388