pith. machine review for the scientific record. sign in

arxiv: 2604.20788 · v1 · submitted 2026-04-22 · 🧮 math.ST · stat.TH

Recognition: unknown

The E-measure

Nick W. Koning

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:32 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords classe-measuresevidencehypothesise-measureboundsclosedcontrol
0
0 comments X

The pith

E-measures generalize E-values to intersection-closed hypothesis classes, yielding uniform evidence bounds, automatic familywise evidence control without multiplicity correction, and a frequentist E-prior to E-posterior update.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The E-measure extends the idea of an E-value, which quantifies evidence against a single hypothesis, to entire collections of hypotheses. It follows a compatibility rule: evidence against a narrower hypothesis must be at least as strong as against a broader one. Unlike probabilities that add up, these measures are closed under taking the greatest lower bound, or infimum. When the collection of hypotheses is closed under intersections, the paper claims E-measures are the only non-dominated objects satisfying the compatibility rule. In decision problems the authors form a hypothesis class from the uncertain bad outcomes of each possible action, producing uniform bounds on the consequences that contain high-probability loss statements as special cases. For multiple related claims the same closure property lets the E-measure control overall evidence levels without the usual multiplicity adjustments. The framework also defines an updating step from an E-prior to an E-posterior that stays within a frequentist interpretation. The authors suggest the same construction can be applied to any unknown quantity to obtain predictive versions of the measure.

Core claim

We show that E-measures are the only non-dominated such objects, if the hypothesis class is closed under intersections. ... E-measures control these without multiplicity correction if the hypothesis class is intersection-closed.

Load-bearing premise

The compatibility axiom that there should be at least as much evidence against more specific hypotheses, together with the assumption that the hypothesis class is closed under intersections for the uniqueness and automatic control results.

Figures

Figures reproduced from arXiv: 2604.20788 by Nick W. Koning.

Figure 1
Figure 1. Figure 1: Toy example based on the baseline family of hypotheses [PITH_FULL_IMAGE:figures/full_fig_p024_1.png] view at source ↗
read the original abstract

We introduce the E-measure: a measure-like generalization of the E-value to a class of hypotheses. Unlike classical measures, E-measures are closed under infimums instead of addition. They arise from a compatibility axiom with logical implications, that there should be at least as much evidence against more specific hypotheses. We show that E-measures are the only non-dominated such objects, if the hypothesis class is closed under intersections. We propose to use the E-measure to present all the relevant evidence for a problem, where the relevance is captured by the choice of hypothesis class. We showcase this by applying the E-measure to decision making, inducing a hypothesis class from the uncertain consequences of decisions. This results in uniform E-consequence bounds on decisions, which nest high-probability loss bounds. Correcting for multiplicity, we consider 'familywise evidence' and 'false evidence rate' control, generalizing from errors and discoveries to continuous evidence. Remarkably, E-measures control these without multiplicity correction if the hypothesis class is intersection-closed. Moreover, we obtain a 'frequentist' notion of updating from E-prior to E-posterior. Abstracting the notion of a 'hypothesis', we advocate for using E-measures for any unknown quantity, leading to predictive E-measures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces the E-measure as a measure-like generalization of the E-value to a class of hypotheses. It is defined to satisfy a compatibility axiom (at least as much evidence against more specific hypotheses) and is closed under infima rather than addition. The authors claim that E-measures are the unique non-dominated objects satisfying the axiom when the hypothesis class is closed under intersections. Applications include inducing hypothesis classes from decision consequences to obtain uniform E-consequence bounds (nesting high-probability loss bounds), definitions of familywise evidence and false evidence rate with automatic control (no multiplicity correction needed under intersection closure), a frequentist updating rule from E-prior to E-posterior, and an extension to predictive E-measures for arbitrary unknown quantities.

Significance. If the derivations hold, the paper supplies a coherent axiomatic framework that unifies E-values with logical specificity and yields conditional uniqueness plus automatic evidence-rate control. The decision-theoretic application and predictive extension are potentially useful. The explicit conditioning of results on intersection closure is a strength, as is the avoidance of ad-hoc parameters. The full manuscript supplies the missing derivations referenced in the abstract, including the uniqueness argument and control bounds, which addresses the initial concern about soundness.

major comments (2)
  1. §3, Theorem 2 (uniqueness): the proof that E-measures are the only non-dominated objects under intersection closure is load-bearing for the central claim; the manuscript should explicitly state the domination partial order and verify that the infimum construction saturates it without additional assumptions.
  2. §5.2, Proposition 4 (familywise evidence control): the automatic control result without multiplicity correction is central and conditioned on intersection closure; the derivation should include an explicit inequality relating the E-measure of the intersection to the supremum of individual E-measures, with a counter-example when closure fails.
minor comments (3)
  1. Notation: the symbol for the E-measure is introduced without a clear distinction from the classical E-value in the first paragraphs; a dedicated definition box would improve readability.
  2. §4: the construction of the hypothesis class from decision consequences is sketched but lacks a worked numerical example showing how the induced E-consequence bound differs from a standard high-probability bound.
  3. References: the manuscript cites the E-value literature but omits recent work on e-processes and anytime-valid inference that would contextualize the frequentist updating rule.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. The suggestions help clarify the central results on uniqueness and automatic control. We address each major comment below and will incorporate the requested clarifications in the revision.

read point-by-point responses
  1. Referee: §3, Theorem 2 (uniqueness): the proof that E-measures are the only non-dominated objects under intersection closure is load-bearing for the central claim; the manuscript should explicitly state the domination partial order and verify that the infimum construction saturates it without additional assumptions.

    Authors: We agree that an explicit statement of the domination partial order will improve readability. In the revised manuscript we will add a formal definition (an E-measure M dominates N if M(H) ≥ N(H) for every hypothesis H, with strict inequality for at least one H) and insert a short verification paragraph immediately before the proof of Theorem 2 showing that the infimum construction saturates this order under intersection closure alone, without further assumptions on the underlying probability space or the E-value family. revision: yes

  2. Referee: §5.2, Proposition 4 (familywise evidence control): the automatic control result without multiplicity correction is central and conditioned on intersection closure; the derivation should include an explicit inequality relating the E-measure of the intersection to the supremum of individual E-measures, with a counter-example when closure fails.

    Authors: We accept the suggestion. The revised version will insert an explicit step deriving the inequality E(∩ H_i) ≥ sup E(H_i) directly from the compatibility axiom and intersection closure, which immediately yields the familywise control bound. We will also add a short counter-example (a finite non-intersection-closed collection of hypotheses where the inequality fails) to illustrate why the closure assumption is necessary. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is axiomatic

full rationale

The paper constructs E-measures directly from an explicit compatibility axiom (at least as much evidence against more specific hypotheses) and proves uniqueness of non-dominated objects precisely when the hypothesis class is closed under intersections. All subsequent claims—uniform E-consequence bounds, automatic control of familywise evidence and false evidence rate without multiplicity correction, and E-prior to E-posterior updating—are conditioned on that same closure assumption rather than on any fitted parameter, self-referential definition, or prior result by the same author. No equations reduce a prediction to its own input by construction, and the abstract and stated results contain no load-bearing self-citations. The derivation chain is therefore self-contained against the stated axioms and external closure condition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on one explicit domain assumption (the compatibility axiom) and the technical requirement that hypothesis classes be intersection-closed for the main uniqueness and control theorems; no free parameters or new physical entities are introduced.

axioms (1)
  • domain assumption Compatibility axiom with logical implications: there should be at least as much evidence against more specific hypotheses.
    Stated in the abstract as the origin of E-measures.
invented entities (1)
  • E-measure no independent evidence
    purpose: Measure-like generalization of the E-value to a class of hypotheses closed under infimums.
    Newly defined object whose properties are derived from the compatibility axiom.

pith-pipeline@v0.9.0 · 5510 in / 1461 out tokens · 31625 ms · 2026-05-09T22:32:48.193181+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 13 canonical work pages

  1. [1]

    arXiv preprint arXiv:2502.17830 , year=

    I. Andrews and J. Chen. Certified decisions.arXiv preprint arXiv:2502.17830,

  2. [2]

    Post-hoc large-sample statistical inference.arXiv preprint arXiv:2603.08002,

    B. Chugg, E. Gauthier, M. I. Jordan, A. Ramdas, and I. Waudby-Smith. Post- hoc large-sample statistical inference.arXiv preprint arXiv:2603.08002,

  3. [3]

    arXiv preprint arXiv:2501.09015 , year=

    W. Hartog and L. Lei. Family-wise error rate control with e-values.arXiv preprint arXiv:2501.09015,

  4. [4]

    Asymptotic and compound e-values: multiple testing and empirical Bayes,

    N. Ignatiadis, R. Wang, and A. Ramdas. Asymptotic and compound e-values: multiple testing and empirical bayes.arXiv preprint arXiv:2409.19812,

  5. [5]

    N. W. Koning. Post-hoc alpha hypothesis testing and the post-hoc p-value. arXiv preprint arXiv:2312.08040,

  6. [6]

    N. W. Koning. Continuous testing: Unifying tests and e-values.arXiv preprint arXiv:2409.05654,

  7. [7]

    N. W. Koning and S. van Meer. Fuzzy prediction sets: Conformal prediction with e-values.arXiv preprint arXiv:2509.13130,

  8. [8]

    doi: 10.1093/jrsssb/qkag050. S. Koobs and N. W. Koning. Equivalence testing with data-dependent and post-hoc equivalence margins.arXiv preprint arXiv:2603.16213,

  9. [9]

    doi: 10.1093/biomet/63.3.655

    ISSN 0006-3444. doi: 10.1093/biomet/63.3.655. A. Ramdas and R. Wang. Hypothesis testing with e-values.Foundations and Trends®in Statistics, 1(1-2):1–390,

  10. [10]

    Ramdas, J

    A. Ramdas, J. Ruf, M. Larsson, and W. Koolen. Admissible anytime-valid sequential inference must rely on nonnegative martingales.arXiv preprint arXiv:2009.03167,

  11. [11]

    Journal of the Royal Statistical Society: Series A (Statistics in Society) , volume =

    ISSN 0964-1998. doi: 10.1111/rssa.12647. N. Shilkret. Maxitive measure and integration. InIndagationes Mathematicae (Proceedings), volume 74, pages 109–116. Elsevier,

  12. [12]

    doi: 10.1111/rssb.12489. I. Waudby-Smith and A. Ramdas. Estimating means of bounded ran- dom variables by betting.Journal of the Royal Statistical Society Series B: Statistical Methodology, 86(1):1–27, 02

  13. [13]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =

    ISSN 1369-7412. doi: 10.1093/jrsssb/qkad009. Z. Xu, A. Solari, L. Fischer, R. de Heide, A. Ramdas, and J. Goeman. Bringing closure to false discovery rate control: A general principle for multiple testing. arXiv preprint arXiv:2509.02517,

  14. [14]

    A E-integration To aggregate over an E-function, we consider a notionE-integrationwith respect to an E-functionε. E-integrals share some properties with classical integrals: positive homogeneity, monotonicity (for E-capacities), their behavior on indica- tor functions (inverted), and point evaluation under Dirac E-measures. The key difference to classical...