arxiv: 2604.20788 · v1 · submitted 2026-04-22 · 🧮 math.ST · stat.TH

Recognition: unknown

The E-measure

Nick W. Koning

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:32 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords classe-measuresevidencehypothesise-measureboundsclosedcontrol

0 comments

The pith

E-measures generalize E-values to intersection-closed hypothesis classes, yielding uniform evidence bounds, automatic familywise evidence control without multiplicity correction, and a frequentist E-prior to E-posterior update.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The E-measure extends the idea of an E-value, which quantifies evidence against a single hypothesis, to entire collections of hypotheses. It follows a compatibility rule: evidence against a narrower hypothesis must be at least as strong as against a broader one. Unlike probabilities that add up, these measures are closed under taking the greatest lower bound, or infimum. When the collection of hypotheses is closed under intersections, the paper claims E-measures are the only non-dominated objects satisfying the compatibility rule. In decision problems the authors form a hypothesis class from the uncertain bad outcomes of each possible action, producing uniform bounds on the consequences that contain high-probability loss statements as special cases. For multiple related claims the same closure property lets the E-measure control overall evidence levels without the usual multiplicity adjustments. The framework also defines an updating step from an E-prior to an E-posterior that stays within a frequentist interpretation. The authors suggest the same construction can be applied to any unknown quantity to obtain predictive versions of the measure.

Core claim

We show that E-measures are the only non-dominated such objects, if the hypothesis class is closed under intersections. ... E-measures control these without multiplicity correction if the hypothesis class is intersection-closed.

Load-bearing premise

The compatibility axiom that there should be at least as much evidence against more specific hypotheses, together with the assumption that the hypothesis class is closed under intersections for the uniqueness and automatic control results.

Figures

Figures reproduced from arXiv: 2604.20788 by Nick W. Koning.

read the original abstract

We introduce the E-measure: a measure-like generalization of the E-value to a class of hypotheses. Unlike classical measures, E-measures are closed under infimums instead of addition. They arise from a compatibility axiom with logical implications, that there should be at least as much evidence against more specific hypotheses. We show that E-measures are the only non-dominated such objects, if the hypothesis class is closed under intersections. We propose to use the E-measure to present all the relevant evidence for a problem, where the relevance is captured by the choice of hypothesis class. We showcase this by applying the E-measure to decision making, inducing a hypothesis class from the uncertain consequences of decisions. This results in uniform E-consequence bounds on decisions, which nest high-probability loss bounds. Correcting for multiplicity, we consider 'familywise evidence' and 'false evidence rate' control, generalizing from errors and discoveries to continuous evidence. Remarkably, E-measures control these without multiplicity correction if the hypothesis class is intersection-closed. Moreover, we obtain a 'frequentist' notion of updating from E-prior to E-posterior. Abstracting the notion of a 'hypothesis', we advocate for using E-measures for any unknown quantity, leading to predictive E-measures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

E-measures give a clean axiomatic way to extend single E-values to hypothesis classes with infimum closure and automatic control results when the class is intersection-closed.

read the letter

The paper defines E-measures via a compatibility axiom that demands at least as much evidence against more specific hypotheses, then proves they are the only non-dominated objects with this property when the hypothesis class is closed under intersections. It also shows closure under infima rather than addition, which leads directly to uniform bounds on decisions and to control of familywise evidence and false evidence rates without extra multiplicity adjustments under the same closure condition. The applications to decision making by inducing hypotheses from uncertain consequences, and the sketch of E-prior to E-posterior updating, follow from the same setup. These pieces are presented as conditional on the stated assumptions, not as unconditional claims. The derivations appear to hold internally once the axiom and closure are granted, and the abstract's assertions line up with the stress-test note that no internal contradictions surface. The main limitation is that the concrete payoff still depends on how one chooses or constructs the hypothesis class in any given problem; the paper illustrates this but does not yet supply side-by-side comparisons against standard E-value or p-value procedures on real data sets. The frequentist updating notion is sketched at a high level and would benefit from a worked numerical example to show how it differs from Bayesian updating in practice. Readers already working with E-values or safe inference will find the new object useful for organizing evidence across related hypotheses. The work is coherent on its own terms and deserves a serious referee to check the proofs and to press on the scope of the applications.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces the E-measure as a measure-like generalization of the E-value to a class of hypotheses. It is defined to satisfy a compatibility axiom (at least as much evidence against more specific hypotheses) and is closed under infima rather than addition. The authors claim that E-measures are the unique non-dominated objects satisfying the axiom when the hypothesis class is closed under intersections. Applications include inducing hypothesis classes from decision consequences to obtain uniform E-consequence bounds (nesting high-probability loss bounds), definitions of familywise evidence and false evidence rate with automatic control (no multiplicity correction needed under intersection closure), a frequentist updating rule from E-prior to E-posterior, and an extension to predictive E-measures for arbitrary unknown quantities.

Significance. If the derivations hold, the paper supplies a coherent axiomatic framework that unifies E-values with logical specificity and yields conditional uniqueness plus automatic evidence-rate control. The decision-theoretic application and predictive extension are potentially useful. The explicit conditioning of results on intersection closure is a strength, as is the avoidance of ad-hoc parameters. The full manuscript supplies the missing derivations referenced in the abstract, including the uniqueness argument and control bounds, which addresses the initial concern about soundness.

major comments (2)

§3, Theorem 2 (uniqueness): the proof that E-measures are the only non-dominated objects under intersection closure is load-bearing for the central claim; the manuscript should explicitly state the domination partial order and verify that the infimum construction saturates it without additional assumptions.
§5.2, Proposition 4 (familywise evidence control): the automatic control result without multiplicity correction is central and conditioned on intersection closure; the derivation should include an explicit inequality relating the E-measure of the intersection to the supremum of individual E-measures, with a counter-example when closure fails.

minor comments (3)

Notation: the symbol for the E-measure is introduced without a clear distinction from the classical E-value in the first paragraphs; a dedicated definition box would improve readability.
§4: the construction of the hypothesis class from decision consequences is sketched but lacks a worked numerical example showing how the induced E-consequence bound differs from a standard high-probability bound.
References: the manuscript cites the E-value literature but omits recent work on e-processes and anytime-valid inference that would contextualize the frequentist updating rule.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. The suggestions help clarify the central results on uniqueness and automatic control. We address each major comment below and will incorporate the requested clarifications in the revision.

read point-by-point responses

Referee: §3, Theorem 2 (uniqueness): the proof that E-measures are the only non-dominated objects under intersection closure is load-bearing for the central claim; the manuscript should explicitly state the domination partial order and verify that the infimum construction saturates it without additional assumptions.

Authors: We agree that an explicit statement of the domination partial order will improve readability. In the revised manuscript we will add a formal definition (an E-measure M dominates N if M(H) ≥ N(H) for every hypothesis H, with strict inequality for at least one H) and insert a short verification paragraph immediately before the proof of Theorem 2 showing that the infimum construction saturates this order under intersection closure alone, without further assumptions on the underlying probability space or the E-value family. revision: yes
Referee: §5.2, Proposition 4 (familywise evidence control): the automatic control result without multiplicity correction is central and conditioned on intersection closure; the derivation should include an explicit inequality relating the E-measure of the intersection to the supremum of individual E-measures, with a counter-example when closure fails.

Authors: We accept the suggestion. The revised version will insert an explicit step deriving the inequality E(∩ H_i) ≥ sup E(H_i) directly from the compatibility axiom and intersection closure, which immediately yields the familywise control bound. We will also add a short counter-example (a finite non-intersection-closed collection of hypotheses where the inequality fails) to illustrate why the closure assumption is necessary. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is axiomatic

full rationale

The paper constructs E-measures directly from an explicit compatibility axiom (at least as much evidence against more specific hypotheses) and proves uniqueness of non-dominated objects precisely when the hypothesis class is closed under intersections. All subsequent claims—uniform E-consequence bounds, automatic control of familywise evidence and false evidence rate without multiplicity correction, and E-prior to E-posterior updating—are conditioned on that same closure assumption rather than on any fitted parameter, self-referential definition, or prior result by the same author. No equations reduce a prediction to its own input by construction, and the abstract and stated results contain no load-bearing self-citations. The derivation chain is therefore self-contained against the stated axioms and external closure condition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on one explicit domain assumption (the compatibility axiom) and the technical requirement that hypothesis classes be intersection-closed for the main uniqueness and control theorems; no free parameters or new physical entities are introduced.

axioms (1)

domain assumption Compatibility axiom with logical implications: there should be at least as much evidence against more specific hypotheses.
Stated in the abstract as the origin of E-measures.

invented entities (1)

E-measure no independent evidence
purpose: Measure-like generalization of the E-value to a class of hypotheses closed under infimums.
Newly defined object whose properties are derived from the compatibility axiom.

pith-pipeline@v0.9.0 · 5510 in / 1461 out tokens · 31625 ms · 2026-05-09T22:32:48.193181+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 13 canonical work pages

[1]

arXiv preprint arXiv:2502.17830 , year=

I. Andrews and J. Chen. Certified decisions.arXiv preprint arXiv:2502.17830,

work page arXiv
[2]

Post-hoc large-sample statistical inference.arXiv preprint arXiv:2603.08002,

B. Chugg, E. Gauthier, M. I. Jordan, A. Ramdas, and I. Waudby-Smith. Post- hoc large-sample statistical inference.arXiv preprint arXiv:2603.08002,

work page arXiv
[3]

arXiv preprint arXiv:2501.09015 , year=

W. Hartog and L. Lei. Family-wise error rate control with e-values.arXiv preprint arXiv:2501.09015,

work page arXiv
[4]

Asymptotic and compound e-values: multiple testing and empirical Bayes,

N. Ignatiadis, R. Wang, and A. Ramdas. Asymptotic and compound e-values: multiple testing and empirical bayes.arXiv preprint arXiv:2409.19812,

work page arXiv
[5]

N. W. Koning. Post-hoc alpha hypothesis testing and the post-hoc p-value. arXiv preprint arXiv:2312.08040,

work page arXiv
[6]

N. W. Koning. Continuous testing: Unifying tests and e-values.arXiv preprint arXiv:2409.05654,

work page arXiv
[7]

N. W. Koning and S. van Meer. Fuzzy prediction sets: Conformal prediction with e-values.arXiv preprint arXiv:2509.13130,

work page arXiv
[8]

doi: 10.1093/jrsssb/qkag050. S. Koobs and N. W. Koning. Equivalence testing with data-dependent and post-hoc equivalence margins.arXiv preprint arXiv:2603.16213,

work page doi:10.1093/jrsssb/qkag050
[9]

doi: 10.1093/biomet/63.3.655

ISSN 0006-3444. doi: 10.1093/biomet/63.3.655. A. Ramdas and R. Wang. Hypothesis testing with e-values.Foundations and Trends®in Statistics, 1(1-2):1–390,

work page doi:10.1093/biomet/63.3.655
[10]

Ramdas, J

A. Ramdas, J. Ruf, M. Larsson, and W. Koolen. Admissible anytime-valid sequential inference must rely on nonnegative martingales.arXiv preprint arXiv:2009.03167,

work page arXiv 2009
[11]

Journal of the Royal Statistical Society: Series A (Statistics in Society) , volume =

ISSN 0964-1998. doi: 10.1111/rssa.12647. N. Shilkret. Maxitive measure and integration. InIndagationes Mathematicae (Proceedings), volume 74, pages 109–116. Elsevier,

work page doi:10.1111/rssa.12647 1998
[12]

doi: 10.1111/rssb.12489. I. Waudby-Smith and A. Ramdas. Estimating means of bounded ran- dom variables by betting.Journal of the Royal Statistical Society Series B: Statistical Methodology, 86(1):1–27, 02

work page doi:10.1111/rssb.12489
[13]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume =

ISSN 1369-7412. doi: 10.1093/jrsssb/qkad009. Z. Xu, A. Solari, L. Fischer, R. de Heide, A. Ramdas, and J. Goeman. Bringing closure to false discovery rate control: A general principle for multiple testing. arXiv preprint arXiv:2509.02517,

work page doi:10.1093/jrsssb/qkad009
[14]

A E-integration To aggregate over an E-function, we consider a notionE-integrationwith respect to an E-functionε. E-integrals share some properties with classical integrals: positive homogeneity, monotonicity (for E-capacities), their behavior on indica- tor functions (inverted), and point evaluation under Dirac E-measures. The key difference to classical...

1971