Minimax unbiased estimation for finite populations with bounded outcomes

Patrick Lopatto; P. M. Aronow

arxiv: 2605.20572 · v1 · pith:TKF32V2Dnew · submitted 2026-05-20 · 🧮 math.ST · stat.ME· stat.TH

Minimax unbiased estimation for finite populations with bounded outcomes

P. M. Aronow , Patrick Lopatto This is my paper

Pith reviewed 2026-05-21 02:45 UTC · model grok-4.3

classification 🧮 math.ST stat.MEstat.TH

keywords minimax estimationfinite population samplingdesign-unbiased estimatorsHorvitz-Thompson estimatorbounded outcomessurvey samplingworst-case risk

0 comments

The pith

When each unit's outcome is confined to a known interval, the minimax unbiased estimator for the population total is a midpoint-adjusted Horvitz-Thompson estimator paired with independent sampling whose probabilities are proportional to the

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that for any fixed set of inclusion probabilities, the worst-case squared error of design-unbiased estimators of the finite population total is minimized when the sampling design renders the inclusion indicators pairwise independent. In this case the optimal estimator adjusts each sampled observation by subtracting the midpoint of its known interval before weighting by the inverse inclusion probability. The authors then optimize the choice of those inclusion probabilities under a budget on their sum and show that setting each probability to the minimum of one and a constant times the length of the corresponding interval yields the overall minimax strategy. This construction extends earlier minimax results from the subclass of linear estimators to the entire class of unbiased estimators.

Core claim

For any sampling design with positive inclusion probabilities, a sharp lower bound exists on the worst-case squared error over all possible outcomes in the product of the intervals [a_i, b_i]. Equality holds if and only if the inclusion indicators are pairwise independent, and the estimator that attains the bound is the midpoint-differenced Horvitz-Thompson estimator. Solving the joint optimization problem shows that the minimax design samples each unit independently with probability min(1, c times the interval length) for a constant c chosen to meet the size constraint.

What carries the argument

The midpoint-differenced Horvitz-Thompson estimator together with a sampling design that makes inclusion indicators pairwise independent.

If this is right

Any pairwise-independent design achieves the lower bound for its given inclusion probabilities.
The estimator is admissible among unbiased affine-equivariant estimators.
The construction extends Gabler's linear minimax result to the full class of design-unbiased estimators.
The optimal inclusion probabilities are set to min(1, c(b_i - a_i)) with c chosen to satisfy the expected sample size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners can apply the length-proportional probabilities to obtain explicit worst-case guarantees in surveys where item values have known bounds.
Poisson sampling with these probabilities approximates the required pairwise independence in large populations.
The same rectangular-parameter-space argument may extend to minimax estimation of other functionals such as subpopulation totals.

Load-bearing premise

The possible values of the outcomes fill the entire rectangular region given by the product of the individual intervals, and only design-unbiased estimators are considered.

What would settle it

An unbiased estimator or design that achieves a strictly smaller worst-case squared error than the midpoint-differenced Horvitz-Thompson estimator under the proposed inclusion probabilities would falsify the claim that the bound is sharp and attained only by this construction.

read the original abstract

We study design-unbiased estimation of the finite-population total $\sum_{i=1}^N y_i$ when each outcome satisfies known bounds $y_i\in[a_i,b_i]$. For any sampling design with inclusion probabilities $\pi_i>0$, we prove a sharp lower bound on the worst-case squared error over the rectangular parameter space. This bound is attained if and only if the unit inclusion indicators are pairwise independent, in which case the minimax estimator is the midpoint-differenced Horvitz-Thompson estimator $\sum_{i=1}^N m_i+\sum_{i\in S}(y_i-m_i)/\pi_i$, with $m_i=(a_i+b_i)/{2}$. We then solve the joint design-and-estimation problem under the constraint $\sum_i \pi_i\le n$. We find that a minimax strategy samples units independently with probabilities $\pi_i^\ast=\min(1,c (b_i-a_i))$ where $c>0$ is chosen so that $\sum_i \pi_i^\ast=n$, and uses the midpoint-differenced estimator. This extends Gabler (1990)'s linear minimax result to the full class of design-unbiased estimators. We also show that the estimator is admissible among unbiased estimators and affine equivariant.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Paper gives sharp minimax bound for all design-unbiased estimators and an explicit optimal independent sampling design under known interval bounds.

read the letter

The key point is that this paper extends Gabler's 1990 linear minimax result to the full class of design-unbiased estimators for the finite-population total when each y_i sits in a known interval [a_i, b_i]. It proves a sharp lower bound on worst-case squared error that is attained exactly when the inclusion indicators are pairwise independent, and shows the midpoint-differenced Horvitz-Thompson estimator achieves it. They then solve the joint problem by recommending independent Bernoulli sampling with probabilities proportional to interval length, scaled so the expected size is n, paired with that estimator. The rectangular parameter space lets the risk separate across units, which keeps the argument direct and yields an explicit, computable strategy. The admissibility result among unbiased affine-equivariant estimators is a useful add-on. The derivations rest on straightforward worst-case analysis rather than fitting or simulation, so the claims look internally consistent. One soft spot is the strong rectangular assumption: real data may have dependence across units that the product space ignores, which could make the worst-case bound less informative in practice. The handling of the exact sum-to-n constraint via the scaling constant c is clean for large N but might need more detail for small populations where the min(1, ·) kicks in often. This is aimed at survey statisticians and sampling theorists who work with bounded outcomes and want robust design-plus-estimation rules. A reader who values explicit minimax strategies over asymptotic approximations will find it useful. It deserves a serious referee because the extension is substantive, the conditions for attainment are stated clearly, and the result is directly applicable.

Referee Report

1 major / 3 minor

Summary. The manuscript proves a sharp lower bound on the worst-case mean squared error of any design-unbiased estimator for the finite-population total when each unit outcome y_i lies in a known interval [a_i, b_i]. The bound is attained if and only if the inclusion indicators are pairwise independent; in that case the midpoint-differenced Horvitz-Thompson estimator achieves the bound. The authors then solve the joint design-estimation problem under the constraint that the expected sample size equals n, obtaining independent Bernoulli sampling with inclusion probabilities π_i^* = min(1, c(b_i - a_i)) for a suitable c, together with the same midpoint-differenced estimator. They further establish admissibility of this estimator within the class of unbiased and affine-equivariant estimators. The work extends Gabler (1990) from the linear subclass to all design-unbiased estimators.

Significance. If the derivations hold, the paper supplies a complete, explicit minimax solution for sampling and estimation under rectangular boundedness constraints. The separation of per-unit risk contributions via the product parameter space, the necessity of pairwise independence for attaining the bound, and the closed-form optimal design constitute a substantive advance in finite-population minimax theory. The admissibility result adds practical weight to the recommendation.

major comments (1)

[Theorem 3.2] The necessity direction of the pairwise-independence characterization (that the lower bound is attained only when inclusions are pairwise independent) is load-bearing for the claim that the independent Bernoulli design is uniquely minimax. A concrete verification that the cross-term expectations vanish if and only if Cov(I_i, I_j) = 0 for i ≠ j would strengthen the argument.

minor comments (3)

[Section 2] The definition of the midpoint m_i = (a_i + b_i)/2 is used repeatedly but first appears only after the statement of the main lower-bound theorem; introducing it in the notation section would improve readability.
[Section 4] The constant c in the optimal inclusion probabilities π_i^* is defined implicitly by the equation ∑ min(1, c(b_i - a_i)) = n. An explicit algorithm or closed-form expression for c (or a reference to one) would help readers implement the design.
[References] The citation to Gabler (1990) is given only by year; the full bibliographic details should be supplied in the references.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and the recommendation of minor revision. We address the single major comment below.

read point-by-point responses

Referee: [Theorem 3.2] The necessity direction of the pairwise-independence characterization (that the lower bound is attained only when inclusions are pairwise independent) is load-bearing for the claim that the independent Bernoulli design is uniquely minimax. A concrete verification that the cross-term expectations vanish if and only if Cov(I_i, I_j) = 0 for i ≠ j would strengthen the argument.

Authors: We appreciate the suggestion to make the necessity direction more explicit. In the proof of Theorem 3.2 the worst-case MSE expands into a sum of per-unit variance terms plus cross terms of the form E[(I_i - π_i)(I_j - π_j)(y_i - m_i)(y_j - m_j)]/(π_i π_j). Because the parameter space is a product of intervals, the sign of each (y_k - m_k) can be chosen independently; consequently the cross term is nonnegative and strictly positive for some choice of y whenever Cov(I_i, I_j) ≠ 0. The cross term vanishes for every y precisely when the covariance is zero. We will insert a short remark immediately after the statement of Theorem 3.2 that isolates this direct calculation and thereby renders the if-and-only-if claim fully concrete. revision: yes

Circularity Check

0 steps flagged

No significant objection identified

full rationale

The paper establishes its central minimax result via direct mathematical arguments: a sharp lower bound on worst-case squared error is derived over the rectangular product space of intervals [a_i, b_i], shown to be attained precisely when inclusion indicators are pairwise independent, and the midpoint-differenced Horvitz-Thompson estimator is identified as the unique attaining estimator for any fixed marginals π_i. The subsequent design optimization then selects the specific π_i^* = min(1, c(b_i - a_i)) that minimizes this bound subject to ∑π_i = n. These steps rely on explicit use of the product structure to separate per-unit risk contributions and extend the earlier linear-minimax result of Gabler (1990) without any self-referential definitions, fitted inputs renamed as predictions, or load-bearing self-citations. The derivation chain is therefore self-contained against the stated assumptions and external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard finite-population sampling theory and the rectangular parameter space assumption; no free parameters are fitted to data, and no new entities are postulated.

axioms (2)

domain assumption Outcomes y_i lie in known fixed intervals [a_i, b_i] forming a rectangular parameter space.
Invoked to define the worst-case squared error and derive the sharp lower bound.
domain assumption Sampling designs have positive inclusion probabilities π_i > 0.
Required for the Horvitz-Thompson estimator to be defined and unbiased.

pith-pipeline@v0.9.0 · 5765 in / 1353 out tokens · 37316 ms · 2026-05-21T02:45:37.834091+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

rectangular parameter space Θ = ∏[a_i,b_i]

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

Aggarwal, O. P. , title =. Ann. Math. Statist. , year =

work page
[2]

Bickel, P. J. and Lehmann, E. L. , title =. Ann. Statist. , year =

work page
[3]

Cassel, C. M. and S. Some results on generalized difference estimation and generalized regression estimation for finite populations , journal =. 1976 , volume =

work page 1976
[4]

and Li, K.-C

Cheng, C.-S. and Li, K.-C. , title =. Ann. Statist. , year =

work page
[5]

Deville, J.-C. and S. Calibration estimators in survey sampling , journal =. 1992 , volume =

work page 1992
[6]

, title =

Gabler, S. , title =

work page
[7]

Godambe, V. P. , title =. J. Roy. Statist. Soc. B , year =

work page
[8]

Godambe, V. P. and Joshi, V. M. , title =. Ann. Math. Statist. , year =

work page
[9]

Horvitz, D. G. and Thompson, D. J. , title =. J. Amer. Statist. Assoc. , year =

work page
[10]

The Annals of Statistics , volume=

The best strategy for estimating the mean of a finite population , author=. The Annals of Statistics , volume=. 1979 , publisher=

work page 1979
[11]

Statistics and probability: essays in honor of C.R

Minimax estimation in simple random sampling , author=. Statistics and probability: essays in honor of C.R. Rao. North-Holland Publishing Company , pages=

work page
[12]

Metrika , volume=

A conditional minimax approach in survey sampling , author=. Metrika , volume=. 1988 , publisher=

work page 1988
[13]

The Annals of Statistics , pages=

Asymptotic analysis of minimax strategies in survey sampling , author=. The Annals of Statistics , pages=. 1989 , publisher=

work page 1989

[1] [1]

Aggarwal, O. P. , title =. Ann. Math. Statist. , year =

work page

[2] [2]

Bickel, P. J. and Lehmann, E. L. , title =. Ann. Statist. , year =

work page

[3] [3]

Cassel, C. M. and S. Some results on generalized difference estimation and generalized regression estimation for finite populations , journal =. 1976 , volume =

work page 1976

[4] [4]

and Li, K.-C

Cheng, C.-S. and Li, K.-C. , title =. Ann. Statist. , year =

work page

[5] [5]

Deville, J.-C. and S. Calibration estimators in survey sampling , journal =. 1992 , volume =

work page 1992

[6] [6]

, title =

Gabler, S. , title =

work page

[7] [7]

Godambe, V. P. , title =. J. Roy. Statist. Soc. B , year =

work page

[8] [8]

Godambe, V. P. and Joshi, V. M. , title =. Ann. Math. Statist. , year =

work page

[9] [9]

Horvitz, D. G. and Thompson, D. J. , title =. J. Amer. Statist. Assoc. , year =

work page

[10] [10]

The Annals of Statistics , volume=

The best strategy for estimating the mean of a finite population , author=. The Annals of Statistics , volume=. 1979 , publisher=

work page 1979

[11] [11]

Statistics and probability: essays in honor of C.R

Minimax estimation in simple random sampling , author=. Statistics and probability: essays in honor of C.R. Rao. North-Holland Publishing Company , pages=

work page

[12] [12]

Metrika , volume=

A conditional minimax approach in survey sampling , author=. Metrika , volume=. 1988 , publisher=

work page 1988

[13] [13]

The Annals of Statistics , pages=

Asymptotic analysis of minimax strategies in survey sampling , author=. The Annals of Statistics , pages=. 1989 , publisher=

work page 1989