arxiv: 2605.01615 · v1 · submitted 2026-05-02 · 📊 stat.ME · stat.AP· stat.OT

Recognition: unknown

Threshold Exceedance Estimation in Spatially Correlated Areal Data Using Maxima-Nominated Sampling

Mohammad Jafari Jozani

Authors on Pith no claims yet

Pith reviewed 2026-05-09 17:42 UTC · model grok-4.3

classification 📊 stat.ME stat.APstat.OT

keywords exceedance probabilitymaxima-nominated samplingspatial correlationareal datasampling designthreshold estimationDUST-MNShealth surveillance

0 comments

The pith

A maxima-nominated sampling design produces a closed-form estimator for the fraction of high-risk areas that has lower variance than standard sampling when the fraction is low to moderate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a sampling strategy for estimating the share of areal units whose risk levels exceed a fixed threshold, a task common in health and environmental monitoring. It creates small groups of candidate areas, ranks them cheaply by judgment or covariates, and collects full measurements only on the highest-ranked area in each group. This approach spreads the measured units across space to lessen the impact of spatial autocorrelation. The authors obtain exact expressions for the bias and variance of the resulting estimator and prove it is more efficient than simple random sampling or a related sequential design whenever the true proportion lies below a cutoff that depends only on group size.

Core claim

The DUST-MNS design forms n candidate sets of size k and obtains final measurements only from the area judged highest risk in each set, yielding n measured areas from nk screened candidates. A closed-form estimator of the exceedance probability θ is derived from these measurements, together with its exact bias and variance. In the regime θ < θ⋆(k), where θ⋆(k) depends only on k, the DUST-MNS estimator has smaller variance than the estimators from simple random sampling and from DUST-SRS. The paper supplies guidance on choosing k, efficiency bounds under a Beta model, extensions to imperfect ranking, variance estimators, and bootstrap confidence intervals.

What carries the argument

DUST-MNS design that forms n sets of k candidates, ranks them by auxiliary information or judgment, and measures only the maximum in each set, thereby combining maxima-nominated sampling with pps-DUST to control spatial spread and autocorrelation.

If this is right

The estimator admits closed-form expressions for bias and variance that do not require simulation.
Efficiency gains hold for all θ below a cutoff θ⋆(k) that is determined solely by the set size k.
The method extends directly to imperfect ranking and supplies bootstrap confidence intervals.
Efficiency bounds are available under a Beta model for the underlying success probabilities.
Guidance is given for selecting the set size k to balance screening cost against variance reduction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same grouping-and-maxima logic could be tested on non-areal spatial data such as point-referenced pollution readings or gridded satellite imagery.
Sequential updating of the ranking information across multiple survey waves might further reduce the number of full measurements needed.
Simulations that vary the strength of spatial correlation beyond the Beta model would show whether the reported efficiency threshold θ⋆(k) remains stable.

Load-bearing premise

Areas within each candidate set can be ranked by risk with sufficient accuracy using only expert judgment, prior surveys, or cheap auxiliary covariates.

What would settle it

Apply the DUST-MNS estimator and an SRS estimator to the same fixed spatial population whose true exceedance proportion θ is known to lie above θ⋆(k) and compare their empirical mean squared errors.

Figures

Figures reproduced from arXiv: 2605.01615 by Mohammad Jafari Jozani.

**Figure 1.** Figure 1: Left: county-level stroke prevalence (age-adjusted, adults view at source ↗

**Figure 2.** Figure 2: Left: empirical distribution of county-level stroke prevalence. The dashed vertical line marks view at source ↗

**Figure 3.** Figure 3: Theoretical efficiency at the empirical parameter values. Left: working-model relative efficiency view at source ↗

**Figure 4.** Figure 4: Comparison of Monte Carlo MSE values (×104 ) for the county stroke application under the two nonzero DUST autocorrelation settings, η0 = 0.15 and η0 = 0.30, two sample sizes n ∈ {10, 20} and set sizes k ∈ {2, 3, 4, 5} (for the MNS design) for the county stroke application. Within each panel, bars compare DUST-SRS, imperfect DUST-MNS, perfect DUST-MNS, and SRS. 23 view at source ↗

read the original abstract

We study estimation of the proportion of areal units in a spatially correlated domain whose success probabilities exceed a prespecified threshold. Such problems arise in health surveillance, environmental monitoring, and social policy, where the goal is to estimate the fraction of high-risk areas. We propose a DUST-MNS design that combines maxima-nominated sampling (MNS) with the probability-proportional-to-size dependent unit sequential technique (pps-DUST), thereby promoting spatial spread while mitigating the effect of spatial autocorrelation. The design forms $n$ candidate sets of size $k$ and obtains final measurements only from the area judged to be at highest risk in each set, yielding $n$ measured areas from $nk$ screened candidates. Ranking may be based on expert judgment, prior surveys, or easily obtained auxiliary covariates. We derive a closed-form estimator of the exceedance probability $\theta$ based on data from DUST-MNS design, establish its bias and variance, and show that, in the rare-to-moderate exceedance regime $\theta<\theta^\star(k)$, the proposed DUST-MNS estimator outperforms its SRS and DUST-SRS counterparts, where $\theta^\star(k)$ depends only on $k$. We also provide guidance on the choice of $k$, derive efficiency bounds under a Beta model, extend the method to imperfect ranking, and develop variance estimation and bootstrap confidence intervals. An application to county-level stroke prevalence data from CDC PLACES, using diabetes prevalence as the ranking concomitant, illustrates the proposed approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a closed-form estimator for exceedance probabilities under a new DUST-MNS design that beats SRS and DUST-SRS when θ is below a k-only cutoff, with bias/variance derivations and a CDC data example.

read the letter

The main takeaway is that this work combines maxima-nominated sampling with pps-DUST to create a design that screens nk candidates but measures only n of them, the highest-ranked in each set of k. From that they derive an explicit estimator for the exceedance probability θ, its exact bias and variance, and a clean dominance result: the new estimator has lower variance than both SRS and DUST-SRS whenever θ falls below θ-star(k), a threshold that depends only on k. They also give guidance on picking k, extend the results to imperfect ranking, supply bootstrap intervals, and illustrate everything on county-level stroke prevalence data using diabetes prevalence as the ranking covariate. The derivations are self-contained and the efficiency comparisons are stated with the regime condition made explicit rather than hidden. The Beta-model bound and the real-data application are useful additions. The soft spots are straightforward. Gains require decent ranking quality within each set of k; when the auxiliary or judgment is weak the advantage shrinks, though they do quantify the imperfect-ranking case. The outperformance is limited to the rare-to-moderate regime, so the method is not a universal replacement. Spatial spread is handled by the design probabilities rather than a full correlation model, which keeps things simple but leaves strong dependence to be addressed elsewhere. This is aimed at survey statisticians and spatial analysts who work on threshold estimation in health or environmental areal data. The math is internally consistent and the claims are testable, so it deserves a serious referee who can verify the variance formulas and perhaps request additional simulations under varying correlation strengths.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes the DUST-MNS design, which combines maxima-nominated sampling of size-k candidate sets with pps-DUST to promote spatial spread while reducing the impact of spatial autocorrelation. From the n measured units (the maxima within each set), it derives a closed-form estimator of the exceedance probability θ, supplies exact bias and variance formulas, proves that this estimator dominates SRS and DUST-SRS when θ < θ⋆(k) for a threshold depending only on k, gives guidance on selecting k, extends the results to imperfect ranking, develops variance estimators and bootstrap intervals, and demonstrates the method on CDC PLACES county-level stroke prevalence data ranked by diabetes prevalence.

Significance. If the closed-form derivations and regime-specific dominance hold, the paper supplies a theoretically transparent and practically implementable sampling strategy for estimating rare-to-moderate exceedance proportions in spatially dependent areal data. The explicit bias/variance expressions, the k-only dependence of the efficiency threshold, the Beta-model bounds, and the real-data illustration are strengths that would make the contribution useful for health surveillance and environmental monitoring applications.

major comments (2)

[§3.2] §3.2, the efficiency comparison: the statement that DUST-MNS outperforms DUST-SRS for θ < θ⋆(k) relies on the exact variance expressions derived under the pps-DUST inclusion probabilities; the manuscript should verify that the spatial-spread mechanism does not alter the ordering of the variances when the ranking is imperfect, as this is load-bearing for the practical recommendation.
[§4.1] §4.1, Eq. (12): the closed-form estimator is presented as unbiased under perfect ranking, but the subsequent bias formula under imperfect ranking (Eq. (15)) introduces a multiplicative factor that depends on the ranking accuracy parameter; the paper should state whether this factor is estimated from data or treated as known, because it affects the claimed closed-form property.

minor comments (3)

[§2.1] The notation for the candidate-set size k and the number of sets n is introduced in the abstract but first defined formally only in §2.1; a single early definition paragraph would improve readability.
[Figure 3] Figure 3 (efficiency curves) uses a log-scale on the θ axis without labeling the break points at θ⋆(k) for k=2,3,4; adding vertical reference lines at these values would make the regime condition visually immediate.
[§5.3] The bootstrap CI procedure in §5.3 is described algorithmically but lacks a small simulation check against the exact variance formula; a brief table comparing coverage would strengthen the implementation guidance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation for minor revision. We address each major comment below with clarifications and planned revisions.

read point-by-point responses

Referee: [§3.2] §3.2, the efficiency comparison: the statement that DUST-MNS outperforms DUST-SRS for θ < θ⋆(k) relies on the exact variance expressions derived under the pps-DUST inclusion probabilities; the manuscript should verify that the spatial-spread mechanism does not alter the ordering of the variances when the ranking is imperfect, as this is load-bearing for the practical recommendation.

Authors: We agree that verifying the variance ordering under imperfect ranking is necessary to support the practical recommendations. The dominance result and exact variance formulas in §3.2 are derived under perfect ranking. Section 4.1 extends the bias and variance expressions to imperfect ranking by incorporating a ranking accuracy parameter that multiplicatively adjusts the formulas. However, we did not re-derive or numerically confirm that the pps-DUST spatial-spread mechanism preserves the dominance ordering (DUST-MNS over DUST-SRS for θ < θ⋆(k)) under this extension. We will add a short analytical argument or targeted simulation study in the revised §3.2 or §4 showing that the relative variance ordering is maintained, as the inclusion probabilities affect both estimators in a comparable way. This addition will be included in the next version. revision: yes
Referee: [§4.1] §4.1, Eq. (12): the closed-form estimator is presented as unbiased under perfect ranking, but the subsequent bias formula under imperfect ranking (Eq. (15)) introduces a multiplicative factor that depends on the ranking accuracy parameter; the paper should state whether this factor is estimated from data or treated as known, because it affects the claimed closed-form property.

Authors: The estimator in Eq. (12) is closed-form and can be computed directly from the observed maxima without reference to the ranking accuracy parameter. The multiplicative factor appearing in the bias expression (Eq. (15)) is a theoretical modeling device used solely to characterize bias under imperfect ranking; it is treated as a known constant in the analysis, following the standard approach in ranked-set and nominated sampling literature. This parameter is not estimated from the primary sample and is not required to compute the estimator, its variance estimator, or the bootstrap intervals. We will insert a clarifying sentence in §4.1 to make this distinction explicit, thereby preserving the closed-form claim for the estimator while addressing the referee's concern. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper derives a closed-form estimator for the exceedance probability θ directly from the DUST-MNS sampling design probabilities, along with explicit bias and variance formulas. The regime-specific dominance result is obtained by comparing these expressions to those for SRS and DUST-SRS, with the threshold θ⋆(k) depending only on the design parameter k. All steps are presented as mathematical consequences of the design construction and ranking mechanism without reducing to fitted parameters, self-referential definitions, or load-bearing self-citations. The argument is internally self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach rests on domain assumptions about ranking quality and spatial correlation structure, plus user choice of set size k; no new entities are postulated.

free parameters (1)

k
Candidate set size chosen by the user; determines the regime threshold θ⋆(k) and efficiency bounds under the Beta model.

axioms (2)

domain assumption Ranking within candidate sets can be performed reliably using expert judgment, prior surveys, or auxiliary covariates.
Invoked to justify measuring only the judged maximum in each set of k areas.
domain assumption The exceedance probability θ lies in the rare-to-moderate regime θ < θ⋆(k).
Required for the outperformance claim over SRS and DUST-SRS.

pith-pipeline@v0.9.0 · 5577 in / 1412 out tokens · 31478 ms · 2026-05-09T17:42:38.836584+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 1 canonical work pages

[1]

Arbia, G. (1993). The use of GIS in spatial statistical surveys. International Statistical Review , 61(2):339--359

1993
[2]

A., Zahnd, W

Benavidez, G. A., Zahnd, W. E., Hung, P., and Eberth, J. M. (2024). Chronic disease prevalence in the US : Sociodemographic and geographic variations by zip code tabulation area. Preventing Chronic Disease , 21:230267

2024
[3]

Bohn, L. L. and Wolfe, D. A. (1994). The effect of imperfect judgment rankings on properties of procedures based on the ranked-set samples analog of the Mann--Whitney--Wilcoxon statistic. Journal of the American Statistical Association , 89(425):168--176

1994
[4]

Boyles, R. A. and Samaniego, F. J. (1986). Estimating a distribution function based on nomination sampling. Journal of the American Statistical Association , 81(396):1039--1045

1986
[5]

A., Watson, K

Carlson, S. A., Watson, K. B., Rockhill, S., Wang, Y., Pankowska, M. M., and Greenlund, K. J. (2023). Linking Local-Level Chronic Disease and Social Vulnerability Measures to Inform Planning Efforts: A COPD Example. Preventing Chronic Disease , 20:230025

2023
[6]

Chen C., Becker C., Delhey L., Twardzik E., and Lisabeth L.D. (2026). County-Level Associations Between Structural Inequalities in Social Determinants of Health and Stroke Prevalence Among U.S. Adults: An Ecologic Study. American Journal of Preventive Medicine . Online ahead of print. DOI/URL: https://pubmed.ncbi.nlm.nih.gov/41167506/

work page arXiv 2026
[7]

A., and Wolfe, D

Chen, H., Stasny, E. A., and Wolfe, D. A. (2006). Unbalanced ranked set sampling for estimating a population proportion. Biometrics , 62(1):150--158

2006
[8]

David, H. A. and Nagaraja, H. N. (2003). Order Statistics, 3rd ed. Wiley, Hoboken, NJ

2003
[9]

J., Lu, H., Wang, Y., Matthews, K

Greenlund, K. J., Lu, H., Wang, Y., Matthews, K. A., LeClercq, J. M., Lee, B., and Carlson, S. A. (2022). PLACES : Local data for better health. Preventing Chronic Disease , 2022 Jun 16;19:E31

2022
[10]

and Kaufmann, R

Hacker, K. and Kaufmann, R. (2024). Chronic disease mapping, an important strategy and tool for health promotion. Preventing Chronic Disease , 2024 Apr 25;21:E28

2024
[11]

and Johnson, B

Jafari Jozani, M. and Johnson, B. C. (2012). Randomized nomination sampling for finite populations. Journal of Statistical Planning and Inference , 142(7):2103--2115

2012
[12]

Kvam, P. H. and Samaniego, F. J. (1993). On estimating distribution functions using nomination samples. Journal of the American Statistical Association , 88(424):1317--1322

1993
[13]

and Jafari Jozani, M

Loewen, N. and Jafari Jozani, M. (2026). Leveraging rank information for robust regression analysis: A nomination sampling approach. Statistics in Medicine , 45(1--2):e70362

2026
[14]

N., Stasny, E

MacEachern, S. N., Stasny, E. A., and Wolfe, D. A. (2004). Judgement post-stratification with imprecise rankings. Biometrics , 60(1):207--215

2004
[15]

Moran, P. A. P. (1950). Notes on continuous stochastic phenomena. Biometrika , 37(1--2):17--23

1950
[16]

W., Wolfe, D

Nahhas, R. W., Wolfe, D. A., and Chen, H. (2002). Ranked set sampling: Cost and optimal set size. Biometrics , 58(4):964--971

2002
[17]

Nourmohammadi, M., Jafari Jozani, M., and Johnson, B. C. (2014). Confidence intervals for quantiles in finite populations with randomized nomination sampling. Computational Statistics & Data Analysis , 73:112--128

2014
[18]

Nourmohammadi, M., Jafari Jozani, M., and Johnson, B. C. (2015). Nonparametric confidence intervals for quantiles with randomized nomination sampling. Sankhya A , 77(2):408--432

2015
[19]

Nourmohammadi, M., Jafari Jozani, M., and Johnson, B. C. (2020). Parametric inference using nomination sampling with an application to mercury contamination in fish. Sankhya A , 82(1):115--146

2020
[20]

L., Lisabeth, L., Schneider, A

Stulberg, E. L., Lisabeth, L., Schneider, A. L. C., Skolarus, L., Kershaw, K. N., Zheutlin, A. R., Harris, B. R. E., Sarpong, D., Wong, K.-H., Sheth, K. N., and de Havenon, A. (2024). Correlations of socioeconomic and clinical determinants with United States county-level stroke prevalence. Annals of Neurology , 96(4):739--744

2024
[21]

Tiwari, R. C. and Wells, M. T. (1989). Quantile estimation based on nomination sampling. IEEE Transactions on Reliability , 38(5):612--614

1989
[22]

Willemain, T. R. (1980). Estimating the population median by nomination sampling. Journal of the American Statistical Association , 75(372):908--911

1980
[23]

Yu, P. L. H. and Lam, K. (1997). Regression estimator in ranked set sampling. Biometrics , 53(3):1070--1080

1997