pith. machine review for the scientific record. sign in

arxiv: 2605.12760 · v1 · submitted 2026-05-12 · 📊 stat.ME · stat.AP

Recognition: 2 theorem links

· Lean Theorem

How long should a block be?

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:44 UTC · model grok-4.3

classification 📊 stat.ME stat.AP
keywords block maximaextreme value analysisblock lengthasymptotic relative efficiencylikelihood diagnosticsgeneralized extreme value distributionenvironmental datadata censoring
0
0 comments X

The pith

Excessively long blocks reduce asymptotic relative efficiency in the block maxima method, and likelihood-based diagnostics can identify suitable lengths even with rounded or censored data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the choice of block length in the block maximum method used in extreme value analysis, where maxima of m observations are approximated by a generalized extreme value distribution. It shows that choosing blocks that are too long lowers asymptotic relative efficiency. Likelihood-based approaches and graphical diagnostics are proposed to check whether a proposed block length works well. These tools handle cases where observations may be rounded or left-censored. The ideas are tested in simulations and applied to real datasets on wind speeds, river flows, and rainfall.

Core claim

The authors establish that taking excessively long blocks reduces asymptotic relative efficiency in the block maxima method, and likelihood-based approaches together with graphical diagnostics can determine whether a proposed block length is suitable, allowing for rounding and left-censoring.

What carries the argument

Block length m in the block maxima method, evaluated through asymptotic relative efficiency calculations and validated by likelihood-based graphical diagnostics for the quality of the generalized extreme value approximation.

Load-bearing premise

The data-generating process is close enough to the domain of attraction of a non-degenerate extreme-value limit that the block-maxima approximation remains useful once m is large enough.

What would settle it

A simulation or real dataset in which asymptotic relative efficiency does not decrease as block length increases, or in which the proposed diagnostics approve a clearly unsuitable block length.

read the original abstract

The block maximum method, which is widely used in extreme value analysis, uses a generalized extreme value distribution to approximate that of the maximum of m observations. The quality of this approximation depends on the value of m and may be poor if m is too small. Surprisingly little attention has been paid to the choice of the block length, although a good choice is crucial to the success of the method. In this paper we assess the effect of taking excessively long blocks in terms of asymptotic relative efficiency, and propose likelihood-based approaches and graphical diagnostics to determine whether a proposed block length is suitable, allowing for potential rounding and left-censoring of observations. We investigate our ideas using simulation and illustrate them using wind speed, river flow and rainfall data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that choosing excessively long blocks in the block-maxima method reduces asymptotic relative efficiency relative to shorter blocks, and that likelihood-based tests together with graphical diagnostics can be used to assess whether a candidate block length yields an adequate GEV approximation, with extensions to handle rounding and left-censoring. These ideas are developed from first-principles extreme-value limits, evaluated in simulation, and illustrated on wind-speed, river-flow, and rainfall series.

Significance. If the diagnostics prove reliable, the work supplies a practical, model-based procedure for block-length selection that directly quantifies the efficiency penalty of over-long blocks and accommodates common data imperfections; this would be a useful addition to the EVT toolkit for practitioners who must choose m without strong prior knowledge of the parent distribution.

major comments (2)
  1. [Sections 3 and 4 (efficiency formulas and diagnostic derivations)] The asymptotic relative-efficiency derivations and the likelihood-based diagnostics both rest on the premise that block maxima converge to a non-degenerate GEV limit at a usable rate once m is large enough. The manuscript provides no analysis or simulation evidence for parent distributions whose convergence is slow (e.g., those with slowly-varying components or near-boundary regular variation), so the claimed efficiency loss and the diagnostic thresholds may not hold under the exact finite-sample conditions the paper targets.
  2. [Section 4 (likelihood-based approaches)] No theorem or formal argument is given establishing that the proposed likelihood-ratio or score tests control type-I error at the nominal level under the finite-m, finite-sample conditions of interest; the supporting evidence is entirely simulation-based and therefore does not guarantee the error-rate control asserted for practical use.
minor comments (2)
  1. [Section 2] Notation for the block length m and the number of blocks n is introduced without an explicit summary table; a small table listing all symbols and their meanings would improve readability.
  2. [Section 6] In the real-data illustrations the authors report the chosen block lengths but do not show the corresponding diagnostic plots for the final selected m; including these would allow readers to verify the decision process.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We address each major point below and indicate where the manuscript has been revised.

read point-by-point responses
  1. Referee: [Sections 3 and 4 (efficiency formulas and diagnostic derivations)] The asymptotic relative-efficiency derivations and the likelihood-based diagnostics both rest on the premise that block maxima converge to a non-degenerate GEV limit at a usable rate once m is large enough. The manuscript provides no analysis or simulation evidence for parent distributions whose convergence is slow (e.g., those with slowly-varying components or near-boundary regular variation), so the claimed efficiency loss and the diagnostic thresholds may not hold under the exact finite-sample conditions the paper targets.

    Authors: We thank the referee for this observation. The asymptotic relative-efficiency results in Section 3 are derived under the classical domain-of-attraction conditions that guarantee convergence to a non-degenerate GEV limit; they quantify the efficiency penalty incurred by choosing m larger than necessary once that limit is attained. The simulations in Section 5 already include distributions with different convergence rates (normal, exponential, Pareto with indices 1.5 and 3). For distributions exhibiting markedly slower convergence, the likelihood-based diagnostics and graphical checks developed in Section 4 are precisely intended to detect an inadequate GEV approximation, regardless of the underlying reason. In the revised manuscript we have added a short paragraph in Section 6 that explicitly cautions readers about slowly-varying components and near-boundary regular variation, and we have included one additional simulation example with a distribution known to converge slowly to illustrate that the diagnostics flag the problem. revision: partial

  2. Referee: [Section 4 (likelihood-based approaches)] No theorem or formal argument is given establishing that the proposed likelihood-ratio or score tests control type-I error at the nominal level under the finite-m, finite-sample conditions of interest; the supporting evidence is entirely simulation-based and therefore does not guarantee the error-rate control asserted for practical use.

    Authors: We agree that a finite-sample theorem establishing exact type-I error control under arbitrary parent distributions would be desirable but is technically difficult to obtain without imposing strong and unrealistic assumptions. The tests rely on the standard asymptotic chi-squared limit of the likelihood-ratio statistic as the number of blocks tends to infinity, which is the natural justification in this setting. Our simulation study (Section 5) is deliberately broad, covering multiple parent distributions, a range of block lengths, and varying numbers of blocks; the reported empirical type-I error rates remain close to the nominal 5% level once the number of blocks exceeds roughly 50. In the revision we have strengthened the wording in Section 4 to clarify the asymptotic basis, added a table summarizing empirical type-I error rates for smaller numbers of blocks, and noted the simulation-based nature of the finite-sample evidence. revision: partial

Circularity Check

0 steps flagged

No circularity: efficiency and diagnostics derived from standard EVT limits

full rationale

The paper derives asymptotic relative efficiency for block maxima directly from the classical extreme-value limit theorems for the GEV approximation, without any reduction to fitted parameters or self-referential definitions. The proposed likelihood diagnostics and graphical checks compare a candidate block length against a finer partitioning under the same model family; this introduces only ordinary data dependence rather than forcing the conclusion by construction. No self-citation is load-bearing for the central claims, no ansatz is smuggled, and no known result is merely renamed. The derivation chain therefore remains independent of the paper's own outputs.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on the standard extreme-value limit theorem for block maxima and on the usual regularity conditions for likelihood inference; no new entities are postulated and the only free parameters are the usual GEV shape, scale and location fitted to each blocking.

free parameters (1)
  • GEV parameters per blocking
    Shape, scale and location are estimated by maximum likelihood for each candidate block length; these are data-driven rather than fixed a priori.
axioms (2)
  • domain assumption The underlying distribution belongs to the domain of attraction of a non-degenerate GEV limit
    Invoked in the efficiency calculations and in the interpretation of the diagnostics.
  • domain assumption Observations within blocks are approximately independent after blocking
    Standard assumption for the block-maxima method; used throughout.

pith-pipeline@v0.9.0 · 5414 in / 1377 out tokens · 44114 ms · 2026-05-14T19:44:57.765034+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 1 internal anchor

  1. [1]

    Prescott and A

    P. Prescott and A. T. Walden , year = 1980, journal =

  2. [2]

    Mathematical Proceedings of the Cambridge Philosophical Society , volume = 24, number = 2, pages =

    Limiting forms of the frequency distribution of the largest or smallest member of a sample , author =. Mathematical Proceedings of the Cambridge Philosophical Society , volume = 24, number = 2, pages =

  3. [3]

    Revue math\'

    La distribution de la plus grande de n valeurs , author =. Revue math\'

  4. [4]

    Annals of Mathematics44(3), 423–453 (1943) https://doi.org/10.2307/1968974

    B. Gnedenko , year = 1943, journal =. Sur la distribution limite du terme maximum d'une s\'. doi:10.2307/1968974 , issn =

  5. [5]

    Bulletin of the American Mathematical Society , volume = 54, pages =

    Order statistics , author =. Bulletin of the American Mathematical Society , volume = 54, pages =. doi:10.1090/S0002-9904-1948-08936-4 , url =

  6. [6]

    Acta Mathematica Academiae Scientiarum Hungarica , volume = 4, number = 3, pages =

    On the theory of order statistics , author =. Acta Mathematica Academiae Scientiarum Hungarica , volume = 4, number = 3, pages =. doi:10.1007/BF02127580 , issn =

  7. [7]

    Quarterly Journal of the Royal Meteorological Society , volume = 81, pages =

    The frequency distribution of the annual maximum (or minimum) values of meteorological elements , author =. Quarterly Journal of the Royal Meteorological Society , volume = 81, pages =

  8. [8]

    doi:10.7312/gumb92958 , isbn = 9780231891318, url =

    Statistics of Extremes , author =. doi:10.7312/gumb92958 , isbn = 9780231891318, url =

  9. [9]

    Biometrika , volume = 48, number =

    Some methods of constructing exact tests , author =. Biometrika , volume = 48, number =. doi:10.1093/biomet/48.1-2.41 , issn =

  10. [10]

    Water Resources Research , volume = 6, pages =

    A stochastic model for flood analysis , author =. Water Resources Research , volume = 6, pages =. doi:10.1029/WR006i006p01641 , url =

  11. [11]

    Water Resources Research , volume = 7, pages =

    Some problems of flood analysis , author =. Water Resources Research , volume = 7, pages =. doi:10.1029/WR007i005p01144 , url =

  12. [12]

    Annals of Probability , volume = 2, pages =

    Residual life time at great age , author =. Annals of Probability , volume = 2, pages =. doi:10.1214/aop/1176996548 , url =

  13. [13]

    Zeitschrift f

    On extreme values in stationary sequences , author =. Zeitschrift f. doi:10.1007/BF00532947 , issn =

  14. [14]

    Annals of Statistics , volume = 3, pages =

    A simple general approach to inference about the tail of a distribution , author =. Annals of Statistics , volume = 3, pages =

  15. [15]

    The Annals of Statistics , publisher =

    Statistical inference using extreme order statistics , author =. The Annals of Statistics , publisher =

  16. [16]

    The exact distribution of extremes of a non-

    L. The exact distribution of extremes of a non-. Stochastic Processes and their Applications , volume = 5, number = 2, pages =. doi:10.1016/0304-4149(77)90026-6 , issn =

  17. [17]

    Journal of Applied Probability , volume = 17, number = 4, pages =

    An exponential. Journal of Applied Probability , volume = 17, number = 4, pages =. doi:10.2307/3213224 , url =

  18. [18]

    Studies in Econometrics, Time Series, and Multivariate Statistics , publisher =

    Maximum likelihood estimation in a latent variable problem , author =. Studies in Econometrics, Time Series, and Multivariate Statistics , publisher =. doi:10.1016/B978-0-12-398750-1.50008-5 , isbn =

  19. [19]

    Zeitschrift f

    Extremes and local dependence in stationary sequences , author =. Zeitschrift f. doi:10.1007/BF00532484 , issn =

  20. [20]

    doi:10.1007/978-1-4612-5449-2 , isbn =

    Extremes and Related Properties of Random Sequences and Processes , author =. doi:10.1007/978-1-4612-5449-2 , isbn =

  21. [21]

    Journal of Multivariate Analysis , volume = 13, number = 2, pages =

    Point processes and multivariate extreme values , author =. Journal of Multivariate Analysis , volume = 13, number = 2, pages =. doi:10.1016/0047-259X(83)90025-8 , issn =

  22. [22]

    Simulation of

    Chiaw-Hock Sim , year = 1986, journal =. Simulation of. doi:10.1080/03610918608812565 , url =

  23. [23]

    Approximations in extreme value theory , author =

  24. [24]

    and Resnick, Sidney I

    Davis, Richard A. and Resnick, Sidney I. , year = 1989, journal =. Basic properties and prediction of max-

  25. [25]

    Journal of the Royal Statistical Society

    Models for exceedances over high thresholds (with Discussion) , author =. Journal of the Royal Statistical Society. Series B. (Methodological) , volume = 52, number = 3, pages =. doi:10.1111/j.2517-6161.1990.tb01796.x , issn =

  26. [26]

    Journal of Applied Probability , volume = 28, number = 1, pages =

    Minification processes and their transformations , author =. Journal of Applied Probability , volume = 28, number = 1, pages =

  27. [27]

    Newton and Charles J

    Michael A. Newton and Charles J. Geyer , year = 1994, journal =. Bootstrap Recycling: A. doi:10.1080/01621459.1994.10476823 , url =

  28. [28]

    Journal of Computational and Graphical Statistics , publisher =

    Randomized Quantile Residuals , author =. Journal of Computational and Graphical Statistics , publisher =. doi:10.1080/10618600.1996.10474708 , url =

  29. [29]

    doi:10.1017/CBO9780511802843 , isbn =

    Bootstrap Methods and Their Application , author =. doi:10.1017/CBO9780511802843 , isbn =

  30. [30]

    Coles, Stuart , year = 2001, publisher =. An. doi:10.1007/978-1-4471-3675-0 , isbn =

  31. [31]

    Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , volume = 360, number = 1796, pages =

    Floods: some probabilistic and statistical approaches , author =. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , volume = 360, number = 1796, pages =. doi:10.1098/rsta.2002.1006 , issn =

  32. [32]

    Journal of Statistical Planning and Inference , volume = 103, number = 1, pages =

    Moving-maximum models for extrema of time series , author =. Journal of Statistical Planning and Inference , volume = 103, number = 1, pages =. doi:10.1016/S0378-3758(01)00197-5 , issn =

  33. [33]

    Statistical Models , author =

  34. [34]

    Statistics of Extremes: Theory and Applications , author =

  35. [35]

    Biometrika , volume = 94, number = 1, pages =

    Inference for clustered data using the independence loglikelihood , author =. Biometrika , volume = 94, number = 1, pages =. doi:10.1093/biomet/asm015 , issn =

  36. [36]

    and Peruccacci, S

    Guzzetti, F. and Peruccacci, S. and Rossi, M. and Stark, C. P. , year = 2007, journal =. Rainfall thresholds for the initiation of landslides in central and southern. doi:10.1007/s00703-007-0262-7 , issn =

  37. [37]

    Computational Statistics & Data Analysis , volume = 51, number = 7, pages =

    Improving the reliability of bootstrap tests with the fast double bootstrap , author =. Computational Statistics & Data Analysis , volume = 51, number = 7, pages =. doi:10.1016/j.csda.2006.04.001 , issn =

  38. [38]

    Extremes , volume = 10, number = 1, pages =

    Likelihood estimation of the extremal index , author =. Extremes , volume = 10, number = 1, pages =. doi:10.1007/s10687-007-0034-2 , issn =

  39. [39]

    and Callaghan, Terry V

    Jonasson, Christer and Sonesson, Mats and Christensen, Torben R. and Callaghan, Terry V. , year = 2012, day =. Environmental Monitoring and Research in the. AMBIO , volume = 41, number = 3, pages =. doi:10.1007/s13280-012-0301-6 , issn =

  40. [40]

    Extremes , volume = 18, number = 4, pages =

    An efficient semiparametric maxima estimator of the extremal index , author =. Extremes , volume = 18, number = 4, pages =. doi:10.1007/s10687-015-0221-5 , issn =

  41. [41]

    Extreme Value Modeling and Risk Analysis , location =

    Time series of extremes , author =. Extreme Value Modeling and Risk Analysis , location =. doi:10.1201/b19721 , editor =

  42. [42]

    A method of selecting the block size of

    Wang, Jixin and You, Shuang and Wu, Yuqian and Zhang, Yingshuang and Bin, Shibo , year = 2016, journal =. A method of selecting the block size of. doi:10.1155/2016/6372197 , url =

  43. [43]

    Econometric Reviews , publisher =

    Diagnostics for the bootstrap and fast double bootstrap , author =. Econometric Reviews , publisher =. doi:10.1080/07474938.2017.1307918 , url =

  44. [44]

    The Annals of Statistics , publisher =

    Weak convergence of a pseudo maximum likelihood estimator for the extremal index , author =. The Annals of Statistics , publisher =. doi:10.1214/17-aos1621 , url =

  45. [45]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume = 27, number = 3, pages =

    Spacings , author =. Journal of the Royal Statistical Society: Series B (Methodological) , volume = 27, number = 3, pages =. doi:10.1111/j.2517-6161.1965.tb00602.x , issn =

  46. [46]

    Peaks over thresholds modeling with multivariate generalized

    Anna Kiriliouk and Holger Rootz\'. Peaks over thresholds modeling with multivariate generalized. Technometrics , publisher =. doi:10.1080/00401706.2018.1462738 , url =

  47. [47]

    Technical gazette , volume = 26, number = 5, pages =

    A new methodology for the block maxima approach in selecting the optimal block size , author =. Technical gazette , volume = 26, number = 5, pages =

  48. [48]

    An automatic procedure to select a block size in the continuous generalized extreme value model estimation , author =

  49. [49]

    Statistical Science , publisher =

    A horse race between the block maxima method and the peak-over-threshold approach , author =. Statistical Science , publisher =. doi:10.1214/20-sts795 , url =

  50. [50]

    Inference for extreme earthquake magnitudes accounting for a time-varying measurement process , author =

  51. [51]

    The Annals of Applied Statistics , publisher =

    Improved inference on risk measures for univariate extremes , author =. The Annals of Applied Statistics , publisher =. doi:10.1214/21-aoas1555 , url =

  52. [52]

    Statistics and Computing , volume = 32, number = 2, pages = 32, doi =

    Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison , author =. Statistics and Computing , volume = 32, number = 2, pages = 32, doi =

  53. [53]

    Journal of Statistical Software103(3), 1–26 (2022) https://doi.org/10.18637/jss.v103.i03

    Youngman, Benjamin D. , year = 2022, journal =. doi:10.18637/jss.v103.i03 , url =

  54. [54]

    The Annals of Statistics , publisher =

    On the disjoint and sliding block maxima method for piecewise stationary time series , author =. The Annals of Statistics , publisher =. doi:10.1214/23-aos2260 , url =

  55. [55]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume = 88, number = 2, pages =

    Bootstrapping estimators based on the block maxima method , author =. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume = 88, number = 2, pages =. doi:10.1093/jrsssb/qkaf060 , issn =

  56. [56]

    Choosing the threshold in extreme value analysis , author =

  57. [57]

    Environmetrics , volume = 37, number = 2, pages =

    Accounting for missing data when modelling block maxima , author =. Environmetrics , volume = 37, number = 2, pages =. doi:10.1002/env.70075 , url =

  58. [58]
  59. [59]

    doi:10.32614/R.manuals , url =

    R: A Language and Environment for Statistical Computing , author =. doi:10.32614/R.manuals , url =

  60. [60]

    Bayesian Mixture Models for Heterogeneous Extremes

    Bayesian Mixture Models for Heterogeneous Extremes , author =. doi:10.48550/arXiv.2509.15359 , url =

  61. [61]

    Canadian Journal of Statistics , volume = 50, number = 4, pages =

    Let's practice what we preach: Planning and interpreting simulation studies with design and analysis of experiments , author =. Canadian Journal of Statistics , volume = 50, number = 4, pages =. doi:10.1002/cjs.11719 , url =