pith. sign in

arxiv: 2606.11421 · v1 · pith:MVFIYOPFnew · submitted 2026-06-09 · 📊 stat.ME · math.ST· stat.CO· stat.TH

Second-Order Least Squares as a Special Case of the Polynomial Maximization Method

Pith reviewed 2026-06-27 12:09 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.COstat.TH
keywords second-order least squarespolynomial maximization methodlinear regressionestimating equationsasymptotic variancenon-Gaussian errorsconditional homoskedasticity
0
0 comments X

The pith

Optimally weighted second-order least squares equals degree-two polynomial maximization for linear regression under conditional homoskedasticity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that for linear regression with errors having constant conditional variance but non-Gaussian distribution, optimally weighted second-order least squares and the degree-two generalized polynomial maximization method produce the identical population estimating equation. Both select the same optimal linear combination of the first two centered residual moments, solve the same normal equation, share the same influence function, and achieve the asymptotic variance c2 g2 / N with g2 depending on error skewness and excess kurtosis. Feasible plug-in versions of the two methods are therefore first-order equivalent. The equivalence is sharp and fails under heteroskedasticity, where the methods separate and PMM can lose consistency for asymmetric errors.

Core claim

Optimally weighted second-order least squares (SLS) and the degree-two generalized polynomial maximization method (PMM) are the same population estimating equation for linear regression with conditionally homoskedastic non-Gaussian errors: they choose the same optimal linear combination of the first two centered residual moments, solve one population normal system, share one influence function, and attain the common asymptotic variance c2 g2 / N -- the ordinary-least-squares slope-variance factor c2 scaled by the PMM variance-reduction coefficient g2=1-γ3²/(2+γ4).

What carries the argument

The shared population estimating equation formed by the optimal linear combination of the first two centered residual moments under conditional homoskedasticity.

If this is right

  • Feasible plug-in implementations of optimally weighted SLS and degree-two PMM are first-order equivalent.
  • Under heteroskedasticity the unconditional PMM and conditional SLS weighting separate, costing efficiency for symmetric errors and consistency for asymmetric errors.
  • Beyond degree two, PMM holds an efficiency reserve unreachable by SLS within its second-moment span.
  • For symmetric platykurtic errors SLS collapses to ordinary least squares for the slope while degree-three PMM exploits kurtosis information outside the SLS moment span.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Researchers applying either method to gain efficiency with non-normal errors are using equivalent procedures at the population level when conditional homoskedasticity holds.
  • The separation under heteroskedasticity suggests that tests for conditional variance constancy may be needed before relying on either estimator for efficiency gains.
  • Higher-degree PMM versions could be examined for further efficiency gains in settings where SLS remains restricted to second moments.

Load-bearing premise

The regression errors have constant conditional variance given the predictors.

What would settle it

An empirical or simulated dataset with heteroskedastic errors where the two methods produce statistically different slope estimates or where one is consistent and the other is not.

Figures

Figures reproduced from arXiv: 2606.11421 by Serhii Zabolotnii.

Figure 1
Figure 1. Figure 1: PMM2 and SLS become empirically indistinguishable as [PITH_FULL_IMAGE:figures/full_fig_p019_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: PMM3 captures a degree-3 efficiency reserve that SLS and GMM miss under symmetric [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Control. Under Gaussian errors PMM2 and SLS incur no asymptotic efficiency loss [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗
read the original abstract

We prove that optimally weighted second-order least squares (SLS) and the degree-two generalized polynomial maximization method (PMM) are the same population estimating equation for linear regression with conditionally homoskedastic non-Gaussian errors: they choose the same optimal linear combination of the first two centered residual moments, solve one population normal system, share one influence function, and attain the common asymptotic variance $c_2g_2/N$ -- the ordinary-least-squares slope-variance factor $c_2$ scaled by the PMM variance-reduction coefficient $g_2=1-\gamma_3^2/(2+\gamma_4)$ (with $\gamma_3,\gamma_4$ the error skewness and excess kurtosis). Feasible plug-in implementations are therefore first-order equivalent, with only higher-order finite-sample differences. The identity is sharp: under heteroskedasticity the unconditional PMM body and the conditional SLS weighting separate, costing efficiency for symmetric errors and consistency for asymmetric errors. Beyond degree two, PMM holds an efficiency reserve that SLS cannot reach within its second-moment span. For symmetric platykurtic errors SLS collapses to ordinary least squares for the slope, while degree-three PMM exploits kurtosis information outside the SLS moment span through a closed-form coefficient $g_3$; for canonical asymmetric laws this reserve is $30$--$50\%$ within the degree-three polynomial moment class. The Lean 4 development machine-checks the degree-specific algebraic core -- the closed forms for $g_2$ and $g_3$, the $g_2\le1$ result, the design cancellations, and the symmetric collapse -- while the general monotonicity $g_{S+1}\le g_S\le1$ is proved analytically by nesting. A Monte Carlo study illustrates the equivalence, the reserve, and the heteroskedastic boundary at finite samples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The manuscript proves that optimally weighted second-order least squares (SLS) and the degree-two generalized polynomial maximization method (PMM) coincide as population estimating equations for linear regression under conditional homoskedasticity with non-Gaussian errors. They select the same optimal linear combination of the first two centered residual moments, solve identical normal equations, share an influence function, and attain the common asymptotic variance c₂g₂/N where g₂ = 1 - γ₃²/(2 + γ₄). The equivalence is shown to be sharp under heteroskedasticity (where unconditional PMM and conditional SLS weighting diverge), while higher-degree PMM retains efficiency reserves beyond the SLS second-moment span. The algebraic core (closed forms for g₂, g₃, g₂ ≤ 1, design cancellations, symmetric collapse) is machine-checked in Lean 4; monotonicity g_{S+1} ≤ g_S ≤ 1 is proved analytically by nesting; a Monte Carlo study illustrates finite-sample behavior and the heteroskedastic boundary.

Significance. If the result holds, the paper unifies two moment-based estimators by exhibiting an explicit algebraic identity at the population level, with the machine-checked Lean 4 proofs of the closed forms for g₂ and g₃, the g₂ ≤ 1 bound, design cancellations, and symmetric collapse constituting a verifiable strength. The analytic nesting proof for monotonicity and the Monte Carlo confirmation of the equivalence, reserve, and heteroskedasticity caveat further support the contribution. This clarifies when SLS is a special case of PMM and quantifies the efficiency gains available from higher polynomial moments for non-Gaussian errors.

minor comments (1)
  1. §3, after Eq. (12): the notation for the feasible plug-in estimators could be clarified by explicitly distinguishing the population g₂ from its sample analogue ĝ₂ in the statement of first-order equivalence.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading of the manuscript and for the positive assessment. We are pleased that the referee recommends acceptance.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a direct population-level algebraic proof that optimally weighted SLS and degree-2 PMM coincide as estimating equations under conditional homoskedasticity, deriving the shared influence function, normal equations, and asymptotic variance c2 g2/N from the same linear combination of centered residual moments. The derivation relies on explicit moment conditions and design cancellations without fitted parameters renamed as predictions or self-referential definitions. The Lean 4 machine-check verifies the closed forms for g2, g3, the g2 ≤ 1 bound, and symmetric collapse, while the general monotonicity is proved by analytic nesting; these are independent verifications rather than load-bearing self-citations. The heteroskedasticity caveat is stated as a boundary condition with Monte Carlo confirmation. No step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the linear regression model with conditionally homoskedastic errors and the existence of error moments up to order four; no free parameters are introduced or fitted, and no new entities are postulated.

axioms (2)
  • domain assumption Linear regression model Y = Xβ + ε with E(ε|X) = 0 and Var(ε|X) = σ² (conditional homoskedasticity)
    Invoked throughout the abstract as the setting where the SLS-PMM identity holds; the abstract explicitly states the identity is sharp under heteroskedasticity.
  • domain assumption Existence of third and fourth moments of the error distribution (skewness γ3 and excess kurtosis γ4)
    Required to define the efficiency coefficient g2 = 1 - γ3²/(2 + γ4) and the higher-degree reserve g3.

pith-pipeline@v0.9.1-grok · 5882 in / 1686 out tokens · 31378 ms · 2026-06-27T12:09:44.471545+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 7 canonical work pages

  1. [1]

    Peter J. Bickel. On adaptive estimation.The Annals of Statistics, 10(3):647–671, 1982. doi: 10.1214/aos/1176345863

  2. [2]

    Asymptotic efficiency in estimation with conditional moment restrictions

    Gary Chamberlain. Asymptotic efficiency in estimation with conditional moment restrictions. Journal of Econometrics, 34(3):305–334, 1987

  3. [3]

    On linear and quadratic estimating functions.Biometrika, 74(3):591–597,

    Martin Crowder. On linear and quadratic estimating functions.Biometrika, 74(3):591–597,

  4. [4]

    doi: 10.1093/biomet/74.3.591

  5. [5]

    V. P. Godambe. An optimum property of regular maximum likelihood estimation.The Annals of Mathematical Statistics, 31(4):1208–1211, 1960

  6. [6]

    V. P. Godambe and M. E. Thompson. An extension of quasi-likelihood estimation.Journal of Statistical Planning and Inference, 22(2):137–152, 1989

  7. [7]

    Large sample properties of generalized method of moments estimators

    Lars Peter Hansen. Large sample properties of generalized method of moments estimators. Econometrica, 50(4):1029–1054, 1982

  8. [8]

    Heyde.Quasi-Likelihood and Its Application: A General Approach to Optimal Parameter Estimation

    Christopher C. Heyde.Quasi-Likelihood and Its Application: A General Approach to Optimal Parameter Estimation. Springer, New York, 1997

  9. [9]

    J. R. M. Hosking. L-moments: analysis and estimation of distributions using linear combi- nations of order statistics.Journal of the Royal Statistical Society, Series B, 52(1):105–124, 1990

  10. [10]

    Huber.Robust Statistics

    Peter J. Huber.Robust Statistics. Wiley, 1981

  11. [11]

    The efficiency of the second-order nonlinear least squares estimator and its extension.Annals of the Institute of Statistical Mathematics, 64:751–764,

    Mijeong Kim and Yanyuan Ma. The efficiency of the second-order nonlinear least squares estimator and its extension.Annals of the Institute of Statistical Mathematics, 64:751–764,

  12. [12]

    doi: 10.1007/s10463-011-0332-y

  13. [13]

    Kunchenko.Polynomial Parameter Estimations of Close to Gaussian Random Vari- ables

    Yuriy P. Kunchenko.Polynomial Parameter Estimations of Close to Gaussian Random Vari- ables. Shaker Verlag, Aachen, 2002

  14. [14]

    Jerry M. Mendel. Tutorial on higher-order statistics (spectra) in signal processing and system theory.Proceedings of the IEEE, 79(3):278–305, 1991

  15. [15]

    Whitney K. Newey. Efficient estimation of models with conditional moment restrictions. In G. S. Maddala, C. R. Rao, and H. D. Vinod, editors,Handbook of Statistics, Vol. 11: Econo- metrics, pages 419–454. North-Holland, Amsterdam, 1993

  16. [16]

    Lindsay, and Bing Li

    Annie Qu, Bruce G. Lindsay, and Bing Li. Improving generalised estimating equations using quadratic inference functions.Biometrika, 87(4):823–836, 2000. doi: 10.1093/biomet/87.4.823

  17. [17]

    Estimation of nonlinear Berkson-type measurement error models.Statistica Sinica, 13:1201–1210, 2003

    Liqun Wang. Estimation of nonlinear Berkson-type measurement error models.Statistica Sinica, 13:1201–1210, 2003. 25

  18. [18]

    Second-order nonlinear least squares estimation.Annals of the Institute of Statistical Mathematics, 60:883–900, 2008

    Liqun Wang and Alexandre Leblanc. Second-order nonlinear least squares estimation.Annals of the Institute of Statistical Mathematics, 60:883–900, 2008. doi: 10.1007/s10463-007-0139-z

  19. [19]

    EstemPMM: Polynomial maximization method estimation.https://cran

    Serhii Zabolotnii. EstemPMM: Polynomial maximization method estimation.https://cran. r-project.org/package=EstemPMM, 2026. R package version 0.4.0

  20. [20]

    Warsza, and Oleksandr Tkachenko

    Serhii Zabolotnii, Zygmunt L. Warsza, and Oleksandr Tkachenko. Polynomial estimation of linear regression parameters for the asymmetric PDF of errors. InAutomation 2018, Advances in Intelligent Systems and Computing, pages 758–772, Cham, 2018. Springer. doi: 10.1007/ 978-3-319-77179-3_75

  21. [21]

    Warsza, and Oleksandr Tkachenko

    Serhii Zabolotnii, Zygmunt L. Warsza, and Oleksandr Tkachenko. Estimation of linear re- gression parameters of symmetric non-Gaussian errors by polynomial maximization method. InAutomation 2019, Advances in Intelligent Systems and Computing, pages 636–649, Cham,

  22. [22]
  23. [23]

    Serhii Zabolotnii, Oleksandr Tkachenko, Waldemar Nowakowski, and Zygmunt L. Warsza. Ap- plication of the polynomial maximization method for estimating nonlinear regression param- eters with non-Gaussian asymmetric errors. InAutomation 2024, Lecture Notes in Networks and Systems, pages 342–356, Cham, 2024. Springer. doi: 10.1007/978-3-031-78266-4_30. 26