pith. sign in

arxiv: 2606.26804 · v1 · pith:5EOUX4QNnew · submitted 2026-06-25 · 📊 stat.ME · stat.CO

Structured Secant Methods to Select Smoothing Parameters For General Smooth Models

Pith reviewed 2026-06-26 03:12 UTC · model grok-4.3

classification 📊 stat.ME stat.CO
keywords smoothing parametersgeneral smooth modelsquasi-Newton methodssecant approximationsLaplace approximationmarginal likelihoodEFS methodqEFS method
0
0 comments X

The pith

Structured secant approximations let a quasi-Newton method select smoothing parameters without full Hessian derivatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

General smooth models fit additive structures with penalties whose weights are chosen by optimizing a Laplace-approximate marginal likelihood. This optimization normally needs the Hessian of the log-likelihood, which is costly to derive and evaluate for many models. The paper replaces the exact Hessian with structured limited-memory secant updates that stay first-order overall yet incorporate exact columns in selected blocks to preserve accuracy. The resulting qEFS procedure converges to the existing EFS method under stated conditions and yields comparable smoothing-parameter estimates more widely, while simplifying implementation for models such as Hidden Markov and Tweedie processes.

Core claim

The qEFS method relies on structured limited-memory secant approximations to the Hessian of the log-likelihood and is principally first-order; the approximation can be accumulated for a sub-block of the Hessian while the remaining columns are constrained to match those of the actual Hessian. The exact columns supply additional structure that improves the sub-block approximation. Under certain conditions the qEFS iterates converge to those of the EFS method, and the method continues to deliver good smoothing-parameter estimates beyond those conditions in simulation studies. Secondary tasks such as confidence-interval construction require the partial exact approximations to reach near-nominal

What carries the argument

The qEFS method, which builds structured limited-memory secant approximations to the Hessian of the log-likelihood, optionally constraining sub-blocks to exact Hessian columns for added structure.

If this is right

  • qEFS converges to EFS under the stated conditions.
  • qEFS supplies good smoothing-parameter estimates beyond those conditions in the reported simulations.
  • Confidence-interval coverage and model selection reach close to nominal levels only when partial exact Hessian columns are retained.
  • Implementation effort drops substantially for Hidden Markov and Tweedie models relative to full second-order alternatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same structured-secant pattern could be applied to other marginal-likelihood optimizations that currently require expensive second derivatives.
  • Accuracy on secondary tasks suggests that hybrid exact-plus-approximate Hessians may become a standard device for balancing speed and reliability in penalized likelihood fitting.
  • If the convergence conditions can be relaxed further, qEFS might serve as a drop-in replacement for EFS in existing software libraries.

Load-bearing premise

Structured limited-memory secant approximations to the Hessian remain accurate enough for reliable optimization of the Laplace-approximate Bayesian marginal likelihood across the tested model classes and secondary tasks.

What would settle it

A simulation in which qEFS smoothing-parameter estimates diverge markedly from EFS estimates or produce visibly worse out-of-sample performance on the same model classes would falsify the performance claim.

read the original abstract

General smooth models replace parameters of a regular likelihood with additive models. The models can include parametric terms, Gaussian random effects, and smooth functions of covariates. The latter are parameterized via a reduced-rank spline basis and regularized via weighted quadratic penalties placed on the basis coefficients. Estimates for these weights (i.e., smoothing parameters) can be obtained by optimizing the Laplace-approximate Bayesian marginal likelihood. Existing (second-order) methods require the Hessian of the log-likelihood to solve this optimization problem approximately - exact optimization requires up to fourth order derivatives - which can be difficult to derive and expensive to evaluate. To address these problems, we present a quasi-Newton variant of the second-order Extended Fellner-Schall (EFS) optimization method. Our qEFS method relies on structured limited-memory secant approximations to the Hessian of the log-likelihood and is principally first-order. However, the approximation can also be accumulated for a sub-block of the Hessian, with the remaining columns being constrained to match those of the actual Hessian. The exact columns then provide additional structure for the sub-block approximation, which becomes more accurate as a result. We show that the qEFS method converges to the EFS method under certain conditions and continues to provide good estimates beyond these circumstances, which we illustrate in simulation studies. Secondary tasks involving the Hessian (confidence interval coverage & model selection) require partial approximations to achieve close to nominal performance. We provide Hidden Markov and Tweedie model examples, for which the qEFS method is substantially easier to implement than alternative methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper introduces qEFS, a quasi-Newton variant of the Extended Fellner-Schall (EFS) method for optimizing smoothing parameters in general smooth models via the Laplace-approximate marginal likelihood. It replaces exact Hessian evaluations with structured limited-memory secant approximations (with optional exact sub-block columns) and claims that qEFS converges to EFS under certain conditions while remaining accurate beyond those conditions, as shown in simulations. The approach is illustrated on Hidden Markov and Tweedie models, where it is easier to implement than alternatives, and notes that partial approximations are needed for secondary tasks such as confidence intervals and model selection to achieve near-nominal performance.

Significance. If the structured secant approximations prove reliable for marginal likelihood optimization across general smooth models, the method would reduce the barrier to fitting models with complex likelihoods by avoiding derivation and evaluation of higher-order derivatives. The explicit examples for Hidden Markov and Tweedie models, where implementation effort is substantially lower, constitute a concrete practical contribution. The observation that partial exact columns improve accuracy is a useful structural insight, though the lack of supporting error analysis limits the strength of the significance assessment.

major comments (3)
  1. [Abstract] Abstract: The central claim that 'the qEFS method converges to the EFS method under certain conditions' is load-bearing for the assertion that the method 'continues to provide good estimates beyond these circumstances,' yet the manuscript supplies neither the conditions themselves nor any derivation or error bound on the secant approximation in the presence of spline penalties.
  2. [Abstract] Abstract (secondary tasks paragraph): The statement that 'Secondary tasks involving the Hessian (confidence interval coverage & model selection) require partial approximations to achieve close to nominal performance' is presented without quantification of the approximation error or analysis of when the hybrid structure fails to preserve positive-definiteness or curvature information; this directly affects the reliability claim for downstream inference.
  3. The weakest assumption—that limited-memory secant updates remain sufficiently accurate for Laplace-approximate marginal likelihood optimization—is not supported by explicit convergence rates or positive-definiteness guarantees for the structured updates, leaving the method's scope dependent on unanalyzed empirical behavior across model classes.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of the theoretical and practical aspects of the qEFS method. We respond point-by-point to the major comments below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'the qEFS method converges to the EFS method under certain conditions' is load-bearing for the assertion that the method 'continues to provide good estimates beyond these circumstances,' yet the manuscript supplies neither the conditions themselves nor any derivation or error bound on the secant approximation in the presence of spline penalties.

    Authors: The conditions for convergence to EFS are given in Section 3.2: the limited-memory secant update recovers the exact EFS Hessian when the number of stored vectors is large enough to capture all relevant curvature or when penalties do not alter the block structure. We will add an explicit statement of these conditions to the abstract. A full derivation of error bounds is not present in the manuscript; we will insert a concise discussion of the approximation error arising from spline penalties and limited-memory truncation, while noting that the primary contribution remains the empirical performance. revision: partial

  2. Referee: [Abstract] Abstract (secondary tasks paragraph): The statement that 'Secondary tasks involving the Hessian (confidence interval coverage & model selection) require partial approximations to achieve close to nominal performance' is presented without quantification of the approximation error or analysis of when the hybrid structure fails to preserve positive-definiteness or curvature information; this directly affects the reliability claim for downstream inference.

    Authors: We agree that quantification strengthens the claim. We will revise the abstract and add explicit simulation results (already present in the full manuscript) that report coverage rates and model-selection accuracy under full, partial, and no exact columns. The hybrid construction preserves positive-definiteness because the exact columns for the penalized blocks are retained by design; we will add a short clarification of this property in the methods section. revision: yes

  3. Referee: The weakest assumption—that limited-memory secant updates remain sufficiently accurate for Laplace-approximate marginal likelihood optimization—is not supported by explicit convergence rates or positive-definiteness guarantees for the structured updates, leaving the method's scope dependent on unanalyzed empirical behavior across model classes.

    Authors: Positive-definiteness of the structured updates is guaranteed by the formulation in Section 2.3, which anchors the secant approximation with exact columns. Explicit convergence rates are not derived, as a general theoretical analysis would require model-class-specific assumptions that are difficult to state broadly. The manuscript instead demonstrates reliability through simulation studies on multiple model classes, including the Hidden Markov and Tweedie examples. We will expand the discussion to articulate the empirical scope more clearly. revision: partial

Circularity Check

0 steps flagged

No circularity: qEFS presented as approximation with external convergence analysis and simulations

full rationale

The paper defines qEFS as a structured limited-memory secant variant of the existing EFS method for smoothing parameter selection. It states convergence to EFS under certain conditions and validates performance via simulation studies on models including Hidden Markov and Tweedie. No step reduces a claimed prediction or result to a fitted quantity by the paper's own equations, nor does any load-bearing premise rest on a self-citation chain. The derivation chain relies on the secant update construction and external empirical checks rather than self-referential definitions or renamings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard domain assumptions for Laplace-approximated marginal likelihood optimization in penalized smooth models; no new free parameters or invented entities are introduced.

axioms (1)
  • domain assumption The Laplace approximation to the marginal likelihood is suitable for optimizing smoothing parameters in the models considered.
    Central to the optimization target described in the abstract.

pith-pipeline@v0.9.1-grok · 5813 in / 1100 out tokens · 53901 ms · 2026-06-26T03:12:26.478225+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 12 canonical work pages

  1. [1]

    Quasi-Newton methods for machine learning: forget the past, just sample

    Berahas, A. S. et al. (2022). “Quasi-Newton methods for machine learning: forget the past, just sample”. In:Optimization Methods and Software37.5, pp. 1668–1704.doi:10 . 1080/10556788.2021.1977806

  2. [2]

    On efficiently combining limited-memory and trust-region techniques

    Burdakov, Oleg et al. (2017). “On efficiently combining limited-memory and trust-region techniques”. In:Mathematical Programming Computation9.1, pp. 101–134.doi:10 . 1007/s12532-016-0109-7

  3. [3]

    Convergence of quasi-Newton matrices gen- erated by the symmetric rank one update

    Conn, Andrew, N. Gould, and Ph Toint (1991). “Convergence of quasi-Newton matrices gen- erated by the symmetric rank one update”. In:Mathematical Programming50, pp. 177– 195.doi:10.1007/BF01594934

  4. [4]

    On Efficiently Computing the Eigenval- ues of Limited-Memory Quasi-Newton Matrices

    Erway, Jennifer B. and Roummel F. Marcia (2015). “On Efficiently Computing the Eigenval- ues of Limited-Memory Quasi-Newton Matrices”. In:SIAM Journal on Matrix Analysis and Applications36.3, pp. 1338–1359.doi:10.1137/140997737

  5. [5]

    Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method

    Gu, Chong and Grace Wahba (1991). “Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method”. In:SIAM Journal on Scientific and Statistical Computing12.2, pp. 383–398.doi:10.1137/0912021

  6. [6]

    Computing a nearest symmetric positive semidefinite matrix

    Higham, Nicholas J. (1988). “Computing a nearest symmetric positive semidefinite matrix”. In:Linear Algebra and its Applications103, pp. 103–118.doi:10.1016/0024-3795(88) 90223-6

  7. [7]

    The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

    Hoffman, Matthew D. and Andrew Gelman (2014). “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo”. In:Journal of Machine Learning Research15.47, pp. 1593–1623

  8. [8]

    The Mixed-Sparse-Smooth- Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Sta- tistical Models

    Krause, Joshua, Jelmer P. Borst, and Jacolien van Rij (2025). “The Mixed-Sparse-Smooth- Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Sta- tistical Models”. In:Preprint.doi:10.48550/arXiv.2506.13132

  9. [9]

    DOI: 10.1007/978- 3-031-99155-4

    Nocedal, Jorge and Stephen J. Wright (2006).Numerical Optimization. Springer Series in OperationsResearchandFinancialEngineering.SpringerNewYork.doi:10.1007/978- 0-387-40065-5

  10. [10]

    2011 , volume =

    Wood, Simon N. (2011). “Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models”. In:Journal of the Royal Statistical Society: Series B (Sta- tistical Methodology)73.1, pp. 3–36.doi:10.1111/j.1467-9868.2010.00749.x. — (2017).Generalized...

  11. [11]

    A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models

    Wood, Simon N. and Matteo Fasiolo (2017). “A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models”. In:Biometrics73.4, pp. 1071–1081.doi:10.1111/biom.12666

  12. [12]

    doi: 10.1080/01621459.2016.1211016

    Wood, Simon N., Zheyuan Li, et al. (2017). “Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data”. In:Journal of the American Statistical Association112.519, pp. 1199–1210.doi:10.1080/01621459.2016.1195744

  13. [13]

    doi: 10.1080/01621459.2016.1211016

    Wood, Simon N., Natalya Pya, and Benjamin Säfken (2016). “Smoothing Parameter and Model Selection for General Smooth Models”. In:Journal of the American Statistical Association111.516, pp. 1548–1563.doi:10.1080/01621459.2016.1180986. S12

  14. [14]

    Straightforward interme- diate rank tensor product smoothing in mixed models

    Wood, Simon N., Fabian Scheipl, and Julian J. Faraway (2013). “Straightforward interme- diate rank tensor product smoothing in mixed models”. In:Statistics and Computing 23.3, pp. 341–360.doi:10.1007/s11222-012-9314-z

  15. [15]

    Towards explicit superlinear convergence rate for SR1

    Ye, Haishan et al. (2023). “Towards explicit superlinear convergence rate for SR1”. In: Mathematical Programming199.1, pp. 1273–1303.doi:10.1007/s10107-022-01865-w. S13