Structured Secant Methods to Select Smoothing Parameters For General Smooth Models
Pith reviewed 2026-06-26 03:12 UTC · model grok-4.3
The pith
Structured secant approximations let a quasi-Newton method select smoothing parameters without full Hessian derivatives.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The qEFS method relies on structured limited-memory secant approximations to the Hessian of the log-likelihood and is principally first-order; the approximation can be accumulated for a sub-block of the Hessian while the remaining columns are constrained to match those of the actual Hessian. The exact columns supply additional structure that improves the sub-block approximation. Under certain conditions the qEFS iterates converge to those of the EFS method, and the method continues to deliver good smoothing-parameter estimates beyond those conditions in simulation studies. Secondary tasks such as confidence-interval construction require the partial exact approximations to reach near-nominal
What carries the argument
The qEFS method, which builds structured limited-memory secant approximations to the Hessian of the log-likelihood, optionally constraining sub-blocks to exact Hessian columns for added structure.
If this is right
- qEFS converges to EFS under the stated conditions.
- qEFS supplies good smoothing-parameter estimates beyond those conditions in the reported simulations.
- Confidence-interval coverage and model selection reach close to nominal levels only when partial exact Hessian columns are retained.
- Implementation effort drops substantially for Hidden Markov and Tweedie models relative to full second-order alternatives.
Where Pith is reading between the lines
- The same structured-secant pattern could be applied to other marginal-likelihood optimizations that currently require expensive second derivatives.
- Accuracy on secondary tasks suggests that hybrid exact-plus-approximate Hessians may become a standard device for balancing speed and reliability in penalized likelihood fitting.
- If the convergence conditions can be relaxed further, qEFS might serve as a drop-in replacement for EFS in existing software libraries.
Load-bearing premise
Structured limited-memory secant approximations to the Hessian remain accurate enough for reliable optimization of the Laplace-approximate Bayesian marginal likelihood across the tested model classes and secondary tasks.
What would settle it
A simulation in which qEFS smoothing-parameter estimates diverge markedly from EFS estimates or produce visibly worse out-of-sample performance on the same model classes would falsify the performance claim.
read the original abstract
General smooth models replace parameters of a regular likelihood with additive models. The models can include parametric terms, Gaussian random effects, and smooth functions of covariates. The latter are parameterized via a reduced-rank spline basis and regularized via weighted quadratic penalties placed on the basis coefficients. Estimates for these weights (i.e., smoothing parameters) can be obtained by optimizing the Laplace-approximate Bayesian marginal likelihood. Existing (second-order) methods require the Hessian of the log-likelihood to solve this optimization problem approximately - exact optimization requires up to fourth order derivatives - which can be difficult to derive and expensive to evaluate. To address these problems, we present a quasi-Newton variant of the second-order Extended Fellner-Schall (EFS) optimization method. Our qEFS method relies on structured limited-memory secant approximations to the Hessian of the log-likelihood and is principally first-order. However, the approximation can also be accumulated for a sub-block of the Hessian, with the remaining columns being constrained to match those of the actual Hessian. The exact columns then provide additional structure for the sub-block approximation, which becomes more accurate as a result. We show that the qEFS method converges to the EFS method under certain conditions and continues to provide good estimates beyond these circumstances, which we illustrate in simulation studies. Secondary tasks involving the Hessian (confidence interval coverage & model selection) require partial approximations to achieve close to nominal performance. We provide Hidden Markov and Tweedie model examples, for which the qEFS method is substantially easier to implement than alternative methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces qEFS, a quasi-Newton variant of the Extended Fellner-Schall (EFS) method for optimizing smoothing parameters in general smooth models via the Laplace-approximate marginal likelihood. It replaces exact Hessian evaluations with structured limited-memory secant approximations (with optional exact sub-block columns) and claims that qEFS converges to EFS under certain conditions while remaining accurate beyond those conditions, as shown in simulations. The approach is illustrated on Hidden Markov and Tweedie models, where it is easier to implement than alternatives, and notes that partial approximations are needed for secondary tasks such as confidence intervals and model selection to achieve near-nominal performance.
Significance. If the structured secant approximations prove reliable for marginal likelihood optimization across general smooth models, the method would reduce the barrier to fitting models with complex likelihoods by avoiding derivation and evaluation of higher-order derivatives. The explicit examples for Hidden Markov and Tweedie models, where implementation effort is substantially lower, constitute a concrete practical contribution. The observation that partial exact columns improve accuracy is a useful structural insight, though the lack of supporting error analysis limits the strength of the significance assessment.
major comments (3)
- [Abstract] Abstract: The central claim that 'the qEFS method converges to the EFS method under certain conditions' is load-bearing for the assertion that the method 'continues to provide good estimates beyond these circumstances,' yet the manuscript supplies neither the conditions themselves nor any derivation or error bound on the secant approximation in the presence of spline penalties.
- [Abstract] Abstract (secondary tasks paragraph): The statement that 'Secondary tasks involving the Hessian (confidence interval coverage & model selection) require partial approximations to achieve close to nominal performance' is presented without quantification of the approximation error or analysis of when the hybrid structure fails to preserve positive-definiteness or curvature information; this directly affects the reliability claim for downstream inference.
- The weakest assumption—that limited-memory secant updates remain sufficiently accurate for Laplace-approximate marginal likelihood optimization—is not supported by explicit convergence rates or positive-definiteness guarantees for the structured updates, leaving the method's scope dependent on unanalyzed empirical behavior across model classes.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the presentation of the theoretical and practical aspects of the qEFS method. We respond point-by-point to the major comments below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'the qEFS method converges to the EFS method under certain conditions' is load-bearing for the assertion that the method 'continues to provide good estimates beyond these circumstances,' yet the manuscript supplies neither the conditions themselves nor any derivation or error bound on the secant approximation in the presence of spline penalties.
Authors: The conditions for convergence to EFS are given in Section 3.2: the limited-memory secant update recovers the exact EFS Hessian when the number of stored vectors is large enough to capture all relevant curvature or when penalties do not alter the block structure. We will add an explicit statement of these conditions to the abstract. A full derivation of error bounds is not present in the manuscript; we will insert a concise discussion of the approximation error arising from spline penalties and limited-memory truncation, while noting that the primary contribution remains the empirical performance. revision: partial
-
Referee: [Abstract] Abstract (secondary tasks paragraph): The statement that 'Secondary tasks involving the Hessian (confidence interval coverage & model selection) require partial approximations to achieve close to nominal performance' is presented without quantification of the approximation error or analysis of when the hybrid structure fails to preserve positive-definiteness or curvature information; this directly affects the reliability claim for downstream inference.
Authors: We agree that quantification strengthens the claim. We will revise the abstract and add explicit simulation results (already present in the full manuscript) that report coverage rates and model-selection accuracy under full, partial, and no exact columns. The hybrid construction preserves positive-definiteness because the exact columns for the penalized blocks are retained by design; we will add a short clarification of this property in the methods section. revision: yes
-
Referee: The weakest assumption—that limited-memory secant updates remain sufficiently accurate for Laplace-approximate marginal likelihood optimization—is not supported by explicit convergence rates or positive-definiteness guarantees for the structured updates, leaving the method's scope dependent on unanalyzed empirical behavior across model classes.
Authors: Positive-definiteness of the structured updates is guaranteed by the formulation in Section 2.3, which anchors the secant approximation with exact columns. Explicit convergence rates are not derived, as a general theoretical analysis would require model-class-specific assumptions that are difficult to state broadly. The manuscript instead demonstrates reliability through simulation studies on multiple model classes, including the Hidden Markov and Tweedie examples. We will expand the discussion to articulate the empirical scope more clearly. revision: partial
Circularity Check
No circularity: qEFS presented as approximation with external convergence analysis and simulations
full rationale
The paper defines qEFS as a structured limited-memory secant variant of the existing EFS method for smoothing parameter selection. It states convergence to EFS under certain conditions and validates performance via simulation studies on models including Hidden Markov and Tweedie. No step reduces a claimed prediction or result to a fitted quantity by the paper's own equations, nor does any load-bearing premise rest on a self-citation chain. The derivation chain relies on the secant update construction and external empirical checks rather than self-referential definitions or renamings.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Laplace approximation to the marginal likelihood is suitable for optimizing smoothing parameters in the models considered.
Reference graph
Works this paper leans on
-
[1]
Quasi-Newton methods for machine learning: forget the past, just sample
Berahas, A. S. et al. (2022). “Quasi-Newton methods for machine learning: forget the past, just sample”. In:Optimization Methods and Software37.5, pp. 1668–1704.doi:10 . 1080/10556788.2021.1977806
arXiv 2022
-
[2]
On efficiently combining limited-memory and trust-region techniques
Burdakov, Oleg et al. (2017). “On efficiently combining limited-memory and trust-region techniques”. In:Mathematical Programming Computation9.1, pp. 101–134.doi:10 . 1007/s12532-016-0109-7
2017
-
[3]
Convergence of quasi-Newton matrices gen- erated by the symmetric rank one update
Conn, Andrew, N. Gould, and Ph Toint (1991). “Convergence of quasi-Newton matrices gen- erated by the symmetric rank one update”. In:Mathematical Programming50, pp. 177– 195.doi:10.1007/BF01594934
-
[4]
On Efficiently Computing the Eigenval- ues of Limited-Memory Quasi-Newton Matrices
Erway, Jennifer B. and Roummel F. Marcia (2015). “On Efficiently Computing the Eigenval- ues of Limited-Memory Quasi-Newton Matrices”. In:SIAM Journal on Matrix Analysis and Applications36.3, pp. 1338–1359.doi:10.1137/140997737
-
[5]
Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method
Gu, Chong and Grace Wahba (1991). “Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method”. In:SIAM Journal on Scientific and Statistical Computing12.2, pp. 383–398.doi:10.1137/0912021
-
[6]
Computing a nearest symmetric positive semidefinite matrix
Higham, Nicholas J. (1988). “Computing a nearest symmetric positive semidefinite matrix”. In:Linear Algebra and its Applications103, pp. 103–118.doi:10.1016/0024-3795(88) 90223-6
-
[7]
The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo
Hoffman, Matthew D. and Andrew Gelman (2014). “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo”. In:Journal of Machine Learning Research15.47, pp. 1593–1623
2014
-
[8]
Krause, Joshua, Jelmer P. Borst, and Jacolien van Rij (2025). “The Mixed-Sparse-Smooth- Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Sta- tistical Models”. In:Preprint.doi:10.48550/arXiv.2506.13132
-
[9]
DOI: 10.1007/978- 3-031-99155-4
Nocedal, Jorge and Stephen J. Wright (2006).Numerical Optimization. Springer Series in OperationsResearchandFinancialEngineering.SpringerNewYork.doi:10.1007/978- 0-387-40065-5
-
[10]
Wood, Simon N. (2011). “Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models”. In:Journal of the Royal Statistical Society: Series B (Sta- tistical Methodology)73.1, pp. 3–36.doi:10.1111/j.1467-9868.2010.00749.x. — (2017).Generalized...
-
[11]
Wood, Simon N. and Matteo Fasiolo (2017). “A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models”. In:Biometrics73.4, pp. 1071–1081.doi:10.1111/biom.12666
-
[12]
doi: 10.1080/01621459.2016.1211016
Wood, Simon N., Zheyuan Li, et al. (2017). “Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data”. In:Journal of the American Statistical Association112.519, pp. 1199–1210.doi:10.1080/01621459.2016.1195744
-
[13]
doi: 10.1080/01621459.2016.1211016
Wood, Simon N., Natalya Pya, and Benjamin Säfken (2016). “Smoothing Parameter and Model Selection for General Smooth Models”. In:Journal of the American Statistical Association111.516, pp. 1548–1563.doi:10.1080/01621459.2016.1180986. S12
-
[14]
Straightforward interme- diate rank tensor product smoothing in mixed models
Wood, Simon N., Fabian Scheipl, and Julian J. Faraway (2013). “Straightforward interme- diate rank tensor product smoothing in mixed models”. In:Statistics and Computing 23.3, pp. 341–360.doi:10.1007/s11222-012-9314-z
-
[15]
Towards explicit superlinear convergence rate for SR1
Ye, Haishan et al. (2023). “Towards explicit superlinear convergence rate for SR1”. In: Mathematical Programming199.1, pp. 1273–1303.doi:10.1007/s10107-022-01865-w. S13
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.