Structured Secant Methods to Select Smoothing Parameters For General Smooth Models

Jacolien van Rij; Jelmer P. Borst; Joshua Krause

arxiv: 2606.26804 · v1 · pith:5EOUX4QNnew · submitted 2026-06-25 · 📊 stat.ME · stat.CO

Structured Secant Methods to Select Smoothing Parameters For General Smooth Models

Joshua Krause , Jelmer P. Borst , Jacolien van Rij This is my paper

Pith reviewed 2026-06-26 03:12 UTC · model grok-4.3

classification 📊 stat.ME stat.CO

keywords smoothing parametersgeneral smooth modelsquasi-Newton methodssecant approximationsLaplace approximationmarginal likelihoodEFS methodqEFS method

0 comments

The pith

Structured secant approximations let a quasi-Newton method select smoothing parameters without full Hessian derivatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

General smooth models fit additive structures with penalties whose weights are chosen by optimizing a Laplace-approximate marginal likelihood. This optimization normally needs the Hessian of the log-likelihood, which is costly to derive and evaluate for many models. The paper replaces the exact Hessian with structured limited-memory secant updates that stay first-order overall yet incorporate exact columns in selected blocks to preserve accuracy. The resulting qEFS procedure converges to the existing EFS method under stated conditions and yields comparable smoothing-parameter estimates more widely, while simplifying implementation for models such as Hidden Markov and Tweedie processes.

Core claim

The qEFS method relies on structured limited-memory secant approximations to the Hessian of the log-likelihood and is principally first-order; the approximation can be accumulated for a sub-block of the Hessian while the remaining columns are constrained to match those of the actual Hessian. The exact columns supply additional structure that improves the sub-block approximation. Under certain conditions the qEFS iterates converge to those of the EFS method, and the method continues to deliver good smoothing-parameter estimates beyond those conditions in simulation studies. Secondary tasks such as confidence-interval construction require the partial exact approximations to reach near-nominal

What carries the argument

The qEFS method, which builds structured limited-memory secant approximations to the Hessian of the log-likelihood, optionally constraining sub-blocks to exact Hessian columns for added structure.

If this is right

qEFS converges to EFS under the stated conditions.
qEFS supplies good smoothing-parameter estimates beyond those conditions in the reported simulations.
Confidence-interval coverage and model selection reach close to nominal levels only when partial exact Hessian columns are retained.
Implementation effort drops substantially for Hidden Markov and Tweedie models relative to full second-order alternatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same structured-secant pattern could be applied to other marginal-likelihood optimizations that currently require expensive second derivatives.
Accuracy on secondary tasks suggests that hybrid exact-plus-approximate Hessians may become a standard device for balancing speed and reliability in penalized likelihood fitting.
If the convergence conditions can be relaxed further, qEFS might serve as a drop-in replacement for EFS in existing software libraries.

Load-bearing premise

Structured limited-memory secant approximations to the Hessian remain accurate enough for reliable optimization of the Laplace-approximate Bayesian marginal likelihood across the tested model classes and secondary tasks.

What would settle it

A simulation in which qEFS smoothing-parameter estimates diverge markedly from EFS estimates or produce visibly worse out-of-sample performance on the same model classes would falsify the performance claim.

read the original abstract

General smooth models replace parameters of a regular likelihood with additive models. The models can include parametric terms, Gaussian random effects, and smooth functions of covariates. The latter are parameterized via a reduced-rank spline basis and regularized via weighted quadratic penalties placed on the basis coefficients. Estimates for these weights (i.e., smoothing parameters) can be obtained by optimizing the Laplace-approximate Bayesian marginal likelihood. Existing (second-order) methods require the Hessian of the log-likelihood to solve this optimization problem approximately - exact optimization requires up to fourth order derivatives - which can be difficult to derive and expensive to evaluate. To address these problems, we present a quasi-Newton variant of the second-order Extended Fellner-Schall (EFS) optimization method. Our qEFS method relies on structured limited-memory secant approximations to the Hessian of the log-likelihood and is principally first-order. However, the approximation can also be accumulated for a sub-block of the Hessian, with the remaining columns being constrained to match those of the actual Hessian. The exact columns then provide additional structure for the sub-block approximation, which becomes more accurate as a result. We show that the qEFS method converges to the EFS method under certain conditions and continues to provide good estimates beyond these circumstances, which we illustrate in simulation studies. Secondary tasks involving the Hessian (confidence interval coverage & model selection) require partial approximations to achieve close to nominal performance. We provide Hidden Markov and Tweedie model examples, for which the qEFS method is substantially easier to implement than alternative methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a first-order quasi-Newton variant of EFS that uses structured secant updates and is easier to code for some models, but its claims rest on simulations without error bounds.

read the letter

The main takeaway is that qEFS replaces the Hessian in the Extended Fellner-Schall optimizer with limited-memory secant approximations that can keep exact columns for part of the matrix. This keeps the method principally first-order while adding structure that improves accuracy over plain BFGS-style updates.

What is new is the tailored secant scheme for the Laplace-approximate marginal likelihood problem in general smooth models, including the option to mix exact and approximate columns. The paper shows convergence to standard EFS under stated conditions and reports that the method still works in simulations outside those conditions. It also demonstrates simpler implementation for Hidden Markov and Tweedie models where fourth-order derivatives are awkward.

The soft spots are the lack of explicit error bounds or convergence rates for the secant approximation when spline penalties are present, and the admission that full first-order use hurts coverage and model selection, so partial exact columns are needed. The evidence for reliability beyond the convergence conditions is simulation-based only, with no independent verification of the derivations supplied in the abstract.

This is for readers who implement smoothing-parameter selection in statistical software or work on additive models with non-standard likelihoods. The citation pattern looks standard for the subfield.

I would send it to peer review so referees can check the derivations and simulation details.

Referee Report

3 major / 0 minor

Summary. The paper introduces qEFS, a quasi-Newton variant of the Extended Fellner-Schall (EFS) method for optimizing smoothing parameters in general smooth models via the Laplace-approximate marginal likelihood. It replaces exact Hessian evaluations with structured limited-memory secant approximations (with optional exact sub-block columns) and claims that qEFS converges to EFS under certain conditions while remaining accurate beyond those conditions, as shown in simulations. The approach is illustrated on Hidden Markov and Tweedie models, where it is easier to implement than alternatives, and notes that partial approximations are needed for secondary tasks such as confidence intervals and model selection to achieve near-nominal performance.

Significance. If the structured secant approximations prove reliable for marginal likelihood optimization across general smooth models, the method would reduce the barrier to fitting models with complex likelihoods by avoiding derivation and evaluation of higher-order derivatives. The explicit examples for Hidden Markov and Tweedie models, where implementation effort is substantially lower, constitute a concrete practical contribution. The observation that partial exact columns improve accuracy is a useful structural insight, though the lack of supporting error analysis limits the strength of the significance assessment.

major comments (3)

[Abstract] Abstract: The central claim that 'the qEFS method converges to the EFS method under certain conditions' is load-bearing for the assertion that the method 'continues to provide good estimates beyond these circumstances,' yet the manuscript supplies neither the conditions themselves nor any derivation or error bound on the secant approximation in the presence of spline penalties.
[Abstract] Abstract (secondary tasks paragraph): The statement that 'Secondary tasks involving the Hessian (confidence interval coverage & model selection) require partial approximations to achieve close to nominal performance' is presented without quantification of the approximation error or analysis of when the hybrid structure fails to preserve positive-definiteness or curvature information; this directly affects the reliability claim for downstream inference.
The weakest assumption—that limited-memory secant updates remain sufficiently accurate for Laplace-approximate marginal likelihood optimization—is not supported by explicit convergence rates or positive-definiteness guarantees for the structured updates, leaving the method's scope dependent on unanalyzed empirical behavior across model classes.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of the theoretical and practical aspects of the qEFS method. We respond point-by-point to the major comments below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'the qEFS method converges to the EFS method under certain conditions' is load-bearing for the assertion that the method 'continues to provide good estimates beyond these circumstances,' yet the manuscript supplies neither the conditions themselves nor any derivation or error bound on the secant approximation in the presence of spline penalties.

Authors: The conditions for convergence to EFS are given in Section 3.2: the limited-memory secant update recovers the exact EFS Hessian when the number of stored vectors is large enough to capture all relevant curvature or when penalties do not alter the block structure. We will add an explicit statement of these conditions to the abstract. A full derivation of error bounds is not present in the manuscript; we will insert a concise discussion of the approximation error arising from spline penalties and limited-memory truncation, while noting that the primary contribution remains the empirical performance. revision: partial
Referee: [Abstract] Abstract (secondary tasks paragraph): The statement that 'Secondary tasks involving the Hessian (confidence interval coverage & model selection) require partial approximations to achieve close to nominal performance' is presented without quantification of the approximation error or analysis of when the hybrid structure fails to preserve positive-definiteness or curvature information; this directly affects the reliability claim for downstream inference.

Authors: We agree that quantification strengthens the claim. We will revise the abstract and add explicit simulation results (already present in the full manuscript) that report coverage rates and model-selection accuracy under full, partial, and no exact columns. The hybrid construction preserves positive-definiteness because the exact columns for the penalized blocks are retained by design; we will add a short clarification of this property in the methods section. revision: yes
Referee: The weakest assumption—that limited-memory secant updates remain sufficiently accurate for Laplace-approximate marginal likelihood optimization—is not supported by explicit convergence rates or positive-definiteness guarantees for the structured updates, leaving the method's scope dependent on unanalyzed empirical behavior across model classes.

Authors: Positive-definiteness of the structured updates is guaranteed by the formulation in Section 2.3, which anchors the secant approximation with exact columns. Explicit convergence rates are not derived, as a general theoretical analysis would require model-class-specific assumptions that are difficult to state broadly. The manuscript instead demonstrates reliability through simulation studies on multiple model classes, including the Hidden Markov and Tweedie examples. We will expand the discussion to articulate the empirical scope more clearly. revision: partial

Circularity Check

0 steps flagged

No circularity: qEFS presented as approximation with external convergence analysis and simulations

full rationale

The paper defines qEFS as a structured limited-memory secant variant of the existing EFS method for smoothing parameter selection. It states convergence to EFS under certain conditions and validates performance via simulation studies on models including Hidden Markov and Tweedie. No step reduces a claimed prediction or result to a fitted quantity by the paper's own equations, nor does any load-bearing premise rest on a self-citation chain. The derivation chain relies on the secant update construction and external empirical checks rather than self-referential definitions or renamings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard domain assumptions for Laplace-approximated marginal likelihood optimization in penalized smooth models; no new free parameters or invented entities are introduced.

axioms (1)

domain assumption The Laplace approximation to the marginal likelihood is suitable for optimizing smoothing parameters in the models considered.
Central to the optimization target described in the abstract.

pith-pipeline@v0.9.1-grok · 5813 in / 1100 out tokens · 53901 ms · 2026-06-26T03:12:26.478225+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 12 canonical work pages

[1]

Quasi-Newton methods for machine learning: forget the past, just sample

Berahas, A. S. et al. (2022). “Quasi-Newton methods for machine learning: forget the past, just sample”. In:Optimization Methods and Software37.5, pp. 1668–1704.doi:10 . 1080/10556788.2021.1977806

arXiv 2022
[2]

On efficiently combining limited-memory and trust-region techniques

Burdakov, Oleg et al. (2017). “On efficiently combining limited-memory and trust-region techniques”. In:Mathematical Programming Computation9.1, pp. 101–134.doi:10 . 1007/s12532-016-0109-7

2017
[3]

Convergence of quasi-Newton matrices gen- erated by the symmetric rank one update

Conn, Andrew, N. Gould, and Ph Toint (1991). “Convergence of quasi-Newton matrices gen- erated by the symmetric rank one update”. In:Mathematical Programming50, pp. 177– 195.doi:10.1007/BF01594934

work page doi:10.1007/bf01594934 1991
[4]

On Efficiently Computing the Eigenval- ues of Limited-Memory Quasi-Newton Matrices

Erway, Jennifer B. and Roummel F. Marcia (2015). “On Efficiently Computing the Eigenval- ues of Limited-Memory Quasi-Newton Matrices”. In:SIAM Journal on Matrix Analysis and Applications36.3, pp. 1338–1359.doi:10.1137/140997737

work page doi:10.1137/140997737 2015
[5]

Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method

Gu, Chong and Grace Wahba (1991). “Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method”. In:SIAM Journal on Scientific and Statistical Computing12.2, pp. 383–398.doi:10.1137/0912021

work page doi:10.1137/0912021 1991
[6]

Computing a nearest symmetric positive semidefinite matrix

Higham, Nicholas J. (1988). “Computing a nearest symmetric positive semidefinite matrix”. In:Linear Algebra and its Applications103, pp. 103–118.doi:10.1016/0024-3795(88) 90223-6

work page doi:10.1016/0024-3795(88 1988
[7]

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

Hoffman, Matthew D. and Andrew Gelman (2014). “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo”. In:Journal of Machine Learning Research15.47, pp. 1593–1623

2014
[8]

The Mixed-Sparse-Smooth- Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Sta- tistical Models

Krause, Joshua, Jelmer P. Borst, and Jacolien van Rij (2025). “The Mixed-Sparse-Smooth- Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Sta- tistical Models”. In:Preprint.doi:10.48550/arXiv.2506.13132

work page doi:10.48550/arxiv.2506.13132 2025
[9]

DOI: 10.1007/978- 3-031-99155-4

Nocedal, Jorge and Stephen J. Wright (2006).Numerical Optimization. Springer Series in OperationsResearchandFinancialEngineering.SpringerNewYork.doi:10.1007/978- 0-387-40065-5

work page doi:10.1007/978- 2006
[10]

2011 , volume =

Wood, Simon N. (2011). “Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models”. In:Journal of the Royal Statistical Society: Series B (Sta- tistical Methodology)73.1, pp. 3–36.doi:10.1111/j.1467-9868.2010.00749.x. — (2017).Generalized...

work page doi:10.1111/j.1467-9868.2010.00749.x 2011
[11]

A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models

Wood, Simon N. and Matteo Fasiolo (2017). “A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models”. In:Biometrics73.4, pp. 1071–1081.doi:10.1111/biom.12666

work page doi:10.1111/biom.12666 2017
[12]

doi: 10.1080/01621459.2016.1211016

Wood, Simon N., Zheyuan Li, et al. (2017). “Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data”. In:Journal of the American Statistical Association112.519, pp. 1199–1210.doi:10.1080/01621459.2016.1195744

work page doi:10.1080/01621459.2016.1195744 2017
[13]

doi: 10.1080/01621459.2016.1211016

Wood, Simon N., Natalya Pya, and Benjamin Säfken (2016). “Smoothing Parameter and Model Selection for General Smooth Models”. In:Journal of the American Statistical Association111.516, pp. 1548–1563.doi:10.1080/01621459.2016.1180986. S12

work page doi:10.1080/01621459.2016.1180986 2016
[14]

Straightforward interme- diate rank tensor product smoothing in mixed models

Wood, Simon N., Fabian Scheipl, and Julian J. Faraway (2013). “Straightforward interme- diate rank tensor product smoothing in mixed models”. In:Statistics and Computing 23.3, pp. 341–360.doi:10.1007/s11222-012-9314-z

work page doi:10.1007/s11222-012-9314-z 2013
[15]

Towards explicit superlinear convergence rate for SR1

Ye, Haishan et al. (2023). “Towards explicit superlinear convergence rate for SR1”. In: Mathematical Programming199.1, pp. 1273–1303.doi:10.1007/s10107-022-01865-w. S13

work page doi:10.1007/s10107-022-01865-w 2023

[1] [1]

Quasi-Newton methods for machine learning: forget the past, just sample

Berahas, A. S. et al. (2022). “Quasi-Newton methods for machine learning: forget the past, just sample”. In:Optimization Methods and Software37.5, pp. 1668–1704.doi:10 . 1080/10556788.2021.1977806

arXiv 2022

[2] [2]

On efficiently combining limited-memory and trust-region techniques

Burdakov, Oleg et al. (2017). “On efficiently combining limited-memory and trust-region techniques”. In:Mathematical Programming Computation9.1, pp. 101–134.doi:10 . 1007/s12532-016-0109-7

2017

[3] [3]

Convergence of quasi-Newton matrices gen- erated by the symmetric rank one update

Conn, Andrew, N. Gould, and Ph Toint (1991). “Convergence of quasi-Newton matrices gen- erated by the symmetric rank one update”. In:Mathematical Programming50, pp. 177– 195.doi:10.1007/BF01594934

work page doi:10.1007/bf01594934 1991

[4] [4]

On Efficiently Computing the Eigenval- ues of Limited-Memory Quasi-Newton Matrices

Erway, Jennifer B. and Roummel F. Marcia (2015). “On Efficiently Computing the Eigenval- ues of Limited-Memory Quasi-Newton Matrices”. In:SIAM Journal on Matrix Analysis and Applications36.3, pp. 1338–1359.doi:10.1137/140997737

work page doi:10.1137/140997737 2015

[5] [5]

Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method

Gu, Chong and Grace Wahba (1991). “Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method”. In:SIAM Journal on Scientific and Statistical Computing12.2, pp. 383–398.doi:10.1137/0912021

work page doi:10.1137/0912021 1991

[6] [6]

Computing a nearest symmetric positive semidefinite matrix

Higham, Nicholas J. (1988). “Computing a nearest symmetric positive semidefinite matrix”. In:Linear Algebra and its Applications103, pp. 103–118.doi:10.1016/0024-3795(88) 90223-6

work page doi:10.1016/0024-3795(88 1988

[7] [7]

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

Hoffman, Matthew D. and Andrew Gelman (2014). “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo”. In:Journal of Machine Learning Research15.47, pp. 1593–1623

2014

[8] [8]

The Mixed-Sparse-Smooth- Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Sta- tistical Models

Krause, Joshua, Jelmer P. Borst, and Jacolien van Rij (2025). “The Mixed-Sparse-Smooth- Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Sta- tistical Models”. In:Preprint.doi:10.48550/arXiv.2506.13132

work page doi:10.48550/arxiv.2506.13132 2025

[9] [9]

DOI: 10.1007/978- 3-031-99155-4

Nocedal, Jorge and Stephen J. Wright (2006).Numerical Optimization. Springer Series in OperationsResearchandFinancialEngineering.SpringerNewYork.doi:10.1007/978- 0-387-40065-5

work page doi:10.1007/978- 2006

[10] [10]

2011 , volume =

Wood, Simon N. (2011). “Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models”. In:Journal of the Royal Statistical Society: Series B (Sta- tistical Methodology)73.1, pp. 3–36.doi:10.1111/j.1467-9868.2010.00749.x. — (2017).Generalized...

work page doi:10.1111/j.1467-9868.2010.00749.x 2011

[11] [11]

A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models

Wood, Simon N. and Matteo Fasiolo (2017). “A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models”. In:Biometrics73.4, pp. 1071–1081.doi:10.1111/biom.12666

work page doi:10.1111/biom.12666 2017

[12] [12]

doi: 10.1080/01621459.2016.1211016

Wood, Simon N., Zheyuan Li, et al. (2017). “Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data”. In:Journal of the American Statistical Association112.519, pp. 1199–1210.doi:10.1080/01621459.2016.1195744

work page doi:10.1080/01621459.2016.1195744 2017

[13] [13]

doi: 10.1080/01621459.2016.1211016

Wood, Simon N., Natalya Pya, and Benjamin Säfken (2016). “Smoothing Parameter and Model Selection for General Smooth Models”. In:Journal of the American Statistical Association111.516, pp. 1548–1563.doi:10.1080/01621459.2016.1180986. S12

work page doi:10.1080/01621459.2016.1180986 2016

[14] [14]

Straightforward interme- diate rank tensor product smoothing in mixed models

Wood, Simon N., Fabian Scheipl, and Julian J. Faraway (2013). “Straightforward interme- diate rank tensor product smoothing in mixed models”. In:Statistics and Computing 23.3, pp. 341–360.doi:10.1007/s11222-012-9314-z

work page doi:10.1007/s11222-012-9314-z 2013

[15] [15]

Towards explicit superlinear convergence rate for SR1

Ye, Haishan et al. (2023). “Towards explicit superlinear convergence rate for SR1”. In: Mathematical Programming199.1, pp. 1273–1303.doi:10.1007/s10107-022-01865-w. S13

work page doi:10.1007/s10107-022-01865-w 2023