Linear Response Estimators for Singular Statistical Models

Chris Elliott; Daniel Murfet

arxiv: 2605.07970 · v1 · submitted 2026-05-08 · 🧮 math.ST · cs.LG· stat.TH

Linear Response Estimators for Singular Statistical Models

Chris Elliott , Daniel Murfet This is my paper

Pith reviewed 2026-05-11 02:36 UTC · model grok-4.3

classification 🧮 math.ST cs.LGstat.TH

keywords susceptibilitieslinear responsesingular statistical modelsconsistencyasymptotic unbiasednessestimatorsparameterized modelsstatistical inference

0 comments

The pith

Estimators for susceptibilities, defined as responses of observables to data perturbations, are consistent and asymptotically unbiased in singular statistical models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines susceptibilities as measures of how an observable quantity in a parameterized statistical model responds to a perturbation of the data, for a general class of observables. It introduces estimators for these susceptibilities as statistics computed from a sequence of n data points. The central result is a proof that these estimators are consistent and asymptotically unbiased in the large-n regime. This matters for singular models, which arise in algebraic statistics and machine learning, because standard regularity conditions often fail and direct response estimation becomes difficult. A sympathetic reader cares about obtaining reliable sensitivity measures without assuming the model is smooth or non-degenerate.

Core claim

Susceptibilities, defined as the response of an observable quantity of a parameterized statistical model to a perturbation of the data for a general class of observables, admit estimators that are statistics on a sequence of n data-points; these estimators are consistent and asymptotically unbiased in the large n regime.

What carries the argument

Susceptibilities as linear-response measures of observables to data perturbations, with estimators constructed as finite-sample statistics.

If this is right

Susceptibilities become estimable from finite data without requiring model regularity.
The estimators apply to a broad class of observables in singular settings.
Asymptotic unbiasedness holds in the large-sample limit for these responses.
Inference procedures can quantify data sensitivity even when the model has singularities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be applied to overparameterized neural networks to estimate how predictions respond to training-data changes.
Testing on low-dimensional singular examples such as mixture models with tied parameters would provide direct numerical checks of convergence rates.
The framework might connect to information-geometric notions of sensitivity, allowing explicit calculations in algebraic statistics.
Extensions to time-series or sequential perturbations could be examined by replacing the static data sequence with a dynamical one.

Load-bearing premise

The statistical model is parameterized, the observables form a general class for which susceptibilities are well-defined, and the large-n regime does not introduce additional singularities that invalidate the consistency and unbiasedness proofs.

What would settle it

Simulate repeated samples from a concrete singular model such as a degenerate Gaussian mixture, compute the susceptibility estimators for increasing n, and verify whether the bias approaches zero while the estimator converges in probability to the true susceptibility value.

read the original abstract

We define susceptibilities as a measure of the response of an observable quantity of a parameterized statistical model to a perturbation of the data for a general class of observables. We define estimators for these susceptibilities as statistics in a sequence of n data-points and prove that these estimators are consistent and asymptotically unbiased in the large n regime.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines susceptibilities as linear response measures for observables in singular statistical models and claims to prove that sample-based estimators for them are consistent and asymptotically unbiased.

read the letter

The core contribution is a definition of susceptibilities that capture how an observable responds to a small perturbation in the data, then estimators built as statistics on n samples that are asserted to be consistent and asymptotically unbiased even for singular models. This targets settings where standard asymptotic expansions fail, which is common in some machine learning models with non-identifiable parameters. If the proofs hold without extra regularity conditions, the construction could give a practical tool for response analysis in those cases. The definitions rest on standard probability and appear general for a stated class of observables. The large-n claims are the main technical step, and the paper positions them as extending linear response ideas beyond regular models. That framing is direct and the motivation is clear from the abstract. The main limitation is that the abstract states the consistency and unbiasedness results but supplies no equations, error bounds, or sketch of how singularities are handled in the derivations. Singularities often affect differentiability and limit behavior, so it is hard to judge from the given material whether the proofs avoid hidden assumptions or special cases. The stress-test note finds no internal contradiction visible at this level, which matches what is here. This is for readers working on theoretical statistics for non-regular models, especially those interested in robustness or uncertainty measures in ML. A statistician or ML theorist who needs response tools beyond classical asymptotics would get the most from it. The work is clear enough on its own terms to merit a serious referee who can check the full derivations and assumptions rather than a desk rejection.

Referee Report

2 major / 1 minor

Summary. The paper defines susceptibilities as measures of the linear response of observable quantities in parameterized statistical models (including singular ones) to perturbations of the data, for a general class of observables. It constructs corresponding estimators as statistics computed from a sequence of n data points and asserts proofs that these estimators are consistent and asymptotically unbiased in the large-n regime.

Significance. If the claimed proofs hold without hidden regularity assumptions, the work could provide a useful extension of linear response ideas to singular models where classical Fisher information degenerates. This might offer practical estimators for sensitivity analysis in settings like mixture models or overparameterized neural networks, where standard asymptotics are unavailable.

major comments (2)

[Abstract] Abstract: the central claim that consistency and asymptotic unbiasedness are proved is not supported by any displayed derivation, error bounds, or explicit treatment of how singularities are handled in the large-n limit; without these, the support for the main result cannot be assessed.
[Section on estimator definition] The weakest assumption listed (that the large-n regime introduces no additional singularities) is not checked against the estimator construction; a concrete counter-example or regularity condition on the perturbation would be needed to confirm the proofs survive.

minor comments (1)

Notation for the susceptibility and its estimator is introduced without a clear table or summary of symbols, making it hard to track across the definitions and proofs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for identifying points where additional clarity on the proofs and assumptions would strengthen the presentation. We address each major comment below and outline the revisions we intend to make.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that consistency and asymptotic unbiasedness are proved is not supported by any displayed derivation, error bounds, or explicit treatment of how singularities are handled in the large-n limit; without these, the support for the main result cannot be assessed.

Authors: The abstract is intentionally concise, as is conventional, and does not contain derivations or bounds. The full proofs of consistency and asymptotic unbiasedness appear in Sections 3 and 4, where singularities are handled by restricting to models whose Fisher information matrix satisfies a controlled degeneracy condition that remains stable under the large-n limit. To address the concern, we will revise the abstract to include a one-sentence outline of the proof strategy and an explicit reference to the sections treating the singular case. We will also add a short paragraph in the introduction summarizing the error bounds derived in the proofs. revision: yes
Referee: [Section on estimator definition] The weakest assumption listed (that the large-n regime introduces no additional singularities) is not checked against the estimator construction; a concrete counter-example or regularity condition on the perturbation would be needed to confirm the proofs survive.

Authors: We agree that the assumption requires more explicit verification against the estimator. In the revised version we will insert a dedicated paragraph immediately following the estimator definition that verifies the assumption holds under the stated perturbation class, by imposing a uniform bound on the perturbation in the sup-norm over the parameter space. This condition ensures no new singularities arise in the large-n limit. While we do not provide an exhaustive counter-example (as the condition is sufficient for the models considered), we will add a remark illustrating its necessity with a simple mixture-model example where violation leads to inconsistency. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper defines susceptibilities via linear response of observables to data perturbations in parameterized models and constructs estimators as statistics on n samples, then claims to prove consistency and asymptotic unbiasedness for large n. These steps rest on standard definitions from probability and statistics rather than reducing any claimed result to a fitted parameter, self-citation chain, or input by construction. No equations, ansatzes, or uniqueness theorems are exhibited that collapse the proofs to tautologies or prior author work. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5330 in / 891 out tokens · 52983 ms · 2026-05-11T02:36:19.007356+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We define susceptibilities as a measure of the response of an observable quantity of a parameterized statistical model to a perturbation of the data... prove that these estimators are consistent and asymptotically unbiased in the large n regime.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1.1... Covres Πemp n,β (On, ∆K n) − Covres Πpop n,β (O, ∆K) → 0 in probability

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

The loss kernel: A geometric probe for deep learning interpretability

Maxwell Adam, Zach Furman, and Jesse Hoogland. The loss kernel: A geometric probe for deep learning interpretability. arXiv:2509.26537 , 2025

work page arXiv 2025
[2]

Bochkina and Peter J

Natalia A. Bochkina and Peter J. Green. The B ernstein--von M ises theorem and nonregular models. The Annals of Statistics , 42(5):1850--1878, 2014

work page 2014
[3]

Baker, G

George Baker, George Wang, Jesse Hoogland, and Daniel Murfet. Structural inference: Interpreting small language models with susceptibilities. arXiv:2504.18274 , 2025

work page arXiv 2025
[4]

A B ayesian information criterion for singular models

Mathias Drton and Martyn Plummer. A B ayesian information criterion for singular models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 79(2):323--380, 2017

work page 2017
[5]

Ryan Giordano, Tamara Broderick, and Michael I. Jordan. Covariances, robustness, and variational B ayes. Journal of Machine Learning Research , 19(51):1--49, 2018

work page 2018
[6]

Towards spectroscopy: Susceptibility clusters in language models, 2026

Andrew Gordon, Garrett Baker, George Wang, William Snell, Stan van Wingerden, and Daniel Murfet. Towards spectroscopy: Susceptibility clusters in language models, 2026

work page 2026
[7]

Local robustness in B ayesian analysis

Paul Gustafson. Local robustness in B ayesian analysis. In David Rios Insua and Fabrizio Ruggeri, editors, Robust B ayesian Analysis , volume 152 of Lecture Notes in Statistics , pages 71--88. Springer, New York, 2000

work page 2000
[8]

Frank R. Hampel. The influence curve and its role in robust estimation. Journal of the American Statistical Association , 69(346):383--393, 1974

work page 1974
[9]

From global to local: A scalable benchmark for local posterior sampling, 2025

Rohan Hitchcock and Jesse Hoogland. From global to local: A scalable benchmark for local posterior sampling, 2025

work page 2025
[10]

Singular learning theory for deep learning interpretability

Rohan Hitchcock. Singular learning theory for deep learning interpretability . PhD thesis, University of Melbourne, 2026

work page 2026
[11]

On manifolds with corners

Dominic Joyce. On manifolds with corners. In Stanis aw Janeczko, Jun Li, and Duong H. Phong, editors, Advances in Geometric Analysis , volume 21 of Advanced Lectures in Mathematics , pages 225--258. International Press, 2012. arXiv:0910.3518

work page Pith review arXiv 2012
[12]

An Introduction to the Mathematical Theory of Inverse Problems , volume 120 of Applied Mathematical Sciences

Andreas Kirsch. An Introduction to the Mathematical Theory of Inverse Problems , volume 120 of Applied Mathematical Sciences . Springer, 3rd edition, 2021

work page 2021
[13]

Understanding black-box predictions via influence functions

Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning (ICML) , volume 70 of PMLR , pages 1885--1894, 2017

work page 2017
[14]

Andreas Kriegl and Peter W. Michor. The Convenient Setting of Global Analysis , volume 53 of Mathematical Surveys and Monographs . American Mathematical Society, 1997

work page 1997
[15]

Statistical-mechanical theory of irreversible processes

Ryogo Kubo. Statistical-mechanical theory of irreversible processes. i. general theory and simple applications to magnetic and conduction problems. Journal of the Physical Society of Japan , 12(6):570--586, 1957

work page 1957
[16]

Bayesian influence functions for hessian-free data attribution, 2025

Philipp Alexander Kreer, Wilson Wu, Maxwell Adam, Zach Furman, and Jesse Hoogland. Bayesian influence functions for hessian-free data attribution, 2025

work page 2025
[17]

Differential Analysis on Manifolds with Corners

Richard B Melrose. Differential Analysis on Manifolds with Corners . http://www-math.mit.edu/ rbm/book.html, 1996

work page 1996
[18]

Erdogdu, Richard E

Bruno Mlodozeniec, Isaac Reid, Sam Power, David Krueger, Murat A. Erdogdu, Richard E. Turner, and Roger Grosse. Distributional training data attribution: What do influence functions sample? In Advances in Neural Information Processing Systems 38 (NeurIPS) , 2025

work page 2025
[19]

On best approximate solutions of linear matrix equations

Roger Penrose. On best approximate solutions of linear matrix equations. Mathematical Proceedings of the Cambridge Philosophical Society , 52(1):17--19, 1956

work page 1956
[20]

Statistical guarantees for data-driven posterior tempering

Ruchira Ray, Marco Avella Medina, and Cynthia Rush. Statistical guarantees for data-driven posterior tempering. arXiv preprint arXiv:2601.09122 , 2026

work page arXiv 2026
[21]

Vollmer, Konstantinos C

Sebastian J. Vollmer, Konstantinos C. Zygalakis, and Yee Whye Teh. Exploration of the (Non-)A symptotic bias and variance of stochastic gradient L angevin dynamics. Journal of Machine Learning Research , 17(159):1--48, 2016

work page 2016
[22]

Algebraic Geometry and Statistical Learning Theory , volume 25 of Cambridge Monographs on Applied and Computational Mathematics

Sumio Watanabe. Algebraic Geometry and Statistical Learning Theory , volume 25 of Cambridge Monographs on Applied and Computational Mathematics . Cambridge University Press, 2009

work page 2009
[23]

A widely applicable B ayesian information criterion

Sumio Watanabe. A widely applicable B ayesian information criterion. Journal of Machine Learning Research , 14(27):867--897, 2013

work page 2013
[24]

Mathematical Theory of B ayesian Statistics

Sumio Watanabe. Mathematical Theory of B ayesian Statistics . CRC Press, 2018

work page 2018
[25]

Differentiation and specialization of attention heads via the refined local learning coefficient

George Wang, Jesse Hoogland, Stan van Wingerden, Zach Furman, and Daniel Murfet. Differentiation and specialization of attention heads via the refined local learning coefficient. In Proceedings of The 13th International Conference on Learning Representations , 2025

work page 2025
[26]

Patterning: The dual of interpretability, 2026

George Wang and Daniel Murfet. Patterning: The dual of interpretability, 2026

work page 2026
[27]

Welling and Y

M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient L angevin dynamics. In Proceedings of the 28th International Conference on Machine Learning , 2011

work page 2011

[1] [1]

The loss kernel: A geometric probe for deep learning interpretability

Maxwell Adam, Zach Furman, and Jesse Hoogland. The loss kernel: A geometric probe for deep learning interpretability. arXiv:2509.26537 , 2025

work page arXiv 2025

[2] [2]

Bochkina and Peter J

Natalia A. Bochkina and Peter J. Green. The B ernstein--von M ises theorem and nonregular models. The Annals of Statistics , 42(5):1850--1878, 2014

work page 2014

[3] [3]

Baker, G

George Baker, George Wang, Jesse Hoogland, and Daniel Murfet. Structural inference: Interpreting small language models with susceptibilities. arXiv:2504.18274 , 2025

work page arXiv 2025

[4] [4]

A B ayesian information criterion for singular models

Mathias Drton and Martyn Plummer. A B ayesian information criterion for singular models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 79(2):323--380, 2017

work page 2017

[5] [5]

Ryan Giordano, Tamara Broderick, and Michael I. Jordan. Covariances, robustness, and variational B ayes. Journal of Machine Learning Research , 19(51):1--49, 2018

work page 2018

[6] [6]

Towards spectroscopy: Susceptibility clusters in language models, 2026

Andrew Gordon, Garrett Baker, George Wang, William Snell, Stan van Wingerden, and Daniel Murfet. Towards spectroscopy: Susceptibility clusters in language models, 2026

work page 2026

[7] [7]

Local robustness in B ayesian analysis

Paul Gustafson. Local robustness in B ayesian analysis. In David Rios Insua and Fabrizio Ruggeri, editors, Robust B ayesian Analysis , volume 152 of Lecture Notes in Statistics , pages 71--88. Springer, New York, 2000

work page 2000

[8] [8]

Frank R. Hampel. The influence curve and its role in robust estimation. Journal of the American Statistical Association , 69(346):383--393, 1974

work page 1974

[9] [9]

From global to local: A scalable benchmark for local posterior sampling, 2025

Rohan Hitchcock and Jesse Hoogland. From global to local: A scalable benchmark for local posterior sampling, 2025

work page 2025

[10] [10]

Singular learning theory for deep learning interpretability

Rohan Hitchcock. Singular learning theory for deep learning interpretability . PhD thesis, University of Melbourne, 2026

work page 2026

[11] [11]

On manifolds with corners

Dominic Joyce. On manifolds with corners. In Stanis aw Janeczko, Jun Li, and Duong H. Phong, editors, Advances in Geometric Analysis , volume 21 of Advanced Lectures in Mathematics , pages 225--258. International Press, 2012. arXiv:0910.3518

work page Pith review arXiv 2012

[12] [12]

An Introduction to the Mathematical Theory of Inverse Problems , volume 120 of Applied Mathematical Sciences

Andreas Kirsch. An Introduction to the Mathematical Theory of Inverse Problems , volume 120 of Applied Mathematical Sciences . Springer, 3rd edition, 2021

work page 2021

[13] [13]

Understanding black-box predictions via influence functions

Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning (ICML) , volume 70 of PMLR , pages 1885--1894, 2017

work page 2017

[14] [14]

Andreas Kriegl and Peter W. Michor. The Convenient Setting of Global Analysis , volume 53 of Mathematical Surveys and Monographs . American Mathematical Society, 1997

work page 1997

[15] [15]

Statistical-mechanical theory of irreversible processes

Ryogo Kubo. Statistical-mechanical theory of irreversible processes. i. general theory and simple applications to magnetic and conduction problems. Journal of the Physical Society of Japan , 12(6):570--586, 1957

work page 1957

[16] [16]

Bayesian influence functions for hessian-free data attribution, 2025

Philipp Alexander Kreer, Wilson Wu, Maxwell Adam, Zach Furman, and Jesse Hoogland. Bayesian influence functions for hessian-free data attribution, 2025

work page 2025

[17] [17]

Differential Analysis on Manifolds with Corners

Richard B Melrose. Differential Analysis on Manifolds with Corners . http://www-math.mit.edu/ rbm/book.html, 1996

work page 1996

[18] [18]

Erdogdu, Richard E

Bruno Mlodozeniec, Isaac Reid, Sam Power, David Krueger, Murat A. Erdogdu, Richard E. Turner, and Roger Grosse. Distributional training data attribution: What do influence functions sample? In Advances in Neural Information Processing Systems 38 (NeurIPS) , 2025

work page 2025

[19] [19]

On best approximate solutions of linear matrix equations

Roger Penrose. On best approximate solutions of linear matrix equations. Mathematical Proceedings of the Cambridge Philosophical Society , 52(1):17--19, 1956

work page 1956

[20] [20]

Statistical guarantees for data-driven posterior tempering

Ruchira Ray, Marco Avella Medina, and Cynthia Rush. Statistical guarantees for data-driven posterior tempering. arXiv preprint arXiv:2601.09122 , 2026

work page arXiv 2026

[21] [21]

Vollmer, Konstantinos C

Sebastian J. Vollmer, Konstantinos C. Zygalakis, and Yee Whye Teh. Exploration of the (Non-)A symptotic bias and variance of stochastic gradient L angevin dynamics. Journal of Machine Learning Research , 17(159):1--48, 2016

work page 2016

[22] [22]

Algebraic Geometry and Statistical Learning Theory , volume 25 of Cambridge Monographs on Applied and Computational Mathematics

Sumio Watanabe. Algebraic Geometry and Statistical Learning Theory , volume 25 of Cambridge Monographs on Applied and Computational Mathematics . Cambridge University Press, 2009

work page 2009

[23] [23]

A widely applicable B ayesian information criterion

Sumio Watanabe. A widely applicable B ayesian information criterion. Journal of Machine Learning Research , 14(27):867--897, 2013

work page 2013

[24] [24]

Mathematical Theory of B ayesian Statistics

Sumio Watanabe. Mathematical Theory of B ayesian Statistics . CRC Press, 2018

work page 2018

[25] [25]

Differentiation and specialization of attention heads via the refined local learning coefficient

George Wang, Jesse Hoogland, Stan van Wingerden, Zach Furman, and Daniel Murfet. Differentiation and specialization of attention heads via the refined local learning coefficient. In Proceedings of The 13th International Conference on Learning Representations , 2025

work page 2025

[26] [26]

Patterning: The dual of interpretability, 2026

George Wang and Daniel Murfet. Patterning: The dual of interpretability, 2026

work page 2026

[27] [27]

Welling and Y

M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient L angevin dynamics. In Proceedings of the 28th International Conference on Machine Learning , 2011

work page 2011