Recognition: 2 theorem links
· Lean TheoremStabilised weighted data subsampling for accelerated inference in models with recursive likelihoods
Pith reviewed 2026-05-14 17:57 UTC · model grok-4.3
The pith
Stabilised weighted subsampling yields unbiased log-likelihood estimates for faster inference in recursive models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is a stabilisation framework for weighted data subsampling that produces an unbiased estimator of the log-likelihood in models with recursive likelihoods. By assigning higher inclusion probabilities to early observations, the method reduces the expected depth of recursion. Theoretical results guide hyperparameter choices that keep the decay rate inside an interval preventing both variance explosion and excessive cost. An analogous unbiased estimator is obtained for the log-likelihood gradient. When these estimators are inserted into standard inference algorithms, applications to conditional volatility models deliver substantial computational speed-ups while preserving (
What carries the argument
Stabilised weighted subsampling, which uses controlled decay of sampling probabilities to produce an unbiased log-likelihood estimator while reducing recursion depth.
If this is right
- The estimators serve as generic building blocks that can be embedded in stochastic optimisation, variational Bayes, and Markov chain Monte Carlo frameworks.
- In standard and threshold GARCH models the method produces substantial computational speed-ups while maintaining inferential accuracy.
- It outperforms uniform subsampling and compares favourably with recent stochastic gradient and divide-and-conquer MCMC methods for dependent data.
- An unbiased gradient estimator is available to support gradient-based inference.
Where Pith is reading between the lines
- The same stabilisation principle could be tested on other recursive structures such as state-space or sequential hierarchical models.
- Adaptive tuning of the decay hyperparameter might allow the method to operate in streaming or online settings without retuning.
- If the variance-cost balance holds across a wider class of dependent-data problems, routine subsampling could become standard even for moderate-sized recursive likelihoods.
- Extensions to non-time-series recursive computations remain open for direct empirical checks.
Load-bearing premise
That hyperparameter tuning can restrict the decay of sampling probabilities in a way that simultaneously controls estimator variance and computational cost without introducing bias into the likelihood estimate.
What would settle it
Applying the stabilised estimator to a large GARCH dataset and finding either estimator variance substantially higher than full-data inference or no reduction in average recursion cost would falsify the central claim.
Figures
read the original abstract
Inference for models with recursively defined likelihoods is computationally demanding, limiting scalability to large datasets. We propose a stabilised weighted subsampling methodology for accelerated inference based on an unbiased estimator of the log-likelihood. By assigning higher sampling probabilities to early observations, the method reduces the effective depth of recursive likelihood evaluations and hence expected computational cost. However, slow decay leads to frequent inclusion of late observations and high computational cost, while overly aggressive decay can substantially inflate estimator variance. We develop a stabilisation framework, underpinned by theoretical results, that restricts the decay of the sampling probabilities to avoid both variance and computational pathologies through principled hyperparameter tuning. We further consider an unbiased subsampling estimator of the log-likelihood gradient, enabling gradient-based inference. The proposed estimators are generic building blocks for subsampling-based inference and can be embedded within frameworks including stochastic optimisation, variational Bayes, and Markov chain Monte Carlo. Applications to conditional volatility models, including standard and threshold generalised autoregressive conditional heteroskedasticity models, demonstrate substantial computational speed-ups while maintaining inferential accuracy. The proposed approach outperforms uniform subsampling and compares favourably with recent stochastic gradient and divide-and-conquer MCMC methods for dependent data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a stabilised weighted subsampling methodology for accelerated inference in models with recursively defined likelihoods. It constructs an unbiased estimator of the log-likelihood by assigning higher sampling probabilities to early observations, thereby reducing the expected depth of recursive evaluations and computational cost. A stabilisation framework, supported by theoretical results, controls the decay rate of these probabilities through hyperparameter tuning to avoid both high variance and high cost pathologies. The approach is extended to an unbiased estimator of the log-likelihood gradient. The estimators are presented as generic building blocks embeddable in stochastic optimisation, variational Bayes, and MCMC. Applications to standard and threshold GARCH models are used to demonstrate substantial speed-ups while preserving inferential accuracy, with favourable comparisons to uniform subsampling and existing stochastic gradient/divide-and-conquer methods for dependent data.
Significance. If the unbiasedness and stabilisation claims hold, the work offers a principled route to scalable inference for large time-series datasets with recursive likelihood structures, particularly volatility models. The generic framing and embedding potential across multiple inference frameworks constitute a clear strength, as does the explicit handling of the variance-cost trade-off via hyperparameter control. The empirical demonstrations on GARCH-type models provide concrete evidence of practical utility, though the significance ultimately hinges on the verifiability of the supporting theory.
major comments (2)
- [Abstract and §3] Abstract and §3: The central claim that the weighted subsampling estimator remains unbiased while the stabilisation framework restricts probability decay without introducing bias is load-bearing, yet the provided description does not include the explicit expectation calculation or the theorem establishing that the hyperparameter-tuned decay preserves unbiasedness; this must be shown in detail to support the variance-control guarantee.
- [§4.2] §4.2 (stabilisation framework): The assertion that principled hyperparameter tuning simultaneously avoids high estimator variance and high computational cost is not accompanied by a concrete bound or sensitivity analysis linking the decay rate to the recursive depth; without this, the speed-accuracy trade-off claim for GARCH applications rests on unverified assumptions.
minor comments (2)
- [Abstract] The abstract mentions applications to conditional volatility models but does not specify the exact dataset sizes or number of replications used in the timing and accuracy comparisons; these details should be added for reproducibility.
- [§2] Notation for the sampling probabilities p_i and the stabilisation hyperparameter should be introduced with a clear definition before the theoretical results are stated.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive comments on our manuscript. We address each major comment below and will revise the manuscript to provide the requested clarifications and additional analysis, thereby strengthening the presentation of the theoretical results.
read point-by-point responses
-
Referee: [Abstract and §3] The central claim that the weighted subsampling estimator remains unbiased while the stabilisation framework restricts probability decay without introducing bias is load-bearing, yet the provided description does not include the explicit expectation calculation or the theorem establishing that the hyperparameter-tuned decay preserves unbiasedness; this must be shown in detail to support the variance-control guarantee.
Authors: We appreciate the referee drawing attention to this foundational aspect. The unbiasedness of the weighted subsampling estimator is established explicitly in Theorem 1 of Section 3, where the expectation is computed directly from the sampling probabilities to equal the true log-likelihood. The stabilisation framework restricts decay rates through hyperparameters but does not alter this expectation, as the tuning operates on the probability schedule without introducing bias. To address the concern, we will expand Section 3 with the full step-by-step expectation derivation and proof that stabilisation preserves unbiasedness, and we will revise the abstract to reference Theorem 1 explicitly. revision: yes
-
Referee: [§4.2] The assertion that principled hyperparameter tuning simultaneously avoids high estimator variance and high computational cost is not accompanied by a concrete bound or sensitivity analysis linking the decay rate to the recursive depth; without this, the speed-accuracy trade-off claim for GARCH applications rests on unverified assumptions.
Authors: We agree that an explicit bound and sensitivity analysis would strengthen the claims. Section 4.2 currently provides theoretical variance bounds in terms of the decay parameter and illustrates the hyperparameter-controlled trade-off, but we acknowledge that a direct sensitivity analysis connecting decay rate to expected recursive depth is not fully elaborated. In the revision we will add this analysis, deriving a concrete bound on expected recursion depth as a function of the stabilisation hyperparameters and including a sensitivity study for the GARCH models to confirm the speed-accuracy trade-off. revision: yes
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper presents a stabilised weighted subsampling method built on an unbiased log-likelihood estimator, with a stabilisation framework derived from theoretical results controlling sampling-probability decay via hyperparameter tuning. No steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the central claims rest on external theoretical grounding and are applied to GARCH models without evident internal reduction to the paper's own equations or prior self-references as the sole justification. The derivation remains self-contained against the stated assumptions.
Axiom & Free-Parameter Ledger
free parameters (1)
- stabilisation hyperparameter controlling decay rate
axioms (1)
- domain assumption The weighted subsampling estimator remains unbiased for the log-likelihood when probabilities are stabilised.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We develop a stabilisation framework... truncated power-law decaying (TPD) weights... tail floor fraction c... E(u_max)≤t⋆+(T−t⋆)(1−(1−ε)m)
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Lemma 3... variance... O(T^{λ−2}) for power-law... exponential growth for ED
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Ai, M., Yu, J., Zhang, H., and Wang, H. (2021). Optimal subsampling algorithms for big data regressions. Statistica Sinica , 31(2):749--772
work page 2021
-
[2]
Aicher, C., Putcha, S., Nemeth, C., Fearnhead, P., and Fox, E. (2025). Stochastic gradient MCMC for nonlinear state space models. Bayesian Analysis , 20(1):83 -- 105
work page 2025
-
[3]
Amari, S.-i. (1998). Natural gradient works efficiently in learning. Neural Computation , 10(2):251--276
work page 1998
-
[4]
Bardenet, R., Doucet, A., and Holmes, C. (2014). Towards scaling up M arkov chain M onte C arlo: A n adaptive subsampling approach. Proceedings of the 31st International Conference on Machine Learning , pages 405--413
work page 2014
-
[5]
Bardenet, R., Doucet, A., and Holmes, C. (2017). On M arkov chain M onte C arlo methods for tall data. Journal of Machine Learning Research , 18(47):1--43
work page 2017
-
[6]
Bauwens, L. and Lubrano, M. (1998). Bayesian inference on GARCH models using the G ibbs sampler. The Econometrics Journal , 1:C23--C46
work page 1998
-
[7]
Baydin, A. G., Pearlmutter, B. A., Radul, A. A., and Siskind, J. M. (2018). Automatic differentiation in machine learning: A survey. Journal of Machine Learning Research , 18(153):1--43
work page 2018
-
[8]
M., Kucukelbir, A., and McAuliffe, J
Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association , 112(518):859--877
work page 2017
-
[9]
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics , 31(3):307--327
work page 1986
-
[10]
Chen, C.-F. (1985). On asymptotic normality of limiting density functions with B ayesian implications. Journal of the Royal Statistical Society Series B: Statistical Methodology , 47(3):540--546
work page 1985
-
[11]
Chen, T., Fox, E., and Guestrin, C. (2014). Stochastic gradient H amiltonian M onte C arlo. In International Conference on Machine Learning , pages 1683--1691. PMLR
work page 2014
-
[12]
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of U nited K ingdom inflation. Econometrica , 50(4):987--1007
work page 1982
-
[13]
Fiorentini, G., Calzolari, G., and Panattoni, L. (1996). Analytic derivatives and the computation of GARCH estimates. Journal of Applied Econometrics , 11(4):399--417
work page 1996
-
[14]
R., Jagannathan, R., and Runkle, D
Glosten, L. R., Jagannathan, R., and Runkle, D. E. (1993). On the relation between the expected value and the volatility of the nominal excess return on stocks. The Journal of Finance , 48(5):1779--1801
work page 1993
-
[15]
Gunawan, D., Tran, M.-N., and Kohn, R. (2017). Fast inference for intractable likelihood problems using variational B ayes. arXiv preprint arXiv:1705.06679
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[16]
Haario, H., Saksman, E., and Tamminen, J. (2001). An adaptive M etropolis algorithm . Bernoulli , 7(2):223 -- 242
work page 2001
-
[17]
Hansen, P. R. and Lunde, A. (2005). A forecast comparison of volatility models: D oes anything beat a GARCH ( 1 , 1 )? Journal of Applied Econometrics , 20(7):873--889
work page 2005
-
[18]
Huang, D., Wang, H., and Yao, Q. (2008). Estimating GARCH models: W hen to use what? The Econometrics Journal , 11(1):27--38
work page 2008
-
[19]
Kingma, D. P. and Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations
work page 2015
-
[20]
Kingma, D. P. and Welling, M. (2014). Auto-encoding variational B ayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR) 2014
work page 2014
-
[21]
Korattikara, A., Chen, Y., and Welling, M. (2014). Austerity in MCMC land: C utting the M etropolis- H astings budget. Proceedings of the 31st International Conference on Machine Learning , pages 181--189
work page 2014
-
[22]
Li, D., Clements, A., and Drovandi, C. (2021). Efficient B ayesian estimation for GARCH -type models via sequential M onte C arlo. Econometrics and Statistics , 19:22--46
work page 2021
-
[23]
Ma, Y.-A., Foti, N. J., and Fox, E. B. (2017). Stochastic gradient MCMC methods for hidden M arkov models. In International Conference on Machine Learning , pages 2265--2274. PMLR
work page 2017
-
[24]
Magris, M. and Iosifidis, A. (2023). Variational inference for GARCH -family models. In Proceedings of the Fourth ACM International Conference on AI in Finance , pages 541--548
work page 2023
-
[25]
Martens, J. (2020). New insights and perspectives on the natural gradient method. Journal of Machine Learning Research , 21(146):1--76
work page 2020
-
[26]
Mikosch, T. and Straumann, D. (2002). Whittle estimation in a heavy-tailed GARCH (1,1) model. Stochastic Processes and their Applications , 100:187--222
work page 2002
-
[27]
Nemeth, C. and Fearnhead, P. (2021). Stochastic gradient M arkov chain M onte C arlo. Journal of the American Statistical Association , 116(533):433--450
work page 2021
-
[28]
Ong, V. M.-H., Nott, D. J., and Smith, M. S. (2018). Gaussian variational approximation with a factor covariance structure. Journal of Computational and Graphical Statistics , 27(3):465--478
work page 2018
- [29]
-
[30]
Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization , 30(4):838--855
work page 1992
-
[31]
Quiroz, M., Kohn, R., Villani, M., and Tran, M.-N. (2019). Speeding up MCMC by efficient data subsampling. Journal of the American Statistical Association , 114(526):831--843
work page 2019
-
[32]
Quiroz, M., Nott, D. J., and Kohn, R. (2023). G aussian variational approximations for high-dimensional state space models. Bayesian Analysis , 18(3):989 -- 1016
work page 2023
-
[33]
Robbins, H. and Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics , 22(3):400--407
work page 1951
-
[34]
Salomone, R., Quiroz, M., Kohn, R., Villani, M., and Tran, M.-N. (2020). Spectral subsampling MCMC for stationary time series. In International Conference on Machine Learning , pages 8449--8458. PMLR
work page 2020
-
[35]
S \"a rndal, C.-E., Swensson, B., and Wretman, J. (2003). Model Assisted Survey Sampling . Springer Science & Business Media
work page 2003
-
[36]
Titsias, M. and Lázaro-Gredilla, M. (2014). Doubly stochastic variational B ayes for non-conjugate inference. In Proceedings of the 31st International Conference on Machine Learning , volume 32, pages 1971--1979
work page 2014
-
[37]
Villani, M., Quiroz, M., Kohn, R., and Salomone, R. (2024). Spectral subsampling MCMC for stationary multivariate time series with applications to vector ARTFIMA processes. Econometrics and Statistics , 32:98--121
work page 2024
-
[38]
Wales, D. J. and Doye, J. P. (1997). Global optimization by basin-hopping and the lowest energy structures of L ennard- J ones clusters containing up to 110 atoms. The Journal of Physical Chemistry A , 101(28):5111--5116
work page 1997
-
[39]
Wang, H., Zhu, R., and Ma, P. (2018). Optimal subsampling for large sample logistic regression. Journal of the American Statistical Association , 113(522):829--844
work page 2018
-
[40]
Whittle, P. (1953). Estimation and information in stationary time series. Arkiv f \"o r Matematik , 2(5):423--434
work page 1953
-
[41]
Winker, P. and Maringer, D. (2006). The convergence of optimization based GARCH estimators: T heory and application. In Rizzi, A. and Vichi, M., editors, Compstat 2006 - Proceedings in Computational Statistics , pages 483--494
work page 2006
-
[42]
Xu, M., Quiroz, M., Kohn, R., and Sisson, S. A. (2019). Variance reduction properties of the reparameterization trick. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics , volume 89 of Proceedings of Machine Learning Research , pages 2711--2720. PMLR
work page 2019
-
[43]
Xuan, H., Maestrini, L., Chen, F., and Grazian, C. (2024). Stochastic variational inference for GARCH models. Statistics and Computing , 34(1):45
work page 2024
-
[44]
Yao, Y. and Wang, H. (2021). A review on optimal subsampling methods for massive datasets. Journal of Data Science , 19(1):151--172
work page 2021
-
[45]
Zakoian, J.-M. (1994). Threshold heteroskedastic models. Journal of Economic Dynamics and Control , 18(5):931--955
work page 1994
-
[46]
Ceperley, D. and Dewing, M. (1999). The penalty method for random walks with uncertain energies. The Journal of Chemical Physics , 110(20):9812--9820
work page 1999
-
[47]
Chopin, N. (2002). A sequential particle filter method for static models. Biometrika , 89(3):539--552
work page 2002
-
[48]
Dang, K.-D., Quiroz, M., Kohn, R., Tran, M.-N., and Villani, M. (2019). Hamiltonian M onte C arlo with energy conserving subsampling. Journal of Machine Learning Research , 20(100):1--31
work page 2019
-
[49]
K., Deligiannidis, G., and Kohn, R
Doucet, A., Pitt, M. K., Deligiannidis, G., and Kohn, R. (2015). Efficient implementation of M arkov chain M onte C arlo when using an unbiased likelihood estimator. Biometrika , 102:295--313
work page 2015
-
[50]
Gorham, J. and Mackey, L. (2017). Measuring sample quality with kernels. In International Conference on Machine Learning , pages 1292--1301. PMLR
work page 2017
-
[51]
Gunawan, D., Dang, K.-D., Quiroz, M., Kohn, R., and Tran, M.-N. (2020). Subsampling sequential M onte C arlo for static B ayesian models. Statistics and Computing , 30(6):1741--1758
work page 2020
-
[52]
No free lunch for approximate mcmc
Johndrow, J. E., Pillai, N. S., and Smith, A. (2020). No free lunch for approximate MCMC . arXiv preprint arXiv:2010.12514
-
[53]
Knuth, D. E. (1993). Johann F aulhaber and sums of powers. Mathematics of Computation , 61(203):277--294
work page 1993
-
[54]
Magnus, J. R. and Neudecker, H. (2019). Matrix Differential Calculus with Applications in Statistics and Econometrics . John Wiley & Sons
work page 2019
-
[55]
K., dos Santos Silva, R., Giordani, P., and Kohn, R
Pitt, M. K., dos Santos Silva, R., Giordani, P., and Kohn, R. (2012). On some properties of M arkov chain M onte C arlo simulation methods based on the particle filter. Journal of Econometrics , 171(2):134--151
work page 2012
-
[56]
Poyiadjis, G., Doucet, A., and Singh, S. S. (2011). Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrika , 98(1):65--80
work page 2011
- [57]
-
[58]
Quiroz, M. and Tran, M.-N. (2023). Bayesian Analysis of Big Data via Subsampling M arkov Chain M onte C arlo , pages 1--6. John Wiley & Sons, Ltd
work page 2023
-
[59]
Quiroz, M., Tran, M.-N., Villani, M., and Kohn, R. (2018a). Speeding up MCMC by delayed acceptance and data subsampling. Journal of Computational and Graphical Statistics , 27(1):12--22
-
[60]
Quiroz, M., Villani, M., Kohn, R., Tran, M.-N., and Dang, K.-D. (2018b). Subsampling MCMC - A n introduction for the survey statistician. Sankhya A , 80:33--69
-
[61]
Rudolf, D., Smith, A., and Quiroz, M. (2026). Perturbations of M arkov chains. In Craiu, R. V., Vats, D., Jones, G. L., Brooks, S., Gelman, A., and Meng, X.-L., editors, Handbook of Markov Chain Monte Carlo , pages 527--568. Chapman and Hall/CRC, 2 edition
work page 2026
-
[62]
Scott, S. L. (2017). Comparing consensus M onte C arlo strategies for distributed B ayesian computation. Brazilian Journal of Probability and Statistics , 31(4):668--685
work page 2017
-
[63]
Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I., and McCulloch, R. E. (2022). Bayes and big data: T he consensus M onte C arlo algorithm. In Big Data and Information Theory , pages 8--18. Routledge
work page 2022
-
[64]
Cmd S tan R : T he R interface to C md S tan
Stan Development Team (2026). Cmd S tan R : T he R interface to C md S tan. R package
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.