pith. machine review for the scientific record. sign in

arxiv: 2605.13397 · v1 · submitted 2026-05-13 · 📊 stat.ME · stat.CO

Recognition: 2 theorem links

· Lean Theorem

Stabilised weighted data subsampling for accelerated inference in models with recursive likelihoods

Authors on Pith no claims yet

Pith reviewed 2026-05-14 17:57 UTC · model grok-4.3

classification 📊 stat.ME stat.CO
keywords weighted subsamplingrecursive likelihoodunbiased estimatorGARCH modelsstochastic optimisationMarkov chain Monte Carlovariational Bayescomputational statistics
0
0 comments X

The pith

Stabilised weighted subsampling yields unbiased log-likelihood estimates for faster inference in recursive models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to accelerate inference when likelihoods are defined recursively, as in many time-series models. It subsamples observations with weights that give higher probability to early data points, thereby shortening the average length of recursive computations while keeping the log-likelihood estimator unbiased. A stabilisation rule, supported by theory, limits how quickly these probabilities can decay so that estimator variance stays controlled without driving computation back up. The same construction supplies an unbiased gradient estimator, allowing the subsampling step to plug directly into optimisation, variational Bayes, or MCMC routines. Demonstrations on GARCH and threshold GARCH models show clear reductions in run time with no loss in the accuracy of the recovered parameters.

Core claim

The central discovery is a stabilisation framework for weighted data subsampling that produces an unbiased estimator of the log-likelihood in models with recursive likelihoods. By assigning higher inclusion probabilities to early observations, the method reduces the expected depth of recursion. Theoretical results guide hyperparameter choices that keep the decay rate inside an interval preventing both variance explosion and excessive cost. An analogous unbiased estimator is obtained for the log-likelihood gradient. When these estimators are inserted into standard inference algorithms, applications to conditional volatility models deliver substantial computational speed-ups while preserving (

What carries the argument

Stabilised weighted subsampling, which uses controlled decay of sampling probabilities to produce an unbiased log-likelihood estimator while reducing recursion depth.

If this is right

  • The estimators serve as generic building blocks that can be embedded in stochastic optimisation, variational Bayes, and Markov chain Monte Carlo frameworks.
  • In standard and threshold GARCH models the method produces substantial computational speed-ups while maintaining inferential accuracy.
  • It outperforms uniform subsampling and compares favourably with recent stochastic gradient and divide-and-conquer MCMC methods for dependent data.
  • An unbiased gradient estimator is available to support gradient-based inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same stabilisation principle could be tested on other recursive structures such as state-space or sequential hierarchical models.
  • Adaptive tuning of the decay hyperparameter might allow the method to operate in streaming or online settings without retuning.
  • If the variance-cost balance holds across a wider class of dependent-data problems, routine subsampling could become standard even for moderate-sized recursive likelihoods.
  • Extensions to non-time-series recursive computations remain open for direct empirical checks.

Load-bearing premise

That hyperparameter tuning can restrict the decay of sampling probabilities in a way that simultaneously controls estimator variance and computational cost without introducing bias into the likelihood estimate.

What would settle it

Applying the stabilised estimator to a large GARCH dataset and finding either estimator variance substantially higher than full-data inference or no reduction in average recursion cost would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.13397 by Aishwarya Bhaskaran, Matias Quiroz, Thomas Goodwin, Zixuan Wang.

Figure 1
Figure 1. Figure 1: Truncated power-law decaying (TPD) sampling probabilities in (2.13) with [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: E(umax) in Lemma 8(i) as a function of m (both normalised by T) for T = 10,000 (left panel) and T = 100,000 (right panel) with the hyperparameters t ⋆ = 1,000 and b = 100 taking the same values as in the applications. The figure shows the results for several ε obtained using different c; see the legend. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: E(umax) in Lemma 8(i) (normalised by T, with T = 100,000) as a function of c and m (where m is treated as continuous for visualisation), shown for three values of t ⋆ (see panel titles), with b = 100. The values of T and b correspond to those in the applications, as does t ⋆ in the middle panel. The colour scale (right) represents the expected computational cost ratio, where red indicates no compute saving… view at source ↗
Figure 4
Figure 4. Figure 4: Observations yt for the Dow Jones 30 index, constructed as one-minute log-returns over the period 2023-05-15 to 2024-06-28 (T = 100,000) and rescaled to have unit sample standard deviation. The figure shows that the log-returns can exhibit deviations of 30-40 unconditional stan￾27 [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Constrained objective function E(umax) defined in (2.20) (normalised by T, with T = 100,000) and described in Section 2.7, shown as a function of c and m (where m is treated as continuous for visualisation). The black curve marks the variance constraint, and the dashed vertical line marks the safeguard lower bound constraint. Results are shown for three choices of Rmax = 1/cmin (see panel titles), with b =… view at source ↗
Figure 6
Figure 6. Figure 6: Results for the GARCH(1, 1) model with normal errors fitted to the Dow Jones data. The figure shows boxplots of the evidence lower bound (ELBO) differences ∆ELBO defined in (4.5), where positive values indicate ELBO values higher than the median full-data VB ELBO, which is marked with a dashed horizontal line. Subsampling variational Bayes (VB) results are shown for different sampling schemes, with full-da… view at source ↗
Figure 7
Figure 7. Figure 7: Posterior marginal distributions under the original parameterisation [PITH_FULL_IMAGE:figures/full_fig_p035_7.png] view at source ↗
read the original abstract

Inference for models with recursively defined likelihoods is computationally demanding, limiting scalability to large datasets. We propose a stabilised weighted subsampling methodology for accelerated inference based on an unbiased estimator of the log-likelihood. By assigning higher sampling probabilities to early observations, the method reduces the effective depth of recursive likelihood evaluations and hence expected computational cost. However, slow decay leads to frequent inclusion of late observations and high computational cost, while overly aggressive decay can substantially inflate estimator variance. We develop a stabilisation framework, underpinned by theoretical results, that restricts the decay of the sampling probabilities to avoid both variance and computational pathologies through principled hyperparameter tuning. We further consider an unbiased subsampling estimator of the log-likelihood gradient, enabling gradient-based inference. The proposed estimators are generic building blocks for subsampling-based inference and can be embedded within frameworks including stochastic optimisation, variational Bayes, and Markov chain Monte Carlo. Applications to conditional volatility models, including standard and threshold generalised autoregressive conditional heteroskedasticity models, demonstrate substantial computational speed-ups while maintaining inferential accuracy. The proposed approach outperforms uniform subsampling and compares favourably with recent stochastic gradient and divide-and-conquer MCMC methods for dependent data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a stabilised weighted subsampling methodology for accelerated inference in models with recursively defined likelihoods. It constructs an unbiased estimator of the log-likelihood by assigning higher sampling probabilities to early observations, thereby reducing the expected depth of recursive evaluations and computational cost. A stabilisation framework, supported by theoretical results, controls the decay rate of these probabilities through hyperparameter tuning to avoid both high variance and high cost pathologies. The approach is extended to an unbiased estimator of the log-likelihood gradient. The estimators are presented as generic building blocks embeddable in stochastic optimisation, variational Bayes, and MCMC. Applications to standard and threshold GARCH models are used to demonstrate substantial speed-ups while preserving inferential accuracy, with favourable comparisons to uniform subsampling and existing stochastic gradient/divide-and-conquer methods for dependent data.

Significance. If the unbiasedness and stabilisation claims hold, the work offers a principled route to scalable inference for large time-series datasets with recursive likelihood structures, particularly volatility models. The generic framing and embedding potential across multiple inference frameworks constitute a clear strength, as does the explicit handling of the variance-cost trade-off via hyperparameter control. The empirical demonstrations on GARCH-type models provide concrete evidence of practical utility, though the significance ultimately hinges on the verifiability of the supporting theory.

major comments (2)
  1. [Abstract and §3] Abstract and §3: The central claim that the weighted subsampling estimator remains unbiased while the stabilisation framework restricts probability decay without introducing bias is load-bearing, yet the provided description does not include the explicit expectation calculation or the theorem establishing that the hyperparameter-tuned decay preserves unbiasedness; this must be shown in detail to support the variance-control guarantee.
  2. [§4.2] §4.2 (stabilisation framework): The assertion that principled hyperparameter tuning simultaneously avoids high estimator variance and high computational cost is not accompanied by a concrete bound or sensitivity analysis linking the decay rate to the recursive depth; without this, the speed-accuracy trade-off claim for GARCH applications rests on unverified assumptions.
minor comments (2)
  1. [Abstract] The abstract mentions applications to conditional volatility models but does not specify the exact dataset sizes or number of replications used in the timing and accuracy comparisons; these details should be added for reproducibility.
  2. [§2] Notation for the sampling probabilities p_i and the stabilisation hyperparameter should be introduced with a clear definition before the theoretical results are stated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive comments on our manuscript. We address each major comment below and will revise the manuscript to provide the requested clarifications and additional analysis, thereby strengthening the presentation of the theoretical results.

read point-by-point responses
  1. Referee: [Abstract and §3] The central claim that the weighted subsampling estimator remains unbiased while the stabilisation framework restricts probability decay without introducing bias is load-bearing, yet the provided description does not include the explicit expectation calculation or the theorem establishing that the hyperparameter-tuned decay preserves unbiasedness; this must be shown in detail to support the variance-control guarantee.

    Authors: We appreciate the referee drawing attention to this foundational aspect. The unbiasedness of the weighted subsampling estimator is established explicitly in Theorem 1 of Section 3, where the expectation is computed directly from the sampling probabilities to equal the true log-likelihood. The stabilisation framework restricts decay rates through hyperparameters but does not alter this expectation, as the tuning operates on the probability schedule without introducing bias. To address the concern, we will expand Section 3 with the full step-by-step expectation derivation and proof that stabilisation preserves unbiasedness, and we will revise the abstract to reference Theorem 1 explicitly. revision: yes

  2. Referee: [§4.2] The assertion that principled hyperparameter tuning simultaneously avoids high estimator variance and high computational cost is not accompanied by a concrete bound or sensitivity analysis linking the decay rate to the recursive depth; without this, the speed-accuracy trade-off claim for GARCH applications rests on unverified assumptions.

    Authors: We agree that an explicit bound and sensitivity analysis would strengthen the claims. Section 4.2 currently provides theoretical variance bounds in terms of the decay parameter and illustrates the hyperparameter-controlled trade-off, but we acknowledge that a direct sensitivity analysis connecting decay rate to expected recursive depth is not fully elaborated. In the revision we will add this analysis, deriving a concrete bound on expected recursion depth as a function of the stabilisation hyperparameters and including a sensitivity study for the GARCH models to confirm the speed-accuracy trade-off. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper presents a stabilised weighted subsampling method built on an unbiased log-likelihood estimator, with a stabilisation framework derived from theoretical results controlling sampling-probability decay via hyperparameter tuning. No steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the central claims rest on external theoretical grounding and are applied to GARCH models without evident internal reduction to the paper's own equations or prior self-references as the sole justification. The derivation remains self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of an unbiased weighted estimator whose variance can be controlled by restricting probability decay, plus the assumption that hyperparameter tuning can be done in a principled way without post-hoc adjustments.

free parameters (1)
  • stabilisation hyperparameter controlling decay rate
    Chosen to restrict sampling-probability decay and thereby avoid variance and cost pathologies.
axioms (1)
  • domain assumption The weighted subsampling estimator remains unbiased for the log-likelihood when probabilities are stabilised.
    Invoked to justify the estimator as a drop-in replacement for full-data likelihood.

pith-pipeline@v0.9.0 · 5513 in / 1216 out tokens · 32255 ms · 2026-05-14T17:57:21.911396+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 1 internal anchor

  1. [1]

    Ai, M., Yu, J., Zhang, H., and Wang, H. (2021). Optimal subsampling algorithms for big data regressions. Statistica Sinica , 31(2):749--772

  2. [2]

    Aicher, C., Putcha, S., Nemeth, C., Fearnhead, P., and Fox, E. (2025). Stochastic gradient MCMC for nonlinear state space models. Bayesian Analysis , 20(1):83 -- 105

  3. [3]

    Amari, S.-i. (1998). Natural gradient works efficiently in learning. Neural Computation , 10(2):251--276

  4. [4]

    Bardenet, R., Doucet, A., and Holmes, C. (2014). Towards scaling up M arkov chain M onte C arlo: A n adaptive subsampling approach. Proceedings of the 31st International Conference on Machine Learning , pages 405--413

  5. [5]

    Bardenet, R., Doucet, A., and Holmes, C. (2017). On M arkov chain M onte C arlo methods for tall data. Journal of Machine Learning Research , 18(47):1--43

  6. [6]

    and Lubrano, M

    Bauwens, L. and Lubrano, M. (1998). Bayesian inference on GARCH models using the G ibbs sampler. The Econometrics Journal , 1:C23--C46

  7. [7]

    G., Pearlmutter, B

    Baydin, A. G., Pearlmutter, B. A., Radul, A. A., and Siskind, J. M. (2018). Automatic differentiation in machine learning: A survey. Journal of Machine Learning Research , 18(153):1--43

  8. [8]

    M., Kucukelbir, A., and McAuliffe, J

    Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association , 112(518):859--877

  9. [9]

    Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics , 31(3):307--327

  10. [10]

    Chen, C.-F. (1985). On asymptotic normality of limiting density functions with B ayesian implications. Journal of the Royal Statistical Society Series B: Statistical Methodology , 47(3):540--546

  11. [11]

    Chen, T., Fox, E., and Guestrin, C. (2014). Stochastic gradient H amiltonian M onte C arlo. In International Conference on Machine Learning , pages 1683--1691. PMLR

  12. [12]

    Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of U nited K ingdom inflation. Econometrica , 50(4):987--1007

  13. [13]

    Fiorentini, G., Calzolari, G., and Panattoni, L. (1996). Analytic derivatives and the computation of GARCH estimates. Journal of Applied Econometrics , 11(4):399--417

  14. [14]

    R., Jagannathan, R., and Runkle, D

    Glosten, L. R., Jagannathan, R., and Runkle, D. E. (1993). On the relation between the expected value and the volatility of the nominal excess return on stocks. The Journal of Finance , 48(5):1779--1801

  15. [15]

    Gunawan, D., Tran, M.-N., and Kohn, R. (2017). Fast inference for intractable likelihood problems using variational B ayes. arXiv preprint arXiv:1705.06679

  16. [16]

    Haario, H., Saksman, E., and Tamminen, J. (2001). An adaptive M etropolis algorithm . Bernoulli , 7(2):223 -- 242

  17. [17]

    Hansen, P. R. and Lunde, A. (2005). A forecast comparison of volatility models: D oes anything beat a GARCH ( 1 , 1 )? Journal of Applied Econometrics , 20(7):873--889

  18. [18]

    Huang, D., Wang, H., and Yao, Q. (2008). Estimating GARCH models: W hen to use what? The Econometrics Journal , 11(1):27--38

  19. [19]

    Kingma, D. P. and Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations

  20. [20]

    Kingma, D. P. and Welling, M. (2014). Auto-encoding variational B ayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR) 2014

  21. [21]

    Korattikara, A., Chen, Y., and Welling, M. (2014). Austerity in MCMC land: C utting the M etropolis- H astings budget. Proceedings of the 31st International Conference on Machine Learning , pages 181--189

  22. [22]

    Li, D., Clements, A., and Drovandi, C. (2021). Efficient B ayesian estimation for GARCH -type models via sequential M onte C arlo. Econometrics and Statistics , 19:22--46

  23. [23]

    J., and Fox, E

    Ma, Y.-A., Foti, N. J., and Fox, E. B. (2017). Stochastic gradient MCMC methods for hidden M arkov models. In International Conference on Machine Learning , pages 2265--2274. PMLR

  24. [24]

    and Iosifidis, A

    Magris, M. and Iosifidis, A. (2023). Variational inference for GARCH -family models. In Proceedings of the Fourth ACM International Conference on AI in Finance , pages 541--548

  25. [25]

    Martens, J. (2020). New insights and perspectives on the natural gradient method. Journal of Machine Learning Research , 21(146):1--76

  26. [26]

    and Straumann, D

    Mikosch, T. and Straumann, D. (2002). Whittle estimation in a heavy-tailed GARCH (1,1) model. Stochastic Processes and their Applications , 100:187--222

  27. [27]

    and Fearnhead, P

    Nemeth, C. and Fearnhead, P. (2021). Stochastic gradient M arkov chain M onte C arlo. Journal of the American Statistical Association , 116(533):433--450

  28. [28]

    M.-H., Nott, D

    Ong, V. M.-H., Nott, D. J., and Smith, M. S. (2018). Gaussian variational approximation with a factor covariance structure. Journal of Computational and Graphical Statistics , 27(3):465--478

  29. [29]

    Ou, R., Astfalck, L., Sen, D., and Dunson, D. (2025). Scalable B ayesian inference for time series via divide-and-conquer. arXiv preprint arXiv:2106.11043v4

  30. [30]

    Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization , 30(4):838--855

  31. [31]

    Quiroz, M., Kohn, R., Villani, M., and Tran, M.-N. (2019). Speeding up MCMC by efficient data subsampling. Journal of the American Statistical Association , 114(526):831--843

  32. [32]

    J., and Kohn, R

    Quiroz, M., Nott, D. J., and Kohn, R. (2023). G aussian variational approximations for high-dimensional state space models. Bayesian Analysis , 18(3):989 -- 1016

  33. [33]

    and Monro, S

    Robbins, H. and Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics , 22(3):400--407

  34. [34]

    Salomone, R., Quiroz, M., Kohn, R., Villani, M., and Tran, M.-N. (2020). Spectral subsampling MCMC for stationary time series. In International Conference on Machine Learning , pages 8449--8458. PMLR

  35. [35]

    S \"a rndal, C.-E., Swensson, B., and Wretman, J. (2003). Model Assisted Survey Sampling . Springer Science & Business Media

  36. [36]

    and Lázaro-Gredilla, M

    Titsias, M. and Lázaro-Gredilla, M. (2014). Doubly stochastic variational B ayes for non-conjugate inference. In Proceedings of the 31st International Conference on Machine Learning , volume 32, pages 1971--1979

  37. [37]

    Villani, M., Quiroz, M., Kohn, R., and Salomone, R. (2024). Spectral subsampling MCMC for stationary multivariate time series with applications to vector ARTFIMA processes. Econometrics and Statistics , 32:98--121

  38. [38]

    Wales, D. J. and Doye, J. P. (1997). Global optimization by basin-hopping and the lowest energy structures of L ennard- J ones clusters containing up to 110 atoms. The Journal of Physical Chemistry A , 101(28):5111--5116

  39. [39]

    Wang, H., Zhu, R., and Ma, P. (2018). Optimal subsampling for large sample logistic regression. Journal of the American Statistical Association , 113(522):829--844

  40. [40]

    Whittle, P. (1953). Estimation and information in stationary time series. Arkiv f \"o r Matematik , 2(5):423--434

  41. [41]

    and Maringer, D

    Winker, P. and Maringer, D. (2006). The convergence of optimization based GARCH estimators: T heory and application. In Rizzi, A. and Vichi, M., editors, Compstat 2006 - Proceedings in Computational Statistics , pages 483--494

  42. [42]

    Xu, M., Quiroz, M., Kohn, R., and Sisson, S. A. (2019). Variance reduction properties of the reparameterization trick. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics , volume 89 of Proceedings of Machine Learning Research , pages 2711--2720. PMLR

  43. [43]

    Xuan, H., Maestrini, L., Chen, F., and Grazian, C. (2024). Stochastic variational inference for GARCH models. Statistics and Computing , 34(1):45

  44. [44]

    and Wang, H

    Yao, Y. and Wang, H. (2021). A review on optimal subsampling methods for massive datasets. Journal of Data Science , 19(1):151--172

  45. [45]

    Zakoian, J.-M. (1994). Threshold heteroskedastic models. Journal of Economic Dynamics and Control , 18(5):931--955

  46. [46]

    and Dewing, M

    Ceperley, D. and Dewing, M. (1999). The penalty method for random walks with uncertain energies. The Journal of Chemical Physics , 110(20):9812--9820

  47. [47]

    Chopin, N. (2002). A sequential particle filter method for static models. Biometrika , 89(3):539--552

  48. [48]

    Dang, K.-D., Quiroz, M., Kohn, R., Tran, M.-N., and Villani, M. (2019). Hamiltonian M onte C arlo with energy conserving subsampling. Journal of Machine Learning Research , 20(100):1--31

  49. [49]

    K., Deligiannidis, G., and Kohn, R

    Doucet, A., Pitt, M. K., Deligiannidis, G., and Kohn, R. (2015). Efficient implementation of M arkov chain M onte C arlo when using an unbiased likelihood estimator. Biometrika , 102:295--313

  50. [50]

    and Mackey, L

    Gorham, J. and Mackey, L. (2017). Measuring sample quality with kernels. In International Conference on Machine Learning , pages 1292--1301. PMLR

  51. [51]

    Gunawan, D., Dang, K.-D., Quiroz, M., Kohn, R., and Tran, M.-N. (2020). Subsampling sequential M onte C arlo for static B ayesian models. Statistics and Computing , 30(6):1741--1758

  52. [52]

    No free lunch for approximate mcmc

    Johndrow, J. E., Pillai, N. S., and Smith, A. (2020). No free lunch for approximate MCMC . arXiv preprint arXiv:2010.12514

  53. [53]

    Knuth, D. E. (1993). Johann F aulhaber and sums of powers. Mathematics of Computation , 61(203):277--294

  54. [54]

    Magnus, J. R. and Neudecker, H. (2019). Matrix Differential Calculus with Applications in Statistics and Econometrics . John Wiley & Sons

  55. [55]

    K., dos Santos Silva, R., Giordani, P., and Kohn, R

    Pitt, M. K., dos Santos Silva, R., Giordani, P., and Kohn, R. (2012). On some properties of M arkov chain M onte C arlo simulation methods based on the particle filter. Journal of Econometrics , 171(2):134--151

  56. [56]

    Poyiadjis, G., Doucet, A., and Singh, S. S. (2011). Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrika , 98(1):65--80

  57. [57]

    Prado, E., Nemeth, C., and Sherlock, C. (2026). Metropolis-- H astings with scalable subsampling. arXiv preprint arXiv:2407.19602v2

  58. [58]

    and Tran, M.-N

    Quiroz, M. and Tran, M.-N. (2023). Bayesian Analysis of Big Data via Subsampling M arkov Chain M onte C arlo , pages 1--6. John Wiley & Sons, Ltd

  59. [59]

    Quiroz, M., Tran, M.-N., Villani, M., and Kohn, R. (2018a). Speeding up MCMC by delayed acceptance and data subsampling. Journal of Computational and Graphical Statistics , 27(1):12--22

  60. [60]

    Quiroz, M., Villani, M., Kohn, R., Tran, M.-N., and Dang, K.-D. (2018b). Subsampling MCMC - A n introduction for the survey statistician. Sankhya A , 80:33--69

  61. [61]

    Rudolf, D., Smith, A., and Quiroz, M. (2026). Perturbations of M arkov chains. In Craiu, R. V., Vats, D., Jones, G. L., Brooks, S., Gelman, A., and Meng, X.-L., editors, Handbook of Markov Chain Monte Carlo , pages 527--568. Chapman and Hall/CRC, 2 edition

  62. [62]

    Scott, S. L. (2017). Comparing consensus M onte C arlo strategies for distributed B ayesian computation. Brazilian Journal of Probability and Statistics , 31(4):668--685

  63. [63]

    L., Blocker, A

    Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I., and McCulloch, R. E. (2022). Bayes and big data: T he consensus M onte C arlo algorithm. In Big Data and Information Theory , pages 8--18. Routledge

  64. [64]

    Cmd S tan R : T he R interface to C md S tan

    Stan Development Team (2026). Cmd S tan R : T he R interface to C md S tan. R package