arxiv: 2605.13397 · v1 · submitted 2026-05-13 · 📊 stat.ME · stat.CO

Recognition: 2 theorem links

· Lean Theorem

Stabilised weighted data subsampling for accelerated inference in models with recursive likelihoods

Matias Quiroz , Aishwarya Bhaskaran , Zixuan Wang , Thomas Goodwin

Authors on Pith no claims yet

Pith reviewed 2026-05-14 17:57 UTC · model grok-4.3

classification 📊 stat.ME stat.CO

keywords weighted subsamplingrecursive likelihoodunbiased estimatorGARCH modelsstochastic optimisationMarkov chain Monte Carlovariational Bayescomputational statistics

0 comments

The pith

Stabilised weighted subsampling yields unbiased log-likelihood estimates for faster inference in recursive models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to accelerate inference when likelihoods are defined recursively, as in many time-series models. It subsamples observations with weights that give higher probability to early data points, thereby shortening the average length of recursive computations while keeping the log-likelihood estimator unbiased. A stabilisation rule, supported by theory, limits how quickly these probabilities can decay so that estimator variance stays controlled without driving computation back up. The same construction supplies an unbiased gradient estimator, allowing the subsampling step to plug directly into optimisation, variational Bayes, or MCMC routines. Demonstrations on GARCH and threshold GARCH models show clear reductions in run time with no loss in the accuracy of the recovered parameters.

Core claim

The central discovery is a stabilisation framework for weighted data subsampling that produces an unbiased estimator of the log-likelihood in models with recursive likelihoods. By assigning higher inclusion probabilities to early observations, the method reduces the expected depth of recursion. Theoretical results guide hyperparameter choices that keep the decay rate inside an interval preventing both variance explosion and excessive cost. An analogous unbiased estimator is obtained for the log-likelihood gradient. When these estimators are inserted into standard inference algorithms, applications to conditional volatility models deliver substantial computational speed-ups while preserving (

What carries the argument

Stabilised weighted subsampling, which uses controlled decay of sampling probabilities to produce an unbiased log-likelihood estimator while reducing recursion depth.

If this is right

The estimators serve as generic building blocks that can be embedded in stochastic optimisation, variational Bayes, and Markov chain Monte Carlo frameworks.
In standard and threshold GARCH models the method produces substantial computational speed-ups while maintaining inferential accuracy.
It outperforms uniform subsampling and compares favourably with recent stochastic gradient and divide-and-conquer MCMC methods for dependent data.
An unbiased gradient estimator is available to support gradient-based inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same stabilisation principle could be tested on other recursive structures such as state-space or sequential hierarchical models.
Adaptive tuning of the decay hyperparameter might allow the method to operate in streaming or online settings without retuning.
If the variance-cost balance holds across a wider class of dependent-data problems, routine subsampling could become standard even for moderate-sized recursive likelihoods.
Extensions to non-time-series recursive computations remain open for direct empirical checks.

Load-bearing premise

That hyperparameter tuning can restrict the decay of sampling probabilities in a way that simultaneously controls estimator variance and computational cost without introducing bias into the likelihood estimate.

What would settle it

Applying the stabilised estimator to a large GARCH dataset and finding either estimator variance substantially higher than full-data inference or no reduction in average recursion cost would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.13397 by Aishwarya Bhaskaran, Matias Quiroz, Thomas Goodwin, Zixuan Wang.

**Figure 2.** Figure 2: E(umax) in Lemma 8(i) as a function of m (both normalised by T) for T = 10,000 (left panel) and T = 100,000 (right panel) with the hyperparameters t ⋆ = 1,000 and b = 100 taking the same values as in the applications. The figure shows the results for several ε obtained using different c; see the legend. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: E(umax) in Lemma 8(i) (normalised by T, with T = 100,000) as a function of c and m (where m is treated as continuous for visualisation), shown for three values of t ⋆ (see panel titles), with b = 100. The values of T and b correspond to those in the applications, as does t ⋆ in the middle panel. The colour scale (right) represents the expected computational cost ratio, where red indicates no compute saving… view at source ↗

**Figure 4.** Figure 4: Observations yt for the Dow Jones 30 index, constructed as one-minute log-returns over the period 2023-05-15 to 2024-06-28 (T = 100,000) and rescaled to have unit sample standard deviation. The figure shows that the log-returns can exhibit deviations of 30-40 unconditional stan27 [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗

**Figure 5.** Figure 5: Constrained objective function E(umax) defined in (2.20) (normalised by T, with T = 100,000) and described in Section 2.7, shown as a function of c and m (where m is treated as continuous for visualisation). The black curve marks the variance constraint, and the dashed vertical line marks the safeguard lower bound constraint. Results are shown for three choices of Rmax = 1/cmin (see panel titles), with b =… view at source ↗

**Figure 6.** Figure 6: Results for the GARCH(1, 1) model with normal errors fitted to the Dow Jones data. The figure shows boxplots of the evidence lower bound (ELBO) differences ∆ELBO defined in (4.5), where positive values indicate ELBO values higher than the median full-data VB ELBO, which is marked with a dashed horizontal line. Subsampling variational Bayes (VB) results are shown for different sampling schemes, with full-da… view at source ↗

**Figure 7.** Figure 7: Posterior marginal distributions under the original parameterisation [PITH_FULL_IMAGE:figures/full_fig_p035_7.png] view at source ↗

read the original abstract

Inference for models with recursively defined likelihoods is computationally demanding, limiting scalability to large datasets. We propose a stabilised weighted subsampling methodology for accelerated inference based on an unbiased estimator of the log-likelihood. By assigning higher sampling probabilities to early observations, the method reduces the effective depth of recursive likelihood evaluations and hence expected computational cost. However, slow decay leads to frequent inclusion of late observations and high computational cost, while overly aggressive decay can substantially inflate estimator variance. We develop a stabilisation framework, underpinned by theoretical results, that restricts the decay of the sampling probabilities to avoid both variance and computational pathologies through principled hyperparameter tuning. We further consider an unbiased subsampling estimator of the log-likelihood gradient, enabling gradient-based inference. The proposed estimators are generic building blocks for subsampling-based inference and can be embedded within frameworks including stochastic optimisation, variational Bayes, and Markov chain Monte Carlo. Applications to conditional volatility models, including standard and threshold generalised autoregressive conditional heteroskedasticity models, demonstrate substantial computational speed-ups while maintaining inferential accuracy. The proposed approach outperforms uniform subsampling and compares favourably with recent stochastic gradient and divide-and-conquer MCMC methods for dependent data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The stabilised weighted subsampling gives a workable way to cut recursion depth in models like GARCH while keeping the log-likelihood estimator unbiased, but the variance control rests on tuning that needs more checking.

read the letter

The main advance is a stabilisation rule that limits how fast the sampling probabilities decay for early observations in recursive likelihoods. This keeps the estimator unbiased, trims expected compute cost, and avoids the variance spike that comes from dropping too many late terms. The GARCH and threshold GARCH runs show clear wall-clock gains over uniform subsampling and hold accuracy against a couple of recent stochastic-gradient baselines for dependent data. That part is concrete and useful. The paper also sketches an unbiased gradient estimator, which opens the door to stochastic optimisation or MCMC wrappers. The soft spot is the theory: the abstract claims results that bound variance and cost via the hyperparameter, but without the derivations it is hard to judge how tight those bounds are or how sensitive they are to series length and dependence strength. The claim that the estimators drop straight into variational Bayes or divide-and-conquer MCMC is plausible on paper but not demonstrated, so the generality is asserted rather than shown. This is aimed at computational statisticians who already work with large time-series models and need faster likelihood evaluations. A reader who runs GARCH-type models on long series will get immediate practical value from the experiments and the tuning recipe. I would send it to peer review because the core construction is clean, the empirical comparison is relevant, and the unbiasedness property is worth a careful check even if the variance analysis needs tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a stabilised weighted subsampling methodology for accelerated inference in models with recursively defined likelihoods. It constructs an unbiased estimator of the log-likelihood by assigning higher sampling probabilities to early observations, thereby reducing the expected depth of recursive evaluations and computational cost. A stabilisation framework, supported by theoretical results, controls the decay rate of these probabilities through hyperparameter tuning to avoid both high variance and high cost pathologies. The approach is extended to an unbiased estimator of the log-likelihood gradient. The estimators are presented as generic building blocks embeddable in stochastic optimisation, variational Bayes, and MCMC. Applications to standard and threshold GARCH models are used to demonstrate substantial speed-ups while preserving inferential accuracy, with favourable comparisons to uniform subsampling and existing stochastic gradient/divide-and-conquer methods for dependent data.

Significance. If the unbiasedness and stabilisation claims hold, the work offers a principled route to scalable inference for large time-series datasets with recursive likelihood structures, particularly volatility models. The generic framing and embedding potential across multiple inference frameworks constitute a clear strength, as does the explicit handling of the variance-cost trade-off via hyperparameter control. The empirical demonstrations on GARCH-type models provide concrete evidence of practical utility, though the significance ultimately hinges on the verifiability of the supporting theory.

major comments (2)

[Abstract and §3] Abstract and §3: The central claim that the weighted subsampling estimator remains unbiased while the stabilisation framework restricts probability decay without introducing bias is load-bearing, yet the provided description does not include the explicit expectation calculation or the theorem establishing that the hyperparameter-tuned decay preserves unbiasedness; this must be shown in detail to support the variance-control guarantee.
[§4.2] §4.2 (stabilisation framework): The assertion that principled hyperparameter tuning simultaneously avoids high estimator variance and high computational cost is not accompanied by a concrete bound or sensitivity analysis linking the decay rate to the recursive depth; without this, the speed-accuracy trade-off claim for GARCH applications rests on unverified assumptions.

minor comments (2)

[Abstract] The abstract mentions applications to conditional volatility models but does not specify the exact dataset sizes or number of replications used in the timing and accuracy comparisons; these details should be added for reproducibility.
[§2] Notation for the sampling probabilities p_i and the stabilisation hyperparameter should be introduced with a clear definition before the theoretical results are stated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive comments on our manuscript. We address each major comment below and will revise the manuscript to provide the requested clarifications and additional analysis, thereby strengthening the presentation of the theoretical results.

read point-by-point responses

Referee: [Abstract and §3] The central claim that the weighted subsampling estimator remains unbiased while the stabilisation framework restricts probability decay without introducing bias is load-bearing, yet the provided description does not include the explicit expectation calculation or the theorem establishing that the hyperparameter-tuned decay preserves unbiasedness; this must be shown in detail to support the variance-control guarantee.

Authors: We appreciate the referee drawing attention to this foundational aspect. The unbiasedness of the weighted subsampling estimator is established explicitly in Theorem 1 of Section 3, where the expectation is computed directly from the sampling probabilities to equal the true log-likelihood. The stabilisation framework restricts decay rates through hyperparameters but does not alter this expectation, as the tuning operates on the probability schedule without introducing bias. To address the concern, we will expand Section 3 with the full step-by-step expectation derivation and proof that stabilisation preserves unbiasedness, and we will revise the abstract to reference Theorem 1 explicitly. revision: yes
Referee: [§4.2] The assertion that principled hyperparameter tuning simultaneously avoids high estimator variance and high computational cost is not accompanied by a concrete bound or sensitivity analysis linking the decay rate to the recursive depth; without this, the speed-accuracy trade-off claim for GARCH applications rests on unverified assumptions.

Authors: We agree that an explicit bound and sensitivity analysis would strengthen the claims. Section 4.2 currently provides theoretical variance bounds in terms of the decay parameter and illustrates the hyperparameter-controlled trade-off, but we acknowledge that a direct sensitivity analysis connecting decay rate to expected recursive depth is not fully elaborated. In the revision we will add this analysis, deriving a concrete bound on expected recursion depth as a function of the stabilisation hyperparameters and including a sensitivity study for the GARCH models to confirm the speed-accuracy trade-off. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper presents a stabilised weighted subsampling method built on an unbiased log-likelihood estimator, with a stabilisation framework derived from theoretical results controlling sampling-probability decay via hyperparameter tuning. No steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the central claims rest on external theoretical grounding and are applied to GARCH models without evident internal reduction to the paper's own equations or prior self-references as the sole justification. The derivation remains self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of an unbiased weighted estimator whose variance can be controlled by restricting probability decay, plus the assumption that hyperparameter tuning can be done in a principled way without post-hoc adjustments.

free parameters (1)

stabilisation hyperparameter controlling decay rate
Chosen to restrict sampling-probability decay and thereby avoid variance and cost pathologies.

axioms (1)

domain assumption The weighted subsampling estimator remains unbiased for the log-likelihood when probabilities are stabilised.
Invoked to justify the estimator as a drop-in replacement for full-data likelihood.

pith-pipeline@v0.9.0 · 5513 in / 1216 out tokens · 32255 ms · 2026-05-14T17:57:21.911396+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop a stabilisation framework... truncated power-law decaying (TPD) weights... tail floor fraction c... E(u_max)≤t⋆+(T−t⋆)(1−(1−ε)m)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Lemma 3... variance... O(T^{λ−2}) for power-law... exponential growth for ED

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 1 internal anchor

[1]

Ai, M., Yu, J., Zhang, H., and Wang, H. (2021). Optimal subsampling algorithms for big data regressions. Statistica Sinica , 31(2):749--772

work page 2021
[2]

Aicher, C., Putcha, S., Nemeth, C., Fearnhead, P., and Fox, E. (2025). Stochastic gradient MCMC for nonlinear state space models. Bayesian Analysis , 20(1):83 -- 105

work page 2025
[3]

Amari, S.-i. (1998). Natural gradient works efficiently in learning. Neural Computation , 10(2):251--276

work page 1998
[4]

Bardenet, R., Doucet, A., and Holmes, C. (2014). Towards scaling up M arkov chain M onte C arlo: A n adaptive subsampling approach. Proceedings of the 31st International Conference on Machine Learning , pages 405--413

work page 2014
[5]

Bardenet, R., Doucet, A., and Holmes, C. (2017). On M arkov chain M onte C arlo methods for tall data. Journal of Machine Learning Research , 18(47):1--43

work page 2017
[6]

and Lubrano, M

Bauwens, L. and Lubrano, M. (1998). Bayesian inference on GARCH models using the G ibbs sampler. The Econometrics Journal , 1:C23--C46

work page 1998
[7]

G., Pearlmutter, B

Baydin, A. G., Pearlmutter, B. A., Radul, A. A., and Siskind, J. M. (2018). Automatic differentiation in machine learning: A survey. Journal of Machine Learning Research , 18(153):1--43

work page 2018
[8]

M., Kucukelbir, A., and McAuliffe, J

Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association , 112(518):859--877

work page 2017
[9]

Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics , 31(3):307--327

work page 1986
[10]

Chen, C.-F. (1985). On asymptotic normality of limiting density functions with B ayesian implications. Journal of the Royal Statistical Society Series B: Statistical Methodology , 47(3):540--546

work page 1985
[11]

Chen, T., Fox, E., and Guestrin, C. (2014). Stochastic gradient H amiltonian M onte C arlo. In International Conference on Machine Learning , pages 1683--1691. PMLR

work page 2014
[12]

Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of U nited K ingdom inflation. Econometrica , 50(4):987--1007

work page 1982
[13]

Fiorentini, G., Calzolari, G., and Panattoni, L. (1996). Analytic derivatives and the computation of GARCH estimates. Journal of Applied Econometrics , 11(4):399--417

work page 1996
[14]

R., Jagannathan, R., and Runkle, D

Glosten, L. R., Jagannathan, R., and Runkle, D. E. (1993). On the relation between the expected value and the volatility of the nominal excess return on stocks. The Journal of Finance , 48(5):1779--1801

work page 1993
[15]

Gunawan, D., Tran, M.-N., and Kohn, R. (2017). Fast inference for intractable likelihood problems using variational B ayes. arXiv preprint arXiv:1705.06679

work page internal anchor Pith review Pith/arXiv arXiv 2017
[16]

Haario, H., Saksman, E., and Tamminen, J. (2001). An adaptive M etropolis algorithm . Bernoulli , 7(2):223 -- 242

work page 2001
[17]

Hansen, P. R. and Lunde, A. (2005). A forecast comparison of volatility models: D oes anything beat a GARCH ( 1 , 1 )? Journal of Applied Econometrics , 20(7):873--889

work page 2005
[18]

Huang, D., Wang, H., and Yao, Q. (2008). Estimating GARCH models: W hen to use what? The Econometrics Journal , 11(1):27--38

work page 2008
[19]

Kingma, D. P. and Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations

work page 2015
[20]

Kingma, D. P. and Welling, M. (2014). Auto-encoding variational B ayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR) 2014

work page 2014
[21]

Korattikara, A., Chen, Y., and Welling, M. (2014). Austerity in MCMC land: C utting the M etropolis- H astings budget. Proceedings of the 31st International Conference on Machine Learning , pages 181--189

work page 2014
[22]

Li, D., Clements, A., and Drovandi, C. (2021). Efficient B ayesian estimation for GARCH -type models via sequential M onte C arlo. Econometrics and Statistics , 19:22--46

work page 2021
[23]

J., and Fox, E

Ma, Y.-A., Foti, N. J., and Fox, E. B. (2017). Stochastic gradient MCMC methods for hidden M arkov models. In International Conference on Machine Learning , pages 2265--2274. PMLR

work page 2017
[24]

and Iosifidis, A

Magris, M. and Iosifidis, A. (2023). Variational inference for GARCH -family models. In Proceedings of the Fourth ACM International Conference on AI in Finance , pages 541--548

work page 2023
[25]

Martens, J. (2020). New insights and perspectives on the natural gradient method. Journal of Machine Learning Research , 21(146):1--76

work page 2020
[26]

and Straumann, D

Mikosch, T. and Straumann, D. (2002). Whittle estimation in a heavy-tailed GARCH (1,1) model. Stochastic Processes and their Applications , 100:187--222

work page 2002
[27]

and Fearnhead, P

Nemeth, C. and Fearnhead, P. (2021). Stochastic gradient M arkov chain M onte C arlo. Journal of the American Statistical Association , 116(533):433--450

work page 2021
[28]

M.-H., Nott, D

Ong, V. M.-H., Nott, D. J., and Smith, M. S. (2018). Gaussian variational approximation with a factor covariance structure. Journal of Computational and Graphical Statistics , 27(3):465--478

work page 2018
[29]

Ou, R., Astfalck, L., Sen, D., and Dunson, D. (2025). Scalable B ayesian inference for time series via divide-and-conquer. arXiv preprint arXiv:2106.11043v4

work page arXiv 2025
[30]

Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization , 30(4):838--855

work page 1992
[31]

Quiroz, M., Kohn, R., Villani, M., and Tran, M.-N. (2019). Speeding up MCMC by efficient data subsampling. Journal of the American Statistical Association , 114(526):831--843

work page 2019
[32]

J., and Kohn, R

Quiroz, M., Nott, D. J., and Kohn, R. (2023). G aussian variational approximations for high-dimensional state space models. Bayesian Analysis , 18(3):989 -- 1016

work page 2023
[33]

and Monro, S

Robbins, H. and Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics , 22(3):400--407

work page 1951
[34]

Salomone, R., Quiroz, M., Kohn, R., Villani, M., and Tran, M.-N. (2020). Spectral subsampling MCMC for stationary time series. In International Conference on Machine Learning , pages 8449--8458. PMLR

work page 2020
[35]

S \"a rndal, C.-E., Swensson, B., and Wretman, J. (2003). Model Assisted Survey Sampling . Springer Science & Business Media

work page 2003
[36]

and Lázaro-Gredilla, M

Titsias, M. and Lázaro-Gredilla, M. (2014). Doubly stochastic variational B ayes for non-conjugate inference. In Proceedings of the 31st International Conference on Machine Learning , volume 32, pages 1971--1979

work page 2014
[37]

Villani, M., Quiroz, M., Kohn, R., and Salomone, R. (2024). Spectral subsampling MCMC for stationary multivariate time series with applications to vector ARTFIMA processes. Econometrics and Statistics , 32:98--121

work page 2024
[38]

Wales, D. J. and Doye, J. P. (1997). Global optimization by basin-hopping and the lowest energy structures of L ennard- J ones clusters containing up to 110 atoms. The Journal of Physical Chemistry A , 101(28):5111--5116

work page 1997
[39]

Wang, H., Zhu, R., and Ma, P. (2018). Optimal subsampling for large sample logistic regression. Journal of the American Statistical Association , 113(522):829--844

work page 2018
[40]

Whittle, P. (1953). Estimation and information in stationary time series. Arkiv f \"o r Matematik , 2(5):423--434

work page 1953
[41]

and Maringer, D

Winker, P. and Maringer, D. (2006). The convergence of optimization based GARCH estimators: T heory and application. In Rizzi, A. and Vichi, M., editors, Compstat 2006 - Proceedings in Computational Statistics , pages 483--494

work page 2006
[42]

Xu, M., Quiroz, M., Kohn, R., and Sisson, S. A. (2019). Variance reduction properties of the reparameterization trick. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics , volume 89 of Proceedings of Machine Learning Research , pages 2711--2720. PMLR

work page 2019
[43]

Xuan, H., Maestrini, L., Chen, F., and Grazian, C. (2024). Stochastic variational inference for GARCH models. Statistics and Computing , 34(1):45

work page 2024
[44]

and Wang, H

Yao, Y. and Wang, H. (2021). A review on optimal subsampling methods for massive datasets. Journal of Data Science , 19(1):151--172

work page 2021
[45]

Zakoian, J.-M. (1994). Threshold heteroskedastic models. Journal of Economic Dynamics and Control , 18(5):931--955

work page 1994
[46]

and Dewing, M

Ceperley, D. and Dewing, M. (1999). The penalty method for random walks with uncertain energies. The Journal of Chemical Physics , 110(20):9812--9820

work page 1999
[47]

Chopin, N. (2002). A sequential particle filter method for static models. Biometrika , 89(3):539--552

work page 2002
[48]

Dang, K.-D., Quiroz, M., Kohn, R., Tran, M.-N., and Villani, M. (2019). Hamiltonian M onte C arlo with energy conserving subsampling. Journal of Machine Learning Research , 20(100):1--31

work page 2019
[49]

K., Deligiannidis, G., and Kohn, R

Doucet, A., Pitt, M. K., Deligiannidis, G., and Kohn, R. (2015). Efficient implementation of M arkov chain M onte C arlo when using an unbiased likelihood estimator. Biometrika , 102:295--313

work page 2015
[50]

and Mackey, L

Gorham, J. and Mackey, L. (2017). Measuring sample quality with kernels. In International Conference on Machine Learning , pages 1292--1301. PMLR

work page 2017
[51]

Gunawan, D., Dang, K.-D., Quiroz, M., Kohn, R., and Tran, M.-N. (2020). Subsampling sequential M onte C arlo for static B ayesian models. Statistics and Computing , 30(6):1741--1758

work page 2020
[52]

No free lunch for approximate mcmc

Johndrow, J. E., Pillai, N. S., and Smith, A. (2020). No free lunch for approximate MCMC . arXiv preprint arXiv:2010.12514

work page arXiv 2020
[53]

Knuth, D. E. (1993). Johann F aulhaber and sums of powers. Mathematics of Computation , 61(203):277--294

work page 1993
[54]

Magnus, J. R. and Neudecker, H. (2019). Matrix Differential Calculus with Applications in Statistics and Econometrics . John Wiley & Sons

work page 2019
[55]

K., dos Santos Silva, R., Giordani, P., and Kohn, R

Pitt, M. K., dos Santos Silva, R., Giordani, P., and Kohn, R. (2012). On some properties of M arkov chain M onte C arlo simulation methods based on the particle filter. Journal of Econometrics , 171(2):134--151

work page 2012
[56]

Poyiadjis, G., Doucet, A., and Singh, S. S. (2011). Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrika , 98(1):65--80

work page 2011
[57]

Prado, E., Nemeth, C., and Sherlock, C. (2026). Metropolis-- H astings with scalable subsampling. arXiv preprint arXiv:2407.19602v2

work page arXiv 2026
[58]

and Tran, M.-N

Quiroz, M. and Tran, M.-N. (2023). Bayesian Analysis of Big Data via Subsampling M arkov Chain M onte C arlo , pages 1--6. John Wiley & Sons, Ltd

work page 2023
[59]

Quiroz, M., Tran, M.-N., Villani, M., and Kohn, R. (2018a). Speeding up MCMC by delayed acceptance and data subsampling. Journal of Computational and Graphical Statistics , 27(1):12--22

work page
[60]

Quiroz, M., Villani, M., Kohn, R., Tran, M.-N., and Dang, K.-D. (2018b). Subsampling MCMC - A n introduction for the survey statistician. Sankhya A , 80:33--69

work page
[61]

Rudolf, D., Smith, A., and Quiroz, M. (2026). Perturbations of M arkov chains. In Craiu, R. V., Vats, D., Jones, G. L., Brooks, S., Gelman, A., and Meng, X.-L., editors, Handbook of Markov Chain Monte Carlo , pages 527--568. Chapman and Hall/CRC, 2 edition

work page 2026
[62]

Scott, S. L. (2017). Comparing consensus M onte C arlo strategies for distributed B ayesian computation. Brazilian Journal of Probability and Statistics , 31(4):668--685

work page 2017
[63]

L., Blocker, A

Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I., and McCulloch, R. E. (2022). Bayes and big data: T he consensus M onte C arlo algorithm. In Big Data and Information Theory , pages 8--18. Routledge

work page 2022
[64]

Cmd S tan R : T he R interface to C md S tan

Stan Development Team (2026). Cmd S tan R : T he R interface to C md S tan. R package

work page 2026