pith. sign in

arxiv: 2605.28426 · v1 · pith:C2E3YT55new · submitted 2026-05-27 · 💻 cs.DC · cs.NA· math.NA

Fault Tolerance of Accelerated Asynchronous Fixed-Point Iterations on Flexible Computing Infrastructure

Pith reviewed 2026-06-29 09:50 UTC · model grok-4.3

classification 💻 cs.DC cs.NAmath.NA
keywords asynchronous iterationAnderson accelerationfault tolerancestraggler tolerancefixed-point methodsJacobi methodvalue iterationself-consistent field
0
0 comments X

The pith

Anderson acceleration survives asynchrony only when staleness perturbs the fixed-point evaluation rather than corrupting iterates directly.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether Anderson acceleration offsets inconsistency from stale data in asynchronous fixed-point iterations run on distributed systems. Experiments cover the Jacobi method, value iteration for Bellman equations, and Hartree-Fock SCF iteration, all executed via the Ray framework with controlled delays. Asynchronous execution yields consistent wall-clock speedups of 2.9x to 16.9x across all three methods regardless of acceleration. Acceleration succeeds only when staleness enters as a perturbation to the map evaluation, as in value iteration and SCF, but fails when stale values overwrite the accelerated iterate, as in block Jacobi. The entry point of staleness, rather than map contraction or smoothness, controls whether the accelerator survives.

Core claim

Asynchronous execution provides wall-clock speedups of 2.9x for Jacobi, 7.7x for value iteration, and 16.9x for SCF independent of acceleration. Anderson acceleration fails categorically under iterate-level corruption but retains its benefits under evaluation-level perturbation, making the staleness mechanism the primary determinant of whether acceleration survives asynchronous execution.

What carries the argument

The distinction between iterate-level corruption, where stale worker returns directly overwrite portions of the accelerated iterate, and evaluation-level perturbation, where staleness acts as a bounded perturbation to the fixed-point map evaluation.

If this is right

  • Straggler tolerance holds universally, delivering the reported speedups with a 100 ms delayed worker in every tested method.
  • Anderson acceleration loses all benefit under iterate-level corruption but keeps its improvement under evaluation-level perturbation.
  • The location where staleness enters the computation overrides contraction norm or smoothness as the deciding factor for accelerator survival.
  • The observed behavior matches the perturbation analysis cited from Toth et al. (2017).

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Solvers intended for asynchronous use should route updates to prevent direct overwrite of accelerated iterates when acceleration is applied.
  • The staleness-mechanism distinction could be tested on additional fixed-point problems to check whether it holds beyond the three methods studied.
  • Distributed execution frameworks could detect the type of staleness and conditionally enable or disable acceleration.

Load-bearing premise

That the difference between iterate-level corruption and evaluation-level perturbation observed in these three methods is the main driver of acceleration behavior and that the pattern generalizes beyond the tested cases and Ray framework.

What would settle it

Modify the block Jacobi implementation so that staleness enters only as an evaluation perturbation rather than direct iterate overwrite, then check whether Anderson acceleration regains its convergence benefit under asynchrony.

Figures

Figures reproduced from arXiv: 2605.28426 by Evan Coleman, Masha Sosonkina.

Figure 1
Figure 1. Figure 1: Jacobi straggler tolerance. Left: residual vs. worker-updates (sync curves overlap; async fans right). Right: residual vs. wall time [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Anderson acceleration for async Jacobi. (a) Window size: sync improves monotonically; async always above the no-Anderson [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Block coupling threshold. Multi-sweep local solves become transformative above [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: VI convergence on Garnet(500,4,5,𝛾=0.95). Anderson(5) accelerates both sync and async. Damping hurts. Dashed = async (median of 10 realizations). tightens with 𝛾, explaining why Anderson’s benefit grows with the discount factor while simultaneously predicting that it will diminish as 𝛾 → 1. Selection strategy interacts with coupling density (Section 3.5). For Jacobi, block size is the primary factor and gr… view at source ↗
Figure 5
Figure 5. Figure 5: Anderson benefit grows with 𝛾. At 𝛾 = 0.99, sync Anderson reduces iterations by 1.7× but async needs more budget. Orange = Anderson; blue = plain. 0 5000 10000 15000 Worker updates 10 7 10 5 10 3 10 1 Bellman residual (a) Bellman residual Sync+AA Async fixed (k=125) Uniform k=50 Greedy k=25 Greedy k=50 0 5000 10000 15000 Worker updates 10 7 10 5 10 3 10 1 Error vs V* (b) Error vs V* VI selection strategies… view at source ↗
Figure 6
Figure 6. Figure 6: VI selection strategies. Greedy outperforms uniform at every [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: VI straggler tolerance on Garnet(500,4,5, [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Stochastic SCF convergence at 𝑈/|𝑡| = 2.5 with damped mixing (10 realizations per delay). (a) Fraction finding the correct fixed point. (b) Energy error per realization; circles = correct FP, crosses = wrong FP. At intermediate correlation (𝑈 /|𝑡| = 2.5, 8 atoms), the SCF energy landscape admits multiple fixed points and async convergence becomes stochastic: different Ray scheduling realizations lead to di… view at source ↗
Figure 9
Figure 9. Figure 9: SCF convergence. (a) 8 atoms, 𝑈/|𝑡| = 2: DIIS corrects async bias (177 vs. 28 iterations). (b) 20 atoms, 𝑈/|𝑡| = 4: damped SCF shows wall-clock straggler tolerance despite incomplete convergence. At strong correlation (𝑈 /|𝑡| = 4, 20 atoms), even synchronous DIIS diverges and damped SCF barely converges, but the straggler tolerance remains dramatic: 16.9× wall-clock speedup at 100 ms delay. All three regim… view at source ↗
read the original abstract

Asynchronous iterative methods tolerate straggling processors by allowing workers to proceed with stale data, but at a cost: the iterates become inconsistent, potentially degrading convergence. We investigate whether convergence accelerators such as Anderson acceleration compensate for this degradation. We experimentally study three fixed-point iterations: the Jacobi method for sparse linear systems, value iteration for the Bellman equation, and the Hartree--Fock self-consistent field (SCF) iteration. The experiments are conducted using a high-performance execution framework Ray, which abstracts the complexity of distributed systems and enables code parallelization and fault injection with minimal changes. We establish two main results. First, straggler tolerance is universal: asynchronous execution provides wall-clock speedups of $2.9\times$ (Jacobi), $7.7\times$ (VI), and $16.9\times$ (SCF) over synchronous execution with a 100\,ms-delayed worker, independent of whether acceleration is used. Second, Anderson acceleration's effectiveness under asynchrony depends on where staleness enters the computation. We identify two staleness mechanisms: iterate-level corruption, where stale worker returns directly overwrite portions of the accelerated iterate (as in block Jacobi), and evaluation-level perturbation, where staleness acts as a bounded perturbation to the fixed-point map evaluation (as in VI and SCF). Anderson acceleration fails categorically under the first mechanism but retains its benefits under the second, consistent with the perturbation analysis of Toth et al.\ (2017). This distinction, rather than the contraction norm or smoothness of the map, is the primary determinant of whether acceleration survives asynchronous execution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper experimentally studies Anderson acceleration for asynchronous fixed-point iterations on three maps (block Jacobi for linear systems, value iteration for Bellman equations, and Hartree-Fock SCF) using the Ray framework. It reports that asynchrony yields wall-clock speedups of 2.9× (Jacobi), 7.7× (VI), and 16.9× (SCF) independent of acceleration, and that Anderson acceleration fails categorically under iterate-level corruption but retains benefits under evaluation-level perturbation; the authors conclude that this staleness-mechanism distinction, rather than contraction norm or smoothness, is the primary determinant of whether acceleration survives asynchrony, consistent with Toth et al. (2017) perturbation analysis.

Significance. If the mechanism distinction is confirmed, the results supply practical guidance on when convergence accelerators remain useful in distributed asynchronous settings with stragglers. The choice of Ray for fault injection and parallelization supports reproducibility, and the direct experimental link to existing perturbation theory is a constructive contribution.

major comments (1)
  1. [Abstract and experimental results] Abstract and experimental results: the claim that the iterate-level vs. evaluation-level staleness distinction 'rather than the contraction norm or smoothness of the map, is the primary determinant' is not supported by the design. The three maps differ simultaneously in staleness mechanism, contraction properties, and smoothness; no controlled variation holds the mechanism fixed while changing the norm or Lipschitz constant, leaving the observed categorical failure in Jacobi versus retention in VI/SCF open to confounding by map properties.
minor comments (1)
  1. The abstract and results sections report experimental outcomes but provide no details on number of runs, statistical tests, error bars, or controls for confounding factors, which limits verifiability of the speedup and failure-mode claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and the constructive critique of our experimental design and claims. We respond to the major comment below.

read point-by-point responses
  1. Referee: [Abstract and experimental results] Abstract and experimental results: the claim that the iterate-level vs. evaluation-level staleness distinction 'rather than the contraction norm or smoothness of the map, is the primary determinant' is not supported by the design. The three maps differ simultaneously in staleness mechanism, contraction properties, and smoothness; no controlled variation holds the mechanism fixed while changing the norm or Lipschitz constant, leaving the observed categorical failure in Jacobi versus retention in VI/SCF open to confounding by map properties.

    Authors: We agree that the experimental design does not provide a controlled isolation of the staleness mechanism while varying contraction norm or smoothness; the three maps were selected as representative applications rather than as a factorial design. The observed categorical difference (failure under iterate-level corruption in Jacobi, retention under evaluation-level perturbation in VI and SCF) is consistent with the perturbation analysis of Toth et al. (2017), but confounding by map properties cannot be excluded on the basis of these experiments alone. In the revised version we will (i) replace the phrasing 'is the primary determinant' in the abstract with 'is consistent with the mechanism distinction being the primary determinant according to existing perturbation theory' and (ii) add a limitations paragraph in the discussion section that explicitly notes the lack of controlled variation and calls for future work that holds the map fixed while varying only the injection point of staleness. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results are self-contained observations

full rationale

The paper reports direct experimental measurements of wall-clock speedups and convergence behavior under injected delays in the Ray framework for three distinct fixed-point maps. Claims about staleness mechanisms (iterate-level vs. evaluation-level) and Anderson acceleration survival are presented as outcomes of those controlled runs rather than quantities derived from equations or parameters fitted inside the paper. The single external citation to Toth et al. (2017) supplies an independent perturbation analysis and does not form a self-citation chain. No load-bearing step reduces by construction to a fitted input, self-definition, or renamed known result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard domain assumptions from numerical analysis and distributed systems; no free parameters, new axioms, or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Fixed-point iterations are expected to converge when the underlying map satisfies appropriate contraction or smoothness conditions.
    This background assumption is required for the experiments on convergence behavior to be meaningful.

pith-pipeline@v0.9.1-grok · 5825 in / 1342 out tokens · 28844 ms · 2026-06-29T09:50:24.540478+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Residual-Weighted Randomized Jacobi: Sharpened Bounds via Residual Concentration and Asynchronous Extension

    math.NA 2026-05 unverdicted novelty 7.0

    Residual-weighted randomized Jacobi with ℓ=2 selection sharpens one-step convergence by exactly the inverse participation ratio ν² of the residual and extends the analysis to asynchronous execution where the same quan...

Reference graph

Works this paper leans on

35 extracted references · 22 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Anderson

    Donald G. Anderson. 1965. Iterative Procedures for Nonlinear Integral Equations.J. ACM12, 4 (1965), 547–560. doi:10.1145/321296.321305

  2. [2]

    Quintana-Ortí

    Hartwig Anzt, Jack Dongarra, and Enrique S. Quintana-Ortí. 2019. Fine-grained bit-flip protection for relaxation methods.Journal of Computational Science36 (2019). doi:10.1016/j.jocs.2016.11.013 14

  3. [3]

    Archibald, Ken I.M

    Thomas W. Archibald, Ken I.M. McKinnon, and Lyn C. Thomas. 1995.On the Generation of Markov Decision Processes. Technical Report. University of Edinburgh. Journal of the Operational Research Society, 46(3):354–361

  4. [4]

    Haim Avron, Alex Druinsky, and Anshul Gupta. 2015. Revisiting Asynchronous Linear Solvers: Provable Convergence Rate Through Randomization. J. ACM62, 6 (2015), 51:1–51:27. doi:10.1145/2814566

  5. [5]

    Banerjee, Phanish Suryanarayana, and John E

    Amartya S. Banerjee, Phanish Suryanarayana, and John E. Pask. 2016. Periodic Pulay Method for Robust and Efficient Convergence Acceleration of Self-Consistent Field Iterations.Chemical Physics Letters647 (2016), 31–35. doi:10.1016/j.cplett.2016.01.033

  6. [6]

    Gérard M. Baudet. 1978. Asynchronous Iterative Methods for Multiprocessors.J. ACM25, 2 (1978), 226–244. doi:10.1145/322063.322067

  7. [7]

    Bertsekas and John N

    Dimitri P. Bertsekas and John N. Tsitsiklis. 1989.Parallel and Distributed Computation: Numerical Methods. Prentice Hall

  8. [8]

    Bertsekas and John N

    Dimitri P. Bertsekas and John N. Tsitsiklis. 1996.Neuro-Dynamic Programming. Athena Scientific, Belmont, MA

  9. [9]

    Daniel Chazan and Willard Miranker. 1969. Chaotic Relaxation.Linear Algebra Appl.2, 2 (1969), 199–222. doi:10.1016/0024-3795(69)90028-7

  10. [10]

    Zizhong Chen. 2013. Online-ABFT: An online algorithm based fault tolerance scheme for soft error detection in iterative methods. InProc. 18th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP). 167–176. doi:10.1145/2442516.2442533

  11. [11]

    Edmond Chow, Hartwig Anzt, and Jack Dongarra. 2015. Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs. InHigh Performance Computing (ISC High Performance 2015) (Lecture Notes in Computer Science, Vol. 9137). Springer, 1–16. doi:10.1007/978-3-319-20119-1_1

  12. [12]

    Edmond Chow, Andreas Frommer, and Daniel B. Szyld. 2021. Asynchronous Richardson Iterations: Theory and Practice.Numerical Algorithms87 (2021), 1635–1651. doi:10.1007/s11075-020-01023-3

  13. [13]

    Evan Coleman, Erik J Jensen, and Masha Sosonkina. 2021. Fault recovery methods for asynchronous linear solvers.International Journal of Parallel Programming49, 1 (2021), 51–80

  14. [14]

    Evan Coleman and Masha Sosonkina. 2018. Self-stabilizing fine-grained parallel incomplete LU factorization.Sustainable Computing: Informatics and Systems19 (2018), 291–304

  15. [15]

    Rebholz, and Mengying Xiao

    Caleb Evans, Sara Pollock, Leo G. Rebholz, and Mengying Xiao. 2020. A Proof That Anderson Acceleration Improves the Convergence Rate in Linearly Converging Fixed-Point Methods (But Not in Those Converging Quadratically).SIAM J. Numer. Anal.58, 1 (2020), 788–810. doi:10.1137/19M1245384

  16. [16]

    Haw-ren Fang and Yousef Saad. 2009. Two Classes of Multisecant Methods for Nonlinear Acceleration.Numerical Linear Algebra with Applications 16, 3 (2009), 197–221. doi:10.1002/nla.617

  17. [17]

    Andreas Frommer and Daniel B. Szyld. 2000. On Asynchronous Iterations.J. Comput. Appl. Math.123, 1–2 (2000), 201–216. doi:10.1016/S0377- 0427(00)00409-X

  18. [18]

    Matthieu Geist and Bruno Scherrer. 2018. Anderson Acceleration for Reinforcement Learning. InEuropean Workshop on Reinforcement Learning (EWRL). arXiv:1809.09501

  19. [19]

    Erik J Jensen, Evan Coleman, and Masha Sosonkina. 2018. Using modeling to improve the performance of asynchronous jacobi. InProceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA). The Steering Committee of The World Congress in Computer Science, Computer . . . , 117–126

  20. [20]

    Erik J Jensen, Evan Coleman, and Masha Sosonkina. 2019. Predictive modeling of the performance of asynchronous iterative methods: EJ Jensen et al.The Journal of Supercomputing75, 8 (2019), 5084–5105

  21. [21]

    Jordan, and Ion Stoica

    Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging AI Applications. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, 561–577

  22. [22]

    Kimio Ohno. 1964. Some Remarks on the Pariser–Parr–Pople Method.Theoretica Chimica Acta2, 3 (1964), 219–227. doi:10.1007/BF00528281

  23. [23]

    Rudolph Pariser and Robert G. Parr. 1953. A Semi-Empirical Theory of the Electronic Spectra and Electronic Structure of Complex Unsaturated Molecules. I.The Journal of Chemical Physics21, 3 (1953), 466–471. doi:10.1063/1.1698929

  24. [24]

    John A. Pople. 1953. Electron Interaction in Unsaturated Hydrocarbons.Transactions of the Faraday Society49 (1953), 1375–1385. doi:10.1039/ TF9534901375

  25. [25]

    Péter Pulay. 1980. Convergence Acceleration of Iterative Sequences. The Case of SCF Iteration.Chemical Physics Letters73, 2 (1980), 393–398. doi:10.1016/0009-2614(80)80396-4

  26. [26]

    Thorsten Rohwedder and Reinhold Schneider. 2011. An Analysis for the DIIS Acceleration Method Used in Quantum Chemistry Calculations. Journal of Mathematical Chemistry49, 9 (2011), 1889–1914. doi:10.1007/s10910-011-9863-y

  27. [27]

    Piyush Sao and Richard Vuduc. 2013. Self-stabilizing iterative solvers. InProc. Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA). 1–8. doi:10.1145/2530268.2530272

  28. [28]

    Wenjie Shi, Shiji Song, Hui Wu, Ya-Chien Hsu, Cheng Wu, and Gao Huang. 2019. Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning.Advances in Neural Information Processing Systems32 (2019)

  29. [29]

    Ke Sun, Yafei Li, Yingnan Chen, Yingbin Lin, Hongyuan Zha, and Baoyuan Wu. 2021. Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization. InAdvances in Neural Information Processing Systems, Vol. 34

  30. [30]

    Pyscf: the python-based simulations of chemistry framework,

    Qiming Sun, Timothy C. Berkelbach, Nick S. Blunt, George H. Booth, Sheng Guo, Zhendong Li, Junzi Liu, James D. McClain, Elvira R. Sayfutyarova, Sandeep Sharma, Sebastian Wouters, and Garnet Kin-Lic Chan. 2018. PySCF: The Python-Based Simulations of Chemistry Framework.WIREs Computational Molecular Science8, 1 (2018), e1340. doi:10.1002/wcms.1340

  31. [31]

    Attila Szabo and Neil S. Ostlund. 1996.Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory(revised ed.). Dover Publications. 15

  32. [32]

    Austin Ellis, Thomas Evans, Steven Hamilton, C

    Alex Toth, J. Austin Ellis, Thomas Evans, Steven Hamilton, C. T. Kelley, Roger Pawlowski, and Stuart Slattery. 2017. Local Improvement Results for Anderson Acceleration with Inaccurate Function Evaluations.SIAM Journal on Scientific Computing39, 5 (2017), S47–S65. doi:10.1137/16M1080677

  33. [33]

    Alex Toth and C. T. Kelley. 2015. Convergence Analysis for Anderson Acceleration.SIAM J. Numer. Anal.53, 2 (2015), 805–819. doi:10.1137/130919398

  34. [34]

    Walker and Peng Ni

    Homer F. Walker and Peng Ni. 2011. Anderson Acceleration for Fixed-Point Iterations.SIAM J. Numer. Anal.49, 4 (2011), 1715–1735. doi:10.1137/ 10078356X

  35. [35]

    Junzi Zhang, Brendan O’Donoghue, and Stephen Boyd. 2020. Globally Convergent Type-I Anderson Acceleration for Non-Smooth Fixed-Point Iterations.SIAM Journal on Optimization30, 4 (2020), 3170–3197. doi:10.1137/18M1232772 16