pith. sign in

arxiv: 2605.29304 · v1 · pith:GF5OUP3Snew · submitted 2026-05-28 · 🧮 math.NA · cs.NA

On subspace-constrained preconditioning for randomized iterative methods

Pith reviewed 2026-06-29 06:19 UTC · model grok-4.3

classification 🧮 math.NA cs.NA
keywords subspace-constrained preconditioningrandomized iterative methodslinear systemsQR factorizationconvergence analysispreconditioning
0
0 comments X

The pith

A QR-like factorization allows subspace-constrained preconditioning to reduce randomized linear solvers to smaller systems with linear convergence in expectation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper refines subspace-constrained preconditioning for randomized iterative methods that solve linear systems. The approach introduces a QR-like factorization to convert the system into an equivalent block-orthogonal form without relying on full-rank assumptions from earlier methods. The reformulation shrinks the problem to a smaller linear system possessing a favorable distribution of singular values when an appropriate initial point is used. The framework operates implicitly inside the iterations, avoiding the expense of building an explicit preconditioner or preconditioned matrix. It further incorporates orthogonalized search directions derived from stochastic gradients along with accelerated versions, and establishes linear convergence in expectation for the overall procedure.

Core claim

The subspace-constrained preconditioning framework, realized through an implicit QR-like factorization, transforms the original linear system into block-orthogonal form. This step reduces the task to solving a smaller linear system with improved singular value properties from a suitable starting vector. The resulting algorithmic structure converges linearly in expectation while remaining computationally efficient by not constructing preconditioners explicitly.

What carries the argument

The QR-like factorization that produces a block-orthogonal equivalent system, enabling reduction to a smaller problem with favorable singular values.

If this is right

  • The technique eliminates the need for full-rank assumptions required in previous subspace preconditioning work.
  • Implementation stays implicit, sidestepping the high cost of forming a complete preconditioned system.
  • Orthogonal search directions constructed from stochastic gradients support accelerated variants of the method.
  • Linear convergence holds in expectation under the stated conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The reduction to smaller systems may extend naturally to other classes of iterative solvers beyond randomized ones.
  • Choosing the initial point to achieve the favorable singular value distribution could be explored through adaptive strategies in practice.
  • The implicit nature suggests the approach scales well to very large sparse systems where explicit matrices are impractical.

Load-bearing premise

The reformulation reduces the problem to solving a smaller linear system with a favorable singular value distribution, provided an appropriate initial point is employed.

What would settle it

Running the method on a test linear system with a chosen initial point and checking whether the expected residual or error decreases at a constant linear rate per iteration; failure to observe this rate would challenge the convergence claim.

Figures

Figures reproduced from arXiv: 2605.29304 by Deren Han, Hou-Duo Qi, Jiaxin Xie, Yonghan Sun.

Figure 1
Figure 1. Figure 1: Figures illustrate the singular value distribution of AIrP and A. The title of each subplot indicates the corresponding values of mp. The coefficient matrices are generated as (5000, 1000, 900, 300, 300, 2) matrices with RL = [900, 1000], RM = [300, 400], and RS = [50, 150]. factorizations, and adaptive updates required in their row-selection procedures. SkCPQR￾IS-Krylov-PS reduces the CPU time compared wi… view at source ↗
Figure 2
Figure 2. Figure 2: Figures depict the evolution of RSE with respect to the num￾ber of iterations (top) and the CPU time (bottom). The title of each sub￾plot indicates the corresponding values of mp. The coefficient matrices are generated as (5000, 1000, 900, 300, 300, 2) matrices with RL = [900, 1000], RM = [300, 400], and RS = [50, 150]. The other parameters are fixed as q = 50 and ℓ = 10. mp also decreases the CPU time, wh… view at source ↗
Figure 3
Figure 3. Figure 3: Performance of SqNorm-IS-Krylov-PS with different values of q, ℓ, and mp. The top row shows the number of full iterations k · q mr , and the bottom row shows the CPU time. Each subplot title indicates the corresponding values of mp. The coefficient matrices are generated as (1024, 128, 128, 16, 16, 2) matrices with RL = [900, 1000], RM = [300, 400], and RS = [50, 150] [PITH_FULL_IMAGE:figures/full_fig_p02… view at source ↗
Figure 4
Figure 4. Figure 4: Performance of IS-Krylov-PS and SqNorm-IS-Krylov-PS for lin￾ear systems with coefficient matrices from LIBSVM [5]. Figures depict the evolution of RSE with respect to the number of iterations and the CPU time. Each plot title indicates the dataset name and data dimensions. We set q = 300 and ℓ = 10. right-hand side is set to b = Ax∗ . The iterative methods are terminated when their solution accuracy reache… view at source ↗
Figure 5
Figure 5. Figure 5: The figures illustrate the evolution of CPU time with respect to the number of rows m. The title of each subplot indicates the corre￾sponding values of n and mp. The coefficient matrices are generated as (m, n, n, 100, 100, 2) matrices with RL = [900, 1000], RM = [300, 400], and RS = [50, 150]. The other parameters are fixed as ℓ = 50 and q = 30 [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗
read the original abstract

In this paper, we further investigate and refine the subspace-constrained preconditioning technique to enhance the theoretical and numerical convergence properties of randomized iterative methods for solving linear systems. In particular, we design a QR-like factorization that transforms the original linear system into an equivalent block-orthogonal form, thus avoiding the full-rank assumptions adopted in existing work. Moreover, this reformulation reduces the problem to solving a smaller linear system with a favorable singular value distribution, provided an appropriate initial point is employed. The proposed framework can be implemented implicitly within the iteration and does not require explicitly constructing either a preconditioner matrix or a preconditioned linear system, which eliminates the prohibitive cost of forming a fully preconditioned system. Furthermore, we construct orthogonalized search directions from stochastic gradients and develop accelerated variants of the framework. We prove that the proposed algorithmic framework converges linearly in expectation. Numerical experiments demonstrate the benefits of the proposed preconditioning strategy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a subspace-constrained preconditioning framework for randomized iterative methods solving linear systems. It introduces a QR-like factorization that converts the system to an equivalent block-orthogonal form without requiring full-rank assumptions, reduces the problem to a smaller linear system whose singular values are claimed to be favorable (provided an appropriate initial point), implements the approach implicitly without forming explicit preconditioners, constructs orthogonalized search directions from stochastic gradients, develops accelerated variants, proves linear convergence in expectation, and reports numerical benefits.

Significance. If the linear convergence holds without restrictive conditions on the initial point, the work would strengthen the theoretical foundation for randomized solvers by providing an implicit preconditioning strategy that avoids full-rank assumptions and expensive matrix constructions. The combination of the factorization reformulation, implicit implementation, and accelerated variants could improve practical efficiency for large-scale systems.

major comments (2)
  1. [Abstract / reformulation section] Abstract and the reformulation section: the reduction to a smaller linear system with favorable singular value distribution is stated to hold only 'provided an appropriate initial point is employed.' This condition appears load-bearing for both the spectral claim and the subsequent linear convergence proof; the manuscript must either supply a general, constructive procedure for obtaining such an initial point that works for arbitrary instances or clarify that the guarantee is conditional on a specially chosen start.
  2. [Convergence theorem] Convergence theorem (presumably §4 or the main theorem): the proof of linear convergence in expectation must be checked against whether it relies on the same initial-point condition used for the singular-value benefit. If the theorem statement does not explicitly restrict the initial vector, the derivation should be examined for an implicit assumption that would invalidate the result for generic starts.
minor comments (2)
  1. [Reformulation section] Clarify the precise relationship between the block-orthogonal form obtained by the QR-like factorization and the original system; an explicit statement of equivalence (including any rank or consistency conditions) would aid readability.
  2. [Implementation section] The description of the implicit implementation should include a short pseudocode or algorithmic outline showing how the factorization step is folded into the iteration without explicit matrix construction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and valuable comments on our manuscript. We address the two major comments point by point below, indicating the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract / reformulation section] Abstract and the reformulation section: the reduction to a smaller linear system with favorable singular value distribution is stated to hold only 'provided an appropriate initial point is employed.' This condition appears load-bearing for both the spectral claim and the subsequent linear convergence proof; the manuscript must either supply a general, constructive procedure for obtaining such an initial point that works for arbitrary instances or clarify that the guarantee is conditional on a specially chosen start.

    Authors: We agree that the dependence on an appropriate initial point for the favorable singular-value claim should be stated more explicitly. In the revised manuscript we will (i) retain the existing phrasing in the abstract but add a dedicated remark in the reformulation section that the spectral benefit is conditional, and (ii) supply a constructive, easily implementable procedure: the initial vector is taken as the orthogonal projection of an arbitrary starting guess onto the column space of the sketching matrix (or, equivalently, the solution of a small least-squares problem whose size is independent of the original dimension). This choice is always feasible and can be computed implicitly within the same iteration framework. We will also update the abstract to reference this procedure. revision: yes

  2. Referee: [Convergence theorem] Convergence theorem (presumably §4 or the main theorem): the proof of linear convergence in expectation must be checked against whether it relies on the same initial-point condition used for the singular-value benefit. If the theorem statement does not explicitly restrict the initial vector, the derivation should be examined for an implicit assumption that would invalidate the result for generic starts.

    Authors: We have re-examined the proof of the main convergence theorem. The linear rate is derived under the same block-orthogonal reformulation that yields the favorable singular values; consequently the analysis does rely on the initial vector satisfying the condition used for the spectral claim. The current theorem statement does not make this restriction explicit. In the revision we will add the precise assumption on the initial vector to the theorem statement and to the statement of the main result, and we will insert a short paragraph immediately after the theorem that recalls why the chosen initialization satisfies the hypothesis. No other changes to the proof are required. revision: yes

Circularity Check

0 steps flagged

No circularity: convergence proof is independent of fitted inputs or self-referential definitions

full rationale

The paper derives linear convergence in expectation for the subspace-constrained framework via reformulation to an equivalent block-orthogonal system (via QR-like factorization) followed by standard analysis of randomized iterative methods on the reduced system. The 'appropriate initial point' is an explicit modeling assumption stated in the abstract, not a hidden fit or self-definition. No equations reduce a claimed prediction to a fitted parameter by construction, and no load-bearing step relies on a self-citation chain that itself lacks external verification. The approach builds on prior randomized methods but adds an explicit factorization step whose spectral benefit is analyzed directly rather than assumed via renaming or smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard linear algebra tools and domain assumptions from randomized iterative methods; no free parameters, invented entities, or ad-hoc axioms are described in the abstract.

axioms (1)
  • domain assumption A QR-like factorization exists that transforms the linear system into an equivalent block-orthogonal form without requiring full rank.
    Invoked to avoid full-rank assumptions of prior work and enable the reduction to a smaller system.

pith-pipeline@v0.9.1-grok · 5687 in / 1167 out tokens · 31648 ms · 2026-06-29T06:19:22.854054+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

86 extracted references · 19 canonical work pages · 8 internal anchors

  1. [1]

    Amsel, Y

    N. Amsel, Y. Baumann, P. Beckman, P. B¨ urgisser, C. Cama˜ no, et al. Linear systems and eigenvalue problems: open questions from a Simons workshop.arXiv preprint arXiv:2602.05394, 2026

  2. [2]

    K. B. Athreya and S. N. Lahiri.Measure theory and probability theory. Springer, New York, 2006

  3. [3]

    Avron, P

    H. Avron, P. Maymounkov, and S. Toledo. Blendenpik: supercharging LAPACK’s least-squares solver. SIAM J. Sci. Comput., 32(3):1217–1236, 2010

  4. [4]

    M. Benzi. Preconditioning techniques for large linear systems: a survey.J. Comput. Phys., 182(2):418– 477, 2002

  5. [5]

    Chang and C.-J

    C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines.ACM Trans. Intell. Syst. Technol., 2(3):1–27, 2011

  6. [6]

    Y. Chen, E. N. Epperly, J. A. Tropp, and R. J. Webber. Randomly pivoted Cholesky: practical ap- proximation of a kernel matrix with few entry evaluations.Comm. Pure Appl. Math., 78(5):995–1041, 2025

  7. [7]

    Cortinovis and D

    A. Cortinovis and D. Kressner. Adaptive randomized pivoting for column subset selection, DEIM, and low-rank approximation.SIAM J. Matrix Anal. Appl., 47(1):25–47, 2026

  8. [8]

    E. J. Craig. The n-step iteration procedures.J. Math. Phys., 34(1-4):64–73, 1955

  9. [9]

    J. W. Demmel. The probability that a numerical analysis problem is difficult.Math. Comp., 50(182):449– 480, 1988

  10. [10]

    Derezi´ nski and M

    M. Derezi´ nski and M. W. Mahoney. Recent and upcoming developments in randomized numerical linear algebra for machine learning. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6470–6479, 2024

  11. [11]

    Derezi´ nski and J

    M. Derezi´ nski and J. Yang. Solving dense linear systems faster than via preconditioning. InProceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 1118–1129, 2024

  12. [12]

    Deshpande, L

    A. Deshpande, L. Rademacher, S. S. Vempala, and G. Wang. Matrix approximation and projective clustering via volume sampling.Theory Comput., 2(1):225–247, 2006

  13. [13]

    Y. Dong, C. Chen, P.-G. Martinsson, and K. Pearce. Robust blockwise random pivoting: fast and accurate adaptive interpolative decomposition.SIAM J. Matrix Anal. Appl., 46(3):1791–1815, 2025

  14. [14]

    Dong and P.-G

    Y. Dong and P.-G. Martinsson. Simpler is better: a comparative study of randomized pivoting algo- rithms for CUR and interpolative decompositions.Adv. Comput. Math., 49(4):66, 2023

  15. [15]

    J. A. Duersch and M. Gu. Randomized QR with column pivoting.SIAM J. Sci. Comput., 39(4):C263– C291, 2017

  16. [16]

    J. A. Duersch and M. Gu. Randomized projection for rank-revealing matrix factorizations and low-rank approximations.SIAM Rev., 62(3):661–682, 2020

  17. [17]

    Eckart and G

    C. Eckart and G. Young. The approximation of one matrix by another of lower rank.Psychometrika, 1(3):211–218, 1936. SUBSPACE-CONSTRAINED PRECONDITIONING FOR RIM 29

  18. [18]

    Ehrig and P

    R. Ehrig and P. Deuflhard. GMERR—an error-minimizing variant of GMRES. Technical Report SC- 97-63, ZIB, 1997

  19. [19]

    E. N. Epperly. Adaptive randomized pivoting and volume sampling.arXiv preprint arXiv:2510.02513, 2025

  20. [20]

    E. N. Epperly, M. Meier, and Y. Nakatsukasa. Fast randomized least-squares solvers can be just as accurate and stable as classical direct solvers.Commun. Pure Appl. Math., 79(2):293–339, 2026

  21. [21]

    Fletcher.Practical methods of optimization

    R. Fletcher.Practical methods of optimization. John Wiley & Sons, Chichester, 2013

  22. [22]

    Frieze, R

    A. Frieze, R. Kannan, and S. Vempala. Fast Monte-Carlo algorithms for finding low-rank approxima- tions.J. ACM, 51(6):1025–1041, 2004

  23. [23]

    S. Garg, A. S. Berahas, and M. Derezi´ nski. Second-order information promotes mini-batch robustness in variance-reduced gradients.J. Mach. Learn. Res., 26(306):1–49, 2025

  24. [24]

    Garrigos and R

    G. Garrigos and R. M. Gower. Handbook of convergence theorems for (stochastic) gradient methods. arXiv preprint arXiv:2301.11235, 2023

  25. [25]

    A. Gaul, M. H. Gutknecht, J. Liesen, and R. Nabben. A framework for deflated and augmented Krylov subspace methods.SIAM J. Matrix Anal. Appl., 34(2):495–518, 2013

  26. [26]

    G. H. Golub and C. F. Van Loan.Matrix computations. Johns Hopkins University Press, Baltimore, 2013

  27. [27]

    Goswami and B

    A. Goswami and B. V. Rao.Measure theory for analysis and probability. Springer, Singapore, 2025

  28. [28]

    R. M. Gower, N. Loizou, X. Qian, A. Sailanbayev, E. Shulgin, and P. Richt´ arik. SGD: general analysis and improved rates. InProceedings of the 36th International Conference on Machine Learning, pages 5200–5209, 2019

  29. [29]

    R. M. Gower and P. Richt´ arik. Randomized iterative methods for linear systems.SIAM J. Matrix Anal. Appl., 36(4):1660–1690, 2015

  30. [30]

    L. Guo, R. Xiang, D. Han, and J. Xie. Enhanced randomized Douglas-Rachford method: Improved probabilities and adaptive momentum.arXiv preprint arXiv:2506.10261, 2025

  31. [31]

    Halko, P.-G

    N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: probabilistic algo- rithms for constructing approximate matrix decompositions.SIAM Rev., 53(2):217–288, 2011

  32. [32]

    D. Han, Y. Su, and J. Xie. Randomized Douglas–Rachford methods for linear systems: improved accuracy and efficiency.SIAM J. Optim., 34(1):1045–1070, 2024

  33. [33]

    Han and J

    D. Han and J. Xie. On pseudoinverse-free randomized methods for linear systems: unified framework and acceleration.Optim. Methods Softw., 41(1):82–117, 2026

  34. [34]

    I. C. Ipsen and A. K. Saibaba. Many (most?) column subset selection criteria are NP hard.arXiv preprint arXiv:2511.02740, 2025

  35. [35]

    Kaczmarz

    S. Kaczmarz. Angen¨ aherte aufl¨ osung von systemen linearer glei-chungen.Bull. Int. Acad. Pol. Sic. Let., Cl. Sci. Math. Nat., pages 355–357, 1937

  36. [36]

    Ke and H

    Y. Ke and H. Luo. Robust Kaczmarz methods for nearly singular linear systems.arXiv preprint arXiv:2602.21916, 2026

  37. [37]

    C. T. Kelley.Iterative methods for linear and nonlinear equations. SIAM, Philadelphia, 1995

  38. [38]

    D. P. Kingma and J. Ba. Adam: a method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

  39. [39]

    S. P. Kolodziej, M. Aznaveh, M. Bullock, J. David, T. A. Davis, M. Henderson, Y. Hu, and R. Sand- strom. The SuiteSparse matrix collection website interface.J. Open Source Softw., 4(35):1244, 2019

  40. [40]

    D. Kovalev. SGD with adaptive preconditioning: unified analysis and momentum acceleration.arXiv preprint arXiv:2506.23803, 2025

  41. [41]

    X.-L. Li. Preconditioned stochastic gradient descent.IEEE Trans. Neural Netw. Learn. Syst., 29(5):1454–1466, 2017

  42. [42]

    Liu and S

    J. Liu and S. Wright. An accelerated randomized Kaczmarz algorithm.Math. Comp., 85(297):153–178, 2016

  43. [43]

    Loizou and P

    N. Loizou and P. Richt´ arik. Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods.Comput. Optim. Appl., 77(3):653–710, 2020

  44. [44]

    Lok and E

    J. Lok and E. Rebrova. A subspace constrained randomized Kaczmarz method for structure or external knowledge exploitation.Linear Algebra Appl., 698:220–260, 2024. 30 YONGHAN SUN, HOU-DUO QI, DEREN HAN, AND JIAXIN XIE

  45. [45]

    Lok and E

    J. Lok and E. Rebrova. Subspace-constrained randomized coordinate descent for linear systems with good low-rank matrix approximations.arXiv preprint arXiv:2506.09394, 2025

  46. [46]

    D. A. Lorenz and M. Winkler. Minimal error momentum Bregman-Kaczmarz.Linear Algebra Appl., 709:416–448, 2025

  47. [47]

    Ma and D

    A. Ma and D. Needell. Stochastic gradient descent for linear systems with missing data.Numer. Math. Theory Methods Appl., 12(1):1–20, 2019

  48. [48]

    Martens and R

    J. Martens and R. Grosse. Optimizing neural networks with Kronecker-factored approximate curvature. arXiv preprint arXiv:1503.05671, 2015

  49. [49]

    Martinsson and J

    P.-G. Martinsson and J. A. Tropp. Randomized numerical linear algebra: foundations and algorithms. Acta Numer., 29:403–572, 2020

  50. [50]

    Meier, Y

    M. Meier, Y. Nakatsukasa, A. Townsend, and M. Webb. Are sketch-and-precondition least squares solvers numerically stable?SIAM J. Matrix Anal. Appl., 45(2):905–929, 2024

  51. [51]

    X. Meng, M. A. Saunders, and M. W. Mahoney. LSRN: a parallel iterative solver for strongly over- or underdetermined systems.SIAM J. Sci. Comput., 36(2):C95–C118, 2014

  52. [52]

    Spectrum Approximation Beyond Fast Matrix Multiplication: Algorithms and Hardness

    C. Musco, P. Netrapalli, A. Sidford, S. Ubaru, and D. P. Woodruff. Spectrum approximation beyond fast matrix multiplication: algorithms and hardness.arXiv preprint arXiv:1704.04163, 2017

  53. [53]

    I. Necoara. Faster randomized block Kaczmarz algorithms.SIAM J. Matrix Anal. Appl., 40(4):1425– 1452, 2019

  54. [54]

    Needell, N

    D. Needell, N. Srebro, and R. Ward. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm.Math. Program., 155:549–573, 2016

  55. [55]

    Needell and J

    D. Needell and J. A. Tropp. Paved with good intentions: analysis of a randomized block Kaczmarz method.Linear Algebra Appl., 441:199–221, 2014

  56. [56]

    Nemirovski, A

    A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming.SIAM J. Optim., 19(4):1574–1609, 2009

  57. [57]

    Pearce, C

    K. Pearce, C. Chen, Y. Dong, and P.-G. Martinsson. Adaptive parallelizable algorithms for interpolative decompositions via partially pivoted LU.Numer. Linear Algebra Appl., 32(1):e70002, 2025

  58. [58]

    K. J. Pearce and P.-G. Martinsson. Randomized algorithms for low-rank matrix and tensor decompo- sitions.arXiv preprint arXiv:2512.05286, 2025

  59. [59]

    S. J. Reddi, S. Kale, and S. Kumar. On the convergence of Adam and beyond.arXiv preprint arXiv:1904.09237, 2019

  60. [60]

    J. Rieger. Generalized Gearhart–Koshy acceleration for the Kaczmarz method.Math. Comp., 92:1251– 1272, 2023

  61. [61]

    Robbins and S

    H. Robbins and S. Monro. A stochastic approximation method.Ann. Math. Statist., 22:400–407, 1951

  62. [62]

    Saad.Iterative methods for sparse linear systems

    Y. Saad.Iterative methods for sparse linear systems. SIAM, Philadelphia, 2003

  63. [63]

    Saad and M

    Y. Saad and M. H. Schultz. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems.SIAM J. Sci. Stat. Comput., 7(3):856–869, 1986

  64. [64]

    Sch¨ opfer and D

    F. Sch¨ opfer and D. A. Lorenz. Linear convergence of the randomized sparse Kaczmarz method.Math. Program., 173(1):509–536, 2019

  65. [65]

    N. N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent.Neural Comput., 14(7):1723–1738, 2002

  66. [66]

    A. J. Scott and G. P. Styan. On a separation theorem for generalized eigenvalues and a problem in the analysis of sample surveys.Linear Algebra Appl., 70:209–224, 1985

  67. [67]

    Scott and M

    J. Scott and M. T˚ uma. Sparse linear least-squares problems.Acta Numer., 34:891–1010, 2025

  68. [68]

    Design Criteria for SGD Preconditioners: Local Conditioning, Noise Floors, and Basin Stability

    M. Scott, T. Xu, Z. Tang, A. Pichette-Emmons, Q. Ye, Y. Saad, and Y. Xi. Designing preconditioners for SGD: local conditioning, noise floors, and basin stability.arXiv preprint arXiv:2511.19716, 2025

  69. [69]

    J. R. Shewchuk. An introduction to the conjugate gradient method without the agonizing pain. Technical Report CMU-CS-94-125, Carnegie Mellon University, Pittsburgh, PA, 1994

  70. [70]

    Strohmer and R

    T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with exponential convergence.J. Fourier Anal. Appl., 15(2):262–278, 2009

  71. [71]

    Y. Su, D. Han, Y. Zeng, and J. Xie. On greedy multi-step inertial randomized Kaczmarz method for solving linear systems.Calcolo, 61(4):68, 2024. SUBSPACE-CONSTRAINED PRECONDITIONING FOR RIM 31

  72. [72]

    Y. Sun, D. Han, and J. Xie. Connecting randomized iterative methods with Krylov subspaces.arXiv preprint arXiv:2505.20602, 2025

  73. [73]

    Y. Wang, Y. Sun, D. Han, and J. Xie. Linear convergence of Gearhart–Koshy accelerated Kaczmarz methods for tensor linear systems.arXiv preprint arXiv:2604.05816, 2026

  74. [74]

    R. Weiss. Error-minimizing Krylov subspace methods.SIAM J. Sci. Comput., 15(3):511–527, 1994

  75. [75]

    R. Weiss. A theoretical overview of Krylov subspace methods.Appl. Numer. Math., 19(3):207–233, 1995

  76. [76]

    Xiang, J

    R. Xiang, J. Xie, and Q. Zhang. Randomized block Kaczmarz with volume sampling: momentum acceleration and efficient implementation.arXiv preprint arXiv:2503.13941, 2025

  77. [77]

    Xie, H.-D

    J. Xie, H.-D. Qi, and D. Han. Randomized iterative methods for generalized absolute value equations: solvability and error bounds.SIAM J. Optim., 35(3):1731–1760, 2025

  78. [78]

    Xie and Z

    J. Xie and Z. Xu. Subset selection for matrices with fixed blocks.Israel J. Math., 245(1):1–26, 2021

  79. [79]

    Q. Ye. Preconditioning for accelerated gradient descent optimization and regularization.arXiv preprint arXiv:2410.00232, 2024

  80. [80]

    Randomized conjugate gradient least squares

    Y. Zeng, J.-F. Cai, D. Han, and J. Xie. Randomized conjugate gradient least squares.arXiv preprint arXiv:2605.25034, 2026

Showing first 80 references.