On subspace-constrained preconditioning for randomized iterative methods

Deren Han; Hou-Duo Qi; Jiaxin Xie; Yonghan Sun

arxiv: 2605.29304 · v1 · pith:GF5OUP3Snew · submitted 2026-05-28 · 🧮 math.NA · cs.NA

On subspace-constrained preconditioning for randomized iterative methods

Yonghan Sun , Hou-Duo Qi , Deren Han , Jiaxin Xie This is my paper

Pith reviewed 2026-06-29 06:19 UTC · model grok-4.3

classification 🧮 math.NA cs.NA

keywords subspace-constrained preconditioningrandomized iterative methodslinear systemsQR factorizationconvergence analysispreconditioning

0 comments

The pith

A QR-like factorization allows subspace-constrained preconditioning to reduce randomized linear solvers to smaller systems with linear convergence in expectation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper refines subspace-constrained preconditioning for randomized iterative methods that solve linear systems. The approach introduces a QR-like factorization to convert the system into an equivalent block-orthogonal form without relying on full-rank assumptions from earlier methods. The reformulation shrinks the problem to a smaller linear system possessing a favorable distribution of singular values when an appropriate initial point is used. The framework operates implicitly inside the iterations, avoiding the expense of building an explicit preconditioner or preconditioned matrix. It further incorporates orthogonalized search directions derived from stochastic gradients along with accelerated versions, and establishes linear convergence in expectation for the overall procedure.

Core claim

The subspace-constrained preconditioning framework, realized through an implicit QR-like factorization, transforms the original linear system into block-orthogonal form. This step reduces the task to solving a smaller linear system with improved singular value properties from a suitable starting vector. The resulting algorithmic structure converges linearly in expectation while remaining computationally efficient by not constructing preconditioners explicitly.

What carries the argument

The QR-like factorization that produces a block-orthogonal equivalent system, enabling reduction to a smaller problem with favorable singular values.

If this is right

The technique eliminates the need for full-rank assumptions required in previous subspace preconditioning work.
Implementation stays implicit, sidestepping the high cost of forming a complete preconditioned system.
Orthogonal search directions constructed from stochastic gradients support accelerated variants of the method.
Linear convergence holds in expectation under the stated conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The reduction to smaller systems may extend naturally to other classes of iterative solvers beyond randomized ones.
Choosing the initial point to achieve the favorable singular value distribution could be explored through adaptive strategies in practice.
The implicit nature suggests the approach scales well to very large sparse systems where explicit matrices are impractical.

Load-bearing premise

The reformulation reduces the problem to solving a smaller linear system with a favorable singular value distribution, provided an appropriate initial point is employed.

What would settle it

Running the method on a test linear system with a chosen initial point and checking whether the expected residual or error decreases at a constant linear rate per iteration; failure to observe this rate would challenge the convergence claim.

Figures

Figures reproduced from arXiv: 2605.29304 by Deren Han, Hou-Duo Qi, Jiaxin Xie, Yonghan Sun.

**Figure 2.** Figure 2: Figures depict the evolution of RSE with respect to the number of iterations (top) and the CPU time (bottom). The title of each subplot indicates the corresponding values of mp. The coefficient matrices are generated as (5000, 1000, 900, 300, 300, 2) matrices with RL = [900, 1000], RM = [300, 400], and RS = [50, 150]. The other parameters are fixed as q = 50 and ℓ = 10. mp also decreases the CPU time, wh… view at source ↗

**Figure 3.** Figure 3: Performance of SqNorm-IS-Krylov-PS with different values of q, ℓ, and mp. The top row shows the number of full iterations k · q mr , and the bottom row shows the CPU time. Each subplot title indicates the corresponding values of mp. The coefficient matrices are generated as (1024, 128, 128, 16, 16, 2) matrices with RL = [900, 1000], RM = [300, 400], and RS = [50, 150] [PITH_FULL_IMAGE:figures/full_fig_p02… view at source ↗

**Figure 4.** Figure 4: Performance of IS-Krylov-PS and SqNorm-IS-Krylov-PS for linear systems with coefficient matrices from LIBSVM [5]. Figures depict the evolution of RSE with respect to the number of iterations and the CPU time. Each plot title indicates the dataset name and data dimensions. We set q = 300 and ℓ = 10. right-hand side is set to b = Ax∗ . The iterative methods are terminated when their solution accuracy reache… view at source ↗

**Figure 5.** Figure 5: The figures illustrate the evolution of CPU time with respect to the number of rows m. The title of each subplot indicates the corresponding values of n and mp. The coefficient matrices are generated as (m, n, n, 100, 100, 2) matrices with RL = [900, 1000], RM = [300, 400], and RS = [50, 150]. The other parameters are fixed as ℓ = 50 and q = 30 [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗

read the original abstract

In this paper, we further investigate and refine the subspace-constrained preconditioning technique to enhance the theoretical and numerical convergence properties of randomized iterative methods for solving linear systems. In particular, we design a QR-like factorization that transforms the original linear system into an equivalent block-orthogonal form, thus avoiding the full-rank assumptions adopted in existing work. Moreover, this reformulation reduces the problem to solving a smaller linear system with a favorable singular value distribution, provided an appropriate initial point is employed. The proposed framework can be implemented implicitly within the iteration and does not require explicitly constructing either a preconditioner matrix or a preconditioned linear system, which eliminates the prohibitive cost of forming a fully preconditioned system. Furthermore, we construct orthogonalized search directions from stochastic gradients and develop accelerated variants of the framework. We prove that the proposed algorithmic framework converges linearly in expectation. Numerical experiments demonstrate the benefits of the proposed preconditioning strategy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Refines subspace preconditioning via QR-like factorization to drop full-rank assumptions and enable implicit use, but linear convergence in expectation requires a special initial point whose general availability is unclear.

read the letter

The paper takes an existing subspace-constrained preconditioning approach for randomized iterative solvers and adds a QR-like factorization step. This produces an equivalent block-orthogonal system without needing full column rank on the constraint matrix, then reduces the work to a smaller system whose singular values are claimed to be favorable. The whole thing runs implicitly inside the iteration so no explicit preconditioner matrix is ever formed. They also orthogonalize the search directions from stochastic gradients and give accelerated variants, plus a proof of linear convergence in expectation.

The implicit implementation and the rank relaxation are the concrete advances. Both address real practical costs in large-scale linear systems, and the framework stays within the randomized Kaczmarz-style literature it cites.

The load-bearing assumption is the initial point. The abstract states the favorable singular-value reduction holds only when an appropriate initial point is employed. If the paper supplies no general construction or guarantee that works for arbitrary right-hand sides and starting vectors, then the convergence rate does not apply to standard starts and the spectral benefit is conditional rather than automatic. That is the main soft spot; everything else follows from prior randomized-method results.

The work is aimed at specialists already using or analyzing randomized iterative methods for linear systems. A reader in that narrow area can extract the factorization trick and the implicit implementation details. It is solid enough on its own terms to merit referee time rather than a desk reject, though the initial-point condition will need clarification in review.

Referee Report

2 major / 2 minor

Summary. The paper develops a subspace-constrained preconditioning framework for randomized iterative methods solving linear systems. It introduces a QR-like factorization that converts the system to an equivalent block-orthogonal form without requiring full-rank assumptions, reduces the problem to a smaller linear system whose singular values are claimed to be favorable (provided an appropriate initial point), implements the approach implicitly without forming explicit preconditioners, constructs orthogonalized search directions from stochastic gradients, develops accelerated variants, proves linear convergence in expectation, and reports numerical benefits.

Significance. If the linear convergence holds without restrictive conditions on the initial point, the work would strengthen the theoretical foundation for randomized solvers by providing an implicit preconditioning strategy that avoids full-rank assumptions and expensive matrix constructions. The combination of the factorization reformulation, implicit implementation, and accelerated variants could improve practical efficiency for large-scale systems.

major comments (2)

[Abstract / reformulation section] Abstract and the reformulation section: the reduction to a smaller linear system with favorable singular value distribution is stated to hold only 'provided an appropriate initial point is employed.' This condition appears load-bearing for both the spectral claim and the subsequent linear convergence proof; the manuscript must either supply a general, constructive procedure for obtaining such an initial point that works for arbitrary instances or clarify that the guarantee is conditional on a specially chosen start.
[Convergence theorem] Convergence theorem (presumably §4 or the main theorem): the proof of linear convergence in expectation must be checked against whether it relies on the same initial-point condition used for the singular-value benefit. If the theorem statement does not explicitly restrict the initial vector, the derivation should be examined for an implicit assumption that would invalidate the result for generic starts.

minor comments (2)

[Reformulation section] Clarify the precise relationship between the block-orthogonal form obtained by the QR-like factorization and the original system; an explicit statement of equivalence (including any rank or consistency conditions) would aid readability.
[Implementation section] The description of the implicit implementation should include a short pseudocode or algorithmic outline showing how the factorization step is folded into the iteration without explicit matrix construction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and valuable comments on our manuscript. We address the two major comments point by point below, indicating the revisions we will make.

read point-by-point responses

Referee: [Abstract / reformulation section] Abstract and the reformulation section: the reduction to a smaller linear system with favorable singular value distribution is stated to hold only 'provided an appropriate initial point is employed.' This condition appears load-bearing for both the spectral claim and the subsequent linear convergence proof; the manuscript must either supply a general, constructive procedure for obtaining such an initial point that works for arbitrary instances or clarify that the guarantee is conditional on a specially chosen start.

Authors: We agree that the dependence on an appropriate initial point for the favorable singular-value claim should be stated more explicitly. In the revised manuscript we will (i) retain the existing phrasing in the abstract but add a dedicated remark in the reformulation section that the spectral benefit is conditional, and (ii) supply a constructive, easily implementable procedure: the initial vector is taken as the orthogonal projection of an arbitrary starting guess onto the column space of the sketching matrix (or, equivalently, the solution of a small least-squares problem whose size is independent of the original dimension). This choice is always feasible and can be computed implicitly within the same iteration framework. We will also update the abstract to reference this procedure. revision: yes
Referee: [Convergence theorem] Convergence theorem (presumably §4 or the main theorem): the proof of linear convergence in expectation must be checked against whether it relies on the same initial-point condition used for the singular-value benefit. If the theorem statement does not explicitly restrict the initial vector, the derivation should be examined for an implicit assumption that would invalidate the result for generic starts.

Authors: We have re-examined the proof of the main convergence theorem. The linear rate is derived under the same block-orthogonal reformulation that yields the favorable singular values; consequently the analysis does rely on the initial vector satisfying the condition used for the spectral claim. The current theorem statement does not make this restriction explicit. In the revision we will add the precise assumption on the initial vector to the theorem statement and to the statement of the main result, and we will insert a short paragraph immediately after the theorem that recalls why the chosen initialization satisfies the hypothesis. No other changes to the proof are required. revision: yes

Circularity Check

0 steps flagged

No circularity: convergence proof is independent of fitted inputs or self-referential definitions

full rationale

The paper derives linear convergence in expectation for the subspace-constrained framework via reformulation to an equivalent block-orthogonal system (via QR-like factorization) followed by standard analysis of randomized iterative methods on the reduced system. The 'appropriate initial point' is an explicit modeling assumption stated in the abstract, not a hidden fit or self-definition. No equations reduce a claimed prediction to a fitted parameter by construction, and no load-bearing step relies on a self-citation chain that itself lacks external verification. The approach builds on prior randomized methods but adds an explicit factorization step whose spectral benefit is analyzed directly rather than assumed via renaming or smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard linear algebra tools and domain assumptions from randomized iterative methods; no free parameters, invented entities, or ad-hoc axioms are described in the abstract.

axioms (1)

domain assumption A QR-like factorization exists that transforms the linear system into an equivalent block-orthogonal form without requiring full rank.
Invoked to avoid full-rank assumptions of prior work and enable the reduction to a smaller system.

pith-pipeline@v0.9.1-grok · 5687 in / 1167 out tokens · 31648 ms · 2026-06-29T06:19:22.854054+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

86 extracted references · 19 canonical work pages · 8 internal anchors

[1]

Amsel, Y

N. Amsel, Y. Baumann, P. Beckman, P. B¨ urgisser, C. Cama˜ no, et al. Linear systems and eigenvalue problems: open questions from a Simons workshop.arXiv preprint arXiv:2602.05394, 2026

work page arXiv 2026
[2]

K. B. Athreya and S. N. Lahiri.Measure theory and probability theory. Springer, New York, 2006

2006
[3]

Avron, P

H. Avron, P. Maymounkov, and S. Toledo. Blendenpik: supercharging LAPACK’s least-squares solver. SIAM J. Sci. Comput., 32(3):1217–1236, 2010

2010
[4]

M. Benzi. Preconditioning techniques for large linear systems: a survey.J. Comput. Phys., 182(2):418– 477, 2002

2002
[5]

Chang and C.-J

C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines.ACM Trans. Intell. Syst. Technol., 2(3):1–27, 2011

2011
[6]

Y. Chen, E. N. Epperly, J. A. Tropp, and R. J. Webber. Randomly pivoted Cholesky: practical ap- proximation of a kernel matrix with few entry evaluations.Comm. Pure Appl. Math., 78(5):995–1041, 2025

2025
[7]

Cortinovis and D

A. Cortinovis and D. Kressner. Adaptive randomized pivoting for column subset selection, DEIM, and low-rank approximation.SIAM J. Matrix Anal. Appl., 47(1):25–47, 2026

2026
[8]

E. J. Craig. The n-step iteration procedures.J. Math. Phys., 34(1-4):64–73, 1955

1955
[9]

J. W. Demmel. The probability that a numerical analysis problem is difficult.Math. Comp., 50(182):449– 480, 1988

1988
[10]

Derezi´ nski and M

M. Derezi´ nski and M. W. Mahoney. Recent and upcoming developments in randomized numerical linear algebra for machine learning. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6470–6479, 2024

2024
[11]

Derezi´ nski and J

M. Derezi´ nski and J. Yang. Solving dense linear systems faster than via preconditioning. InProceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 1118–1129, 2024

2024
[12]

Deshpande, L

A. Deshpande, L. Rademacher, S. S. Vempala, and G. Wang. Matrix approximation and projective clustering via volume sampling.Theory Comput., 2(1):225–247, 2006

2006
[13]

Y. Dong, C. Chen, P.-G. Martinsson, and K. Pearce. Robust blockwise random pivoting: fast and accurate adaptive interpolative decomposition.SIAM J. Matrix Anal. Appl., 46(3):1791–1815, 2025

2025
[14]

Dong and P.-G

Y. Dong and P.-G. Martinsson. Simpler is better: a comparative study of randomized pivoting algo- rithms for CUR and interpolative decompositions.Adv. Comput. Math., 49(4):66, 2023

2023
[15]

J. A. Duersch and M. Gu. Randomized QR with column pivoting.SIAM J. Sci. Comput., 39(4):C263– C291, 2017

2017
[16]

J. A. Duersch and M. Gu. Randomized projection for rank-revealing matrix factorizations and low-rank approximations.SIAM Rev., 62(3):661–682, 2020

2020
[17]

Eckart and G

C. Eckart and G. Young. The approximation of one matrix by another of lower rank.Psychometrika, 1(3):211–218, 1936. SUBSPACE-CONSTRAINED PRECONDITIONING FOR RIM 29

1936
[18]

Ehrig and P

R. Ehrig and P. Deuflhard. GMERR—an error-minimizing variant of GMRES. Technical Report SC- 97-63, ZIB, 1997

1997
[19]

E. N. Epperly. Adaptive randomized pivoting and volume sampling.arXiv preprint arXiv:2510.02513, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

E. N. Epperly, M. Meier, and Y. Nakatsukasa. Fast randomized least-squares solvers can be just as accurate and stable as classical direct solvers.Commun. Pure Appl. Math., 79(2):293–339, 2026

2026
[21]

Fletcher.Practical methods of optimization

R. Fletcher.Practical methods of optimization. John Wiley & Sons, Chichester, 2013

2013
[22]

Frieze, R

A. Frieze, R. Kannan, and S. Vempala. Fast Monte-Carlo algorithms for finding low-rank approxima- tions.J. ACM, 51(6):1025–1041, 2004

2004
[23]

S. Garg, A. S. Berahas, and M. Derezi´ nski. Second-order information promotes mini-batch robustness in variance-reduced gradients.J. Mach. Learn. Res., 26(306):1–49, 2025

2025
[24]

Garrigos and R

G. Garrigos and R. M. Gower. Handbook of convergence theorems for (stochastic) gradient methods. arXiv preprint arXiv:2301.11235, 2023

work page arXiv 2023
[25]

A. Gaul, M. H. Gutknecht, J. Liesen, and R. Nabben. A framework for deflated and augmented Krylov subspace methods.SIAM J. Matrix Anal. Appl., 34(2):495–518, 2013

2013
[26]

G. H. Golub and C. F. Van Loan.Matrix computations. Johns Hopkins University Press, Baltimore, 2013

2013
[27]

Goswami and B

A. Goswami and B. V. Rao.Measure theory for analysis and probability. Springer, Singapore, 2025

2025
[28]

R. M. Gower, N. Loizou, X. Qian, A. Sailanbayev, E. Shulgin, and P. Richt´ arik. SGD: general analysis and improved rates. InProceedings of the 36th International Conference on Machine Learning, pages 5200–5209, 2019

2019
[29]

R. M. Gower and P. Richt´ arik. Randomized iterative methods for linear systems.SIAM J. Matrix Anal. Appl., 36(4):1660–1690, 2015

2015
[30]

L. Guo, R. Xiang, D. Han, and J. Xie. Enhanced randomized Douglas-Rachford method: Improved probabilities and adaptive momentum.arXiv preprint arXiv:2506.10261, 2025

work page arXiv 2025
[31]

Halko, P.-G

N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: probabilistic algo- rithms for constructing approximate matrix decompositions.SIAM Rev., 53(2):217–288, 2011

2011
[32]

D. Han, Y. Su, and J. Xie. Randomized Douglas–Rachford methods for linear systems: improved accuracy and efficiency.SIAM J. Optim., 34(1):1045–1070, 2024

2024
[33]

Han and J

D. Han and J. Xie. On pseudoinverse-free randomized methods for linear systems: unified framework and acceleration.Optim. Methods Softw., 41(1):82–117, 2026

2026
[34]

I. C. Ipsen and A. K. Saibaba. Many (most?) column subset selection criteria are NP hard.arXiv preprint arXiv:2511.02740, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[35]

Kaczmarz

S. Kaczmarz. Angen¨ aherte aufl¨ osung von systemen linearer glei-chungen.Bull. Int. Acad. Pol. Sic. Let., Cl. Sci. Math. Nat., pages 355–357, 1937

1937
[36]

Ke and H

Y. Ke and H. Luo. Robust Kaczmarz methods for nearly singular linear systems.arXiv preprint arXiv:2602.21916, 2026

work page arXiv 2026
[37]

C. T. Kelley.Iterative methods for linear and nonlinear equations. SIAM, Philadelphia, 1995

1995
[38]

D. P. Kingma and J. Ba. Adam: a method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[39]

S. P. Kolodziej, M. Aznaveh, M. Bullock, J. David, T. A. Davis, M. Henderson, Y. Hu, and R. Sand- strom. The SuiteSparse matrix collection website interface.J. Open Source Softw., 4(35):1244, 2019

2019
[40]

D. Kovalev. SGD with adaptive preconditioning: unified analysis and momentum acceleration.arXiv preprint arXiv:2506.23803, 2025

work page arXiv 2025
[41]

X.-L. Li. Preconditioned stochastic gradient descent.IEEE Trans. Neural Netw. Learn. Syst., 29(5):1454–1466, 2017

2017
[42]

Liu and S

J. Liu and S. Wright. An accelerated randomized Kaczmarz algorithm.Math. Comp., 85(297):153–178, 2016

2016
[43]

Loizou and P

N. Loizou and P. Richt´ arik. Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods.Comput. Optim. Appl., 77(3):653–710, 2020

2020
[44]

Lok and E

J. Lok and E. Rebrova. A subspace constrained randomized Kaczmarz method for structure or external knowledge exploitation.Linear Algebra Appl., 698:220–260, 2024. 30 YONGHAN SUN, HOU-DUO QI, DEREN HAN, AND JIAXIN XIE

2024
[45]

Lok and E

J. Lok and E. Rebrova. Subspace-constrained randomized coordinate descent for linear systems with good low-rank matrix approximations.arXiv preprint arXiv:2506.09394, 2025

work page arXiv 2025
[46]

D. A. Lorenz and M. Winkler. Minimal error momentum Bregman-Kaczmarz.Linear Algebra Appl., 709:416–448, 2025

2025
[47]

Ma and D

A. Ma and D. Needell. Stochastic gradient descent for linear systems with missing data.Numer. Math. Theory Methods Appl., 12(1):1–20, 2019

2019
[48]

Martens and R

J. Martens and R. Grosse. Optimizing neural networks with Kronecker-factored approximate curvature. arXiv preprint arXiv:1503.05671, 2015

work page arXiv 2015
[49]

Martinsson and J

P.-G. Martinsson and J. A. Tropp. Randomized numerical linear algebra: foundations and algorithms. Acta Numer., 29:403–572, 2020

2020
[50]

Meier, Y

M. Meier, Y. Nakatsukasa, A. Townsend, and M. Webb. Are sketch-and-precondition least squares solvers numerically stable?SIAM J. Matrix Anal. Appl., 45(2):905–929, 2024

2024
[51]

X. Meng, M. A. Saunders, and M. W. Mahoney. LSRN: a parallel iterative solver for strongly over- or underdetermined systems.SIAM J. Sci. Comput., 36(2):C95–C118, 2014

2014
[52]

Spectrum Approximation Beyond Fast Matrix Multiplication: Algorithms and Hardness

C. Musco, P. Netrapalli, A. Sidford, S. Ubaru, and D. P. Woodruff. Spectrum approximation beyond fast matrix multiplication: algorithms and hardness.arXiv preprint arXiv:1704.04163, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[53]

I. Necoara. Faster randomized block Kaczmarz algorithms.SIAM J. Matrix Anal. Appl., 40(4):1425– 1452, 2019

2019
[54]

Needell, N

D. Needell, N. Srebro, and R. Ward. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm.Math. Program., 155:549–573, 2016

2016
[55]

Needell and J

D. Needell and J. A. Tropp. Paved with good intentions: analysis of a randomized block Kaczmarz method.Linear Algebra Appl., 441:199–221, 2014

2014
[56]

Nemirovski, A

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming.SIAM J. Optim., 19(4):1574–1609, 2009

2009
[57]

Pearce, C

K. Pearce, C. Chen, Y. Dong, and P.-G. Martinsson. Adaptive parallelizable algorithms for interpolative decompositions via partially pivoted LU.Numer. Linear Algebra Appl., 32(1):e70002, 2025

2025
[58]

K. J. Pearce and P.-G. Martinsson. Randomized algorithms for low-rank matrix and tensor decompo- sitions.arXiv preprint arXiv:2512.05286, 2025

work page arXiv 2025
[59]

S. J. Reddi, S. Kale, and S. Kumar. On the convergence of Adam and beyond.arXiv preprint arXiv:1904.09237, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[60]

J. Rieger. Generalized Gearhart–Koshy acceleration for the Kaczmarz method.Math. Comp., 92:1251– 1272, 2023

2023
[61]

Robbins and S

H. Robbins and S. Monro. A stochastic approximation method.Ann. Math. Statist., 22:400–407, 1951

1951
[62]

Saad.Iterative methods for sparse linear systems

Y. Saad.Iterative methods for sparse linear systems. SIAM, Philadelphia, 2003

2003
[63]

Saad and M

Y. Saad and M. H. Schultz. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems.SIAM J. Sci. Stat. Comput., 7(3):856–869, 1986

1986
[64]

Sch¨ opfer and D

F. Sch¨ opfer and D. A. Lorenz. Linear convergence of the randomized sparse Kaczmarz method.Math. Program., 173(1):509–536, 2019

2019
[65]

N. N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent.Neural Comput., 14(7):1723–1738, 2002

2002
[66]

A. J. Scott and G. P. Styan. On a separation theorem for generalized eigenvalues and a problem in the analysis of sample surveys.Linear Algebra Appl., 70:209–224, 1985

1985
[67]

Scott and M

J. Scott and M. T˚ uma. Sparse linear least-squares problems.Acta Numer., 34:891–1010, 2025

2025
[68]

Design Criteria for SGD Preconditioners: Local Conditioning, Noise Floors, and Basin Stability

M. Scott, T. Xu, Z. Tang, A. Pichette-Emmons, Q. Ye, Y. Saad, and Y. Xi. Designing preconditioners for SGD: local conditioning, noise floors, and basin stability.arXiv preprint arXiv:2511.19716, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[69]

J. R. Shewchuk. An introduction to the conjugate gradient method without the agonizing pain. Technical Report CMU-CS-94-125, Carnegie Mellon University, Pittsburgh, PA, 1994

1994
[70]

Strohmer and R

T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with exponential convergence.J. Fourier Anal. Appl., 15(2):262–278, 2009

2009
[71]

Y. Su, D. Han, Y. Zeng, and J. Xie. On greedy multi-step inertial randomized Kaczmarz method for solving linear systems.Calcolo, 61(4):68, 2024. SUBSPACE-CONSTRAINED PRECONDITIONING FOR RIM 31

2024
[72]

Y. Sun, D. Han, and J. Xie. Connecting randomized iterative methods with Krylov subspaces.arXiv preprint arXiv:2505.20602, 2025

work page arXiv 2025
[73]

Y. Wang, Y. Sun, D. Han, and J. Xie. Linear convergence of Gearhart–Koshy accelerated Kaczmarz methods for tensor linear systems.arXiv preprint arXiv:2604.05816, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[74]

R. Weiss. Error-minimizing Krylov subspace methods.SIAM J. Sci. Comput., 15(3):511–527, 1994

1994
[75]

R. Weiss. A theoretical overview of Krylov subspace methods.Appl. Numer. Math., 19(3):207–233, 1995

1995
[76]

Xiang, J

R. Xiang, J. Xie, and Q. Zhang. Randomized block Kaczmarz with volume sampling: momentum acceleration and efficient implementation.arXiv preprint arXiv:2503.13941, 2025

work page arXiv 2025
[77]

Xie, H.-D

J. Xie, H.-D. Qi, and D. Han. Randomized iterative methods for generalized absolute value equations: solvability and error bounds.SIAM J. Optim., 35(3):1731–1760, 2025

2025
[78]

Xie and Z

J. Xie and Z. Xu. Subset selection for matrices with fixed blocks.Israel J. Math., 245(1):1–26, 2021

2021
[79]

Q. Ye. Preconditioning for accelerated gradient descent optimization and regularization.arXiv preprint arXiv:2410.00232, 2024

work page arXiv 2024
[80]

Randomized conjugate gradient least squares

Y. Zeng, J.-F. Cai, D. Han, and J. Xie. Randomized conjugate gradient least squares.arXiv preprint arXiv:2605.25034, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

Showing first 80 references.

[1] [1]

Amsel, Y

N. Amsel, Y. Baumann, P. Beckman, P. B¨ urgisser, C. Cama˜ no, et al. Linear systems and eigenvalue problems: open questions from a Simons workshop.arXiv preprint arXiv:2602.05394, 2026

work page arXiv 2026

[2] [2]

K. B. Athreya and S. N. Lahiri.Measure theory and probability theory. Springer, New York, 2006

2006

[3] [3]

Avron, P

H. Avron, P. Maymounkov, and S. Toledo. Blendenpik: supercharging LAPACK’s least-squares solver. SIAM J. Sci. Comput., 32(3):1217–1236, 2010

2010

[4] [4]

M. Benzi. Preconditioning techniques for large linear systems: a survey.J. Comput. Phys., 182(2):418– 477, 2002

2002

[5] [5]

Chang and C.-J

C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines.ACM Trans. Intell. Syst. Technol., 2(3):1–27, 2011

2011

[6] [6]

Y. Chen, E. N. Epperly, J. A. Tropp, and R. J. Webber. Randomly pivoted Cholesky: practical ap- proximation of a kernel matrix with few entry evaluations.Comm. Pure Appl. Math., 78(5):995–1041, 2025

2025

[7] [7]

Cortinovis and D

A. Cortinovis and D. Kressner. Adaptive randomized pivoting for column subset selection, DEIM, and low-rank approximation.SIAM J. Matrix Anal. Appl., 47(1):25–47, 2026

2026

[8] [8]

E. J. Craig. The n-step iteration procedures.J. Math. Phys., 34(1-4):64–73, 1955

1955

[9] [9]

J. W. Demmel. The probability that a numerical analysis problem is difficult.Math. Comp., 50(182):449– 480, 1988

1988

[10] [10]

Derezi´ nski and M

M. Derezi´ nski and M. W. Mahoney. Recent and upcoming developments in randomized numerical linear algebra for machine learning. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6470–6479, 2024

2024

[11] [11]

Derezi´ nski and J

M. Derezi´ nski and J. Yang. Solving dense linear systems faster than via preconditioning. InProceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 1118–1129, 2024

2024

[12] [12]

Deshpande, L

A. Deshpande, L. Rademacher, S. S. Vempala, and G. Wang. Matrix approximation and projective clustering via volume sampling.Theory Comput., 2(1):225–247, 2006

2006

[13] [13]

Y. Dong, C. Chen, P.-G. Martinsson, and K. Pearce. Robust blockwise random pivoting: fast and accurate adaptive interpolative decomposition.SIAM J. Matrix Anal. Appl., 46(3):1791–1815, 2025

2025

[14] [14]

Dong and P.-G

Y. Dong and P.-G. Martinsson. Simpler is better: a comparative study of randomized pivoting algo- rithms for CUR and interpolative decompositions.Adv. Comput. Math., 49(4):66, 2023

2023

[15] [15]

J. A. Duersch and M. Gu. Randomized QR with column pivoting.SIAM J. Sci. Comput., 39(4):C263– C291, 2017

2017

[16] [16]

J. A. Duersch and M. Gu. Randomized projection for rank-revealing matrix factorizations and low-rank approximations.SIAM Rev., 62(3):661–682, 2020

2020

[17] [17]

Eckart and G

C. Eckart and G. Young. The approximation of one matrix by another of lower rank.Psychometrika, 1(3):211–218, 1936. SUBSPACE-CONSTRAINED PRECONDITIONING FOR RIM 29

1936

[18] [18]

Ehrig and P

R. Ehrig and P. Deuflhard. GMERR—an error-minimizing variant of GMRES. Technical Report SC- 97-63, ZIB, 1997

1997

[19] [19]

E. N. Epperly. Adaptive randomized pivoting and volume sampling.arXiv preprint arXiv:2510.02513, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[20] [20]

E. N. Epperly, M. Meier, and Y. Nakatsukasa. Fast randomized least-squares solvers can be just as accurate and stable as classical direct solvers.Commun. Pure Appl. Math., 79(2):293–339, 2026

2026

[21] [21]

Fletcher.Practical methods of optimization

R. Fletcher.Practical methods of optimization. John Wiley & Sons, Chichester, 2013

2013

[22] [22]

Frieze, R

A. Frieze, R. Kannan, and S. Vempala. Fast Monte-Carlo algorithms for finding low-rank approxima- tions.J. ACM, 51(6):1025–1041, 2004

2004

[23] [23]

S. Garg, A. S. Berahas, and M. Derezi´ nski. Second-order information promotes mini-batch robustness in variance-reduced gradients.J. Mach. Learn. Res., 26(306):1–49, 2025

2025

[24] [24]

Garrigos and R

G. Garrigos and R. M. Gower. Handbook of convergence theorems for (stochastic) gradient methods. arXiv preprint arXiv:2301.11235, 2023

work page arXiv 2023

[25] [25]

A. Gaul, M. H. Gutknecht, J. Liesen, and R. Nabben. A framework for deflated and augmented Krylov subspace methods.SIAM J. Matrix Anal. Appl., 34(2):495–518, 2013

2013

[26] [26]

G. H. Golub and C. F. Van Loan.Matrix computations. Johns Hopkins University Press, Baltimore, 2013

2013

[27] [27]

Goswami and B

A. Goswami and B. V. Rao.Measure theory for analysis and probability. Springer, Singapore, 2025

2025

[28] [28]

R. M. Gower, N. Loizou, X. Qian, A. Sailanbayev, E. Shulgin, and P. Richt´ arik. SGD: general analysis and improved rates. InProceedings of the 36th International Conference on Machine Learning, pages 5200–5209, 2019

2019

[29] [29]

R. M. Gower and P. Richt´ arik. Randomized iterative methods for linear systems.SIAM J. Matrix Anal. Appl., 36(4):1660–1690, 2015

2015

[30] [30]

L. Guo, R. Xiang, D. Han, and J. Xie. Enhanced randomized Douglas-Rachford method: Improved probabilities and adaptive momentum.arXiv preprint arXiv:2506.10261, 2025

work page arXiv 2025

[31] [31]

Halko, P.-G

N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: probabilistic algo- rithms for constructing approximate matrix decompositions.SIAM Rev., 53(2):217–288, 2011

2011

[32] [32]

D. Han, Y. Su, and J. Xie. Randomized Douglas–Rachford methods for linear systems: improved accuracy and efficiency.SIAM J. Optim., 34(1):1045–1070, 2024

2024

[33] [33]

Han and J

D. Han and J. Xie. On pseudoinverse-free randomized methods for linear systems: unified framework and acceleration.Optim. Methods Softw., 41(1):82–117, 2026

2026

[34] [34]

I. C. Ipsen and A. K. Saibaba. Many (most?) column subset selection criteria are NP hard.arXiv preprint arXiv:2511.02740, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[35] [35]

Kaczmarz

S. Kaczmarz. Angen¨ aherte aufl¨ osung von systemen linearer glei-chungen.Bull. Int. Acad. Pol. Sic. Let., Cl. Sci. Math. Nat., pages 355–357, 1937

1937

[36] [36]

Ke and H

Y. Ke and H. Luo. Robust Kaczmarz methods for nearly singular linear systems.arXiv preprint arXiv:2602.21916, 2026

work page arXiv 2026

[37] [37]

C. T. Kelley.Iterative methods for linear and nonlinear equations. SIAM, Philadelphia, 1995

1995

[38] [38]

D. P. Kingma and J. Ba. Adam: a method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[39] [39]

S. P. Kolodziej, M. Aznaveh, M. Bullock, J. David, T. A. Davis, M. Henderson, Y. Hu, and R. Sand- strom. The SuiteSparse matrix collection website interface.J. Open Source Softw., 4(35):1244, 2019

2019

[40] [40]

D. Kovalev. SGD with adaptive preconditioning: unified analysis and momentum acceleration.arXiv preprint arXiv:2506.23803, 2025

work page arXiv 2025

[41] [41]

X.-L. Li. Preconditioned stochastic gradient descent.IEEE Trans. Neural Netw. Learn. Syst., 29(5):1454–1466, 2017

2017

[42] [42]

Liu and S

J. Liu and S. Wright. An accelerated randomized Kaczmarz algorithm.Math. Comp., 85(297):153–178, 2016

2016

[43] [43]

Loizou and P

N. Loizou and P. Richt´ arik. Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods.Comput. Optim. Appl., 77(3):653–710, 2020

2020

[44] [44]

Lok and E

J. Lok and E. Rebrova. A subspace constrained randomized Kaczmarz method for structure or external knowledge exploitation.Linear Algebra Appl., 698:220–260, 2024. 30 YONGHAN SUN, HOU-DUO QI, DEREN HAN, AND JIAXIN XIE

2024

[45] [45]

Lok and E

J. Lok and E. Rebrova. Subspace-constrained randomized coordinate descent for linear systems with good low-rank matrix approximations.arXiv preprint arXiv:2506.09394, 2025

work page arXiv 2025

[46] [46]

D. A. Lorenz and M. Winkler. Minimal error momentum Bregman-Kaczmarz.Linear Algebra Appl., 709:416–448, 2025

2025

[47] [47]

Ma and D

A. Ma and D. Needell. Stochastic gradient descent for linear systems with missing data.Numer. Math. Theory Methods Appl., 12(1):1–20, 2019

2019

[48] [48]

Martens and R

J. Martens and R. Grosse. Optimizing neural networks with Kronecker-factored approximate curvature. arXiv preprint arXiv:1503.05671, 2015

work page arXiv 2015

[49] [49]

Martinsson and J

P.-G. Martinsson and J. A. Tropp. Randomized numerical linear algebra: foundations and algorithms. Acta Numer., 29:403–572, 2020

2020

[50] [50]

Meier, Y

M. Meier, Y. Nakatsukasa, A. Townsend, and M. Webb. Are sketch-and-precondition least squares solvers numerically stable?SIAM J. Matrix Anal. Appl., 45(2):905–929, 2024

2024

[51] [51]

X. Meng, M. A. Saunders, and M. W. Mahoney. LSRN: a parallel iterative solver for strongly over- or underdetermined systems.SIAM J. Sci. Comput., 36(2):C95–C118, 2014

2014

[52] [52]

Spectrum Approximation Beyond Fast Matrix Multiplication: Algorithms and Hardness

C. Musco, P. Netrapalli, A. Sidford, S. Ubaru, and D. P. Woodruff. Spectrum approximation beyond fast matrix multiplication: algorithms and hardness.arXiv preprint arXiv:1704.04163, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[53] [53]

I. Necoara. Faster randomized block Kaczmarz algorithms.SIAM J. Matrix Anal. Appl., 40(4):1425– 1452, 2019

2019

[54] [54]

Needell, N

D. Needell, N. Srebro, and R. Ward. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm.Math. Program., 155:549–573, 2016

2016

[55] [55]

Needell and J

D. Needell and J. A. Tropp. Paved with good intentions: analysis of a randomized block Kaczmarz method.Linear Algebra Appl., 441:199–221, 2014

2014

[56] [56]

Nemirovski, A

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming.SIAM J. Optim., 19(4):1574–1609, 2009

2009

[57] [57]

Pearce, C

K. Pearce, C. Chen, Y. Dong, and P.-G. Martinsson. Adaptive parallelizable algorithms for interpolative decompositions via partially pivoted LU.Numer. Linear Algebra Appl., 32(1):e70002, 2025

2025

[58] [58]

K. J. Pearce and P.-G. Martinsson. Randomized algorithms for low-rank matrix and tensor decompo- sitions.arXiv preprint arXiv:2512.05286, 2025

work page arXiv 2025

[59] [59]

S. J. Reddi, S. Kale, and S. Kumar. On the convergence of Adam and beyond.arXiv preprint arXiv:1904.09237, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[60] [60]

J. Rieger. Generalized Gearhart–Koshy acceleration for the Kaczmarz method.Math. Comp., 92:1251– 1272, 2023

2023

[61] [61]

Robbins and S

H. Robbins and S. Monro. A stochastic approximation method.Ann. Math. Statist., 22:400–407, 1951

1951

[62] [62]

Saad.Iterative methods for sparse linear systems

Y. Saad.Iterative methods for sparse linear systems. SIAM, Philadelphia, 2003

2003

[63] [63]

Saad and M

Y. Saad and M. H. Schultz. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems.SIAM J. Sci. Stat. Comput., 7(3):856–869, 1986

1986

[64] [64]

Sch¨ opfer and D

F. Sch¨ opfer and D. A. Lorenz. Linear convergence of the randomized sparse Kaczmarz method.Math. Program., 173(1):509–536, 2019

2019

[65] [65]

N. N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent.Neural Comput., 14(7):1723–1738, 2002

2002

[66] [66]

A. J. Scott and G. P. Styan. On a separation theorem for generalized eigenvalues and a problem in the analysis of sample surveys.Linear Algebra Appl., 70:209–224, 1985

1985

[67] [67]

Scott and M

J. Scott and M. T˚ uma. Sparse linear least-squares problems.Acta Numer., 34:891–1010, 2025

2025

[68] [68]

Design Criteria for SGD Preconditioners: Local Conditioning, Noise Floors, and Basin Stability

M. Scott, T. Xu, Z. Tang, A. Pichette-Emmons, Q. Ye, Y. Saad, and Y. Xi. Designing preconditioners for SGD: local conditioning, noise floors, and basin stability.arXiv preprint arXiv:2511.19716, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[69] [69]

J. R. Shewchuk. An introduction to the conjugate gradient method without the agonizing pain. Technical Report CMU-CS-94-125, Carnegie Mellon University, Pittsburgh, PA, 1994

1994

[70] [70]

Strohmer and R

T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with exponential convergence.J. Fourier Anal. Appl., 15(2):262–278, 2009

2009

[71] [71]

Y. Su, D. Han, Y. Zeng, and J. Xie. On greedy multi-step inertial randomized Kaczmarz method for solving linear systems.Calcolo, 61(4):68, 2024. SUBSPACE-CONSTRAINED PRECONDITIONING FOR RIM 31

2024

[72] [72]

Y. Sun, D. Han, and J. Xie. Connecting randomized iterative methods with Krylov subspaces.arXiv preprint arXiv:2505.20602, 2025

work page arXiv 2025

[73] [73]

Y. Wang, Y. Sun, D. Han, and J. Xie. Linear convergence of Gearhart–Koshy accelerated Kaczmarz methods for tensor linear systems.arXiv preprint arXiv:2604.05816, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[74] [74]

R. Weiss. Error-minimizing Krylov subspace methods.SIAM J. Sci. Comput., 15(3):511–527, 1994

1994

[75] [75]

R. Weiss. A theoretical overview of Krylov subspace methods.Appl. Numer. Math., 19(3):207–233, 1995

1995

[76] [76]

Xiang, J

R. Xiang, J. Xie, and Q. Zhang. Randomized block Kaczmarz with volume sampling: momentum acceleration and efficient implementation.arXiv preprint arXiv:2503.13941, 2025

work page arXiv 2025

[77] [77]

Xie, H.-D

J. Xie, H.-D. Qi, and D. Han. Randomized iterative methods for generalized absolute value equations: solvability and error bounds.SIAM J. Optim., 35(3):1731–1760, 2025

2025

[78] [78]

Xie and Z

J. Xie and Z. Xu. Subset selection for matrices with fixed blocks.Israel J. Math., 245(1):1–26, 2021

2021

[79] [79]

Q. Ye. Preconditioning for accelerated gradient descent optimization and regularization.arXiv preprint arXiv:2410.00232, 2024

work page arXiv 2024

[80] [80]

Randomized conjugate gradient least squares

Y. Zeng, J.-F. Cai, D. Han, and J. Xie. Randomized conjugate gradient least squares.arXiv preprint arXiv:2605.25034, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026