Towards Universal Convergence of Backward Error in Linear System Solvers

Elizaveta Rebrova; Micha{\l} Derezi\'nski; Yuji Nakatsukasa

arxiv: 2604.16075 · v2 · pith:QQEI464Znew · submitted 2026-04-17 · 🧮 math.NA · cs.DS· cs.LG· cs.NA· math.OC

Towards Universal Convergence of Backward Error in Linear System Solvers

Micha{\l} Derezi\'nski , Yuji Nakatsukasa , Elizaveta Rebrova This is my paper

Pith reviewed 2026-05-10 08:06 UTC · model grok-4.3

classification 🧮 math.NA cs.DScs.LGcs.NAmath.OC

keywords backward errorRichardson iterationpositive semidefinite matricesiterative solverslinear systemsKrylov methodsconvergence ratesnumerical linear algebra

0 comments

The pith

Richardson iteration achieves at most 1/k relative backward error after k steps on any positive semidefinite linear system, independent of condition number.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that the classical Richardson iteration for solving Ax = b produces iterates whose relative backward error is bounded by 1/k after exactly k steps whenever A is positive semidefinite. Backward error measures the smallest relative perturbation to A and b that would make the current iterate an exact solution, which is often the quantity that matters for stability and practical accuracy. Because the bound holds without any dependence on the eigenvalues or conditioning of A, the method yields an O(n^{2}/epsilon) algorithm that reaches any target backward error epsilon. The authors also introduce MINBERR, which minimizes backward error directly inside a Krylov subspace and obtains a faster 1/k^{2} rate, and they extend the approach to general matrices via normal equations with empirically observed 1/k convergence.

Core claim

For any positive semidefinite matrix A and vector b, the k-th Richardson iterate x_k satisfies that the relative backward error is at most 1/k: there exist perturbations Delta A and Delta b with ||Delta A||/||A|| + ||Delta b||/||b|| <= 1/k such that (A + Delta A)x_k = b + Delta b. This bound is derived directly from the residual contraction properties of the iteration on PSD matrices and does not involve the condition number. The same framework yields an O(1/k^{2}) rate when the iterate is chosen to minimize backward error over the Krylov subspace, which is realized by the MINBERR algorithm.

What carries the argument

The residual contraction of Richardson iteration on PSD matrices that directly controls the minimal perturbation size making the current approximate solution exact.

Load-bearing premise

The matrix must be positive semidefinite for the unconditional 1/k backward error bound to hold.

What would settle it

Compute the relative backward error of the k-th Richardson iterate on a positive semidefinite matrix whose condition number exceeds k; if that error ever exceeds 1/k for any such matrix, the claim is false.

Figures

Figures reproduced from arXiv: 2604.16075 by Elizaveta Rebrova, Micha{\l} Derezi\'nski, Yuji Nakatsukasa.

**Figure 2.** Figure 2: Comparing MINRES with MINBERR on the Small-Outlier(2000, κ, σn−1) synthetic task, while varying κ and σn−1. MINBERR is denoted by solid lines, while MINRES is shown using dotted lines. suggesting that the bound in Theorem 3.3 is often tight for typical ill-conditioned problems. On the other hand, MINBERR often outperforms its theoretical 1/k2 rate, especially later in the convergence. The behavior of CG an… view at source ↗

**Figure 3.** Figure 3: Comparing MINBERR-NE against Richardson-NE, LSMR, and LSQR on the [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗

**Figure 4.** Figure 4: Evaluating MINBERR-NE on the Small-Outlier(2000, κ, σn−1) synthetic task treated as a general non-symmetric linear system. We use dotted lines to denote MINBERR-NE running on the original task, while solid lines denote MINBERR-NE running on the matrix perturbed with Gaussian noise (in both cases, backward error is computed with respect to the original task). with a scaled identity (see Section 3), but in t… view at source ↗

read the original abstract

The quest for an algorithm that solves an $n\times n$ linear system in $O(n^2)$ time complexity, or $O(n^2 \text{poly}(1/\epsilon))$ when solving up to $\epsilon$ relative error, is a long-standing open problem in numerical linear algebra and theoretical computer science. There are two predominant paradigms for measuring relative error: forward error (i.e., distance from the output to the optimum solution) and backward error (i.e., distance to the nearest problem solved by the output). In most prior studies, convergence of iterative linear system solvers is measured via various notions of forward error, and as a result, depends heavily on the conditioning of the input. Yet, the numerical analysis literature has long advocated for backward error as the more practically relevant notion of approximation. In this work, we show that -- surprisingly -- the classical and simple Richardson iteration incurs at most $1/k$ (relative) backward error after $k$ iterations on any positive semidefinite (PSD) linear system, irrespective of its condition number. This universal convergence rate implies an $O(n^2/\epsilon)$ complexity algorithm for solving a PSD linear system to $\epsilon$ backward error, and we establish similar or better complexity when using a variety of Krylov solvers beyond Richardson. Then, by directly minimizing backward error over a Krylov subspace, we attain an even faster $O(1/k^2)$ universal rate, and we turn this into an efficient algorithm, MINBERR, with complexity $O(n^2/\sqrt\epsilon)$. Finally, we extend this approach via normal equations to solving general linear systems in $O(n^2\log(n)/\epsilon)$ time complexity. We report strong numerical performance of our algorithms on benchmark problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Richardson iteration on PSD systems gets 1/k backward error independent of condition number, a direct spectral consequence that leads to O(n^2/eps) solvers.

read the letter

The central observation is that Richardson iteration with step 1/||A||_2 produces at most 1/k relative backward error after k steps on any PSD matrix, no matter how large the condition number. This follows because components in near-null directions keep a residual of order 1 while the corresponding part of the iterate grows linearly in k, so the standard backward-error formula eta = ||b - A x|| / (||A|| ||x|| + ||b||) decays as 1/k. The paper uses this to claim an O(n^2/eps) algorithm for epsilon backward error on PSD systems, which is the main practical payoff since forward-error rates usually pay for kappa(A).

Referee Report

0 major / 3 minor

Summary. The paper claims that the classical Richardson iteration achieves at most 1/k relative backward error after k iterations on any positive semidefinite linear system, independent of condition number. This yields an O(n²/ε) algorithm for ε-backward-error solutions. Similar or better rates are established for other Krylov solvers; a new MINBERR algorithm that directly minimizes backward error over the Krylov subspace attains an O(1/k²) universal rate and O(n²/√ε) complexity. For general (non-PSD) systems the normal-equations approach is shown empirically to give O(1/k) convergence, with strong numerical performance reported on benchmarks.

Significance. If the central derivation holds, the result is significant: it supplies the first explicit conditioning-independent convergence guarantees in the backward-error metric, which the numerical-analysis literature has long preferred over forward error. The argument rests on the spectral radius of the Richardson operator being at most 1 for PSD matrices and on the standard normwise backward-error formula, yielding a clean, parameter-free 1/k bound. The construction of MINBERR and the empirical extension to general matrices via normal equations add practical value and suggest new solver design principles focused on backward-error minimization.

minor comments (3)

The complexity statements (O(n²/ε) and O(n²/√ε)) should explicitly state the underlying arithmetic model and whether they count matrix-vector products or full matrix operations.
The precise definition of relative backward error η = ||b − A x|| / (||A|| ||x|| + ||b||) is used in the abstract and introduction; a short dedicated paragraph or equation early in the paper would aid readers.
The section reporting numerical experiments would benefit from a brief statement of the benchmark matrices, stopping criteria, and any data-exclusion rules applied.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We are grateful to the referee for their supportive review and recommendation for minor revision. The referee's summary correctly captures the key contributions of our work on universal convergence rates for backward error in linear system solvers.

read point-by-point responses

Referee: The paper claims that the classical Richardson iteration achieves at most 1/k relative backward error after k iterations on any positive semidefinite linear system, independent of condition number. This yields an O(n²/ε) algorithm for ε-backward-error solutions. Similar or better rates are established for other Krylov solvers; a new MINBERR algorithm that directly minimizes backward error over the Krylov subspace attains an O(1/k²) universal rate and O(n²/√ε) complexity. For general (non-PSD) systems the normal-equations approach is shown empirically to give O(1/k) convergence, with strong numerical performance reported on benchmarks.

Authors: We thank the referee for this accurate encapsulation of our results. The claims are as stated in the manuscript, and we stand by the derivations and empirical findings presented. No changes are required. revision: no

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The claimed 1/k backward-error bound for Richardson iteration on PSD systems is derived directly from the spectral radius of the iteration matrix (at most 1 when the step size is 1/||A||_2) and the standard normwise backward-error formula. For eigenvalues in [0, lambda_max], the residual component stays O(1) while the iterate component grows linearly in k, producing eta <= 1/k independent of condition number. The PSD assumption is explicitly required and stated; the derivation does not reduce to any fitted parameter, self-definition, or self-citation chain. The extension to general matrices is presented only as empirical observation, not a proven claim. No load-bearing ansatz, uniqueness theorem, or renaming of known results appears in the provided text. The result is self-contained against the mathematical properties of PSD matrices and Krylov subspaces.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard properties of positive semidefinite matrices and Krylov subspace methods; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The input matrix is positive semidefinite
Required for the unconditional 1/k backward error bound on Richardson iteration.

pith-pipeline@v0.9.0 · 5647 in / 1240 out tokens · 68099 ms · 2026-05-10T08:06:33.910136+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Well-Conditioned Oblivious Perturbations in Linear Space
cs.DS 2026-04 unverdicted novelty 8.0

An O(n)-randomness perturbation combining a dense deterministic pattern matrix with a non-uniform sparse dependent perturbation reduces condition numbers to O(n) for any input matrix.