arxiv: 2605.08500 · v1 · submitted 2026-05-08 · 💻 cs.IT · math.IT

Recognition: 2 theorem links

· Lean Theorem

Non-binary LDPC codes for Data Storage

Irina Bocharova , Boris Kudryashov , Henk D.L. Hollmann , Vitaly Skachek

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:31 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords non-binary LDPC codesultimate distanceerasure channeldata storageminimum distanceGallager ensembleparity-check matrixReed-Solomon codes

0 comments

The pith

Non-binary LDPC codes can achieve the ultimate distance that bounds their minimum distance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines non-binary LDPC codes over q-ary fields for recovering from disk failures modeled as erasures. It defines the ultimate distance as an upper bound on the minimum distance for codes whose parity-check matrices are formed by replacing nonzero entries of a binary base matrix with field elements. A random-coding bound is derived on the number of non-correctable erasure patterns for the regular Gallager ensemble under maximum-likelihood decoding. Algorithms are presented to compute the ultimate distance and to find the actual minimum distance at low complexity. Concrete examples are constructed where the codes meet the ultimate distance.

Core claim

Non-binary LDPC codes whose parity-check matrices arise by substituting elements of a q-ary finite field into the nonzero positions of a binary base matrix can attain the ultimate distance, an upper bound on minimum distance, with supporting random-coding bounds and search algorithms for the Gallager ensemble.

What carries the argument

The ultimate distance, defined as the upper bound on minimum distance that follows from the lifted structure of the parity-check matrix obtained by replacing nonzero entries of a binary base matrix with symbols from a q-ary finite field.

If this is right

Such codes can correct up to their achieved minimum distance worth of erasures in storage systems.
The random-coding bound supplies a performance limit for regular non-binary LDPC codes in the q-ary erasure channel.
Low-complexity algorithms make distance computation feasible for designing practical codes.
These constructions compete with Reed-Solomon codes for disk-failure recovery.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Achieving the ultimate distance may yield practical complexity advantages over full MDS codes in large storage arrays.
The lifting construction could be applied to irregular base matrices or other field sizes to widen the design space.
Finite-length simulations would show how tightly the random-coding bound is approached in actual codes.
The same distance bounds might inform code choice for related erasure models in distributed storage.

Load-bearing premise

The parity-check matrix obtained by replacing nonzero entries of a binary base matrix with q-ary field elements has an ultimate distance that truly upper-bounds the code minimum distance, and the random-coding bound holds for the Gallager ensemble under maximum-likelihood decoding.

What would settle it

Construction of a non-binary LDPC code from such a lifted matrix whose minimum distance exceeds the computed ultimate distance, or observation of more uncorrectable erasure patterns than the random-coding bound predicts.

Figures

Figures reproduced from arXiv: 2605.08500 by Boris Kudryashov, Henk D.L. Hollmann, Irina Bocharova, Vitaly Skachek.

**Figure 3.** Figure 3: Numerically computed block error probability in QEC for LDPC code [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

In modern data storage systems, non-binary LDPC codes for recovering from disk failures are increasingly considered strong competitors to MDS codes such as Reed-Solomon codes. Since disk failures can be modeled as erasures, we analyze non-binary LDPC codes over a $q$-ary field in the $q$-ary erasure channel, relative to MDS codes. Our focus is on non-binary LDPC codes whose parity-check matrix is obtained by replacing the non-zero entries of a binary base matrix by elements of a $q$-ary finite field. For such LDPC codes, we introduce the notion of ultimate distance, which upper-bounds their minimum distance. We derive a random-coding bound on the number of non-correctable erasure patterns for the Gallager ensemble of regular non-binary LDPC codes under maximum-likelihood decoding. An algorithm for finding the ultimate distance is presented. A low-complexity algorithm for searching for the minimum distance of the non-binary LDPC code is proposed. Finally, we construct examples of non-binary LDPC codes achieving the ultimate distance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines ultimate distance for non-binary LDPC codes from binary bases and claims constructions that meet it, along with algorithms and a random-coding bound, but the min-distance check rests on a low-complexity search whose completeness is unclear.

read the letter

The main point is that these authors define ultimate distance as an upper bound on minimum distance for non-binary LDPC codes built by lifting a binary base matrix over a finite field. They give an algorithm to compute that bound, a low-complexity search for actual minimum distance, and explicit constructions that reach the bound. They also extend a random-coding argument to bound the number of bad erasure patterns in the Gallager ensemble under ML decoding for the q-ary erasure channel.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes non-binary LDPC codes over q-ary finite fields for the q-ary erasure channel, modeling disk failures in data storage. It defines the 'ultimate distance' as an upper bound on minimum distance for codes obtained by replacing nonzero entries of a binary base matrix with field elements. A random-coding bound is derived on the number of non-correctable erasure patterns for the Gallager ensemble of regular non-binary LDPC codes under maximum-likelihood decoding. Algorithms are presented for computing the ultimate distance and for low-complexity minimum-distance search; examples are constructed that are claimed to achieve the ultimate distance.

Significance. If the central claim holds—that the constructed codes meet the ultimate distance exactly and the random-coding bound is informative for these codes—the work supplies a new design criterion and verification tools for non-binary LDPC codes that could serve as lower-complexity alternatives to Reed-Solomon codes in erasure recovery. The explicit algorithms and ensemble bound constitute concrete strengths that would aid reproducibility and further design.

major comments (2)

[low-complexity minimum-distance search algorithm] The section describing the low-complexity algorithm for searching the minimum distance: the algorithm is asserted to confirm that constructed codes have minimum distance equal to the ultimate distance, yet the description does not establish that the procedure exhaustively enumerates all finite-field linear dependencies among columns or rules out early termination. Without such a guarantee, undetected codewords of weight below the ultimate distance remain possible, so the reported examples only satisfy an upper bound rather than equality.
[definition of ultimate distance] The paragraph introducing the ultimate distance: while the quantity is defined from the support structure of the base matrix after field-element replacement, no explicit proof is supplied that it is always an upper bound on minimum distance (i.e., that every nonzero codeword weight is at most this value). The subsequent claim that examples achieve equality therefore rests on an unproven bounding relation.

minor comments (2)

[Abstract] The abstract states that examples achieve the ultimate distance but omits the specific parameters (q, length, rate, base-matrix dimensions) of those examples; adding them would allow immediate assessment of practicality.
[random-coding bound derivation] Notation for the Gallager ensemble and the random-coding bound would benefit from an explicit equation number when first introduced, to facilitate cross-reference with the later algorithmic sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and will revise the paper to improve clarity and rigor.

read point-by-point responses

Referee: The section describing the low-complexity algorithm for searching the minimum distance: the algorithm is asserted to confirm that constructed codes have minimum distance equal to the ultimate distance, yet the description does not establish that the procedure exhaustively enumerates all finite-field linear dependencies among columns or rules out early termination. Without such a guarantee, undetected codewords of weight below the ultimate distance remain possible, so the reported examples only satisfy an upper bound rather than equality.

Authors: We acknowledge that the current description of the low-complexity search algorithm is insufficiently detailed regarding exhaustiveness. The procedure is designed to enumerate column subsets in increasing order of size while solving for field-element coefficients that satisfy linear dependence, with termination only after all subsets up to the ultimate distance have been checked. We will revise the section to include explicit pseudocode, a complexity argument, and a proof that no early termination occurs before all relevant dependencies are examined, thereby confirming that equality to the ultimate distance is rigorously established for the examples. revision: yes
Referee: The paragraph introducing the ultimate distance: while the quantity is defined from the support structure of the base matrix after field-element replacement, no explicit proof is supplied that it is always an upper bound on minimum distance (i.e., that every nonzero codeword weight is at most this value). The subsequent claim that examples achieve equality therefore rests on an unproven bounding relation.

Authors: We agree that the manuscript would benefit from an explicit proof that the ultimate distance upper-bounds the minimum distance for any nonzero field-element assignment. The definition is based on the binary support structure, which guarantees the existence of a linearly dependent set of columns whose dependence holds independently of the specific nonzero values chosen from the field. We will add a formal proof (likely as a new lemma or subsection) demonstrating this bounding property, which will also strengthen the equality claims for the constructed examples. revision: yes

Circularity Check

0 steps flagged

No significant circularity; ultimate distance is an independent upper bound verified by explicit construction and search

full rationale

The paper defines ultimate distance directly from the support structure of the binary base matrix after non-zero entries are replaced by field elements; this is an a priori upper bound on minimum distance, not derived from the codes themselves. It then gives a separate algorithm to compute this bound and a low-complexity search procedure to locate the actual minimum distance of specific constructed codes. The claim that certain codes achieve equality is an empirical verification step, not a reduction by construction or renaming of a fitted quantity. The random-coding bound on the Gallager ensemble is an independent ensemble analysis under ML decoding and does not substitute for or loop back into the per-code distance claim. No self-citations, ansatzes, or uniqueness theorems are invoked in the provided text to force the result. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper introduces the invented notion of ultimate distance as an upper bound derived from the base-matrix construction; standard finite-field arithmetic and ensemble properties are assumed without new axioms listed.

axioms (1)

standard math Properties of finite fields and linear codes over them
Invoked throughout the analysis of q-ary LDPC codes and erasure patterns.

invented entities (1)

ultimate distance no independent evidence
purpose: Upper bound on the minimum distance of the non-binary LDPC codes constructed from binary base matrices
Newly defined quantity that the constructions are shown to achieve.

pith-pipeline@v0.9.0 · 5488 in / 1183 out tokens · 40196 ms · 2026-05-12T01:31:19.152631+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce the notion of ultimate distance, which upper-bounds their minimum distance... algorithm for finding the ultimate distance... low-complexity algorithm for searching for the minimum distance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

A survey of the past, present, and future of erasure coding for storage systems,

Z. Shen, Y . Cai, K. Cheng, P. P. Lee, X. Li, Y . Hu, and J. Shu, “A survey of the past, present, and future of erasure coding for storage systems,” ACM Transactions on Storage, vol. 21, no. 1, pp. 1–39, 2025

work page 2025
[2]

Codes for distributed storage,

V . Ramkumar, S. Balaji, B. Sasidharan, M. Vajha, M. N. Krishnan, and P. V . Kumar, “Codes for distributed storage,”Foundations and Trends in Communications and Information Theory, vol. 19, no. 4, pp. 547–813, 2022

work page 2022
[3]

LDPC code design for distributed storage: Balancing repair bandwidth, reliability, and storage overhead,

H. Park, D. Lee, and J. Moon, “LDPC code design for distributed storage: Balancing repair bandwidth, reliability, and storage overhead,” IEEE Trans. Comm., vol. 66, no. 2, pp. 507–520, 2017

work page 2017
[4]

Exploring high performance distributed file storage using LDPC codes,

B. Gaidioz, B. Koblitz, and N. Santos, “Exploring high performance distributed file storage using LDPC codes,”Parallel Computing, vol. 33, no. 4-5, pp. 264–274, 2007

work page 2007
[5]

Large LDPC codes for big data storage,

W. Yongmei, C. Fengmin, and L. K. Cher, “Large LDPC codes for big data storage,” inProceedings of the ASE BigData & SocialInformatics 2015, 2015, pp. 1–6

work page 2015
[6]

A combinatorial methodology for optimizing non-binary graph-based codes: Theoretical analysis and applications in data storage,

A. Hareedy, C. Lanka, N. Guo, and L. Dolecek, “A combinatorial methodology for optimizing non-binary graph-based codes: Theoretical analysis and applications in data storage,”IEEE Trans.Inf. Theory, vol. 65, no. 4, pp. 2128–2154, 2018

work page 2018
[7]

Codes for energy- efficient data storage systems,

I. Bocharova, B. Kudryashov, and V . Skachek, “Codes for energy- efficient data storage systems,” inISITA, 2026 (submitted)

work page 2026
[8]

A recursive approach to low complexity codes,

R. M. Tanner, “A recursive approach to low complexity codes,”IEEE Trans. Inf. Theory, vol. 27, no. 5, pp. 533–547, 1981

work page 1981
[9]

A recursive approach to low complexity codes,

R. Tanner, “A recursive approach to low complexity codes,”IEEE Trans.Inf. Theory, vol. 27, no. 5, pp. 533–547, 1981

work page 1981
[10]

van Lint and R

J. van Lint and R. Wilson,A course in combinatorics, 2nd ed. Cam- bridge University Press, 2001

work page 2001
[11]

Computing the stopping distance of a Tanner graph is NP-hard,

K. M. Krishnan and P. Shankar, “Computing the stopping distance of a Tanner graph is NP-hard,”IEEE Trans. Inf. Theory, vol. 53, no. 6, pp. 2278–2280, 2007

work page 2007
[12]

Bounds on the error probability of block codes over the𝑞-ary erasure channel,

G. Liva, E. Paolini, and M. Chiani, “Bounds on the error probability of block codes over the𝑞-ary erasure channel,”IEEE Trans. Comm., vol. 61, no. 6, pp. 2156–2165, 2013

work page 2013
[13]

On ensembles of low-density parity-check codes: Asymptotic distance distributions,

S. Litsyn and V . Shevelev, “On ensembles of low-density parity-check codes: Asymptotic distance distributions,”IEEE Trans. Inf. Theory, vol. 48, no. 4, pp. 887–908, 2002

work page 2002
[14]

R. G. Gallager,Low-density parity-check codes. M.I.T. Press: Cam- bridge, MA, 1963

work page 1963