Recognition: 2 theorem links
· Lean TheoremNon-binary LDPC codes for Data Storage
Pith reviewed 2026-05-12 01:31 UTC · model grok-4.3
The pith
Non-binary LDPC codes can achieve the ultimate distance that bounds their minimum distance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Non-binary LDPC codes whose parity-check matrices arise by substituting elements of a q-ary finite field into the nonzero positions of a binary base matrix can attain the ultimate distance, an upper bound on minimum distance, with supporting random-coding bounds and search algorithms for the Gallager ensemble.
What carries the argument
The ultimate distance, defined as the upper bound on minimum distance that follows from the lifted structure of the parity-check matrix obtained by replacing nonzero entries of a binary base matrix with symbols from a q-ary finite field.
If this is right
- Such codes can correct up to their achieved minimum distance worth of erasures in storage systems.
- The random-coding bound supplies a performance limit for regular non-binary LDPC codes in the q-ary erasure channel.
- Low-complexity algorithms make distance computation feasible for designing practical codes.
- These constructions compete with Reed-Solomon codes for disk-failure recovery.
Where Pith is reading between the lines
- Achieving the ultimate distance may yield practical complexity advantages over full MDS codes in large storage arrays.
- The lifting construction could be applied to irregular base matrices or other field sizes to widen the design space.
- Finite-length simulations would show how tightly the random-coding bound is approached in actual codes.
- The same distance bounds might inform code choice for related erasure models in distributed storage.
Load-bearing premise
The parity-check matrix obtained by replacing nonzero entries of a binary base matrix with q-ary field elements has an ultimate distance that truly upper-bounds the code minimum distance, and the random-coding bound holds for the Gallager ensemble under maximum-likelihood decoding.
What would settle it
Construction of a non-binary LDPC code from such a lifted matrix whose minimum distance exceeds the computed ultimate distance, or observation of more uncorrectable erasure patterns than the random-coding bound predicts.
Figures
read the original abstract
In modern data storage systems, non-binary LDPC codes for recovering from disk failures are increasingly considered strong competitors to MDS codes such as Reed-Solomon codes. Since disk failures can be modeled as erasures, we analyze non-binary LDPC codes over a $q$-ary field in the $q$-ary erasure channel, relative to MDS codes. Our focus is on non-binary LDPC codes whose parity-check matrix is obtained by replacing the non-zero entries of a binary base matrix by elements of a $q$-ary finite field. For such LDPC codes, we introduce the notion of ultimate distance, which upper-bounds their minimum distance. We derive a random-coding bound on the number of non-correctable erasure patterns for the Gallager ensemble of regular non-binary LDPC codes under maximum-likelihood decoding. An algorithm for finding the ultimate distance is presented. A low-complexity algorithm for searching for the minimum distance of the non-binary LDPC code is proposed. Finally, we construct examples of non-binary LDPC codes achieving the ultimate distance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes non-binary LDPC codes over q-ary finite fields for the q-ary erasure channel, modeling disk failures in data storage. It defines the 'ultimate distance' as an upper bound on minimum distance for codes obtained by replacing nonzero entries of a binary base matrix with field elements. A random-coding bound is derived on the number of non-correctable erasure patterns for the Gallager ensemble of regular non-binary LDPC codes under maximum-likelihood decoding. Algorithms are presented for computing the ultimate distance and for low-complexity minimum-distance search; examples are constructed that are claimed to achieve the ultimate distance.
Significance. If the central claim holds—that the constructed codes meet the ultimate distance exactly and the random-coding bound is informative for these codes—the work supplies a new design criterion and verification tools for non-binary LDPC codes that could serve as lower-complexity alternatives to Reed-Solomon codes in erasure recovery. The explicit algorithms and ensemble bound constitute concrete strengths that would aid reproducibility and further design.
major comments (2)
- [low-complexity minimum-distance search algorithm] The section describing the low-complexity algorithm for searching the minimum distance: the algorithm is asserted to confirm that constructed codes have minimum distance equal to the ultimate distance, yet the description does not establish that the procedure exhaustively enumerates all finite-field linear dependencies among columns or rules out early termination. Without such a guarantee, undetected codewords of weight below the ultimate distance remain possible, so the reported examples only satisfy an upper bound rather than equality.
- [definition of ultimate distance] The paragraph introducing the ultimate distance: while the quantity is defined from the support structure of the base matrix after field-element replacement, no explicit proof is supplied that it is always an upper bound on minimum distance (i.e., that every nonzero codeword weight is at most this value). The subsequent claim that examples achieve equality therefore rests on an unproven bounding relation.
minor comments (2)
- [Abstract] The abstract states that examples achieve the ultimate distance but omits the specific parameters (q, length, rate, base-matrix dimensions) of those examples; adding them would allow immediate assessment of practicality.
- [random-coding bound derivation] Notation for the Gallager ensemble and the random-coding bound would benefit from an explicit equation number when first introduced, to facilitate cross-reference with the later algorithmic sections.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and will revise the paper to improve clarity and rigor.
read point-by-point responses
-
Referee: The section describing the low-complexity algorithm for searching the minimum distance: the algorithm is asserted to confirm that constructed codes have minimum distance equal to the ultimate distance, yet the description does not establish that the procedure exhaustively enumerates all finite-field linear dependencies among columns or rules out early termination. Without such a guarantee, undetected codewords of weight below the ultimate distance remain possible, so the reported examples only satisfy an upper bound rather than equality.
Authors: We acknowledge that the current description of the low-complexity search algorithm is insufficiently detailed regarding exhaustiveness. The procedure is designed to enumerate column subsets in increasing order of size while solving for field-element coefficients that satisfy linear dependence, with termination only after all subsets up to the ultimate distance have been checked. We will revise the section to include explicit pseudocode, a complexity argument, and a proof that no early termination occurs before all relevant dependencies are examined, thereby confirming that equality to the ultimate distance is rigorously established for the examples. revision: yes
-
Referee: The paragraph introducing the ultimate distance: while the quantity is defined from the support structure of the base matrix after field-element replacement, no explicit proof is supplied that it is always an upper bound on minimum distance (i.e., that every nonzero codeword weight is at most this value). The subsequent claim that examples achieve equality therefore rests on an unproven bounding relation.
Authors: We agree that the manuscript would benefit from an explicit proof that the ultimate distance upper-bounds the minimum distance for any nonzero field-element assignment. The definition is based on the binary support structure, which guarantees the existence of a linearly dependent set of columns whose dependence holds independently of the specific nonzero values chosen from the field. We will add a formal proof (likely as a new lemma or subsection) demonstrating this bounding property, which will also strengthen the equality claims for the constructed examples. revision: yes
Circularity Check
No significant circularity; ultimate distance is an independent upper bound verified by explicit construction and search
full rationale
The paper defines ultimate distance directly from the support structure of the binary base matrix after non-zero entries are replaced by field elements; this is an a priori upper bound on minimum distance, not derived from the codes themselves. It then gives a separate algorithm to compute this bound and a low-complexity search procedure to locate the actual minimum distance of specific constructed codes. The claim that certain codes achieve equality is an empirical verification step, not a reduction by construction or renaming of a fitted quantity. The random-coding bound on the Gallager ensemble is an independent ensemble analysis under ML decoding and does not substitute for or loop back into the per-code distance claim. No self-citations, ansatzes, or uniqueness theorems are invoked in the provided text to force the result. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Properties of finite fields and linear codes over them
invented entities (1)
-
ultimate distance
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce the notion of ultimate distance, which upper-bounds their minimum distance... algorithm for finding the ultimate distance... low-complexity algorithm for searching for the minimum distance
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A survey of the past, present, and future of erasure coding for storage systems,
Z. Shen, Y . Cai, K. Cheng, P. P. Lee, X. Li, Y . Hu, and J. Shu, “A survey of the past, present, and future of erasure coding for storage systems,” ACM Transactions on Storage, vol. 21, no. 1, pp. 1–39, 2025
work page 2025
-
[2]
Codes for distributed storage,
V . Ramkumar, S. Balaji, B. Sasidharan, M. Vajha, M. N. Krishnan, and P. V . Kumar, “Codes for distributed storage,”Foundations and Trends in Communications and Information Theory, vol. 19, no. 4, pp. 547–813, 2022
work page 2022
-
[3]
H. Park, D. Lee, and J. Moon, “LDPC code design for distributed storage: Balancing repair bandwidth, reliability, and storage overhead,” IEEE Trans. Comm., vol. 66, no. 2, pp. 507–520, 2017
work page 2017
-
[4]
Exploring high performance distributed file storage using LDPC codes,
B. Gaidioz, B. Koblitz, and N. Santos, “Exploring high performance distributed file storage using LDPC codes,”Parallel Computing, vol. 33, no. 4-5, pp. 264–274, 2007
work page 2007
-
[5]
Large LDPC codes for big data storage,
W. Yongmei, C. Fengmin, and L. K. Cher, “Large LDPC codes for big data storage,” inProceedings of the ASE BigData & SocialInformatics 2015, 2015, pp. 1–6
work page 2015
-
[6]
A. Hareedy, C. Lanka, N. Guo, and L. Dolecek, “A combinatorial methodology for optimizing non-binary graph-based codes: Theoretical analysis and applications in data storage,”IEEE Trans.Inf. Theory, vol. 65, no. 4, pp. 2128–2154, 2018
work page 2018
-
[7]
Codes for energy- efficient data storage systems,
I. Bocharova, B. Kudryashov, and V . Skachek, “Codes for energy- efficient data storage systems,” inISITA, 2026 (submitted)
work page 2026
-
[8]
A recursive approach to low complexity codes,
R. M. Tanner, “A recursive approach to low complexity codes,”IEEE Trans. Inf. Theory, vol. 27, no. 5, pp. 533–547, 1981
work page 1981
-
[9]
A recursive approach to low complexity codes,
R. Tanner, “A recursive approach to low complexity codes,”IEEE Trans.Inf. Theory, vol. 27, no. 5, pp. 533–547, 1981
work page 1981
-
[10]
J. van Lint and R. Wilson,A course in combinatorics, 2nd ed. Cam- bridge University Press, 2001
work page 2001
-
[11]
Computing the stopping distance of a Tanner graph is NP-hard,
K. M. Krishnan and P. Shankar, “Computing the stopping distance of a Tanner graph is NP-hard,”IEEE Trans. Inf. Theory, vol. 53, no. 6, pp. 2278–2280, 2007
work page 2007
-
[12]
Bounds on the error probability of block codes over the𝑞-ary erasure channel,
G. Liva, E. Paolini, and M. Chiani, “Bounds on the error probability of block codes over the𝑞-ary erasure channel,”IEEE Trans. Comm., vol. 61, no. 6, pp. 2156–2165, 2013
work page 2013
-
[13]
On ensembles of low-density parity-check codes: Asymptotic distance distributions,
S. Litsyn and V . Shevelev, “On ensembles of low-density parity-check codes: Asymptotic distance distributions,”IEEE Trans. Inf. Theory, vol. 48, no. 4, pp. 887–908, 2002
work page 2002
-
[14]
R. G. Gallager,Low-density parity-check codes. M.I.T. Press: Cam- bridge, MA, 1963
work page 1963
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.