pith. sign in

arxiv: 2606.27179 · v1 · pith:6AW2OVR4new · submitted 2026-06-25 · 💻 cs.IT · cs.DM· math.CO· math.IT· math.NT

Linear Code Conversion in the Merge Regime: General Bounds and Reed--Muller Constructions

Pith reviewed 2026-06-26 02:07 UTC · model grok-4.3

classification 💻 cs.IT cs.DMmath.COmath.ITmath.NT
keywords linear code conversionmerge regimegeneralized Hamming weightsReed-Muller codeserasure codeswrite costread costdistributed storage
0
0 comments X

The pith

Universal lower bounds on write and read costs for converting arbitrary linear codes during merges are derived and attained by Reed-Muller constructions for write cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how to convert data between two different linear erasure codes when storage nodes are merged, without paying the full cost of re-encoding everything from scratch. It derives general lower bounds on the number of symbols that must be written or read during this conversion, expressed in terms of unchanged and read symbols. These bounds are then sharpened by incorporating the generalized Hamming weights of the target code, which track support growth across subcodes and yield tighter results than minimum-distance arguments alone whenever the weight hierarchy has jumps. The same framework recovers earlier special-case bounds and is applied to produce explicit Reed-Muller convertible codes via the Plotkin decomposition. For natural parameter choices, the constructions meet the write-cost lower bound exactly.

Core claim

For scalar linear code conversion in the merge regime, universal lower bounds exist on the write and read costs measured by unchanged and read symbols; these bounds can be refined using generalized Hamming weights to become strictly stronger when the final code exhibits nontrivial jumps in its generalized Hamming weight hierarchy, the framework recovers known bounds for important special cases, and Reed-Muller codes constructed via the Plotkin decomposition attain the derived write-cost lower bound in a natural parameter regime while the read-cost analysis is sharp for one initial block.

What carries the argument

Universal lower bounds on conversion costs refined by the generalized Hamming weights of the target code, which capture support-growth properties of its subcodes.

If this is right

  • The bounds recover all previously known results for important special cases of code conversion.
  • The refinement is strictly stronger than minimum-distance arguments exactly when the final code has jumps in its generalized Hamming weight hierarchy.
  • Reed-Muller codes attain the write-cost lower bound for natural parameter regimes via the Plotkin decomposition.
  • The generalized-Hamming-weight analysis is tight for read cost on one initial block but leaves a gap on the other block.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Storage systems could switch erasure-code parameters on the fly with far lower overhead than full re-encoding if these bounds are met by practical codes.
  • The same bounding technique might be applied to conversion regimes other than merging or to non-linear codes.
  • Other algebraic code families could be checked to see whether they also saturate the write-cost bound.

Load-bearing premise

That generalized Hamming weights yield strictly sharper cost estimates than minimum-distance arguments alone precisely when the target code has nontrivial jumps in its weight hierarchy.

What would settle it

An explicit pair of linear codes with a nontrivial generalized Hamming weight jump for which the minimum-distance-only bound already matches the refined bound, or a merge conversion whose measured write cost falls below the stated lower bound.

read the original abstract

Erasure codes are a core component of most existing large-scale distributed storage systems, ensuring reliability against node failures. Recent work has shown that adapting code parameters to changing node failure rates can lead to significant storage savings. The default approach is to re-encode the data under a new code, which consumes substantial system resources. Code conversion was introduced to reduce this cost. However, existing work has mainly focused on conversions within specific classes of codes. In this paper, we study scalar linear code conversion in the merge regime for arbitrary linear codes. We derive universal lower bounds on the write and read costs in terms of unchanged and read symbols. The bounds are refined using generalized Hamming weights, which capture support-growth properties of subcodes and can give sharper estimates than minimum-distance-only arguments. We show that the framework recovers known bounds for important special cases and can be strictly stronger when the final code has nontrivial jumps in its generalized Hamming weight hierarchy. We then apply the framework to Reed-Muller codes and construct explicit Reed-Muller convertible codes using the Plotkin decomposition. For a natural Reed-Muller parameter regime, the construction attains the derived write-cost lower bound. For the read cost, the generalized-Hamming-weight analysis is sharp for one initial block, while a gap remains for the other block.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper studies scalar linear code conversion in the merge regime for arbitrary linear codes. It derives universal lower bounds on write and read costs in terms of unchanged and read symbols, refines these bounds using generalized Hamming weights to capture subcode support growth, shows that the framework recovers known bounds for special cases and can be strictly stronger for codes with nontrivial jumps in the GHW hierarchy, and constructs explicit Reed-Muller convertible codes via Plotkin decomposition that attain the write-cost lower bound in a natural parameter regime (with a remaining gap for read cost on one block).

Significance. If the derivations and attainability hold, the work provides a general framework extending code conversion beyond specific code families, enabling more efficient adaptation of erasure codes to varying node failure rates in distributed storage. Strengths include the explicit recovery of prior bounds, the use of generalized Hamming weights for sharper estimates when the hierarchy has jumps, and the explicit RM constructions meeting the write-cost bound.

minor comments (2)
  1. [Abstract] Abstract: the phrase 'natural Reed-Muller parameter regime' is used without a precise parameter range; a brief inline definition or reference to the relevant section would improve accessibility.
  2. [Abstract] The abstract notes a remaining gap for read cost on one initial block; a short statement in the introduction or conclusion on whether closing this gap is left for future work would clarify the scope.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper derives universal lower bounds on write/read costs directly from general properties of linear codes and the definition of generalized Hamming weights, without any reduction to fitted parameters or self-referential inputs. The Reed-Muller construction applies the standard Plotkin decomposition to attain the write-cost bound in a stated regime, and the framework's recovery of known special-case bounds serves as a consistency check rather than a circular dependency. No load-bearing self-citations, uniqueness theorems imported from the authors' prior work, or ansatzes smuggled via citation are present in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard properties of linear codes and generalized Hamming weights; no free parameters, new entities, or ad-hoc axioms are indicated in the abstract.

axioms (2)
  • standard math Linear codes are vector spaces over finite fields with standard support and weight properties
    Invoked throughout the abstract for erasure-code conversion.
  • domain assumption Generalized Hamming weights capture support-growth properties of subcodes
    Used explicitly to refine bounds beyond minimum-distance arguments.

pith-pipeline@v0.9.1-grok · 5777 in / 1391 out tokens · 22874 ms · 2026-06-26T02:07:39.777165+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references

  1. [1]

    Convertible codes for data and device heterogeneity,

    A. Gruica, B. Jany, and S. Kruglik, “Convertible codes for data and device heterogeneity,” in2026 IEEE In- ternational Symposium on Information Theory (ISIT), 2026, pp. 1–5

  2. [2]

    Erasure coding for distributed storage: An overview,

    S. Balaji, M. N. Krishnan, M. Vajha, V . Ramkumar, B. Sasidharan, and P. V . Kumar, “Erasure coding for distributed storage: An overview,”Science China Infor- mation Sciences, vol. 61, no. 10, p. 100 301, 2018

  3. [3]

    A survey of the past, present, and future of erasure coding for storage systems,

    Z. Shen et al., “A survey of the past, present, and future of erasure coding for storage systems,”ACM Transactions on Storage, vol. 21, no. 1, pp. 1–39, 2025

  4. [4]

    Roth,Introduction to coding theory

    R. Roth,Introduction to coding theory. Cambridge University Press, 2006

  5. [5]

    On the locality of codeword symbols,

    P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin, “On the locality of codeword symbols,”IEEE Transactions on Information Theory, vol. 58, no. 11, pp. 6925–6934, 2012

  6. [6]

    On the service rate region of Reed–Muller codes,

    H. Ly, E. Soljanin, and V . Lalitha, “On the service rate region of Reed–Muller codes,”IEEE Transactions on Information Theory, pp. 1–1, 2026

  7. [7]

    Maximal achievable service rates of codes and connections to combinatorial de- signs,

    H. Ly and E. Soljanin, “Maximal achievable service rates of codes and connections to combinatorial de- signs,”arXiv preprint arXiv:2506.16983, 2025

  8. [8]

    Bounds and constructions of codes with all-symbol locality and availability,

    S. Kruglik and A. Frolov, “Bounds and constructions of codes with all-symbol locality and availability,” in2017 IEEE International Symposium on Information Theory (ISIT), IEEE, 2017, pp. 1023–1027

  9. [9]

    Zigzag codes revisited: From optimal rebuilding to small skip cost and small fields,

    W. Zhang, H. M. Kiah, and S. H. Dau, “Zigzag codes revisited: From optimal rebuilding to small skip cost and small fields,”arXiv preprint arXiv:2509.23090, 2025

  10. [10]

    Enabling optimal access and error correction for the repair of Reed– Solomon codes,

    Z. Chen, M. Ye, and A. Barg, “Enabling optimal access and error correction for the repair of Reed– Solomon codes,”IEEE Transactions on Information Theory, vol. 66, no. 12, pp. 7439–7456, 2020

  11. [11]

    Cluster storage systems gotta have HeART: Improving storage efficiency by exploiting disk-reliability heterogeneity,

    S. Kadekodi, K. V . Rashmi, and G. R. Ganger, “Cluster storage systems gotta have HeART: Improving storage efficiency by exploiting disk-reliability heterogeneity,” inProceedings of the 17th USENIX Conference on File and Storage Technologies (FAST), 2019, pp. 345–358

  12. [12]

    Convertible codes: En- abling efficient conversion of coded data in distributed storage,

    F. Maturana and K. Rashmi, “Convertible codes: En- abling efficient conversion of coded data in distributed storage,”IEEE Transactions on Information Theory, vol. 68, no. 7, pp. 4392–4407, 2022

  13. [13]

    Bandwidth cost of code conversions in distributed storage: Fundamental limits and optimal constructions,

    F. Maturana and K. Rashmi, “Bandwidth cost of code conversions in distributed storage: Fundamental limits and optimal constructions,”IEEE Transactions on In- formation Theory, vol. 69, no. 8, pp. 4993–5008, 2023

  14. [14]

    Access- optimal linear MDS convertible codes for all parame- ters,

    F. Maturana, V . C. Mukka, and K. Rashmi, “Access- optimal linear MDS convertible codes for all parame- ters,” in2020 IEEE International Symposium on Infor- mation Theory (ISIT), IEEE, 2020, pp. 577–582

  15. [15]

    On MDS convertible codes in the merge regime,

    V . Ramkumar, X. Kong, G. Yeswanth Sai, M. Vajha, and M. Nikhil Krishnan, “On MDS convertible codes in the merge regime,”IEEE Transactions on Information Theory, pp. 1–1, 2026

  16. [16]

    On low field size constructions of access-optimal convertible codes,

    S. Chopra, F. Maturana, and K. Rashmi, “On low field size constructions of access-optimal convertible codes,” in2024 IEEE International Symposium on Information Theory (ISIT), 2024, pp. 1456–1461

  17. [17]

    Bounds and optimal constructions of generalized merge-convertible codes for code conversion into LRCs,

    H. Shi, W. Fang, and Y . Gao, “Bounds and optimal constructions of generalized merge-convertible codes for code conversion into LRCs,”IEEE Transactions on Information Theory, vol. 72, no. 5, pp. 2841–2860, 2026

  18. [18]

    Locally repairable con- vertible codes: Improved lower bound and general con- struction,

    S. Ge, H. Cai, and X. Tang, “Locally repairable con- vertible codes: Improved lower bound and general con- struction,”IEEE Transactions on Information Theory, vol. 72, no. 5, pp. 2915–2930, 2026

  19. [19]

    Locally repairable convertible codes with op- timal access costs,

    X. Kong, “Locally repairable convertible codes with op- timal access costs,”IEEE Transactions on Information Theory, vol. 70, no. 9, pp. 6239–6257, 2024

  20. [20]

    MDS generalized convert- ible code,

    S. Ge, H. Cai, and X. Tang, “MDS generalized convert- ible code,”arXiv preprint arXiv:2407.14304, 2024

  21. [21]

    Generalized Hamming weights for lin- ear codes,

    V . K. Wei, “Generalized Hamming weights for lin- ear codes,”IEEE Transactions on Information Theory, vol. 37, no. 5, pp. 1412–1418, 1991

  22. [22]

    On the generalized Hamming weights of (r,δ)-locally repairable codes,

    J. Hao and B. Chen, “On the generalized Hamming weights of (r,δ)-locally repairable codes,”IEEE Access, vol. 8, pp. 149 706–149 713, 2020

  23. [23]

    F. J. MacWilliams and N. J. A. Sloane,The Theory of Error-Correcting Codes. Elsevier, 1977, vol. 16

  24. [24]

    Reed–Muller codes: Theory and algorithms,

    E. Abbe, A. Shpilka, and M. Ye, “Reed–Muller codes: Theory and algorithms,”IEEE Transactions on Infor- mation Theory, vol. 67, no. 6, pp. 3251–3277, 2020