pith. sign in

arxiv: 2605.23076 · v1 · pith:QOVVT7ZYnew · submitted 2026-05-21 · 💻 cs.IT · math.IT

Improved Torn Paper Coding via Local Alignment

Pith reviewed 2026-05-25 05:00 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords torn paper channellocal alignmentglobal alignmentTPC-LPcapacity gapfragment deletioncoding scheme
0
0 comments X

The pith

Local alignment recovers global positions from individual torn-paper fragments using only local data, raising achievable rates and closing the gap to capacity in length-dependent deletion models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a new local-alignment technique can identify each fragment's original position by examining only the bits inside that fragment, rather than statistics collected across all received pieces. Because the method works on shorter fragments that prior schemes had to discard, the decoder extracts usable data from a larger fraction of the channel output. In the generalized TPC-LP model, where every fragment shorter than a logarithmic threshold is lost, the same local method produces rates whose additive gap to the channel capacity can be driven to zero simply by raising the threshold. The result matters because torn-paper channels model real packet fragmentation and reassembly problems in which global coordination is costly or unavailable.

Core claim

The local alignment strategy identifies global alignment bits inside each fragment using only information local to that fragment. This permits the decoder to locate fragments shorter than the minimum length required by earlier interleaved-pilot constructions. For the class of TPC-LP channels that delete all fragments below a logarithmic length threshold while permitting arbitrary length-dependent deletion probabilities on longer fragments, the resulting coding scheme achieves an arbitrarily small additive gap to capacity as the threshold grows.

What carries the argument

Local alignment: the identification of global alignment bits within each fragment from local information alone.

If this is right

  • Fragments below the length previously treated as erasures can now contribute to the decoded codeword.
  • The achievable rate exceeds that of the interleaved-pilot scheme for any fixed minimum fragment length.
  • In TPC-LP channels the additive gap to capacity vanishes with increasing logarithmic threshold.
  • The same local rule applies to any fragment whose length exceeds the chosen threshold, regardless of the deletion probability on longer pieces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The technique may reduce the need for global coordination in other unordered-fragment models such as DNA storage or network packet reassembly.
  • Decoder implementations could trade a modest increase in per-fragment computation for a substantial reduction in the fraction of discarded data.
  • If the logarithmic threshold can be replaced by a slower-growing function while preserving the capacity-gap result, the scheme would tolerate even more aggressive length-dependent losses.

Load-bearing premise

Local information inside one fragment is enough to determine that fragment's original position without reference to global statistics across the whole received set.

What would settle it

A concrete calculation, for a fixed logarithmic threshold, of the largest achievable rate minus the TPC-LP capacity; if this difference fails to approach zero as the threshold is increased, the claimed gap result is false.

Figures

Figures reproduced from arXiv: 2605.23076 by Junsheng Liu, Netanel Raviv.

Figure 1
Figure 1. Figure 1: A comparison between the rates of [12], the rates of [8], the channel capacity [12], and the rates in this work. Torn paper coding was initially proposed in the seminal work by Shomorony and Vahid [12], which established the channel capacity and proposed initial encoding strategies. This framework was later generalized by Ravi, Vahid and Shomorony [11] to account for lost pieces, i.e., where each fragment … view at source ↗
Figure 2
Figure 2. Figure 2: A sketch of torn paper coding. Ref. [12] also shows that purely random codes achieve this capacity. As purely random codes are impractical for encoding and decoding, the authors introduce an interleaving framework based on combining a De Bruijn sequence with a shifted pseudorandom erasure code. In this framework, the De Bruijn sequence serves as a global pilot structure for fragment localization; however, … view at source ↗
Figure 3
Figure 3. Figure 3: A sketch of the decoding process. Algorithm 1 Local alignment. 1: Input: Fragment f = (f[0],f[1], . . . ,f[l − 1]) of length l ≥ (1 + δ)m log n. 2: Goal: Identify the substring pf of the pilot sequence p in the fragment f, and a substring y i f of the codeword ¯sv(i) in f, for each i ∈ [m − 1]. 3: Initialize empty strings w0, w1, . . . , wm−1. 4: for i = 0 to l − 1 do 5: Append f[i] to wi mod m 6: end for … view at source ↗
read the original abstract

In the torn paper channel, a transmitted codeword is broken at random locations into fragments that arrive at the decoder in an unordered manner. A central theoretical challenge within this model is global alignment -- the task of determining each fragment's original position -- in order to faithfully reconstruct the entire codeword. Prior work by Shomorony and Vahid introduced an interleaved-pilot scheme that successfully achieved a vanishing error probability. However, their alignment strategy relies heavily on global statistics, requiring fragments to exceed a minimum length and effectively discarding many shorter ones as erasures, which results in rates significantly below capacity. To address this gap, we propose an improved coding scheme that achieves a provable rate increase through a novel approach we call \textit{local alignment}. This approach identifies global alignment bits within each fragment using only local information, allowing the decoder to determine the positions of fragments that are shorter than those used in previous work. Consequently, the decoder can extract information from a much larger fraction of the channel output than in previous work, yielding significantly higher rates. Furthermore, we extend our analysis to torn paper coding with lost pieces (TPC-LP), a generalized model that accounts for length-dependent fragment deletion. For a class of TPC-LP channels that delete all fragments below a logarithmic length threshold while allowing arbitrary length-dependent deletion probabilities for longer fragments, we show that the proposed local alignment strategy achieves an arbitrarily small additive gap to capacity as the threshold increases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a local alignment scheme for the torn paper channel that extracts global position information from individual fragments using only local statistics, in contrast to prior interleaved-pilot methods that rely on global fragment statistics and discard short fragments. The scheme is analyzed for both the standard torn-paper model and the TPC-LP generalization (length-dependent fragment deletions). The central claim is that, for the class of TPC-LP channels that erase all fragments shorter than a logarithmic threshold while permitting arbitrary length-dependent deletion probabilities on longer fragments, the local-alignment decoder achieves rates within an arbitrarily small additive gap of capacity as the threshold parameter grows.

Significance. If the local-alignment argument is valid, the result materially improves upon Shomorony-Vahid by recovering information from a larger fraction of the output and by closing the additive gap to capacity under a broad class of deletion laws. The manuscript supplies no machine-checked proofs or reproducible code, but the capacity-gap statement is a falsifiable prediction that could be tested numerically for concrete deletion schedules.

major comments (1)
  1. [Abstract / TPC-LP analysis section] The TPC-LP capacity-gap claim (abstract and the section analyzing the generalized model) asserts that local identification of alignment bits succeeds with high probability for every length-dependent deletion law on fragments above the log threshold, without reference to the global multiset of received fragments. The provided abstract supplies no derivation or error-bound sketch showing that this holds when deletion probabilities on longer fragments are chosen adversarially so that surviving fragments are locally similar or sparse; if the proof relies on an implicit uniformity or cross-fragment argument, the stated result for the entire class would not follow.
minor comments (1)
  1. Notation for the local alignment bits and the threshold parameter should be introduced with explicit definitions before the capacity-gap statement is invoked.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and the constructive comment on the TPC-LP analysis. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract / TPC-LP analysis section] The TPC-LP capacity-gap claim (abstract and the section analyzing the generalized model) asserts that local identification of alignment bits succeeds with high probability for every length-dependent deletion law on fragments above the log threshold, without reference to the global multiset of received fragments. The provided abstract supplies no derivation or error-bound sketch showing that this holds when deletion probabilities on longer fragments are chosen adversarially so that surviving fragments are locally similar or sparse; if the proof relies on an implicit uniformity or cross-fragment argument, the stated result for the entire class would not follow.

    Authors: The local-alignment decoder identifies alignment information from each fragment using only its own local statistics; the TPC-LP section derives the misalignment probability bound directly from the per-fragment length and the length-dependent deletion probability p(l) for that fragment alone. The bound is obtained via a union bound over local windows inside the fragment and holds uniformly for any choice of p(l) on lengths above the logarithmic threshold, because the code construction places alignment bits so that local uniqueness is guaranteed with probability decaying exponentially in fragment length regardless of the global multiset. No cross-fragment uniformity or joint statistics are invoked. Consequently the additive gap to capacity vanishes as the threshold grows, for every member of the stated class of TPC-LP channels. We agree that the abstract would benefit from an explicit one-sentence error-bound sketch; we will add it and expand the derivation in the TPC-LP section for clarity. revision: partial

Circularity Check

0 steps flagged

No circularity; derivation is self-contained

full rationale

The paper introduces a local alignment coding scheme for the torn paper channel and its TPC-LP generalization. The central result is an information-theoretic proof that the scheme achieves rates with arbitrarily small additive gap to capacity for the stated channel class as the length threshold grows. No equations reduce a claimed rate or prediction to a fitted input by construction, no self-citations form a load-bearing chain, and the alignment procedure is defined and analyzed directly from the channel model without renaming known results or smuggling ansatzes. The derivation therefore stands on independent coding arguments rather than tautological reduction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract alone; no explicit free parameters, axioms, or invented entities are stated in the provided text. The channel models themselves are treated as given domain assumptions.

axioms (1)
  • domain assumption Torn paper channel and TPC-LP model definitions as standard in the referenced prior work
    The analysis presupposes these channel models without re-deriving them.

pith-pipeline@v0.9.0 · 5779 in / 1181 out tokens · 20910 ms · 2026-05-25T05:00:00.323208+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    Low cost dna data storage using photolithographic synthesis and advanced information reconstruction and error correction,

    P. L. Antkowiak et al., “Low cost dna data storage using photolithographic synthesis and advanced information reconstruction and error correction,”Nature communications, vol. 11, no. 1, p. 5345, 2020

  2. [2]

    Adversarial torn-paper codes,

    D. Bar-Lev, S. Marcovich, E. Yaakobi, and Y . Yehezkeally, “Adversarial torn-paper codes,”IEEE Trans. on Inf. Th., vol. 69, no. 10, pp. 6414–6427, 2023

  3. [3]

    Forward error correction for dna data storage,

    M. Blawat et al., “Forward error correction for dna data storage,”Procedia Computer Science, vol. 80, pp. 1011–1022, 2016

  4. [4]

    A logic-gated nanorobot for targeted transport of molecular payloads,

    S. M. Douglas, I. Bachelet, and G. M. Church, “A logic-gated nanorobot for targeted transport of molecular payloads,”Science, vol. 335, no. 6070, pp. 831–834, 2012

  5. [5]

    Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,

    N. Goldman et al., “Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,”nature, vol. 494, no. 7435, pp. 77–80, 2013

  6. [6]

    Runlength-limited sequences,

    K. A. S. Immink, “Runlength-limited sequences,”Proceedings of the IEEE, vol. 78, no. 11, pp. 1745–1759, 1990

  7. [7]

    Efficient nested hash reassembly codes for torn paper channels,

    X. Jiao, B. Jiao, J. Mu, and H. Han, “Efficient nested hash reassembly codes for torn paper channels,”IEEE Transactions on Communications, 2025

  8. [8]

    Improvement for torn paper codes by local alignment technique,

    L. Junsheng and N. Raviv, “Improvement for torn paper codes by local alignment technique,”2026 IEEE Intl. Symp. on Inf. Th. (ISIT), 2026

  9. [9]

    Single fragment forensic coding from van der corput sets,

    J. Liu and N. Raviv, “Single fragment forensic coding from van der corput sets,” in2025 IEEE International Symposium on Information Theory (ISIT), IEEE, 2025, pp. 1–6

  10. [10]

    DNA merge-sort: A family of nested varshamov-tenengolts reassembly codes for out-of-order media,

    S. Nassirpour, I. Shomorony, and A. Vahid, “DNA merge-sort: A family of nested varshamov-tenengolts reassembly codes for out-of-order media,”IEEE Transactions on Communications, vol. 72, no. 3, pp. 1303– 1317, 2023

  11. [11]

    Recovering a message from an incomplete set of noisy fragments,

    A. N. Ravi, A. Vahid, and I. Shomorony, “Recovering a message from an incomplete set of noisy fragments,” arXiv preprint arXiv:2407.05544, 2024

  12. [12]

    Torn-paper coding,

    I. Shomorony and A. Vahid, “Torn-paper coding,”IEEE Transactions on Information Theory, vol. 67, no. 12, pp. 7904–7913, 2021

  13. [13]

    Error correction for DNA storage,

    J. Sima, N. Raviv, M. Schwartz, and J. Bruck, “Error correction for DNA storage,”IEEE BITS the Inf. Th. Magazine, vol. 3, no. 3, pp. 78–94, 2023

  14. [14]

    Break-resilient codes for forensic 3d fingerprinting,

    C. Wang, J. Sima, and N. Raviv, “Break-resilient codes for forensic 3d fingerprinting,” in2024 IEEE Intl. Symp. on Inf. Th. (ISIT), IEEE, 2024, pp. 3148–3153

  15. [15]

    Secure information embedding and extraction in forensic 3d fingerprinting,

    C. Wang et al., “Secure information embedding and extraction in forensic 3d fingerprinting,”USENIX Security, 2025

  16. [16]

    [Online]

    Wikipedia contributors,Directionality (molecular biology) — Wikipedia, the free encyclopedia, [Online; accessed 31-March-2026], 2026. [Online]. Available: https://en.wikipedia.org/wiki/Directionality_(molecular_ biology) APPENDIXA OMITTED PROOF Proof of Lemma 10.FixE ∈ T, lett=|E|, and letH E denote the submatrix ofHformed by the columns indexed byE. A bi...

  17. [17]

    Since rank deficiency occurs if any individual column belongs to the span of its predecessors, we can apply the union bound to sum these individual failure probabilities

    The probability that the(i+ 1)-th randomly chosen column belongs to this span (an individual failure) is 2i/2r = 2i−r. Since rank deficiency occurs if any individual column belongs to the span of its predecessors, we can apply the union bound to sum these individual failure probabilities. Thus, the probability thatH E is rank deficient is at most: Pr H (r...