Improved Torn Paper Coding via Local Alignment

Junsheng Liu; Netanel Raviv

arxiv: 2605.23076 · v1 · pith:QOVVT7ZYnew · submitted 2026-05-21 · 💻 cs.IT · math.IT

Improved Torn Paper Coding via Local Alignment

Junsheng Liu , Netanel Raviv This is my paper

Pith reviewed 2026-05-25 05:00 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords torn paper channellocal alignmentglobal alignmentTPC-LPcapacity gapfragment deletioncoding scheme

0 comments

The pith

Local alignment recovers global positions from individual torn-paper fragments using only local data, raising achievable rates and closing the gap to capacity in length-dependent deletion models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a new local-alignment technique can identify each fragment's original position by examining only the bits inside that fragment, rather than statistics collected across all received pieces. Because the method works on shorter fragments that prior schemes had to discard, the decoder extracts usable data from a larger fraction of the channel output. In the generalized TPC-LP model, where every fragment shorter than a logarithmic threshold is lost, the same local method produces rates whose additive gap to the channel capacity can be driven to zero simply by raising the threshold. The result matters because torn-paper channels model real packet fragmentation and reassembly problems in which global coordination is costly or unavailable.

Core claim

The local alignment strategy identifies global alignment bits inside each fragment using only information local to that fragment. This permits the decoder to locate fragments shorter than the minimum length required by earlier interleaved-pilot constructions. For the class of TPC-LP channels that delete all fragments below a logarithmic length threshold while permitting arbitrary length-dependent deletion probabilities on longer fragments, the resulting coding scheme achieves an arbitrarily small additive gap to capacity as the threshold grows.

What carries the argument

Local alignment: the identification of global alignment bits within each fragment from local information alone.

If this is right

Fragments below the length previously treated as erasures can now contribute to the decoded codeword.
The achievable rate exceeds that of the interleaved-pilot scheme for any fixed minimum fragment length.
In TPC-LP channels the additive gap to capacity vanishes with increasing logarithmic threshold.
The same local rule applies to any fragment whose length exceeds the chosen threshold, regardless of the deletion probability on longer pieces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The technique may reduce the need for global coordination in other unordered-fragment models such as DNA storage or network packet reassembly.
Decoder implementations could trade a modest increase in per-fragment computation for a substantial reduction in the fraction of discarded data.
If the logarithmic threshold can be replaced by a slower-growing function while preserving the capacity-gap result, the scheme would tolerate even more aggressive length-dependent losses.

Load-bearing premise

Local information inside one fragment is enough to determine that fragment's original position without reference to global statistics across the whole received set.

What would settle it

A concrete calculation, for a fixed logarithmic threshold, of the largest achievable rate minus the TPC-LP capacity; if this difference fails to approach zero as the threshold is increased, the claimed gap result is false.

Figures

Figures reproduced from arXiv: 2605.23076 by Junsheng Liu, Netanel Raviv.

**Figure 1.** Figure 1: A comparison between the rates of [12], the rates of [8], the channel capacity [12], and the rates in this work. Torn paper coding was initially proposed in the seminal work by Shomorony and Vahid [12], which established the channel capacity and proposed initial encoding strategies. This framework was later generalized by Ravi, Vahid and Shomorony [11] to account for lost pieces, i.e., where each fragment … view at source ↗

**Figure 2.** Figure 2: A sketch of torn paper coding. Ref. [12] also shows that purely random codes achieve this capacity. As purely random codes are impractical for encoding and decoding, the authors introduce an interleaving framework based on combining a De Bruijn sequence with a shifted pseudorandom erasure code. In this framework, the De Bruijn sequence serves as a global pilot structure for fragment localization; however, … view at source ↗

**Figure 3.** Figure 3: A sketch of the decoding process. Algorithm 1 Local alignment. 1: Input: Fragment f = (f[0],f[1], . . . ,f[l − 1]) of length l ≥ (1 + δ)m log n. 2: Goal: Identify the substring pf of the pilot sequence p in the fragment f, and a substring y i f of the codeword ¯sv(i) in f, for each i ∈ [m − 1]. 3: Initialize empty strings w0, w1, . . . , wm−1. 4: for i = 0 to l − 1 do 5: Append f[i] to wi mod m 6: end for … view at source ↗

read the original abstract

In the torn paper channel, a transmitted codeword is broken at random locations into fragments that arrive at the decoder in an unordered manner. A central theoretical challenge within this model is global alignment -- the task of determining each fragment's original position -- in order to faithfully reconstruct the entire codeword. Prior work by Shomorony and Vahid introduced an interleaved-pilot scheme that successfully achieved a vanishing error probability. However, their alignment strategy relies heavily on global statistics, requiring fragments to exceed a minimum length and effectively discarding many shorter ones as erasures, which results in rates significantly below capacity. To address this gap, we propose an improved coding scheme that achieves a provable rate increase through a novel approach we call \textit{local alignment}. This approach identifies global alignment bits within each fragment using only local information, allowing the decoder to determine the positions of fragments that are shorter than those used in previous work. Consequently, the decoder can extract information from a much larger fraction of the channel output than in previous work, yielding significantly higher rates. Furthermore, we extend our analysis to torn paper coding with lost pieces (TPC-LP), a generalized model that accounts for length-dependent fragment deletion. For a class of TPC-LP channels that delete all fragments below a logarithmic length threshold while allowing arbitrary length-dependent deletion probabilities for longer fragments, we show that the proposed local alignment strategy achieves an arbitrarily small additive gap to capacity as the threshold increases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Local alignment lets them keep shorter fragments and close the gap to capacity in TPC-LP, but the arbitrary-deletion case needs checking.

read the letter

The main point is that this paper swaps the global-statistics alignment from Shomorony and Vahid for a local method that pulls alignment bits from each fragment on its own. That change lets them use shorter pieces instead of treating them as erasures, which raises the rate and, for the TPC-LP class with a log-length deletion threshold, drives the additive gap to capacity to zero as the threshold grows. The extension to that generalized model is the clearest new piece relative to the cited prior work. It is a direct fix for the rate loss that came from discarding too many fragments, and the framing around DNA storage and packet reassembly is reasonable for the setting. The analysis appears to engage the literature without obvious circularity or unfalsifiable claims. The soft spot is the stress-test concern: the result is stated for arbitrary length-dependent deletion probabilities on fragments above the threshold. If those probabilities can be set so the surviving fragments are too sparse or locally similar, local identification of alignment bits could fail without cross-fragment statistics. The abstract says the local strategy works for the whole class, so the proof must rule that out; if it does, the claim holds, but that step is load-bearing and worth verifying in the full write-up. This is a targeted coding-theory paper for readers already working on torn-paper or fragment-reassembly channels. It is not reshaping the field but gives a concrete improvement that deserves referee time to check the local-alignment argument and the capacity-gap derivation.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a local alignment scheme for the torn paper channel that extracts global position information from individual fragments using only local statistics, in contrast to prior interleaved-pilot methods that rely on global fragment statistics and discard short fragments. The scheme is analyzed for both the standard torn-paper model and the TPC-LP generalization (length-dependent fragment deletions). The central claim is that, for the class of TPC-LP channels that erase all fragments shorter than a logarithmic threshold while permitting arbitrary length-dependent deletion probabilities on longer fragments, the local-alignment decoder achieves rates within an arbitrarily small additive gap of capacity as the threshold parameter grows.

Significance. If the local-alignment argument is valid, the result materially improves upon Shomorony-Vahid by recovering information from a larger fraction of the output and by closing the additive gap to capacity under a broad class of deletion laws. The manuscript supplies no machine-checked proofs or reproducible code, but the capacity-gap statement is a falsifiable prediction that could be tested numerically for concrete deletion schedules.

major comments (1)

[Abstract / TPC-LP analysis section] The TPC-LP capacity-gap claim (abstract and the section analyzing the generalized model) asserts that local identification of alignment bits succeeds with high probability for every length-dependent deletion law on fragments above the log threshold, without reference to the global multiset of received fragments. The provided abstract supplies no derivation or error-bound sketch showing that this holds when deletion probabilities on longer fragments are chosen adversarially so that surviving fragments are locally similar or sparse; if the proof relies on an implicit uniformity or cross-fragment argument, the stated result for the entire class would not follow.

minor comments (1)

Notation for the local alignment bits and the threshold parameter should be introduced with explicit definitions before the capacity-gap statement is invoked.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and the constructive comment on the TPC-LP analysis. We address the major comment point by point below.

read point-by-point responses

Referee: [Abstract / TPC-LP analysis section] The TPC-LP capacity-gap claim (abstract and the section analyzing the generalized model) asserts that local identification of alignment bits succeeds with high probability for every length-dependent deletion law on fragments above the log threshold, without reference to the global multiset of received fragments. The provided abstract supplies no derivation or error-bound sketch showing that this holds when deletion probabilities on longer fragments are chosen adversarially so that surviving fragments are locally similar or sparse; if the proof relies on an implicit uniformity or cross-fragment argument, the stated result for the entire class would not follow.

Authors: The local-alignment decoder identifies alignment information from each fragment using only its own local statistics; the TPC-LP section derives the misalignment probability bound directly from the per-fragment length and the length-dependent deletion probability p(l) for that fragment alone. The bound is obtained via a union bound over local windows inside the fragment and holds uniformly for any choice of p(l) on lengths above the logarithmic threshold, because the code construction places alignment bits so that local uniqueness is guaranteed with probability decaying exponentially in fragment length regardless of the global multiset. No cross-fragment uniformity or joint statistics are invoked. Consequently the additive gap to capacity vanishes as the threshold grows, for every member of the stated class of TPC-LP channels. We agree that the abstract would benefit from an explicit one-sentence error-bound sketch; we will add it and expand the derivation in the TPC-LP section for clarity. revision: partial

Circularity Check

0 steps flagged

No circularity; derivation is self-contained

full rationale

The paper introduces a local alignment coding scheme for the torn paper channel and its TPC-LP generalization. The central result is an information-theoretic proof that the scheme achieves rates with arbitrarily small additive gap to capacity for the stated channel class as the length threshold grows. No equations reduce a claimed rate or prediction to a fitted input by construction, no self-citations form a load-bearing chain, and the alignment procedure is defined and analyzed directly from the channel model without renaming known results or smuggling ansatzes. The derivation therefore stands on independent coding arguments rather than tautological reduction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract alone; no explicit free parameters, axioms, or invented entities are stated in the provided text. The channel models themselves are treated as given domain assumptions.

axioms (1)

domain assumption Torn paper channel and TPC-LP model definitions as standard in the referenced prior work
The analysis presupposes these channel models without re-deriving them.

pith-pipeline@v0.9.0 · 5779 in / 1181 out tokens · 20910 ms · 2026-05-25T05:00:00.323208+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

Low cost dna data storage using photolithographic synthesis and advanced information reconstruction and error correction,

P. L. Antkowiak et al., “Low cost dna data storage using photolithographic synthesis and advanced information reconstruction and error correction,”Nature communications, vol. 11, no. 1, p. 5345, 2020

work page 2020
[2]

Adversarial torn-paper codes,

D. Bar-Lev, S. Marcovich, E. Yaakobi, and Y . Yehezkeally, “Adversarial torn-paper codes,”IEEE Trans. on Inf. Th., vol. 69, no. 10, pp. 6414–6427, 2023

work page 2023
[3]

Forward error correction for dna data storage,

M. Blawat et al., “Forward error correction for dna data storage,”Procedia Computer Science, vol. 80, pp. 1011–1022, 2016

work page 2016
[4]

A logic-gated nanorobot for targeted transport of molecular payloads,

S. M. Douglas, I. Bachelet, and G. M. Church, “A logic-gated nanorobot for targeted transport of molecular payloads,”Science, vol. 335, no. 6070, pp. 831–834, 2012

work page 2012
[5]

Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,

N. Goldman et al., “Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,”nature, vol. 494, no. 7435, pp. 77–80, 2013

work page 2013
[6]

Runlength-limited sequences,

K. A. S. Immink, “Runlength-limited sequences,”Proceedings of the IEEE, vol. 78, no. 11, pp. 1745–1759, 1990

work page 1990
[7]

Efficient nested hash reassembly codes for torn paper channels,

X. Jiao, B. Jiao, J. Mu, and H. Han, “Efficient nested hash reassembly codes for torn paper channels,”IEEE Transactions on Communications, 2025

work page 2025
[8]

Improvement for torn paper codes by local alignment technique,

L. Junsheng and N. Raviv, “Improvement for torn paper codes by local alignment technique,”2026 IEEE Intl. Symp. on Inf. Th. (ISIT), 2026

work page 2026
[9]

Single fragment forensic coding from van der corput sets,

J. Liu and N. Raviv, “Single fragment forensic coding from van der corput sets,” in2025 IEEE International Symposium on Information Theory (ISIT), IEEE, 2025, pp. 1–6

work page 2025
[10]

DNA merge-sort: A family of nested varshamov-tenengolts reassembly codes for out-of-order media,

S. Nassirpour, I. Shomorony, and A. Vahid, “DNA merge-sort: A family of nested varshamov-tenengolts reassembly codes for out-of-order media,”IEEE Transactions on Communications, vol. 72, no. 3, pp. 1303– 1317, 2023

work page 2023
[11]

Recovering a message from an incomplete set of noisy fragments,

A. N. Ravi, A. Vahid, and I. Shomorony, “Recovering a message from an incomplete set of noisy fragments,” arXiv preprint arXiv:2407.05544, 2024

work page arXiv 2024
[12]

Torn-paper coding,

I. Shomorony and A. Vahid, “Torn-paper coding,”IEEE Transactions on Information Theory, vol. 67, no. 12, pp. 7904–7913, 2021

work page 2021
[13]

Error correction for DNA storage,

J. Sima, N. Raviv, M. Schwartz, and J. Bruck, “Error correction for DNA storage,”IEEE BITS the Inf. Th. Magazine, vol. 3, no. 3, pp. 78–94, 2023

work page 2023
[14]

Break-resilient codes for forensic 3d fingerprinting,

C. Wang, J. Sima, and N. Raviv, “Break-resilient codes for forensic 3d fingerprinting,” in2024 IEEE Intl. Symp. on Inf. Th. (ISIT), IEEE, 2024, pp. 3148–3153

work page 2024
[15]

Secure information embedding and extraction in forensic 3d fingerprinting,

C. Wang et al., “Secure information embedding and extraction in forensic 3d fingerprinting,”USENIX Security, 2025

work page 2025
[16]

[Online]

Wikipedia contributors,Directionality (molecular biology) — Wikipedia, the free encyclopedia, [Online; accessed 31-March-2026], 2026. [Online]. Available: https://en.wikipedia.org/wiki/Directionality_(molecular_ biology) APPENDIXA OMITTED PROOF Proof of Lemma 10.FixE ∈ T, lett=|E|, and letH E denote the submatrix ofHformed by the columns indexed byE. A bi...

work page 2026
[17]

Since rank deficiency occurs if any individual column belongs to the span of its predecessors, we can apply the union bound to sum these individual failure probabilities

The probability that the(i+ 1)-th randomly chosen column belongs to this span (an individual failure) is 2i/2r = 2i−r. Since rank deficiency occurs if any individual column belongs to the span of its predecessors, we can apply the union bound to sum these individual failure probabilities. Thus, the probability thatH E is rank deficient is at most: Pr H (r...

work page

[1] [1]

Low cost dna data storage using photolithographic synthesis and advanced information reconstruction and error correction,

P. L. Antkowiak et al., “Low cost dna data storage using photolithographic synthesis and advanced information reconstruction and error correction,”Nature communications, vol. 11, no. 1, p. 5345, 2020

work page 2020

[2] [2]

Adversarial torn-paper codes,

D. Bar-Lev, S. Marcovich, E. Yaakobi, and Y . Yehezkeally, “Adversarial torn-paper codes,”IEEE Trans. on Inf. Th., vol. 69, no. 10, pp. 6414–6427, 2023

work page 2023

[3] [3]

Forward error correction for dna data storage,

M. Blawat et al., “Forward error correction for dna data storage,”Procedia Computer Science, vol. 80, pp. 1011–1022, 2016

work page 2016

[4] [4]

A logic-gated nanorobot for targeted transport of molecular payloads,

S. M. Douglas, I. Bachelet, and G. M. Church, “A logic-gated nanorobot for targeted transport of molecular payloads,”Science, vol. 335, no. 6070, pp. 831–834, 2012

work page 2012

[5] [5]

Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,

N. Goldman et al., “Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,”nature, vol. 494, no. 7435, pp. 77–80, 2013

work page 2013

[6] [6]

Runlength-limited sequences,

K. A. S. Immink, “Runlength-limited sequences,”Proceedings of the IEEE, vol. 78, no. 11, pp. 1745–1759, 1990

work page 1990

[7] [7]

Efficient nested hash reassembly codes for torn paper channels,

X. Jiao, B. Jiao, J. Mu, and H. Han, “Efficient nested hash reassembly codes for torn paper channels,”IEEE Transactions on Communications, 2025

work page 2025

[8] [8]

Improvement for torn paper codes by local alignment technique,

L. Junsheng and N. Raviv, “Improvement for torn paper codes by local alignment technique,”2026 IEEE Intl. Symp. on Inf. Th. (ISIT), 2026

work page 2026

[9] [9]

Single fragment forensic coding from van der corput sets,

J. Liu and N. Raviv, “Single fragment forensic coding from van der corput sets,” in2025 IEEE International Symposium on Information Theory (ISIT), IEEE, 2025, pp. 1–6

work page 2025

[10] [10]

DNA merge-sort: A family of nested varshamov-tenengolts reassembly codes for out-of-order media,

S. Nassirpour, I. Shomorony, and A. Vahid, “DNA merge-sort: A family of nested varshamov-tenengolts reassembly codes for out-of-order media,”IEEE Transactions on Communications, vol. 72, no. 3, pp. 1303– 1317, 2023

work page 2023

[11] [11]

Recovering a message from an incomplete set of noisy fragments,

A. N. Ravi, A. Vahid, and I. Shomorony, “Recovering a message from an incomplete set of noisy fragments,” arXiv preprint arXiv:2407.05544, 2024

work page arXiv 2024

[12] [12]

Torn-paper coding,

I. Shomorony and A. Vahid, “Torn-paper coding,”IEEE Transactions on Information Theory, vol. 67, no. 12, pp. 7904–7913, 2021

work page 2021

[13] [13]

Error correction for DNA storage,

J. Sima, N. Raviv, M. Schwartz, and J. Bruck, “Error correction for DNA storage,”IEEE BITS the Inf. Th. Magazine, vol. 3, no. 3, pp. 78–94, 2023

work page 2023

[14] [14]

Break-resilient codes for forensic 3d fingerprinting,

C. Wang, J. Sima, and N. Raviv, “Break-resilient codes for forensic 3d fingerprinting,” in2024 IEEE Intl. Symp. on Inf. Th. (ISIT), IEEE, 2024, pp. 3148–3153

work page 2024

[15] [15]

Secure information embedding and extraction in forensic 3d fingerprinting,

C. Wang et al., “Secure information embedding and extraction in forensic 3d fingerprinting,”USENIX Security, 2025

work page 2025

[16] [16]

[Online]

Wikipedia contributors,Directionality (molecular biology) — Wikipedia, the free encyclopedia, [Online; accessed 31-March-2026], 2026. [Online]. Available: https://en.wikipedia.org/wiki/Directionality_(molecular_ biology) APPENDIXA OMITTED PROOF Proof of Lemma 10.FixE ∈ T, lett=|E|, and letH E denote the submatrix ofHformed by the columns indexed byE. A bi...

work page 2026

[17] [17]

Since rank deficiency occurs if any individual column belongs to the span of its predecessors, we can apply the union bound to sum these individual failure probabilities

The probability that the(i+ 1)-th randomly chosen column belongs to this span (an individual failure) is 2i/2r = 2i−r. Since rank deficiency occurs if any individual column belongs to the span of its predecessors, we can apply the union bound to sum these individual failure probabilities. Thus, the probability thatH E is rank deficient is at most: Pr H (r...

work page