Improved Torn Paper Coding via Local Alignment
Pith reviewed 2026-05-25 05:00 UTC · model grok-4.3
The pith
Local alignment recovers global positions from individual torn-paper fragments using only local data, raising achievable rates and closing the gap to capacity in length-dependent deletion models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The local alignment strategy identifies global alignment bits inside each fragment using only information local to that fragment. This permits the decoder to locate fragments shorter than the minimum length required by earlier interleaved-pilot constructions. For the class of TPC-LP channels that delete all fragments below a logarithmic length threshold while permitting arbitrary length-dependent deletion probabilities on longer fragments, the resulting coding scheme achieves an arbitrarily small additive gap to capacity as the threshold grows.
What carries the argument
Local alignment: the identification of global alignment bits within each fragment from local information alone.
If this is right
- Fragments below the length previously treated as erasures can now contribute to the decoded codeword.
- The achievable rate exceeds that of the interleaved-pilot scheme for any fixed minimum fragment length.
- In TPC-LP channels the additive gap to capacity vanishes with increasing logarithmic threshold.
- The same local rule applies to any fragment whose length exceeds the chosen threshold, regardless of the deletion probability on longer pieces.
Where Pith is reading between the lines
- The technique may reduce the need for global coordination in other unordered-fragment models such as DNA storage or network packet reassembly.
- Decoder implementations could trade a modest increase in per-fragment computation for a substantial reduction in the fraction of discarded data.
- If the logarithmic threshold can be replaced by a slower-growing function while preserving the capacity-gap result, the scheme would tolerate even more aggressive length-dependent losses.
Load-bearing premise
Local information inside one fragment is enough to determine that fragment's original position without reference to global statistics across the whole received set.
What would settle it
A concrete calculation, for a fixed logarithmic threshold, of the largest achievable rate minus the TPC-LP capacity; if this difference fails to approach zero as the threshold is increased, the claimed gap result is false.
Figures
read the original abstract
In the torn paper channel, a transmitted codeword is broken at random locations into fragments that arrive at the decoder in an unordered manner. A central theoretical challenge within this model is global alignment -- the task of determining each fragment's original position -- in order to faithfully reconstruct the entire codeword. Prior work by Shomorony and Vahid introduced an interleaved-pilot scheme that successfully achieved a vanishing error probability. However, their alignment strategy relies heavily on global statistics, requiring fragments to exceed a minimum length and effectively discarding many shorter ones as erasures, which results in rates significantly below capacity. To address this gap, we propose an improved coding scheme that achieves a provable rate increase through a novel approach we call \textit{local alignment}. This approach identifies global alignment bits within each fragment using only local information, allowing the decoder to determine the positions of fragments that are shorter than those used in previous work. Consequently, the decoder can extract information from a much larger fraction of the channel output than in previous work, yielding significantly higher rates. Furthermore, we extend our analysis to torn paper coding with lost pieces (TPC-LP), a generalized model that accounts for length-dependent fragment deletion. For a class of TPC-LP channels that delete all fragments below a logarithmic length threshold while allowing arbitrary length-dependent deletion probabilities for longer fragments, we show that the proposed local alignment strategy achieves an arbitrarily small additive gap to capacity as the threshold increases.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a local alignment scheme for the torn paper channel that extracts global position information from individual fragments using only local statistics, in contrast to prior interleaved-pilot methods that rely on global fragment statistics and discard short fragments. The scheme is analyzed for both the standard torn-paper model and the TPC-LP generalization (length-dependent fragment deletions). The central claim is that, for the class of TPC-LP channels that erase all fragments shorter than a logarithmic threshold while permitting arbitrary length-dependent deletion probabilities on longer fragments, the local-alignment decoder achieves rates within an arbitrarily small additive gap of capacity as the threshold parameter grows.
Significance. If the local-alignment argument is valid, the result materially improves upon Shomorony-Vahid by recovering information from a larger fraction of the output and by closing the additive gap to capacity under a broad class of deletion laws. The manuscript supplies no machine-checked proofs or reproducible code, but the capacity-gap statement is a falsifiable prediction that could be tested numerically for concrete deletion schedules.
major comments (1)
- [Abstract / TPC-LP analysis section] The TPC-LP capacity-gap claim (abstract and the section analyzing the generalized model) asserts that local identification of alignment bits succeeds with high probability for every length-dependent deletion law on fragments above the log threshold, without reference to the global multiset of received fragments. The provided abstract supplies no derivation or error-bound sketch showing that this holds when deletion probabilities on longer fragments are chosen adversarially so that surviving fragments are locally similar or sparse; if the proof relies on an implicit uniformity or cross-fragment argument, the stated result for the entire class would not follow.
minor comments (1)
- Notation for the local alignment bits and the threshold parameter should be introduced with explicit definitions before the capacity-gap statement is invoked.
Simulated Author's Rebuttal
We thank the referee for the careful review and the constructive comment on the TPC-LP analysis. We address the major comment point by point below.
read point-by-point responses
-
Referee: [Abstract / TPC-LP analysis section] The TPC-LP capacity-gap claim (abstract and the section analyzing the generalized model) asserts that local identification of alignment bits succeeds with high probability for every length-dependent deletion law on fragments above the log threshold, without reference to the global multiset of received fragments. The provided abstract supplies no derivation or error-bound sketch showing that this holds when deletion probabilities on longer fragments are chosen adversarially so that surviving fragments are locally similar or sparse; if the proof relies on an implicit uniformity or cross-fragment argument, the stated result for the entire class would not follow.
Authors: The local-alignment decoder identifies alignment information from each fragment using only its own local statistics; the TPC-LP section derives the misalignment probability bound directly from the per-fragment length and the length-dependent deletion probability p(l) for that fragment alone. The bound is obtained via a union bound over local windows inside the fragment and holds uniformly for any choice of p(l) on lengths above the logarithmic threshold, because the code construction places alignment bits so that local uniqueness is guaranteed with probability decaying exponentially in fragment length regardless of the global multiset. No cross-fragment uniformity or joint statistics are invoked. Consequently the additive gap to capacity vanishes as the threshold grows, for every member of the stated class of TPC-LP channels. We agree that the abstract would benefit from an explicit one-sentence error-bound sketch; we will add it and expand the derivation in the TPC-LP section for clarity. revision: partial
Circularity Check
No circularity; derivation is self-contained
full rationale
The paper introduces a local alignment coding scheme for the torn paper channel and its TPC-LP generalization. The central result is an information-theoretic proof that the scheme achieves rates with arbitrarily small additive gap to capacity for the stated channel class as the length threshold grows. No equations reduce a claimed rate or prediction to a fitted input by construction, no self-citations form a load-bearing chain, and the alignment procedure is defined and analyzed directly from the channel model without renaming known results or smuggling ansatzes. The derivation therefore stands on independent coding arguments rather than tautological reduction to its own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Torn paper channel and TPC-LP model definitions as standard in the referenced prior work
Reference graph
Works this paper leans on
-
[1]
P. L. Antkowiak et al., “Low cost dna data storage using photolithographic synthesis and advanced information reconstruction and error correction,”Nature communications, vol. 11, no. 1, p. 5345, 2020
work page 2020
-
[2]
D. Bar-Lev, S. Marcovich, E. Yaakobi, and Y . Yehezkeally, “Adversarial torn-paper codes,”IEEE Trans. on Inf. Th., vol. 69, no. 10, pp. 6414–6427, 2023
work page 2023
-
[3]
Forward error correction for dna data storage,
M. Blawat et al., “Forward error correction for dna data storage,”Procedia Computer Science, vol. 80, pp. 1011–1022, 2016
work page 2016
-
[4]
A logic-gated nanorobot for targeted transport of molecular payloads,
S. M. Douglas, I. Bachelet, and G. M. Church, “A logic-gated nanorobot for targeted transport of molecular payloads,”Science, vol. 335, no. 6070, pp. 831–834, 2012
work page 2012
-
[5]
Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,
N. Goldman et al., “Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,”nature, vol. 494, no. 7435, pp. 77–80, 2013
work page 2013
-
[6]
K. A. S. Immink, “Runlength-limited sequences,”Proceedings of the IEEE, vol. 78, no. 11, pp. 1745–1759, 1990
work page 1990
-
[7]
Efficient nested hash reassembly codes for torn paper channels,
X. Jiao, B. Jiao, J. Mu, and H. Han, “Efficient nested hash reassembly codes for torn paper channels,”IEEE Transactions on Communications, 2025
work page 2025
-
[8]
Improvement for torn paper codes by local alignment technique,
L. Junsheng and N. Raviv, “Improvement for torn paper codes by local alignment technique,”2026 IEEE Intl. Symp. on Inf. Th. (ISIT), 2026
work page 2026
-
[9]
Single fragment forensic coding from van der corput sets,
J. Liu and N. Raviv, “Single fragment forensic coding from van der corput sets,” in2025 IEEE International Symposium on Information Theory (ISIT), IEEE, 2025, pp. 1–6
work page 2025
-
[10]
DNA merge-sort: A family of nested varshamov-tenengolts reassembly codes for out-of-order media,
S. Nassirpour, I. Shomorony, and A. Vahid, “DNA merge-sort: A family of nested varshamov-tenengolts reassembly codes for out-of-order media,”IEEE Transactions on Communications, vol. 72, no. 3, pp. 1303– 1317, 2023
work page 2023
-
[11]
Recovering a message from an incomplete set of noisy fragments,
A. N. Ravi, A. Vahid, and I. Shomorony, “Recovering a message from an incomplete set of noisy fragments,” arXiv preprint arXiv:2407.05544, 2024
-
[12]
I. Shomorony and A. Vahid, “Torn-paper coding,”IEEE Transactions on Information Theory, vol. 67, no. 12, pp. 7904–7913, 2021
work page 2021
-
[13]
Error correction for DNA storage,
J. Sima, N. Raviv, M. Schwartz, and J. Bruck, “Error correction for DNA storage,”IEEE BITS the Inf. Th. Magazine, vol. 3, no. 3, pp. 78–94, 2023
work page 2023
-
[14]
Break-resilient codes for forensic 3d fingerprinting,
C. Wang, J. Sima, and N. Raviv, “Break-resilient codes for forensic 3d fingerprinting,” in2024 IEEE Intl. Symp. on Inf. Th. (ISIT), IEEE, 2024, pp. 3148–3153
work page 2024
-
[15]
Secure information embedding and extraction in forensic 3d fingerprinting,
C. Wang et al., “Secure information embedding and extraction in forensic 3d fingerprinting,”USENIX Security, 2025
work page 2025
-
[16]
Wikipedia contributors,Directionality (molecular biology) — Wikipedia, the free encyclopedia, [Online; accessed 31-March-2026], 2026. [Online]. Available: https://en.wikipedia.org/wiki/Directionality_(molecular_ biology) APPENDIXA OMITTED PROOF Proof of Lemma 10.FixE ∈ T, lett=|E|, and letH E denote the submatrix ofHformed by the columns indexed byE. A bi...
work page 2026
-
[17]
The probability that the(i+ 1)-th randomly chosen column belongs to this span (an individual failure) is 2i/2r = 2i−r. Since rank deficiency occurs if any individual column belongs to the span of its predecessors, we can apply the union bound to sum these individual failure probabilities. Thus, the probability thatH E is rank deficient is at most: Pr H (r...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.