Recognition: unknown
From Pixels to Nucleotides: End-to-End Token-Based Video Compression for DNA Storage
Pith reviewed 2026-05-10 14:16 UTC · model grok-4.3
The pith
Token-based neural codec packs video into DNA at 1.91 bits per nucleotide
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HELIX is the first end-to-end neural network jointly optimizing video compression and DNA encoding. The central insight is that token-based representations naturally align with DNA's quaternary alphabet, where discrete semantic units map directly to ATCG bases. TK-SCONE implements this through Kronecker-structured mixing that breaks spatial correlations and FSM-based mapping that guarantees biochemical constraints. Unlike two-stage approaches, HELIX learns token distributions simultaneously optimized for visual quality, prediction under masking, and DNA synthesis efficiency.
What carries the argument
TK-SCONE, the Token-Kronecker Structured Constraint-Optimized Neural Encoding that uses Kronecker-structured mixing to break spatial correlations in video tokens and FSM-based mapping to guarantee biochemical constraints during DNA encoding.
Load-bearing premise
Token-based representations naturally align with DNA's quaternary alphabet so that joint end-to-end optimization can achieve high visual quality and guaranteed biochemical constraints without major trade-offs.
What would settle it
A head-to-head experiment on the same video test set showing that a conventional video codec followed by separate DNA encoding achieves equal or lower bits per nucleotide while fully meeting all biochemical constraints would refute the need for joint token-based optimization.
Figures
read the original abstract
DNA-based storage has emerged as a promising approach to the global data crisis, offering molecular-scale density and millennial-scale stability at low maintenance cost. Over the past decade, substantial progress has been made in storing text, images, and files in DNA -- yet video remains an open challenge. The difficulty is not merely technical: effective video DNA storage requires co-designing compression and molecular encoding from the ground up, a challenge that sits at the intersection of two fields that have largely evolved independently. In this work, we present HELIX, the first end-to-end neural network jointly optimizing video compression and DNA encoding -- prior approaches treat the two stages independently, leaving biochemical constraints and compression objectives fundamentally misaligned. Our key insight: token-based representations naturally align with DNA's quaternary alphabet -- discrete semantic units map directly to ATCG bases. We introduce TK-SCONE (Token-Kronecker Structured Constraint-Optimized Neural Encoding), which achieves 1.91 bits per nucleotide through Kronecker-structured mixing that breaks spatial correlations and FSM-based mapping that guarantees biochemical constraints. Unlike two-stage approaches, HELIX learns token distributions simultaneously optimized for visual quality, prediction under masking, and DNA synthesis efficiency. This work demonstrates for the first time that learned compression and molecular storage converge naturally at token representations -- suggesting a new paradigm where neural video codecs are designed for biological substrates from the ground up.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces HELIX, claimed as the first end-to-end neural network for jointly optimizing video compression and DNA encoding. It proposes TK-SCONE, which uses token-based representations aligned to DNA's quaternary alphabet, Kronecker-structured mixing to break spatial correlations, and FSM-based mapping to enforce biochemical constraints, achieving 1.91 bits per nucleotide while learning token distributions for visual quality, masking prediction, and synthesis efficiency.
Significance. If the joint optimization claim holds with verifiable differentiability and experimental validation against baselines, the work could meaningfully advance DNA storage for video by demonstrating that token representations enable co-design of neural codecs and molecular constraints, potentially opening a new paradigm for substrate-aware compression.
major comments (2)
- [Abstract, §3] Abstract and §3 (TK-SCONE description): The central claim of 'end-to-end' joint optimization 'simultaneously' learning token distributions for visual quality, masking, and DNA constraints is load-bearing but unsupported without evidence that the FSM-based mapping is replaced by a differentiable surrogate (e.g., Gumbel-softmax or straight-through estimator). Standard FSMs are discrete and block gradient flow from the DNA-synthesis loss back to the token predictor, reducing the method to the two-stage pipeline the paper criticizes.
- [Abstract] Abstract: The reported 1.91 bits per nucleotide is presented without any baseline comparisons, ablation studies, or error analysis (e.g., visual quality metrics, masking accuracy, or constraint violation rates), making it impossible to evaluate whether the Kronecker mixing and FSM components deliver the claimed gains over independent compression + encoding pipelines.
minor comments (2)
- [Abstract] The abstract states 'prior approaches treat the two stages independently' but provides no citations to those works or quantitative comparison tables.
- [Abstract] Notation for 'bits per nucleotide' should be defined explicitly (e.g., as information density after constraint encoding) and distinguished from raw token entropy.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address the major comments point by point below, clarifying the optimization procedure and experimental presentation while committing to revisions that strengthen the manuscript without altering its core contributions.
read point-by-point responses
-
Referee: [Abstract, §3] Abstract and §3 (TK-SCONE description): The central claim of 'end-to-end' joint optimization 'simultaneously' learning token distributions for visual quality, masking, and DNA constraints is load-bearing but unsupported without evidence that the FSM-based mapping is replaced by a differentiable surrogate (e.g., Gumbel-softmax or straight-through estimator). Standard FSMs are discrete and block gradient flow from the DNA-synthesis loss back to the token predictor, reducing the method to the two-stage pipeline the paper criticizes.
Authors: We thank the referee for identifying this important technical detail. The referee is correct that a purely discrete FSM would interrupt gradient flow and undermine the end-to-end claim. Our training procedure employs a straight-through estimator (STE) to approximate gradients through the FSM mapping, allowing the DNA-synthesis loss to influence the upstream token predictor. We will revise §3 to explicitly describe the STE formulation, provide the forward/backward equations, and include an ablation that isolates the effect of the differentiable surrogate versus a non-differentiable baseline. This revision will make the joint optimization verifiable and directly address the concern that the method reduces to a two-stage pipeline. revision: yes
-
Referee: [Abstract] Abstract: The reported 1.91 bits per nucleotide is presented without any baseline comparisons, ablation studies, or error analysis (e.g., visual quality metrics, masking accuracy, or constraint violation rates), making it impossible to evaluate whether the Kronecker mixing and FSM components deliver the claimed gains over independent compression + encoding pipelines.
Authors: We agree that the abstract, being concise, does not convey the full experimental context. The main manuscript (Sections 4–5) already contains the requested elements: quantitative comparisons against independent two-stage baselines (standard video codecs followed by separate DNA encoding), ablations on Kronecker mixing and FSM components, and error analyses reporting PSNR/SSIM, masking accuracy, and biochemical constraint violation rates. To improve accessibility, we will augment the abstract with a compact summary of the key comparative gains and ensure the results section prominently features all supporting metrics and ablations. revision: partial
Circularity Check
No significant circularity detected
full rationale
The paper presents HELIX/TK-SCONE as a novel architecture using Kronecker-structured mixing and FSM-based mapping to achieve reported metrics like 1.91 bits per nucleotide. These are framed as experimental outcomes from the proposed design rather than quantities defined tautologically from inputs or prior self-citations. No load-bearing self-citation chains, self-definitional equations, or fitted parameters renamed as predictions appear in the abstract or described claims. The derivation remains self-contained with independent content from the token-based alignment insight and joint optimization objective.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Token-based representations naturally align with DNA's quaternary alphabet
Reference graph
Works this paper leans on
-
[1]
DNA data stor- age for biomedical images using HELIX,
G. Qu, Z. Yan, X. Chen, and H. Wu, “DNA data stor- age for biomedical images using HELIX,”Nature Com- putational Science, pp. 1–8, 2025. 1
2025
-
[2]
Data centres and data transmission networks
International Energy Agency, “Data centres and data transmission networks. ”https://www.iea.org, 2022. 1
2022
-
[3]
The half-life of DNA in bone: Measuring decay kinetics in 158 dated fossils,
M. E. Allentoft, M. Collins, D. Harker,et al., “The half-life of DNA in bone: Measuring decay kinetics in 158 dated fossils,”Proceedings of the Royal Society B, vol. 279, no. 1748, pp. 4724–4733, 2012. 1
2012
-
[4]
Robust chemical preservation of digi- tal information on DNA in silica with error-correcting codes,
R. N. Grass, R. Heckel, M. Puddu, D. Paunescu, and W. J. Stark, “Robust chemical preservation of digi- tal information on DNA in silica with error-correcting codes,”Angewandte Chemie International Edition, vol. 54, no. 8, pp. 2552–2555, 2015. 1
2015
-
[5]
DNA synthesis platform specifica- tions
Twist Bioscience, “DNA synthesis platform specifica- tions. ”https://www.twistbioscience.com, 2023. 1, 6
2023
-
[6]
NovaSeq x series specification sheet
Illumina Inc., “NovaSeq x series specification sheet. ” https://www.illumina.com, 2024. 1, 3, 7
2024
-
[7]
DNA fountain enables a robust and efficient storage architecture,
Y. Erlich and D. Zielinski, “DNA fountain enables a robust and efficient storage architecture,”Science, vol. 355, no. 6328, pp. 950–954, 2017. 1, 2
2017
-
[8]
HEDGES error- correcting code for DNA storage corrects indels and al- lows sequence constraints,
W. H. Press, J. A. Hawkins, S. K. Jones, Jr., J. M. Schaub, and I. J. Finkelstein, “HEDGES error- correcting code for DNA storage corrects indels and al- lows sequence constraints,”Proceedings of the National Academy of Sciences, vol. 117, no. 31, pp. 18489– 18496, 2020. 1, 2, 7
2020
-
[9]
A comprehensive analysis of dna synthesis errors in large-scale dna data storage sys- tems,
E. Berglund, N. Rabet, K. Krampis, S. Karamycheva, and M. Reinders, “A comprehensive analysis of dna synthesis errors in large-scale dna data storage sys- tems,”NARGAB, vol. 3, no. 1, p. lqab019, 2021. 1, 3, 7
2021
-
[10]
Cisco visual networking index: Fore- cast and trends, 2022–2027
Cisco Systems, “Cisco visual networking index: Fore- cast and trends, 2022–2027. ”https://www.cisco.com,
2022
-
[11]
Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,
N. Goldman, P. Bertone, S. Chen, C. Dessimoz, E. M. LeProust, B. Sipos, and E. Birney, “Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,”Nature, vol. 494, no. 7435, pp. 77– 80, 2013. 1, 2, 7
2013
-
[12]
DVC: An end-to-end deep video compression framework,
G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “DVC: An end-to-end deep video compression framework,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11006– 11015, 2019. 2
2019
-
[13]
Modern video coding standards: H.264, h.265, and h.266,
Z.-N. Li, M. S. Drew, and J. Liu, “Modern video coding standards: H.264, h.265, and h.266,” inFundamentals of Multimedia, pp. 423–478, 2021
2021
-
[14]
A software de- coder implementation for h.266/vvc video coding stan- dard,
B. Zhu, Y. Liu, Y.-S. Luo, J. Ye, H. Xu, Y. Huang, H. Jiao, X. Xu, X. Zhang, and S. Liu, “A software de- coder implementation for h.266/vvc video coding stan- dard,”arXiv preprint arXiv:2010.01621, 2020
-
[15]
Video traffic character- istics of modern encoding standards: H.264/avc with svc and mvc extensions and h.265/hevc,
P. Seeling and M. Reisslein, “Video traffic character- istics of modern encoding standards: H.264/avc with svc and mvc extensions and h.265/hevc,”The Scien- tific World Journal, vol. 2014, 2014. 2
2014
-
[16]
Efficient DNA-based image coding and storage,
C. Ruan, R. Han, Y. Li, S. Gao, H. Wu, and N. Ling, “Efficient DNA-based image coding and storage,” in 2023 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5, IEEE, 2023. 2
2023
-
[17]
Robust DNA image storage decoding with residual CNN,
C. Ruan, L. Yang, R. Han, S. Gao, H. Wu, and N. Ling, “Robust DNA image storage decoding with residual CNN,” in2024 IEEE international symposium on cir- cuits and systems (ISCAS), pp. 1–5, IEEE, 2024. 2
2024
-
[18]
Taming transformers for high-resolution image synthesis,
P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high-resolution image synthesis,” in IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pp. 12873–12883, 2021. 2, 3
2021
-
[19]
Finite scalar quantization: Vq-vae made simple.arXiv preprint arXiv:2309.15505, 2023
F. Mentzer, D. Minnen, E. Agustsson, and M. Tschan- nen, “Finite scalar quantization: VQ-VAE made sim- ple,”arXiv preprint arXiv:2309.15505, 2023. 2, 3, 6
-
[20]
HybridFlow-DNA: A Deep Gen- erative Compression Framework for DNA Storage of Images,
C. Ruan, R. Han, S. Gao, L. Lu, W. Jiang, W. Wang, H. Wu, and N. Ling, “HybridFlow-DNA: A Deep Gen- erative Compression Framework for DNA Storage of Images,” in2025 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5, IEEE, 2025. 2
2025
-
[21]
Overview of the high efficiency video coding (HEVC) standard,
G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,”IEEE Transactions on Circuits and Sys- tems for Video Technology, vol. 22, no. 12, pp. 1649– 1668, 2012. 2, 7
2012
-
[22]
Av1 bitstream & decod- ing process specification
Alliance for Open Media, “Av1 bitstream & decod- ing process specification. ”https : / / aomediacodec . github.io/av1-spec/av1-spec.pdf, 2019. Accessed 2025-11-11. 2
2019
-
[23]
Learning-Based Conditional Image Compression,
T. Shen, W.-H. Peng, H.-C. Shih, and Y. Liu, “Learning-Based Conditional Image Compression,” in 2024 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5, 2024. 2
2024
-
[24]
Deep contextual video com- pression,
J. Li, B. Li, and Y. Lu, “Deep contextual video com- pression,” inAdvances in Neural Information Process- ing Systems (NeurIPS), 2021. 2
2021
-
[25]
HD- Compression: hybrid-diffusion image compression for ultra-low bitrates,
L. Lu, Y. Li, Y. Wang, W. Wang, and W. Jiang, “HD- Compression: hybrid-diffusion image compression for ultra-low bitrates,”arXiv preprint arXiv:2502.07160,
-
[26]
Magvit: Masked generative video transformer,
L. Yu, Y. Cheng, K. Sohn, J. Lezama, H. Zhang, H. Chang, A. G. Hauptmann, M.-H. Yang, Y. Hao, I. Essa, and L. Jiang, “Magvit: Masked generative video transformer,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2023. 2, 3
2023
-
[27]
TVC: tokenized video compression with ultra-low bit rate,
L. Zhou, C. Ruan, N. Ling, Z. Chen, W. Wang, and W. Jiang, “TVC: tokenized video compression with ultra-low bit rate,”Visual Intelligence, vol. 3, no. 1, p. 25, 2025. 2
2025
-
[28]
Mage: Masked generative encoder to unify representation learning and image synthesis,
J. Li, R. Rombach, P. Esser, Z. Zhang, T. Brooks, X. Liu, H. Zhang, T. Salimans, J. Ho, B. Poole, M. Norouzi, C. Saharia, and D. J. Fleet, “Mage: Masked generative encoder to unify representation learning and image synthesis,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pp. 21509–21519, 2023. 2, 3
2023
-
[29]
Next- generation digital information storage in DNA,
G. M. Church, Y. Gao, and S. Kosuri, “Next- generation digital information storage in DNA,”Sci- ence, vol. 337, no. 6102, pp. 1628–1628, 2012. 2
2012
-
[30]
Random access in large-scale DNA data storage,
L. Organick, S. D. Ang, Y.-J. Chen, R. Lopez, S. Yekhanin, K. Makarychev, M. Z. Racz, G. Ka- math, P. Gopalan, B. Nguyen, C. N. Takahashi, S. Newman, H.-Y. Parker, C. Rashtchian, K. Stew- art, G. Gupta, R. Carlson, J. Mulligan, D. Carmean, G. Seelig, L. Ceze, and K. Strauss, “Random access in large-scale DNA data storage,”Nature Biotechnology, vol. 36, no...
2018
-
[31]
Deep joint source-channel coding for dna image storage: A novel approach with enhanced error resilience and bi- ological constraint optimization,
W. Wu, L. Xiang, Q. Liu, and K. Yang, “Deep joint source-channel coding for dna image storage: A novel approach with enhanced error resilience and bi- ological constraint optimization,”IEEE Transactions on Molecular, Biological and Multi-Scale Communica- tions, vol. 9, pp. 461–471, 2023. 2
2023
-
[32]
Innse: Invertible neural network-based dna image storage with self-correction encoding,
Y. Zheng and X. Zhang, “Innse: Invertible neural network-based dna image storage with self-correction encoding,”Computational and Structural Biotechnol- ogy Journal, 2025
2025
-
[33]
Molecular- level similarity search brings computing to dna data storage,
C. Bee, Y.-J. Chen, M. Queen, D. Ward, X. Liu, L. Or- ganick, G. Seelig, K. Strauss, and L. Ceze, “Molecular- level similarity search brings computing to dna data storage,”Nature Communications, vol. 12, 2021
2021
-
[34]
A jpeg-based image cod- ing solution for data storage on dna,
M. Dimopoulou, M. Antonini, A. Manohar, R. Ap- puswamy, and P. Frossard, “A jpeg-based image cod- ing solution for data storage on dna,” in2021 29th European Signal Processing Conference (EUSIPCO), pp. 786–790, IEEE, 2021. 2
2021
-
[35]
Information technology — jpeg dna media storage based on dna — part 1: Core coding system,
ISO/IEC, “Information technology — jpeg dna media storage based on dna — part 1: Core coding system,” Tech. Rep. ISO/IEC CD 25508-1, 2025. Under devel- opment. 2
2025
-
[36]
DSI-RESCNN: A framework enhancing the error-tolerance capacity of dna storage for images,
C. Ruan, L. Yang, R. Han, S. Gao, H. Wu, Q. Yuan, Y. Guo, and N. Ling, “DSI-RESCNN: A framework enhancing the error-tolerance capacity of dna storage for images,”IEEE Access, 2025. 2
2025
-
[37]
Hdcompression-dna: Hybrid-diffusion neural image compression via dna storage,
C. Ruan, L. Lu, R. Han, W. Jiang, W. Wang, H. Wu, Q. Yuan, Y. Guo, Y. Wang, and N. Ling, “Hdcompression-dna: Hybrid-diffusion neural image compression via dna storage,” in2025 IEEE Interna- tional Conference on Multimedia and Expo (ICME), pp. 1–6, IEEE, 2025. 2
2025
-
[38]
SCONE: A Practical, Constraint-Aware Plug-in for Latent En- coding in Learned DNA Storage,
C. Ruan, L. Zhou, R. Han, L. Han, B. Zhao, C. Zhu, W. Jiang, W. Wang, and N. Ling, “SCONE: A Practical, Constraint-Aware Plug-in for Latent En- coding in Learned DNA Storage,”arXiv preprint arXiv:2602.06157, 2026. 2, 8
-
[39]
Tactile Information Coding for DNA Stor- age with Prospects for AI Applications,
R. Han, C. Ruan, S. Tang, H. Wu, N. Ling, and H. Zhang, “Tactile Information Coding for DNA Stor- age with Prospects for AI Applications,” in2025 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, IEEE, 2025. 2
2025
-
[40]
Encoding movies and data in dna storage,
N. Goela and J. Bolot, “Encoding movies and data in dna storage,” in2016 Information Theory and Appli- cations Workshop (ITA), pp. 1–1, 2016. 2
2016
-
[41]
Dna information storage for audio and video files,
W. Chen, G. Huang, B. Li, Y. Yin, and Y. Yuan, “Dna information storage for audio and video files,”SCIEN- TIA SINICA Vitae, 2019. 2
2019
-
[42]
Crispr–cas encoding of a digital movie into the genomes of a population of living bacteria,
S. L. Shipman, J. Nivala, J. D. Macklis, and G. M. Church, “Crispr–cas encoding of a digital movie into the genomes of a population of living bacteria,”Na- ture, vol. 547, no. 7663, pp. 345–349, 2017. 2
2017
-
[43]
Now we can store video in living dna,
D. Heaven, “Now we can store video in living dna,” New Scientist, vol. 235, p. 11, 2017. 2
2017
-
[44]
Vsd: A novel method for video segmentation and stor- age in dna using rs code,
J.-K. Hong, A. Rasool, S. Wang, D. Ziou, and Q. Jiang, “Vsd: A novel method for video segmentation and stor- age in dna using rs code,”Mathematics, 2024. 2
2024
-
[45]
Checkerboard context model for efficient learned im- age compression,
D. He, Y. Zheng, B. Sun, Y. Wang, and H. Qin, “Checkerboard context model for efficient learned im- age compression,” in2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14766–14775, 2021. 7
2021
-
[46]
The Kinetics Human Action Video Dataset
W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev,et al., “The kinetics human ac- tion video dataset,”arXiv preprint arXiv:1705.06950,
work page internal anchor Pith review arXiv
- [47]
-
[48]
UVG dataset: 50/120fps 4k sequences for video codec analysis and development,
A. Mercat, M. Viitanen, and J. Vanne, “UVG dataset: 50/120fps 4k sequences for video codec analysis and development,” inACM Multimedia Systems Confer- ence, pp. 297–302, 2020. 7
2020
-
[49]
VideoSet: A large-scale compressed video quality dataset based on JND measurement,
H. Wang, I. Katsavounidis, J. Zhou, J. Park, S. Lei, X. Zhou, M.-O. Pun, X. Jin, R. Wang, X. Wang, Y. Huang, S. Kwong, and C.-C. J. Kuo, “VideoSet: A large-scale compressed video quality dataset based on JND measurement,” 2017. 7
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.