A full device-resident GPU LZ77 decoder for genomics reaches 260 GB/s throughput, 0.362 ms random read access, and range decoding for 50 GB files while remaining bit-perfect.
ACEAPEX: Parallel LZ77 Decoding via Encode-Time Absolute Offset Resolution
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
LZ77-based codecs exhibit a fundamental sequential bottleneck in decoding: each back-reference depends on previously decompressed data, preventing multi-core scaling. We present ACEAPEX, a parallel LZ77 codec that stores all back-references as absolute positions in the decompressed output and organizes data into self-contained 1 MB blocks, enabling embarrassingly parallel block-level decoding. Integrated into lzbench, ACEAPEX achieves 10,160 MB/s on EPYC 4344P (8 cores) and 10,869 MB/s on EPYC 9575F for FASTQ genomic data -- up to 3.1x faster than zstd -3 at comparable compression ratios. We further implement a GPU wavefront decoder on NVIDIA H100 SXM, measuring 44.0 GB/s on enwik9 and 20.3 GB/s on FASTQ (wavefront match phase, BIT-PERFECT verified). With a depth-limited encoder variant (-1.5% ratio on enwik9), GPU throughput reaches 77.2 GB/s on a single H100 and 249.9 GB/s on two H100s in NVLink configuration. To our knowledge, this is the first reported GPU LZ77 decode with near-standard compression ratio verified byte-for-byte.
fields
cs.DC 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Absolute-offset design enables unified position-invariant random access through entropy and match compression layers with one coordinate and bit-perfect verification.
citing papers explorer
-
Compressed-Resident Genomics: Full-Pipeline Device-Resident GPU LZ77 Decode with Position-Invariant Random Access
A full device-resident GPU LZ77 decoder for genomics reaches 260 GB/s throughput, 0.362 ms random read access, and range decoding for 50 GB files while remaining bit-perfect.
-
Unified Position-Invariant Random Access Through Two Compression Layers via Absolute-Offset Coordinates: A Bit-Perfect Device-Resident Proof
Absolute-offset design enables unified position-invariant random access through entropy and match compression layers with one coordinate and bit-perfect verification.