arxiv: 2604.19035 · v1 · submitted 2026-04-21 · 💻 cs.IT · math.IT

Recognition: unknown

LLM-Viterbi: Semantic-Aware Decoding for Convolutional Codes

Zhengtong Li , Chentao Yue , Jiafu Hao , Branka Vucetic , Yonghui Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:16 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords convolutional codesViterbi decodinglarge language modelssemantic communicationAWGN channelserror correctionByT5 modeljoint likelihood decoding

0 comments

The pith

A Viterbi decoder that periodically scores paths with a fine-tuned language model selects transmissions that are both channel-consistent and linguistically coherent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that conventional bit-level Viterbi decoding can be improved for text by keeping multiple candidate paths alive and scoring them with a ByT5 model that has been fine-tuned on clean text. At selected steps the decoder multiplies the usual channel likelihoods by the language-model probability that the path forms a plausible sentence, then prunes to the highest joint-score survivor. On AWGN channels with rate-1/2 convolutional codes of constraint length 3 this produces about 1.5 dB extra coding gain in block error rate and more than 50 percent higher semantic similarity between sent and decoded text. The gain arises because natural language statistics remain strong enough to correct residual bit errors that pure channel metrics cannot resolve. The same joint-likelihood idea is stated to apply to any structured source whose statistics can be captured by a suitable model.

Core claim

The LLM-Viterbi decoder integrates LLM priors into the Viterbi decoding for text transmission over AWGN channels. It maintains multiple candidate paths during the Viterbi decoding and periodically evaluates path reliabilities using a fine-tuned Byte-level T5 (ByT5) language model. By combining channel reliability metrics with semantic probability from the LLM, it outputs the path that maximizes the joint likelihood of channel observations and linguistic coherence.

What carries the argument

The LLM-Viterbi decoder, which augments the standard branch metric with periodic semantic probabilities supplied by a fine-tuned ByT5 model so that surviving paths are ranked by the product of channel likelihood and linguistic coherence.

If this is right

The method yields measurable BLER improvement and semantic similarity gain for short-constraint-length convolutional codes.
The decoder requires no change to the transmitted code or the encoder.
The joint-likelihood framework extends in principle to any data source whose statistics can be modeled by a suitable generative model.
Performance gains are reported for both error-rate and semantic-fidelity metrics on the same transmissions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could reduce required transmit power for a target semantic quality in bandwidth-limited text links.
Similar periodic semantic scoring might be added to other soft-output decoders such as BCJR or list decoding.
The technique suggests a general pattern for hybrid classical-AI receivers whenever the source has strong internal structure that survives partial corruption.

Load-bearing premise

The fine-tuned ByT5 language model can reliably assign higher probability to the correct linguistic sequence than to erroneous alternatives even when those sequences still contain some residual bit errors.

What would settle it

Running the same convolutional code and AWGN channel on a held-out text corpus where the LLM-Viterbi decoder shows equal or higher block error rate than ordinary Viterbi.

Figures

Figures reproduced from arXiv: 2604.19035 by Branka Vucetic, Chentao Yue, Jiafu Hao, Yonghui Li, Zhengtong Li.

**Figure 1.** Figure 1: The trellis and encoder of (1, 7oct/5oct) character-level tokenization where each ti corresponds to a single character, enabling compatibility with the ByT5 language model (see Section III-E). Each character is represented using 8-bit ASCII encoding to form the binary sequence u = (u1, u2, . . . , uL) of length L = 8LT . Note that any other symbol-wise source coding schemes are applicable. Then, u is enco… view at source ↗

**Figure 3.** Figure 3: Prefix-based periodic pruning with N = 5. Paths with the same prefix are grouped and evaluated together at positions j = 5, 10. 1) Periodic Evaluation: Instead of evaluating paths after every character, we perform LLM-based pruning periodically every N characters. Specifically, when the decoder reaches character position j = k·N for k = 1, 2, 3, . . ., we trigger LLM evaluation. Between evaluation points, … view at source ↗

**Figure 4.** Figure 4: Performance comparison and semantic similarity for [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Performance comparison and semantic similarity for [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

read the original abstract

Traditional wireless communications rely solely on bit-level channel coding for error correction, without exploiting the inherent linguistic structure of the data source. This paper proposes a large language model (LLM) Viterbi decoder that integrates LLM priors into the Viterbi decoding for text transmission over AWGN channels. The proposed decoder maintains multiple candidate paths during the Viterbi decoding and periodically evaluates path reliabilities using a fine-tuned Byte-level T5 (ByT5) language model. By combining channel reliability metrics with semantic probability from the LLM, it outputs the path that maximizes the joint likelihood of channel observations and linguistic coherence. Simulations show that our decoder achieves significant performance gains over conventional Viterbi decoding in terms of both block error rate (BLER) and semantic similarity. For convolutional codes with constraint length 3, it achieves approximately 1.5 dB more coding gain in BLER, with over 50% improvements in semantic similarity. The framework can extend to other structured data sources beyond text.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The LLM-Viterbi approach of keeping multiple Viterbi paths and periodically re-scoring them with ByT5 is a clean new combination, but the 1.5 dB BLER gain and semantic improvements rest on an untested claim that the model ranks noisy byte sequences correctly.

read the letter

The paper keeps several surviving paths during Viterbi decoding of convolutional codes and inserts a fine-tuned ByT5 model at intervals to assign linguistic probabilities to those paths. It then selects the path that maximizes the product of channel likelihood and this semantic score. For constraint length 3 codes the simulations report roughly 1.5 dB extra coding gain in block error rate and more than 50 percent better semantic similarity than plain Viterbi on AWGN channels. That is the concrete result worth noting first.

Referee Report

3 major / 2 minor

Summary. The paper proposes LLM-Viterbi, a decoder for convolutional codes over AWGN channels that augments the Viterbi algorithm by maintaining multiple candidate paths and periodically scoring them with a fine-tuned ByT5 language model to incorporate semantic priors. The decoder selects the path maximizing a joint likelihood of channel observations and linguistic coherence, claiming approximately 1.5 dB additional coding gain in BLER and over 50% improvement in semantic similarity for constraint length 3 codes compared to standard Viterbi decoding. The framework is presented as extensible to other structured data sources.

Significance. If the performance gains prove reproducible and the LLM integration remains effective on noisy inputs, the work would represent a meaningful step toward semantic-aware channel decoding by explicitly leveraging linguistic structure in text transmission. It offers a practical mechanism for combining bit-level reliability with higher-level priors and could stimulate further research at the intersection of coding theory and language models.

major comments (3)

[Simulations] Simulations section: the reported 1.5 dB BLER coding gain and 50% semantic-similarity improvement are stated without error bars, number of Monte Carlo trials, or any description of the ByT5 fine-tuning dataset, hyperparameters, or training procedure, rendering the quantitative claims unverifiable from the given information.
[Proposed Decoder] Proposed method: the exact rule for combining channel reliability metrics with the LLM-derived semantic probability score is not specified (no equation or weighting scheme is provided), so it is impossible to determine whether the joint-likelihood selection reduces to standard Viterbi under realistic conditions or how sensitive the gains are to this combination.
[Methodology] Methodology: no experiment or analysis demonstrates that the fine-tuned ByT5 model, when presented with byte sequences containing residual channel errors, continues to assign strictly higher probability to linguistically coherent continuations than to low-Hamming-distance alternatives; this untested link is load-bearing for the claimed gains.

minor comments (2)

[Abstract] Abstract: the phrase 'over 50% improvements in semantic similarity' does not define the similarity metric or the baseline against which the improvement is measured.
[Proposed Decoder] The manuscript would benefit from pseudocode or a clear algorithmic description of the periodic LLM evaluation step within the Viterbi trellis.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We sincerely thank the referee for the detailed and insightful comments on our manuscript. We value the feedback highlighting both the potential impact and areas needing clarification. We address each major comment below and outline the revisions planned for the next version of the paper.

read point-by-point responses

Referee: [Simulations] Simulations section: the reported 1.5 dB BLER coding gain and 50% semantic-similarity improvement are stated without error bars, number of Monte Carlo trials, or any description of the ByT5 fine-tuning dataset, hyperparameters, or training procedure, rendering the quantitative claims unverifiable from the given information.

Authors: We agree that the simulations section requires additional details to ensure reproducibility and verifiability. In the revised manuscript, we will augment the Simulations section with error bars on all performance curves, explicitly state the number of Monte Carlo trials performed for each SNR point, and provide a full description of the ByT5 fine-tuning dataset (including source, size, and preprocessing steps), along with the training hyperparameters and procedure. These additions will directly address the verifiability concern. revision: yes
Referee: [Proposed Decoder] Proposed method: the exact rule for combining channel reliability metrics with the LLM-derived semantic probability score is not specified (no equation or weighting scheme is provided), so it is impossible to determine whether the joint-likelihood selection reduces to standard Viterbi under realistic conditions or how sensitive the gains are to this combination.

Authors: We acknowledge that the combination mechanism was presented at a high level without an explicit equation. In the revised manuscript, we will add a precise mathematical definition of the joint path metric, specifying how the standard Viterbi channel reliability term is combined with the LLM-derived semantic score (including the weighting coefficient and its selection rationale). This formulation will clarify the selection rule and permit direct analysis of its relation to conventional Viterbi and sensitivity to the weighting parameter. revision: yes
Referee: [Methodology] Methodology: no experiment or analysis demonstrates that the fine-tuned ByT5 model, when presented with byte sequences containing residual channel errors, continues to assign strictly higher probability to linguistically coherent continuations than to low-Hamming-distance alternatives; this untested link is load-bearing for the claimed gains.

Authors: This is a substantive point concerning the robustness of the LLM prior under noise. While the end-to-end results support the overall approach, the manuscript does not contain a dedicated isolation experiment for the LLM's behavior on noisy byte sequences. In the revision, we will include a new analysis or figure that evaluates the fine-tuned ByT5 probabilities on pairs of byte sequences (coherent vs. low-Hamming-distance incoherent) with injected residual errors, thereby directly testing the load-bearing assumption. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical gains are simulation-based and independent of internal fits

full rationale

The paper defines a joint-likelihood decoder that augments standard Viterbi path metrics with scores from a separately fine-tuned ByT5 model. No equation reduces the reported 1.5 dB BLER gain or semantic-similarity improvement to a quantity that was fitted inside the same experiment; the LLM component is trained on clean text outside the decoding trials, and the comparison baseline is the unmodified Viterbi algorithm. No self-citations are invoked as load-bearing uniqueness theorems, no ansatz is smuggled, and no known result is merely renamed. The derivation chain therefore remains non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the unproven effectiveness of the LLM semantic scorer when applied to noisy candidate sequences; no free parameters, axioms, or invented entities are explicitly listed in the abstract.

invented entities (1)

LLM semantic probability score no independent evidence
purpose: To quantify linguistic coherence of candidate decoded paths
The ByT5 model is invoked as an external prior but no independent validation of its accuracy on noisy inputs is supplied in the abstract.

pith-pipeline@v0.9.0 · 5477 in / 1216 out tokens · 136532 ms · 2026-05-10T02:16:27.475783+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Semantic Ordered Statistics Decoding
cs.IT 2026-05 unverdicted novelty 7.0

Sem-OSD injects byte-level LM priors into OSD via fused scoring and dual TEP families, achieving BLER below finite-blocklength bounds and 1.5 dB gain over Fossorier OSD on BCH and RS codes.

Reference graph

Works this paper leans on

16 extracted references · 2 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

A mathematical theory of communication,

C. E. Shannon, “A mathematical theory of communication,”The Bell system technical journal, vol. 27, no. 3, pp. 379–423, 1948

1948
[2]

Coding for noisy channels,

P. Elias, “Coding for noisy channels,” inIRE WESCON Convention Record, 1955, vol. 2, 1955, pp. 94–104

1955
[3]

Low-density parity-check codes,

R. Gallager, “Low-density parity-check codes,”IRE Transactions on information theory, vol. 8, no. 1, pp. 21–28, 2003

2003
[4]

Source-controlled channel decoding,

J. Hagenauer, “Source-controlled channel decoding,”IEEE transactions on Communications, vol. 43, no. 9, pp. 2449–2457, 2002

2002
[5]

Deep learning enabled seman- tic communication systems,

H. Xie, Z. Qin, G. Y . Li, and B.-H. Juang, “Deep learning enabled seman- tic communication systems,”IEEE transactions on signal processing, vol. 69, pp. 2663–2675, 2021

2021
[6]

Beyond transmitting bits: Context, semantics, and task-oriented communications,

D. G ¨und¨uz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K. K. Wong, and C.-B. Chae, “Beyond transmitting bits: Context, semantics, and task-oriented communications,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 1, pp. 5–41, 2022

2022
[7]

Deep joint source- channel coding for wireless image transmission,

E. Bourtsoulatze, D. B. Kurka, and D. G ¨und¨uz, “Deep joint source- channel coding for wireless image transmission,”IEEE Transactions on Cognitive Communications and Networking, vol. 5, no. 3, pp. 567–579, 2019

2019
[8]

Generative ai for physical layer commu- nications: A survey,

N. Van Huynh, J. Wang, H. Du, D. T. Hoang, D. Niyato, D. N. Nguyen, D. I. Kim, and K. B. Letaief, “Generative ai for physical layer commu- nications: A survey,”IEEE Transactions on Cognitive Communications and Networking, vol. 10, no. 3, pp. 706–728, 2024

2024
[9]

Short wins long: Short codes with language model semantic correction outperform long codes,

J. Hao, C. Yue, H. Chang, B. Vucetic, and Y . Li, “Short wins long: Short codes with language model semantic correction outperform long codes,” arXiv preprint arXiv:2505.08536, 2025

work page arXiv 2025
[10]

List viterbi decoding algorithms with applications,

N. Seshadri and C.-E. Sundberg, “List viterbi decoding algorithms with applications,”IEEE transactions on communications, vol. 42, no. 234, pp. 313–323, 1994

1994
[11]

Byt5: Towards a token-free future with pre-trained byte-to-byte models,

L. Xue, A. Barua, N. Constant, R. Al-Rfou, S. Narang, M. Kale, A. Roberts, and C. Raffel, “Byt5: Towards a token-free future with pre-trained byte-to-byte models,”Transactions of the Association for Computational Linguistics, vol. 10, pp. 291–306, 2022

2022
[12]

Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,

A. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,”IEEE transactions on Information Theory, vol. 13, no. 2, pp. 260–269, 2003

2003
[13]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,”arXiv preprint arXiv:1908.10084, 2019

work page internal anchor Pith review arXiv 1908
[14]

A large annotated corpus for learning natural language inference,

S. Bowman, G. Angeli, C. Potts, and C. D. Manning, “A large annotated corpus for learning natural language inference,” inProceedings of the 2015 conference on empirical methods in natural language processing, 2015, pp. 632–642

2015
[15]

Transformers: State- of-the-art natural language processing,

T. Wolf, L. Debut, V . Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowiczet al., “Transformers: State- of-the-art natural language processing,” inProceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 2020, pp. 38–45

2020
[16]

Pytorch: An imperative style, high-performance deep learning library,

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antigaet al., “Pytorch: An imperative style, high-performance deep learning library,”Advances in neural information processing systems, vol. 32, 2019

2019