pith. sign in

arxiv: 2605.21553 · v1 · pith:TQKELNH7new · submitted 2026-05-20 · 💻 cs.LG · cs.IT· eess.IV· math.IT

TONIC: Token-Centric Semantic Communication for Task-Oriented Wireless Systems

Pith reviewed 2026-05-22 00:37 UTC · model grok-4.3

classification 💻 cs.LG cs.ITeess.IVmath.IT
keywords semantic communicationtask-oriented wirelesstoken-centric designunequal error protectiontransformer completionfoundation modelsimage classificationwireless channels
0
0 comments X

The pith

A token-centric wireless framework protects the most task-relevant tokens and repairs unreliable ones with a receiver completion model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TONIC to fix the mismatch between bit-accurate wireless transmission and the token sequences that foundation models actually use for tasks. The transmitter turns input samples into tokens, scores each token's importance for the downstream task, and gives stronger protection to high-utility tokens inside a fixed channel-use budget. At the receiver, low-confidence tokens are turned into erasures rather than wrong substitutions, then a Transformer model fills the gaps before the task is completed. Experiments on image classification demonstrate higher accuracy than separation schemes, pixel-domain joint coding, and other token baselines over AWGN, Rayleigh, and Rician channels. A sympathetic reader would care because the design matches scarce radio resources directly to what the model needs instead of treating every bit as equally important.

Core claim

TONIC converts each source sample into tokens, estimates token-level task relevance at the transmitter to apply utility-aware unequal error protection, and at the receiver uses token confidence to gate unreliable outputs into recoverable erasures that a Transformer-based completion model then restores for final task inference, yielding higher accuracy than baselines under matched communication budgets.

What carries the argument

Utility-aware unequal error protection at the transmitter combined with confidence-aware gating and Transformer-based token completion at the receiver.

If this is right

  • Task accuracy rises when protection strength tracks token utility instead of treating all tokens equally.
  • Receiver gating converts likely errors into correctable erasures before completion occurs.
  • The modular split between protection, gating, and completion supports separate tuning of each part.
  • Gains persist over AWGN, Rayleigh, and Rician channels when total channel uses are held constant.
  • The approach outperforms both classical separation methods and end-to-end pixel or token baselines on image classification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same token-relevance idea could be tested on text or multimodal tasks where foundation models already operate on discrete units.
  • Lowering the need for perfect bit recovery might reduce transmit power in energy-constrained edge devices.
  • A direct test would compare end-to-end latency when the completion model runs on-device versus offloading it.

Load-bearing premise

That token-level task relevance can be estimated accurately enough at the transmitter and that the Transformer completion model can reliably restore the masked tokens after gating.

What would settle it

An experiment in which replacing the relevance-based protection with uniform allocation or removing the completion model causes the performance advantage to vanish across the tested channels.

Figures

Figures reproduced from arXiv: 2605.21553 by Kezhi Wang, Sige Liu.

Figure 1
Figure 1. Figure 1: Conventional bit-centric communication versus the proposed token-centric TONIC framework. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Online runtime workflow of TONIC. Let E ∈ R K×D denote the token embedding table, where D is the embedding dimension. The embedding of token position i is ei = E[ti , :] ∈ R D, and the stacked embedding sequence is Z = [e1, . . . , eL] T ∈ R L×D. In TONIC, the communicated object is the discrete token sequence t, while the embedding sequence Z serves as the representation on which token￾utility estimation,… view at source ↗
Figure 3
Figure 3. Figure 3: Mechanism decomposition of TONIC: utility-aware token grouping, confidence-aware gating and erasure shaping, and generative completion for [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Offline learning and calibration pipeline. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: High-utility groups correspond to positions that are [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Accuracy versus SNR under AWGN, Rayleigh fading, and Rician fading. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Accuracy versus communication budget under Rayleigh fading. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Token-level reliability of TONIC under Rayleigh fading. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Task accuracy versus token-level error metrics under Rayleigh fading [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Illustration of the utility-aware grouping and protection mechanism of TONIC for a representative sample. [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Detokenized images along the TONIC recovery pipeline, shown for qualitative intuition only; visual fidelity is not the optimization target. [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
read the original abstract

Tokens are becoming the basic units through which foundation models represent and process information for understanding and inference. However, traditional wireless communication, centered on bit-level fidelity, faces a mismatch between what is transmitted reliably and what downstream models actually consume. This mismatch calls for a communication design that directly accounts for token-level task relevance and downstream model requirements, rather than treating all transmitted bits as equally important. In this paper, we propose TONIC, a token-centric semantic communication framework for task-oriented wireless systems. The transmitter converts each source sample into a sequence of tokens, estimates token-level task relevance, and allocates protection through utility-aware unequal error protection under a fixed channel-use budget. At the receiver, token-level confidence is used to gate unreliable decisions, turning harmful substitutions into recoverable erasures before a Transformer-based completion model restores the masked tokens for final task inference. Our framework combines transmitter-side semantic-aware protection with receiver-side confidence-aware gating in a modular and interpretable architecture, rather than relying solely on fully black-box end-to-end learning. We further establish a utility-aware Bayes-risk interpretation for the receiver-side gating rule and study its interaction with unequal protection and completion. Experimental results on image classification show that TONIC consistently outperforms separation-based schemes, the pixel-domain DeepJSCC baseline, and token-domain baselines under matched communication budgets over AWGN, Rayleigh, and Rician channels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes TONIC, a token-centric semantic communication framework for task-oriented wireless systems. The transmitter converts source samples into token sequences, estimates token-level task relevance, and applies utility-aware unequal error protection under a fixed channel-use budget. At the receiver, token-level confidence gates unreliable decisions (converting substitutions to erasures), after which a Transformer-based completion model restores masked tokens for downstream task inference. The framework is presented as modular and interpretable, with a utility-aware Bayes-risk interpretation of the gating rule. Experiments on image classification tasks report that TONIC consistently outperforms separation-based schemes, pixel-domain DeepJSCC, and token-domain baselines under matched budgets over AWGN, Rayleigh, and Rician channels.

Significance. If the empirical results hold, TONIC advances semantic communication by aligning transmission protection with token-level task relevance and downstream model needs, offering a modular alternative to fully end-to-end learned systems. The explicit utility-aware Bayes-risk interpretation for gating and its interaction with unequal protection provide theoretical grounding that could aid reproducibility and extension. Strengths include the interpretable architecture and evaluation across multiple channel models with matched communication budgets.

major comments (2)
  1. [§5] §5 (Experimental Results): The central claim of consistent outperformance over baselines is load-bearing, yet the provided description remains high-level without specific quantitative metrics (e.g., accuracy deltas, SNR points), error bars, dataset details, or ablation results on the gating and completion modules. This prevents verification of the magnitude and robustness of gains.
  2. [Receiver-side gating] Receiver-side gating and Bayes-risk interpretation (around the utility-aware rule): The interpretation is presented as grounding the gating decision, but it is unclear whether the derivation accounts for estimation errors in transmitter-side token relevance or assumes perfect relevance knowledge; if the latter, this could undermine optimality under realistic channel and estimation conditions.
minor comments (2)
  1. [Abstract] Abstract: While concise, it could briefly note the specific image classification datasets and task metrics used to give readers immediate context for the reported outperformance.
  2. [Notation] Notation and figures: Ensure consistent symbols for token relevance scores, confidence thresholds, and channel-use budgets across text, equations, and diagrams to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential of TONIC to advance semantic communication through its token-centric and modular design. We address each major comment below and indicate the corresponding revisions.

read point-by-point responses
  1. Referee: [§5] §5 (Experimental Results): The central claim of consistent outperformance over baselines is load-bearing, yet the provided description remains high-level without specific quantitative metrics (e.g., accuracy deltas, SNR points), error bars, dataset details, or ablation results on the gating and completion modules. This prevents verification of the magnitude and robustness of gains.

    Authors: We agree that the current presentation of results in §5 is high-level and that additional quantitative detail is required for verification. In the revised manuscript we will expand §5 to report concrete accuracy deltas (e.g., percentage-point gains over each baseline at representative SNR values), error bars obtained from repeated trials, full dataset specifications, and ablation studies that isolate the contributions of the gating rule and the completion model. revision: yes

  2. Referee: [Receiver-side gating] Receiver-side gating and Bayes-risk interpretation (around the utility-aware rule): The interpretation is presented as grounding the gating decision, but it is unclear whether the derivation accounts for estimation errors in transmitter-side token relevance or assumes perfect relevance knowledge; if the latter, this could undermine optimality under realistic channel and estimation conditions.

    Authors: The Bayes-risk derivation is presented under the modeling assumption that token relevance is known when analyzing the optimality of the gating threshold. In the implemented system, relevance is estimated from the source. We will revise the manuscript to explicitly state this modeling assumption, discuss its implications for realistic estimation error, and add a brief sensitivity analysis or performance bound that quantifies degradation under imperfect relevance estimates. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a modular token-centric semantic communication framework consisting of transmitter-side token relevance estimation with unequal error protection and receiver-side confidence gating plus Transformer-based token completion. No equations, derivations, or parameter-fitting steps are described in the abstract or framework overview that reduce by construction to the inputs or to self-citations. Experimental claims of outperformance are based on comparisons against baselines under matched budgets across channels, with no evidence that results are forced by definition or by load-bearing self-citation chains. The architecture is presented as interpretable and independent of fully end-to-end black-box learning.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; framework relies on unstated assumptions about token relevance estimation accuracy and completion model effectiveness.

pith-pipeline@v0.9.0 · 5777 in / 931 out tokens · 41938 ms · 2026-05-22T00:37:37.240664+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdv. Neural Inf. Process. Syst., 2017, pp. 5998–6008

  2. [2]

    Forget bit, it is all about token: Towards semantic information theory for llms,

    B. Bai, “Forget bit, it is all about token: Towards semantic information theory for llms,” 2025, technical report

  3. [3]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inProc. Int. Conf. Learn. Represent. (ICLR), 2021

  4. [4]

    Neural discrete representation learning,

    A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” inAdv. Neural Inf. Process. Syst., 2017, pp. 6306–6315

  5. [5]

    Taming transformers for high- resolution image synthesis,

    P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high- resolution image synthesis,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 12 873–12 883

  6. [6]

    Token communication in the era of large models: An information bottleneck- based approach,

    H. Wei, W. Ni, W. Wang, W. Xu, D. Niyato, and P. Zhang, “Token communication in the era of large models: An information bottleneck- based approach,”IEEE Wireless Commun. Lett., vol. 15, pp. 186–190, Oct. 2026

  7. [7]

    ToDMA: Large model-driven token-domain multiple access for semantic commu- nications,

    L. Qiao, M. B. Mashhadi, Z. Gao, R. Schober, and D. G ¨und¨uz, “ToDMA: Large model-driven token-domain multiple access for semantic commu- nications,” May 2025

  8. [8]

    A mathematical theory of communication,

    C. E. Shannon, “A mathematical theory of communication,”Bell Syst. Tech. J., vol. 27, no. 3–4, pp. 379–423, 623–656, 1948

  9. [9]

    Beyond transmitting bits: Context, seman- tics, and task-oriented communications,

    D. G ¨und¨uz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K.- K. Wong, and C.-B. Chae, “Beyond transmitting bits: Context, seman- tics, and task-oriented communications,”IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 5–41, Jan. 2023

  10. [10]

    Semantic communication: A survey of its theoretical development,

    G. Xin, P. Fan, and K. B. Letaief, “Semantic communication: A survey of its theoretical development,”Entropy, vol. 26, no. 2, p. 102, 2024

  11. [11]

    Maskgit: Masked generative image transformer,

    H. Chang, H. Zhang, L. Jiang, C. Liu, and W. T. Freeman, “Maskgit: Masked generative image transformer,” inProc. IEEE/CVF Conf. Com- put. Vis. Pattern Recognit. (CVPR), 2022, pp. 11 315–11 325

  12. [12]

    High-resolution image synthesis with latent diffusion models,

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 10 684–10 695

  13. [13]

    Diffusion-driven semantic communication for generative models with bandwidth constraints,

    L. Guo, W. Chen, Y . Sun, B. Ai, N. Pappas, and T. Q. S. Quek, “Diffusion-driven semantic communication for generative models with bandwidth constraints,”IEEE Trans. Wireless Commun., vol. 24, no. 8, pp. 6490–6503, Aug. 2025

  14. [14]

    Deep learning enabled semantic communication systems,

    H. Xie, Z. Qin, G. Y . Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,”IEEE Trans. Signal Process., vol. 69, pp. 2663–2675, Apr. 2021

  15. [15]

    Learning task-oriented communication for edge inference: An information bottleneck approach,

    J. Shao, Y . Mao, and J. Zhang, “Learning task-oriented communication for edge inference: An information bottleneck approach,”IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 197–211, Jan. 2022

  16. [16]

    Semantic communication systems for speech transmission,

    Z. Weng and Z. Qin, “Semantic communication systems for speech transmission,”IEEE J. Sel. Areas Commun., vol. 39, no. 8, pp. 2434– 2444, Aug. 2021

  17. [17]

    Task-oriented multi-user semantic communications for VQA task,

    H. Xie, Z. Qin, and G. Y . Li, “Task-oriented multi-user semantic communications for VQA task,”IEEE Wireless Commun. Lett., vol. 11, no. 3, pp. 553–557, 2022

  18. [18]

    Task-oriented explainable semantic communications,

    S. Ma, W. Qiao, Y . Wu, H. Li, G. Shi, D. Gao, Y . Shi, S. Li, and N. Al-Dhahir, “Task-oriented explainable semantic communications,” IEEE Trans. Wireless Commun., vol. 22, no. 12, pp. 9248–9262, 2023

  19. [19]

    Deep joint source- channel coding for wireless image transmission,

    E. Bourtsoulatze, D. B. Kurka, and D. G”und”uz, “Deep joint source- channel coding for wireless image transmission,”IEEE Trans. Cogn. Commun. Netw., vol. 5, no. 3, pp. 567–579, 2019

  20. [20]

    Bandwidth-agile image transmission with deep joint source-channel coding,

    D. B. Kurka and D. G”und”uz, “Bandwidth-agile image transmission with deep joint source-channel coding,”IEEE Trans. Wireless Commun., vol. 20, no. 12, pp. 8081–8095, 2021

  21. [21]

    Deepjscc- q: Constellation constrained deep joint source-channel coding,

    T.-Y . Tung, D. B. Kurka, M. Jankowski, and D. G”und”uz, “Deepjscc- q: Constellation constrained deep joint source-channel coding,”IEEE J. Sel. Areas Inf. Theory, vol. 3, no. 4, pp. 720–731, 2022

  22. [22]

    Deep joint source-channel coding for wireless image transmission with OFDM,

    M. Yang, C. Bian, and H.-S. Kim, “Deep joint source-channel coding for wireless image transmission with OFDM,” inProc. IEEE Int. Conf. Commun. (ICC), 2021, pp. 1–6

  23. [23]

    Swinjscc: Taming Swin transformer for deep joint source-channel coding,

    K. Yang, S. Wang, J. Dai, X. Qin, K. Niu, and P. Zhang, “Swinjscc: Taming Swin transformer for deep joint source-channel coding,”IEEE Trans. Cogn. Commun. Netw., vol. 11, no. 1, pp. 90–104, 2025

  24. [24]

    Joint semantic-channel coding and modulation for token communications,

    J. Ying, Z. Qin, Y . Feng, L. Wang, and X. Tao, “Joint semantic-channel coding and modulation for token communications,”IEEE Trans. Wireless Commun., vol. 25, pp. 8179–8193, 2026

  25. [25]

    Large model empowered multi-modal semantic communication with selective tokens for training,

    J. Peng, H. Xing, Z. Xiao, L. Xu, and X. Lei, “Large model empowered multi-modal semantic communication with selective tokens for training,” IEEE Signal Process. Lett., vol. 32, pp. 2967–2971, 2025

  26. [26]

    Federated learning-enabled hybrid language models for communication-efficient token transmission,

    F. Solat, J. Lee, M. Seif, D. Niyato, and H. V . Poor, “Federated learning-enabled hybrid language models for communication-efficient token transmission,”IEEE Internet Things J., vol. 12, no. 24, pp. 53 574– 53 592, 2025

  27. [27]

    D2-jscc: Digital deep joint source-channel coding for semantic communications,

    J. Huang, K. Yuan, C. Huang, and K. Huang, “D2-jscc: Digital deep joint source-channel coding for semantic communications,” inProc. IEEE Int. Symp. Pers., Indoor, Mobile Radio Commun. (PIMRC), 2024, pp. 1–7

  28. [28]

    Process- and-forward: Deep joint source-channel coding over cooperative relay networks,

    C. Bian, Y . Shao, H. Wu, E. Ozfatura, and D. G”und”uz, “Process- and-forward: Deep joint source-channel coding over cooperative relay networks,”IEEE J. Sel. Areas Commun., vol. 43, no. 4, pp. 1118–1134, 2025

  29. [29]

    Attention-driven semantic transmission scheme for AI-native wireless communications,

    K.-H. Lee, H.-H. Choi, and J.-R. Lee, “Attention-driven semantic transmission scheme for AI-native wireless communications,”IEEE Commun. Lett., vol. 30, pp. 287–291, 2026

  30. [30]

    Language-oriented semantic communication for image transmission with fine-tuned diffusion model,

    X. Wei, H. Tong, N. Yang, and C. Yin, “Language-oriented semantic communication for image transmission with fine-tuned diffusion model,” inProc. 16th Int. Conf. Wireless Commun. Signal Process. (WCSP), 2024

  31. [31]

    Generative semantic communication for joint image transmission and segmentation,

    W. Yuan, J. Ren, C. Wang, R. Zhang, J. Wei, D. I. Kim, and S. Cui, “Generative semantic communication for joint image transmission and segmentation,” inProc. IEEE Int. Conf. Commun. Workshops (ICC Workshops), 2025, pp. 1110–1115

  32. [32]

    Gen- erative semantic communications with foundation models: Perception- error analysis and semantic-aware power allocation,

    C. Xu, M. B. Mashhadi, Y . Ma, R. Tafazolli, and J. Wang, “Gen- erative semantic communications with foundation models: Perception- error analysis and semantic-aware power allocation,”IEEE J. Sel. Areas Commun., vol. 43, no. 7, pp. 2493–2505, 2025