TONIC: Token-Centric Semantic Communication for Task-Oriented Wireless Systems

Kezhi Wang; Sige Liu

arxiv: 2605.21553 · v1 · pith:TQKELNH7new · submitted 2026-05-20 · 💻 cs.LG · cs.IT· eess.IV· math.IT

TONIC: Token-Centric Semantic Communication for Task-Oriented Wireless Systems

Sige Liu , Kezhi Wang This is my paper

Pith reviewed 2026-05-22 00:37 UTC · model grok-4.3

classification 💻 cs.LG cs.ITeess.IVmath.IT

keywords semantic communicationtask-oriented wirelesstoken-centric designunequal error protectiontransformer completionfoundation modelsimage classificationwireless channels

0 comments

The pith

A token-centric wireless framework protects the most task-relevant tokens and repairs unreliable ones with a receiver completion model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TONIC to fix the mismatch between bit-accurate wireless transmission and the token sequences that foundation models actually use for tasks. The transmitter turns input samples into tokens, scores each token's importance for the downstream task, and gives stronger protection to high-utility tokens inside a fixed channel-use budget. At the receiver, low-confidence tokens are turned into erasures rather than wrong substitutions, then a Transformer model fills the gaps before the task is completed. Experiments on image classification demonstrate higher accuracy than separation schemes, pixel-domain joint coding, and other token baselines over AWGN, Rayleigh, and Rician channels. A sympathetic reader would care because the design matches scarce radio resources directly to what the model needs instead of treating every bit as equally important.

Core claim

TONIC converts each source sample into tokens, estimates token-level task relevance at the transmitter to apply utility-aware unequal error protection, and at the receiver uses token confidence to gate unreliable outputs into recoverable erasures that a Transformer-based completion model then restores for final task inference, yielding higher accuracy than baselines under matched communication budgets.

What carries the argument

Utility-aware unequal error protection at the transmitter combined with confidence-aware gating and Transformer-based token completion at the receiver.

If this is right

Task accuracy rises when protection strength tracks token utility instead of treating all tokens equally.
Receiver gating converts likely errors into correctable erasures before completion occurs.
The modular split between protection, gating, and completion supports separate tuning of each part.
Gains persist over AWGN, Rayleigh, and Rician channels when total channel uses are held constant.
The approach outperforms both classical separation methods and end-to-end pixel or token baselines on image classification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same token-relevance idea could be tested on text or multimodal tasks where foundation models already operate on discrete units.
Lowering the need for perfect bit recovery might reduce transmit power in energy-constrained edge devices.
A direct test would compare end-to-end latency when the completion model runs on-device versus offloading it.

Load-bearing premise

That token-level task relevance can be estimated accurately enough at the transmitter and that the Transformer completion model can reliably restore the masked tokens after gating.

What would settle it

An experiment in which replacing the relevance-based protection with uniform allocation or removing the completion model causes the performance advantage to vanish across the tested channels.

Figures

Figures reproduced from arXiv: 2605.21553 by Kezhi Wang, Sige Liu.

**Figure 2.** Figure 2: Online runtime workflow of TONIC. Let E ∈ R K×D denote the token embedding table, where D is the embedding dimension. The embedding of token position i is ei = E[ti , :] ∈ R D, and the stacked embedding sequence is Z = [e1, . . . , eL] T ∈ R L×D. In TONIC, the communicated object is the discrete token sequence t, while the embedding sequence Z serves as the representation on which tokenutility estimation,… view at source ↗

**Figure 3.** Figure 3: Mechanism decomposition of TONIC: utility-aware token grouping, confidence-aware gating and erasure shaping, and generative completion for [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Offline learning and calibration pipeline. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 3.** Figure 3: High-utility groups correspond to positions that are [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: Accuracy versus SNR under AWGN, Rayleigh fading, and Rician fading. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Accuracy versus communication budget under Rayleigh fading. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Token-level reliability of TONIC under Rayleigh fading. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Task accuracy versus token-level error metrics under Rayleigh fading [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Illustration of the utility-aware grouping and protection mechanism of TONIC for a representative sample. [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Detokenized images along the TONIC recovery pipeline, shown for qualitative intuition only; visual fidelity is not the optimization target. [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗

read the original abstract

Tokens are becoming the basic units through which foundation models represent and process information for understanding and inference. However, traditional wireless communication, centered on bit-level fidelity, faces a mismatch between what is transmitted reliably and what downstream models actually consume. This mismatch calls for a communication design that directly accounts for token-level task relevance and downstream model requirements, rather than treating all transmitted bits as equally important. In this paper, we propose TONIC, a token-centric semantic communication framework for task-oriented wireless systems. The transmitter converts each source sample into a sequence of tokens, estimates token-level task relevance, and allocates protection through utility-aware unequal error protection under a fixed channel-use budget. At the receiver, token-level confidence is used to gate unreliable decisions, turning harmful substitutions into recoverable erasures before a Transformer-based completion model restores the masked tokens for final task inference. Our framework combines transmitter-side semantic-aware protection with receiver-side confidence-aware gating in a modular and interpretable architecture, rather than relying solely on fully black-box end-to-end learning. We further establish a utility-aware Bayes-risk interpretation for the receiver-side gating rule and study its interaction with unequal protection and completion. Experimental results on image classification show that TONIC consistently outperforms separation-based schemes, the pixel-domain DeepJSCC baseline, and token-domain baselines under matched communication budgets over AWGN, Rayleigh, and Rician channels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TONIC frames a modular token-level semantic comm system that pairs transmitter relevance estimation and unequal protection with receiver gating plus Transformer completion, reporting gains over baselines on image classification.

read the letter

The main thing to know is that TONIC tries to close the gap between bit-level wireless transmission and what foundation models actually use by working at the token level in a modular way rather than a fully end-to-end learned one. The transmitter turns samples into tokens, scores their task relevance, and applies utility-aware unequal error protection within a fixed channel-use budget. The receiver then uses per-token to gate bad decisions into erasures and feeds the result to a Transformer completion model before final inference. They also give the gating rule a Bayes-risk interpretation that ties it to the protection choices. On the experiments, the paper shows TONIC beating separation schemes, pixel-domain DeepJSCC, and other token baselines across AWGN, Rayleigh, and Rician channels under matched budgets for image classification tasks. The modular split and the explicit utility view are the clearest advances over prior semantic communication work that tends to stay black-box. The setup is straightforward to follow and the channel models plus task are standard, which makes the comparisons easy to interpret. The soft spots sit mainly in the evaluation details and assumptions. The abstract claims consistent outperformance but the visible text gives no numbers, error bars, dataset sizes, or ablation tables, so the size and robustness of the gains are hard to judge without the full results section. The central assumptions—that token relevance can be estimated accurately at the transmitter and that the completion model can reliably recover masked tokens—look plausible for classification but could be brittle on other tasks or noisier channels. If the full paper supplies solid ablations and statistical support, those concerns shrink; otherwise they remain the main things to check. This paper is aimed at people working on task-oriented wireless systems and semantic communication for AI models. A reader who wants interpretable designs that mix protection and post-processing rather than pure learned pipelines would get the most out of it. I would send it to peer review. The framework is clearly motivated, the architecture is reproducible in principle, and the experimental scope is broad enough to merit referee time even if more quantitative detail is needed.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes TONIC, a token-centric semantic communication framework for task-oriented wireless systems. The transmitter converts source samples into token sequences, estimates token-level task relevance, and applies utility-aware unequal error protection under a fixed channel-use budget. At the receiver, token-level confidence gates unreliable decisions (converting substitutions to erasures), after which a Transformer-based completion model restores masked tokens for downstream task inference. The framework is presented as modular and interpretable, with a utility-aware Bayes-risk interpretation of the gating rule. Experiments on image classification tasks report that TONIC consistently outperforms separation-based schemes, pixel-domain DeepJSCC, and token-domain baselines under matched budgets over AWGN, Rayleigh, and Rician channels.

Significance. If the empirical results hold, TONIC advances semantic communication by aligning transmission protection with token-level task relevance and downstream model needs, offering a modular alternative to fully end-to-end learned systems. The explicit utility-aware Bayes-risk interpretation for gating and its interaction with unequal protection provide theoretical grounding that could aid reproducibility and extension. Strengths include the interpretable architecture and evaluation across multiple channel models with matched communication budgets.

major comments (2)

[§5] §5 (Experimental Results): The central claim of consistent outperformance over baselines is load-bearing, yet the provided description remains high-level without specific quantitative metrics (e.g., accuracy deltas, SNR points), error bars, dataset details, or ablation results on the gating and completion modules. This prevents verification of the magnitude and robustness of gains.
[Receiver-side gating] Receiver-side gating and Bayes-risk interpretation (around the utility-aware rule): The interpretation is presented as grounding the gating decision, but it is unclear whether the derivation accounts for estimation errors in transmitter-side token relevance or assumes perfect relevance knowledge; if the latter, this could undermine optimality under realistic channel and estimation conditions.

minor comments (2)

[Abstract] Abstract: While concise, it could briefly note the specific image classification datasets and task metrics used to give readers immediate context for the reported outperformance.
[Notation] Notation and figures: Ensure consistent symbols for token relevance scores, confidence thresholds, and channel-use budgets across text, equations, and diagrams to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential of TONIC to advance semantic communication through its token-centric and modular design. We address each major comment below and indicate the corresponding revisions.

read point-by-point responses

Referee: [§5] §5 (Experimental Results): The central claim of consistent outperformance over baselines is load-bearing, yet the provided description remains high-level without specific quantitative metrics (e.g., accuracy deltas, SNR points), error bars, dataset details, or ablation results on the gating and completion modules. This prevents verification of the magnitude and robustness of gains.

Authors: We agree that the current presentation of results in §5 is high-level and that additional quantitative detail is required for verification. In the revised manuscript we will expand §5 to report concrete accuracy deltas (e.g., percentage-point gains over each baseline at representative SNR values), error bars obtained from repeated trials, full dataset specifications, and ablation studies that isolate the contributions of the gating rule and the completion model. revision: yes
Referee: [Receiver-side gating] Receiver-side gating and Bayes-risk interpretation (around the utility-aware rule): The interpretation is presented as grounding the gating decision, but it is unclear whether the derivation accounts for estimation errors in transmitter-side token relevance or assumes perfect relevance knowledge; if the latter, this could undermine optimality under realistic channel and estimation conditions.

Authors: The Bayes-risk derivation is presented under the modeling assumption that token relevance is known when analyzing the optimality of the gating threshold. In the implemented system, relevance is estimated from the source. We will revise the manuscript to explicitly state this modeling assumption, discuss its implications for realistic estimation error, and add a brief sensitivity analysis or performance bound that quantifies degradation under imperfect relevance estimates. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a modular token-centric semantic communication framework consisting of transmitter-side token relevance estimation with unequal error protection and receiver-side confidence gating plus Transformer-based token completion. No equations, derivations, or parameter-fitting steps are described in the abstract or framework overview that reduce by construction to the inputs or to self-citations. Experimental claims of outperformance are based on comparisons against baselines under matched budgets across channels, with no evidence that results are forced by definition or by load-bearing self-citation chains. The architecture is presented as interpretable and independent of fully end-to-end black-box learning.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; framework relies on unstated assumptions about token relevance estimation accuracy and completion model effectiveness.

pith-pipeline@v0.9.0 · 5777 in / 931 out tokens · 41938 ms · 2026-05-22T00:37:37.240664+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdv. Neural Inf. Process. Syst., 2017, pp. 5998–6008

work page 2017
[2]

Forget bit, it is all about token: Towards semantic information theory for llms,

B. Bai, “Forget bit, it is all about token: Towards semantic information theory for llms,” 2025, technical report

work page 2025
[3]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inProc. Int. Conf. Learn. Represent. (ICLR), 2021

work page 2021
[4]

Neural discrete representation learning,

A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” inAdv. Neural Inf. Process. Syst., 2017, pp. 6306–6315

work page 2017
[5]

Taming transformers for high- resolution image synthesis,

P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high- resolution image synthesis,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 12 873–12 883

work page 2021
[6]

Token communication in the era of large models: An information bottleneck- based approach,

H. Wei, W. Ni, W. Wang, W. Xu, D. Niyato, and P. Zhang, “Token communication in the era of large models: An information bottleneck- based approach,”IEEE Wireless Commun. Lett., vol. 15, pp. 186–190, Oct. 2026

work page 2026
[7]

ToDMA: Large model-driven token-domain multiple access for semantic commu- nications,

L. Qiao, M. B. Mashhadi, Z. Gao, R. Schober, and D. G ¨und¨uz, “ToDMA: Large model-driven token-domain multiple access for semantic commu- nications,” May 2025

work page 2025
[8]

A mathematical theory of communication,

C. E. Shannon, “A mathematical theory of communication,”Bell Syst. Tech. J., vol. 27, no. 3–4, pp. 379–423, 623–656, 1948

work page 1948
[9]

Beyond transmitting bits: Context, seman- tics, and task-oriented communications,

D. G ¨und¨uz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K.- K. Wong, and C.-B. Chae, “Beyond transmitting bits: Context, seman- tics, and task-oriented communications,”IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 5–41, Jan. 2023

work page 2023
[10]

Semantic communication: A survey of its theoretical development,

G. Xin, P. Fan, and K. B. Letaief, “Semantic communication: A survey of its theoretical development,”Entropy, vol. 26, no. 2, p. 102, 2024

work page 2024
[11]

Maskgit: Masked generative image transformer,

H. Chang, H. Zhang, L. Jiang, C. Liu, and W. T. Freeman, “Maskgit: Masked generative image transformer,” inProc. IEEE/CVF Conf. Com- put. Vis. Pattern Recognit. (CVPR), 2022, pp. 11 315–11 325

work page 2022
[12]

High-resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 10 684–10 695

work page 2022
[13]

Diffusion-driven semantic communication for generative models with bandwidth constraints,

L. Guo, W. Chen, Y . Sun, B. Ai, N. Pappas, and T. Q. S. Quek, “Diffusion-driven semantic communication for generative models with bandwidth constraints,”IEEE Trans. Wireless Commun., vol. 24, no. 8, pp. 6490–6503, Aug. 2025

work page 2025
[14]

Deep learning enabled semantic communication systems,

H. Xie, Z. Qin, G. Y . Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,”IEEE Trans. Signal Process., vol. 69, pp. 2663–2675, Apr. 2021

work page 2021
[15]

Learning task-oriented communication for edge inference: An information bottleneck approach,

J. Shao, Y . Mao, and J. Zhang, “Learning task-oriented communication for edge inference: An information bottleneck approach,”IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 197–211, Jan. 2022

work page 2022
[16]

Semantic communication systems for speech transmission,

Z. Weng and Z. Qin, “Semantic communication systems for speech transmission,”IEEE J. Sel. Areas Commun., vol. 39, no. 8, pp. 2434– 2444, Aug. 2021

work page 2021
[17]

Task-oriented multi-user semantic communications for VQA task,

H. Xie, Z. Qin, and G. Y . Li, “Task-oriented multi-user semantic communications for VQA task,”IEEE Wireless Commun. Lett., vol. 11, no. 3, pp. 553–557, 2022

work page 2022
[18]

Task-oriented explainable semantic communications,

S. Ma, W. Qiao, Y . Wu, H. Li, G. Shi, D. Gao, Y . Shi, S. Li, and N. Al-Dhahir, “Task-oriented explainable semantic communications,” IEEE Trans. Wireless Commun., vol. 22, no. 12, pp. 9248–9262, 2023

work page 2023
[19]

Deep joint source- channel coding for wireless image transmission,

E. Bourtsoulatze, D. B. Kurka, and D. G”und”uz, “Deep joint source- channel coding for wireless image transmission,”IEEE Trans. Cogn. Commun. Netw., vol. 5, no. 3, pp. 567–579, 2019

work page 2019
[20]

Bandwidth-agile image transmission with deep joint source-channel coding,

D. B. Kurka and D. G”und”uz, “Bandwidth-agile image transmission with deep joint source-channel coding,”IEEE Trans. Wireless Commun., vol. 20, no. 12, pp. 8081–8095, 2021

work page 2021
[21]

Deepjscc- q: Constellation constrained deep joint source-channel coding,

T.-Y . Tung, D. B. Kurka, M. Jankowski, and D. G”und”uz, “Deepjscc- q: Constellation constrained deep joint source-channel coding,”IEEE J. Sel. Areas Inf. Theory, vol. 3, no. 4, pp. 720–731, 2022

work page 2022
[22]

Deep joint source-channel coding for wireless image transmission with OFDM,

M. Yang, C. Bian, and H.-S. Kim, “Deep joint source-channel coding for wireless image transmission with OFDM,” inProc. IEEE Int. Conf. Commun. (ICC), 2021, pp. 1–6

work page 2021
[23]

Swinjscc: Taming Swin transformer for deep joint source-channel coding,

K. Yang, S. Wang, J. Dai, X. Qin, K. Niu, and P. Zhang, “Swinjscc: Taming Swin transformer for deep joint source-channel coding,”IEEE Trans. Cogn. Commun. Netw., vol. 11, no. 1, pp. 90–104, 2025

work page 2025
[24]

Joint semantic-channel coding and modulation for token communications,

J. Ying, Z. Qin, Y . Feng, L. Wang, and X. Tao, “Joint semantic-channel coding and modulation for token communications,”IEEE Trans. Wireless Commun., vol. 25, pp. 8179–8193, 2026

work page 2026
[25]

Large model empowered multi-modal semantic communication with selective tokens for training,

J. Peng, H. Xing, Z. Xiao, L. Xu, and X. Lei, “Large model empowered multi-modal semantic communication with selective tokens for training,” IEEE Signal Process. Lett., vol. 32, pp. 2967–2971, 2025

work page 2025
[26]

Federated learning-enabled hybrid language models for communication-efficient token transmission,

F. Solat, J. Lee, M. Seif, D. Niyato, and H. V . Poor, “Federated learning-enabled hybrid language models for communication-efficient token transmission,”IEEE Internet Things J., vol. 12, no. 24, pp. 53 574– 53 592, 2025

work page 2025
[27]

D2-jscc: Digital deep joint source-channel coding for semantic communications,

J. Huang, K. Yuan, C. Huang, and K. Huang, “D2-jscc: Digital deep joint source-channel coding for semantic communications,” inProc. IEEE Int. Symp. Pers., Indoor, Mobile Radio Commun. (PIMRC), 2024, pp. 1–7

work page 2024
[28]

Process- and-forward: Deep joint source-channel coding over cooperative relay networks,

C. Bian, Y . Shao, H. Wu, E. Ozfatura, and D. G”und”uz, “Process- and-forward: Deep joint source-channel coding over cooperative relay networks,”IEEE J. Sel. Areas Commun., vol. 43, no. 4, pp. 1118–1134, 2025

work page 2025
[29]

Attention-driven semantic transmission scheme for AI-native wireless communications,

K.-H. Lee, H.-H. Choi, and J.-R. Lee, “Attention-driven semantic transmission scheme for AI-native wireless communications,”IEEE Commun. Lett., vol. 30, pp. 287–291, 2026

work page 2026
[30]

Language-oriented semantic communication for image transmission with fine-tuned diffusion model,

X. Wei, H. Tong, N. Yang, and C. Yin, “Language-oriented semantic communication for image transmission with fine-tuned diffusion model,” inProc. 16th Int. Conf. Wireless Commun. Signal Process. (WCSP), 2024

work page 2024
[31]

Generative semantic communication for joint image transmission and segmentation,

W. Yuan, J. Ren, C. Wang, R. Zhang, J. Wei, D. I. Kim, and S. Cui, “Generative semantic communication for joint image transmission and segmentation,” inProc. IEEE Int. Conf. Commun. Workshops (ICC Workshops), 2025, pp. 1110–1115

work page 2025
[32]

Gen- erative semantic communications with foundation models: Perception- error analysis and semantic-aware power allocation,

C. Xu, M. B. Mashhadi, Y . Ma, R. Tafazolli, and J. Wang, “Gen- erative semantic communications with foundation models: Perception- error analysis and semantic-aware power allocation,”IEEE J. Sel. Areas Commun., vol. 43, no. 7, pp. 2493–2505, 2025

work page 2025

[1] [1]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdv. Neural Inf. Process. Syst., 2017, pp. 5998–6008

work page 2017

[2] [2]

Forget bit, it is all about token: Towards semantic information theory for llms,

B. Bai, “Forget bit, it is all about token: Towards semantic information theory for llms,” 2025, technical report

work page 2025

[3] [3]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inProc. Int. Conf. Learn. Represent. (ICLR), 2021

work page 2021

[4] [4]

Neural discrete representation learning,

A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” inAdv. Neural Inf. Process. Syst., 2017, pp. 6306–6315

work page 2017

[5] [5]

Taming transformers for high- resolution image synthesis,

P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high- resolution image synthesis,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 12 873–12 883

work page 2021

[6] [6]

Token communication in the era of large models: An information bottleneck- based approach,

H. Wei, W. Ni, W. Wang, W. Xu, D. Niyato, and P. Zhang, “Token communication in the era of large models: An information bottleneck- based approach,”IEEE Wireless Commun. Lett., vol. 15, pp. 186–190, Oct. 2026

work page 2026

[7] [7]

ToDMA: Large model-driven token-domain multiple access for semantic commu- nications,

L. Qiao, M. B. Mashhadi, Z. Gao, R. Schober, and D. G ¨und¨uz, “ToDMA: Large model-driven token-domain multiple access for semantic commu- nications,” May 2025

work page 2025

[8] [8]

A mathematical theory of communication,

C. E. Shannon, “A mathematical theory of communication,”Bell Syst. Tech. J., vol. 27, no. 3–4, pp. 379–423, 623–656, 1948

work page 1948

[9] [9]

Beyond transmitting bits: Context, seman- tics, and task-oriented communications,

D. G ¨und¨uz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K.- K. Wong, and C.-B. Chae, “Beyond transmitting bits: Context, seman- tics, and task-oriented communications,”IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 5–41, Jan. 2023

work page 2023

[10] [10]

Semantic communication: A survey of its theoretical development,

G. Xin, P. Fan, and K. B. Letaief, “Semantic communication: A survey of its theoretical development,”Entropy, vol. 26, no. 2, p. 102, 2024

work page 2024

[11] [11]

Maskgit: Masked generative image transformer,

H. Chang, H. Zhang, L. Jiang, C. Liu, and W. T. Freeman, “Maskgit: Masked generative image transformer,” inProc. IEEE/CVF Conf. Com- put. Vis. Pattern Recognit. (CVPR), 2022, pp. 11 315–11 325

work page 2022

[12] [12]

High-resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 10 684–10 695

work page 2022

[13] [13]

Diffusion-driven semantic communication for generative models with bandwidth constraints,

L. Guo, W. Chen, Y . Sun, B. Ai, N. Pappas, and T. Q. S. Quek, “Diffusion-driven semantic communication for generative models with bandwidth constraints,”IEEE Trans. Wireless Commun., vol. 24, no. 8, pp. 6490–6503, Aug. 2025

work page 2025

[14] [14]

Deep learning enabled semantic communication systems,

H. Xie, Z. Qin, G. Y . Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,”IEEE Trans. Signal Process., vol. 69, pp. 2663–2675, Apr. 2021

work page 2021

[15] [15]

Learning task-oriented communication for edge inference: An information bottleneck approach,

J. Shao, Y . Mao, and J. Zhang, “Learning task-oriented communication for edge inference: An information bottleneck approach,”IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 197–211, Jan. 2022

work page 2022

[16] [16]

Semantic communication systems for speech transmission,

Z. Weng and Z. Qin, “Semantic communication systems for speech transmission,”IEEE J. Sel. Areas Commun., vol. 39, no. 8, pp. 2434– 2444, Aug. 2021

work page 2021

[17] [17]

Task-oriented multi-user semantic communications for VQA task,

H. Xie, Z. Qin, and G. Y . Li, “Task-oriented multi-user semantic communications for VQA task,”IEEE Wireless Commun. Lett., vol. 11, no. 3, pp. 553–557, 2022

work page 2022

[18] [18]

Task-oriented explainable semantic communications,

S. Ma, W. Qiao, Y . Wu, H. Li, G. Shi, D. Gao, Y . Shi, S. Li, and N. Al-Dhahir, “Task-oriented explainable semantic communications,” IEEE Trans. Wireless Commun., vol. 22, no. 12, pp. 9248–9262, 2023

work page 2023

[19] [19]

Deep joint source- channel coding for wireless image transmission,

E. Bourtsoulatze, D. B. Kurka, and D. G”und”uz, “Deep joint source- channel coding for wireless image transmission,”IEEE Trans. Cogn. Commun. Netw., vol. 5, no. 3, pp. 567–579, 2019

work page 2019

[20] [20]

Bandwidth-agile image transmission with deep joint source-channel coding,

D. B. Kurka and D. G”und”uz, “Bandwidth-agile image transmission with deep joint source-channel coding,”IEEE Trans. Wireless Commun., vol. 20, no. 12, pp. 8081–8095, 2021

work page 2021

[21] [21]

Deepjscc- q: Constellation constrained deep joint source-channel coding,

T.-Y . Tung, D. B. Kurka, M. Jankowski, and D. G”und”uz, “Deepjscc- q: Constellation constrained deep joint source-channel coding,”IEEE J. Sel. Areas Inf. Theory, vol. 3, no. 4, pp. 720–731, 2022

work page 2022

[22] [22]

Deep joint source-channel coding for wireless image transmission with OFDM,

M. Yang, C. Bian, and H.-S. Kim, “Deep joint source-channel coding for wireless image transmission with OFDM,” inProc. IEEE Int. Conf. Commun. (ICC), 2021, pp. 1–6

work page 2021

[23] [23]

Swinjscc: Taming Swin transformer for deep joint source-channel coding,

K. Yang, S. Wang, J. Dai, X. Qin, K. Niu, and P. Zhang, “Swinjscc: Taming Swin transformer for deep joint source-channel coding,”IEEE Trans. Cogn. Commun. Netw., vol. 11, no. 1, pp. 90–104, 2025

work page 2025

[24] [24]

Joint semantic-channel coding and modulation for token communications,

J. Ying, Z. Qin, Y . Feng, L. Wang, and X. Tao, “Joint semantic-channel coding and modulation for token communications,”IEEE Trans. Wireless Commun., vol. 25, pp. 8179–8193, 2026

work page 2026

[25] [25]

Large model empowered multi-modal semantic communication with selective tokens for training,

J. Peng, H. Xing, Z. Xiao, L. Xu, and X. Lei, “Large model empowered multi-modal semantic communication with selective tokens for training,” IEEE Signal Process. Lett., vol. 32, pp. 2967–2971, 2025

work page 2025

[26] [26]

Federated learning-enabled hybrid language models for communication-efficient token transmission,

F. Solat, J. Lee, M. Seif, D. Niyato, and H. V . Poor, “Federated learning-enabled hybrid language models for communication-efficient token transmission,”IEEE Internet Things J., vol. 12, no. 24, pp. 53 574– 53 592, 2025

work page 2025

[27] [27]

D2-jscc: Digital deep joint source-channel coding for semantic communications,

J. Huang, K. Yuan, C. Huang, and K. Huang, “D2-jscc: Digital deep joint source-channel coding for semantic communications,” inProc. IEEE Int. Symp. Pers., Indoor, Mobile Radio Commun. (PIMRC), 2024, pp. 1–7

work page 2024

[28] [28]

Process- and-forward: Deep joint source-channel coding over cooperative relay networks,

C. Bian, Y . Shao, H. Wu, E. Ozfatura, and D. G”und”uz, “Process- and-forward: Deep joint source-channel coding over cooperative relay networks,”IEEE J. Sel. Areas Commun., vol. 43, no. 4, pp. 1118–1134, 2025

work page 2025

[29] [29]

Attention-driven semantic transmission scheme for AI-native wireless communications,

K.-H. Lee, H.-H. Choi, and J.-R. Lee, “Attention-driven semantic transmission scheme for AI-native wireless communications,”IEEE Commun. Lett., vol. 30, pp. 287–291, 2026

work page 2026

[30] [30]

Language-oriented semantic communication for image transmission with fine-tuned diffusion model,

X. Wei, H. Tong, N. Yang, and C. Yin, “Language-oriented semantic communication for image transmission with fine-tuned diffusion model,” inProc. 16th Int. Conf. Wireless Commun. Signal Process. (WCSP), 2024

work page 2024

[31] [31]

Generative semantic communication for joint image transmission and segmentation,

W. Yuan, J. Ren, C. Wang, R. Zhang, J. Wei, D. I. Kim, and S. Cui, “Generative semantic communication for joint image transmission and segmentation,” inProc. IEEE Int. Conf. Commun. Workshops (ICC Workshops), 2025, pp. 1110–1115

work page 2025

[32] [32]

Gen- erative semantic communications with foundation models: Perception- error analysis and semantic-aware power allocation,

C. Xu, M. B. Mashhadi, Y . Ma, R. Tafazolli, and J. Wang, “Gen- erative semantic communications with foundation models: Perception- error analysis and semantic-aware power allocation,”IEEE J. Sel. Areas Commun., vol. 43, no. 7, pp. 2493–2505, 2025

work page 2025