Recognition: unknown
Semantics-Aware Hierarchical Token Communication: Clustering, Bit Mapping, and Power Allocation
Pith reviewed 2026-05-07 05:37 UTC · model grok-4.3
The pith
By clustering semantically similar tokens and mapping them to hierarchical bits with more power on cluster prefixes, H-TokCom limits semantic distortion from noise compared to flat token mappings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that by clustering tokens according to their semantic similarity and mapping them hierarchically to bits—assigning a common prefix to all tokens in a cluster and a distinguishing suffix to each individual token, with greater transmit power allocated to the prefix—errors confined to the suffix bits map the received token to another within the same cluster. This results in only limited semantic distortion, as measured by similarity metrics, rather than arbitrary semantic shifts that occur in conventional flat bit mappings. Consequently, the framework achieves higher semantic similarity across signal-to-noise ratios, with a demonstrated increase from 0.206 to 0.279 at 3 dB
What carries the argument
Semantic clustering of tokens followed by hierarchical prefix-suffix bit assignment with unequal power allocation favoring the prefix.
If this is right
- Semantic similarity increases from 0.206 to 0.279 at 3 dB SNR on the COCO dataset, a gain of 0.073 or 35.4 percent.
- Gains in semantic similarity hold across the full range of tested signal-to-noise ratios.
- Semantic distortion stays limited whenever cluster prefix bits arrive correctly, even if suffix bits flip.
- The design requires advance semantic clustering of the token vocabulary before any transmission occurs.
Where Pith is reading between the lines
- The same prefix-protection idea could be paired with standard error-correcting codes applied only to the cluster bits for further reliability.
- Low-power devices sending image or text tokens might achieve usable meaning at lower transmit energy than flat mappings allow.
- The clustering step could be revisited periodically if the semantic space of tokens drifts over time or across domains.
Load-bearing premise
Tokens must be grouped beforehand into clusters where members are close enough in meaning that replacing one with another from the group does not greatly alter the communicated semantics.
What would settle it
If measurements show that the proportion of received tokens falling outside their sent cluster exceeds a threshold that erases the reported similarity gain at low SNR, the robustness would be falsified.
Figures
read the original abstract
Despite the rise of token communication (TokCom) as a new paradigm beyond traditional bit communication, existing approaches have primarily adopted artificial intelligence (AI)-centric designs that rely on semantic recovery via large models. Meanwhile, their physical-layer designs, such as token-bit mapping and power allocation, remain conventional and do not reflect token-level semantics. These semantics-agnostic designs can lead to significant semantic loss, particularly at low signal-to-noise ratio (SNR) levels. To address this issue, we propose hierarchical TokCom (H-TokCom), a framework that embeds semantic structure directly into physical-layer design. The key idea is to group semantically similar tokens into clusters and hierarchically assign their bit representations, where each token is represented by a cluster-level prefix and a token-specific suffix. As long as the cluster bits are correctly delivered, errors in the suffix bits typically map the received token to another within the same semantic cluster, resulting in only limited semantic distortion. This robustness is further strengthened by allocating more transmit power to the prefix bits than to the suffix bits. Simulation results show that H-TokCom achieves substantial semantic-similarity gains over conventional TokCom across the considered SNR range, increasing the semantic similarity from $0.206$ to $0.279$ at $\gamma=3$ dB on COCO, corresponding to a gain of $0.073$ $(35.4\%)$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes hierarchical token communication (H-TokCom) to make physical-layer design semantics-aware. Semantically similar tokens are pre-clustered; each token is encoded with a shared cluster-level prefix and a token-specific suffix. More transmit power is allocated to the prefix bits. The central claim is that correct prefix delivery confines suffix-bit errors to the same semantic cluster, producing only limited distortion under a semantic similarity metric. Simulations on the COCO dataset report that H-TokCom raises semantic similarity from 0.206 to 0.279 at 3 dB SNR (35.4 % relative gain) compared with conventional TokCom.
Significance. If the clustering procedure and intra-cluster similarity assumptions are validated, the work supplies a concrete, low-complexity physical-layer mechanism that embeds token semantics into bit mapping and power allocation. This moves beyond purely AI-centric recovery and could be useful for low-SNR token-based links. The reported absolute gain of 0.073 is non-trivial, but its reproducibility and generality depend on the missing methodological details.
major comments (3)
- [§3] §3 (Proposed Framework, clustering subsection): The manuscript states that tokens are grouped by semantic similarity and that suffix errors map to another token inside the same cluster, yet supplies no description of the embedding model, similarity metric, clustering algorithm (k-means, hierarchical, etc.), or chosen number of clusters. Without these parameters or a table of measured intra-cluster versus inter-cluster similarity scores, the limited-distortion premise cannot be evaluated and the simulation gains remain unverifiable.
- [§4] §4 (Simulation Results): The abstract reports performance on COCO at γ = 3 dB, but does not state whether the number of clusters or the prefix/suffix power-allocation ratio were selected or tuned on the same COCO split used for final evaluation. If they were, the 0.073 gain may be optimistically biased; an ablation or cross-validation statement is required to support the claim.
- [§3.2] §3.2 (Bit Mapping and Power Allocation): The hierarchical prefix-suffix construction and unequal power allocation are presented as the source of robustness, but no analytical expression or bound is given that relates prefix error probability to the resulting semantic similarity under the chosen metric. The simulation results therefore stand alone without a supporting derivation that would allow extrapolation beyond the reported SNR points.
minor comments (2)
- [§4] The notation for SNR (γ) and the exact definition of the semantic similarity metric used in the COCO experiments should be stated explicitly in the simulation section rather than left implicit.
- [Figures in §4] Figure captions for the semantic-similarity versus SNR curves should include the exact number of clusters and the power-allocation ratio employed, so that readers can reproduce the operating point.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which have helped us identify opportunities to improve the clarity, reproducibility, and theoretical grounding of the manuscript. We address each major comment below and have revised the paper accordingly to incorporate the requested details and analyses.
read point-by-point responses
-
Referee: [§3] §3 (Proposed Framework, clustering subsection): The manuscript states that tokens are grouped by semantic similarity and that suffix errors map to another token inside the same cluster, yet supplies no description of the embedding model, similarity metric, clustering algorithm (k-means, hierarchical, etc.), or chosen number of clusters. Without these parameters or a table of measured intra-cluster versus inter-cluster similarity scores, the limited-distortion premise cannot be evaluated and the simulation gains remain unverifiable.
Authors: We agree that these details are necessary for reproducibility and to substantiate the limited-distortion assumption. In the revised manuscript we expand Section 3 with a complete description of the clustering procedure, including the embedding model (a pre-trained transformer), the similarity metric (cosine similarity on embeddings), the algorithm (k-means), and the number of clusters. We also add a table reporting measured average intra-cluster versus inter-cluster similarities to validate that suffix errors remain semantically limited. revision: yes
-
Referee: [§4] §4 (Simulation Results): The abstract reports performance on COCO at γ = 3 dB, but does not state whether the number of clusters or the prefix/suffix power-allocation ratio were selected or tuned on the same COCO split used for final evaluation. If they were, the 0.073 gain may be optimistically biased; an ablation or cross-validation statement is required to support the claim.
Authors: The concern about potential optimistic bias is valid. The number of clusters and power-allocation ratio were selected on a held-out validation subset distinct from the test split used for the reported results. In the revision we explicitly document this data separation and add an ablation study in Section 4 that varies both parameters while reporting semantic similarity on the held-out test data. revision: yes
-
Referee: [§3.2] §3.2 (Bit Mapping and Power Allocation): The hierarchical prefix-suffix construction and unequal power allocation are presented as the source of robustness, but no analytical expression or bound is given that relates prefix error probability to the resulting semantic similarity under the chosen metric. The simulation results therefore stand alone without a supporting derivation that would allow extrapolation beyond the reported SNR points.
Authors: We acknowledge the value of an analytical relation. Section 3.2 currently offers a qualitative argument that correct prefix reception confines errors to the same cluster. Deriving a tight closed-form bound is non-trivial because the semantic similarity metric is embedding-based and non-linear. In the revised manuscript we add a probabilistic analysis that relates prefix error probability to expected semantic similarity under a uniform-within-cluster assumption, together with a discussion of its limitations; the simulations remain the primary empirical support. revision: partial
Circularity Check
No significant circularity; performance claims are simulation-driven and self-contained.
full rationale
The paper proposes H-TokCom by describing a clustering step on tokens, hierarchical prefix/suffix bit mapping, and unequal power allocation. The central quantitative claim (semantic similarity rising from 0.206 to 0.279 at 3 dB) is obtained directly from end-to-end Monte-Carlo simulation on the COCO dataset under the stated channel model. No equation in the abstract or described framework reduces a fitted parameter to a prediction, no self-citation supplies a load-bearing uniqueness theorem, and the clustering procedure is presented as an input design choice whose effect is then measured rather than assumed by definition. Because the reported gains are empirical outcomes rather than algebraic identities or self-referential fits, the derivation chain does not collapse to its inputs.
Axiom & Free-Parameter Ledger
free parameters (2)
- number of clusters
- power allocation ratio between prefix and suffix
axioms (2)
- standard math AWGN or similar memoryless channel model
- domain assumption Semantic similarity metric remains meaningful when tokens differ only in suffix bits
Reference graph
Works this paper leans on
-
[1]
Improving language understanding by generative pre-training,
A. Radford, K. Narasimhan, T. Salimanset al., “Improving language understanding by generative pre-training,” OpenAI technical report, 2018
2018
-
[2]
ToDMA: Large model- driven Token-domain multiple access for semantic communications,
L. Qiao, M. B. Mashhadi, Z. Gao,et al., “ToDMA: Large model- driven Token-domain multiple access for semantic communications,” arXiv preprint arXiv:2505.10946, 2025
-
[3]
Token communications: A large model-driven framework for cross-modal context-aware semantic communications,
L. Qiao, M. B. Mashhadi, Z. Gaoet al., “Token communications: A large model-driven framework for cross-modal context-aware semantic communications,”IEEE Wireless Commun., vol. 32, no. 5, pp. 80–88, 2025
2025
-
[4]
Text-guided Token communication for wireless image transmission,
B. Liu, L. Qiao, Y . Wanget al., “Text-guided Token communication for wireless image transmission,” inProc. IEEE/CIC ICCC, 2025, pp. 1–6
2025
-
[5]
Language-oriented communication with semantic coding and knowledge distillation for text-to-image genera- tion,
H. Nam, J. Park, J. Choiet al., “Language-oriented communication with semantic coding and knowledge distillation for text-to-image genera- tion,” inProc. IEEE ICASSP, 2024, pp. 13 506–13 510
2024
-
[6]
Token-domain multiple access: Exploiting semantic orthogonality for collision mitigation,
L. Qiao, M. B. Mashhadi, Z. Gaoet al., “Token-domain multiple access: Exploiting semantic orthogonality for collision mitigation,”arXiv preprint arXiv:2502.06118, 2025
-
[7]
Short wins long: Short codes with language model semantic correction outperform long codes,
J. Hao, C. Yue, H. Changet al., “Short wins long: Short codes with language model semantic correction outperform long codes,”arXiv preprint arXiv:2505.08536, 2025
-
[8]
Semantic packet aggregation for Token communication via genetic beam search,
S. Lee, J. Park, J. Choiet al., “Semantic packet aggregation for Token communication via genetic beam search,” inProc. IEEE SPAWC, 2025, pp. 1–5
2025
-
[9]
Context-aware iterative token detection and masked transmission for wireless token communication,
J. Shin, J. Park, J. Parket al., “Context-aware iterative token detection and masked transmission for wireless token communication,” inProc. AAAI Workshop, 2026, to appear
2026
-
[10]
Robustifying token communica- tion systems through conformal risk control,
C. Wang, Z. Chen, T. Q. S. Queket al., “Robustifying token communica- tion systems through conformal risk control,” inProc. AAAI Workshop, 2026, to appear
2026
-
[11]
Deep learning enabled semantic communication systems,
H. Xie, Z. Qin, G. Y . Liet al., “Deep learning enabled semantic communication systems,”IEEE Trans. Signal Process., vol. 69, pp. 2663–2675, 2021
2021
-
[12]
D. Cer, Y . Yang, S.-y. Konget al., “Universal Sentence Encoder,”arXiv preprint arXiv:1803.11175, 2018
work page Pith review arXiv 2018
-
[13]
Sentence-bert: Sentence embeddings using siamese bert-networks,
N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” inProc. EMNLP-IJCNLP, 2019, pp. 3982– 3992
2019
-
[14]
A tutorial on spectral clustering,
U. V on Luxburg, “A tutorial on spectral clustering,”Stat. Comput., vol. 17, no. 4, pp. 395–416, 2007
2007
-
[15]
Analysis of the clustering properties of the hilbert space-filling curve,
B. Moon, H. V . Jagadish, C. Faloutsoset al., “Analysis of the clustering properties of the hilbert space-filling curve,”IEEE Trans. Knowl. Data Eng., vol. 13, no. 1, pp. 124–141, 2001
2001
-
[16]
Microsoft COCO: Common Objects in Context
T.-Y . Lin, M. Maire, S. Belongieet al., “Microsoft COCO: Common objects in context,”arXiv preprint arXiv:1405.0312, 2015
work page internal anchor Pith review arXiv 2015
-
[17]
Natural language un- derstanding with the Quora question pairs dataset,
L. Sharma, L. Graesser, N. Nangiaet al., “Natural language un- derstanding with the Quora question pairs dataset,”arXiv preprint arXiv:1907.01041, 2019
-
[18]
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions,
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier, “From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions,”Trans. Assoc. Comput. Linguist., vol. 2, pp. 67–78, 2014
2014
-
[19]
sentence-transformers/all-minilm-l6-v2,
Sentence-Transformers, “sentence-transformers/all-minilm-l6-v2,” Hug- ging Face model card, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.