arxiv: 2604.18255 · v1 · submitted 2026-04-20 · 📡 eess.SP

Recognition: unknown

WiFo-MiSAC: A Wireless Foundation Model for Multimodal Sensing and Communication Integration via Synesthesia of Machines (SoM)

Boxun Liu, Liuqing Yang, Shijian Gao, Xiang Cheng, Xuanyu Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:01 UTC · model grok-4.3

classification 📡 eess.SP

keywords wireless foundation modelmultimodal sensingcommunication integrationself-supervised learningmixture of expertsbeam predictionchannel estimation

0 comments

The pith

WiFo-MiSAC unifies wireless sensing and communication by tokenizing signals into a shared space for self-supervised learning with disentangled experts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current wireless approaches process communication and sensing data separately, which limits their ability to generalize. The paper introduces WiFo-MiSAC as a foundation model that first converts these different signals into tokens in one common space. Self-supervised training then uses both masked reconstruction and contrastive alignment. A shared-specific disentangled mixture-of-experts structure keeps common and unique features separate to avoid interference. This leads to better results on tasks such as beam prediction and channel estimation, plus easier adaptation to new situations with limited data.

Core claim

The central discovery is that a task-agnostic foundation model can integrate multimodal sensing and communication through signal tokenization into a unified space, followed by pre-training that combines masked reconstruction with contrastive alignment. The SS-DMoE architecture decouples shared and specific representations to allow beneficial interaction without cross-modal interference. This yields state-of-the-art performance on downstream tasks including beam prediction and channel estimation, robust few-shot adaptation, and the capacity to incorporate new modalities without difficulty.

What carries the argument

The shared-specific disentangled mixture-of-experts (SS-DMoE) architecture, which decouples modality-shared and modality-specific representations from tokenized heterogeneous signals.

If this is right

State-of-the-art performance is achieved on beam prediction and channel estimation tasks.
The model adapts effectively to new tasks using only a small number of examples.
New modalities integrate seamlessly into the existing model.
The approach provides a scalable backbone for integrated sensing and communication systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Joint processing of sensing and communication may allow wireless systems to operate more efficiently by sharing learned features.
The tokenization method could be tested in other multimodal environments, such as combining radar with visual data.
Few-shot capabilities suggest potential use in rapidly changing wireless conditions like mobile networks.

Load-bearing premise

That converting different wireless signals into tokens in a single space and applying the SS-DMoE architecture will separate shared and specific features without causing interference between modalities.

What would settle it

If the model shows higher error rates than specialized models on beam prediction when modalities are combined in ways not seen during pre-training, this would challenge the claim of improved generalization without cross-modal interference.

Figures

Figures reproduced from arXiv: 2604.18255 by Boxun Liu, Liuqing Yang, Shijian Gao, Xiang Cheng, Xuanyu Liu.

**Figure 1.** Figure 1: Comparison between conventional task-specific SoM pipelines and the proposed WiFo-MiSAC framework. adapters, such retrofitting usually introduces additional crossmodal plumbing and handcrafted fusion choices, offers limited portability across heterogeneous system configurations and downstream objectives, and degrades noticeably when modalities are added, missing, or perturbed. More importantly, adapter-b… view at source ↗

**Figure 2.** Figure 2: An illustration of data pre-processing and modalspecific tokenizer. B. Data Pre-processing and Modal-specific Tokenizer Let M = {csi, map, radar} denote the modality set. Each sample provides synchronized observations X(m) for m ∈ M. For CSI, the channel measured at the BS is X (csi) 0 ∈ C Na×Nsc , where Na = NtNr is the antenna dimension and Nsc is the number of subcarriers. We convert it to a real-value… view at source ↗

**Figure 3.** Figure 3: An illustration of the unified multimodal encoder architecture in WiFo-MiSAC. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Performance gain of different methods after modality expansion (higher is better). [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Performance drop of different methods after modality missingness (lower is better). [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: The t-SNE visualization of CSI, Radar, and Map features extracted by three WiFo-MiSAC variants on PF5. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Scaling laws of WiFo-MiSAC under different tasks and input configurations, where low SNR and high SNR denote SNR = 10 dB and SNR = 20 dB, respectively [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Heatmaps of routing weights for SoM experts and [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

read the original abstract

Current learning-based wireless methods struggle with generalization due to the fragmented processing of communication and sensing data. WiFo-MiSAC addresses this as a task-agnostic foundation model that tokenizes heterogeneous signals into a unified space for self-supervised pre-training. A shared-specific disentangled mixture-of-experts (SS-DMoE) architecture is employed to decouple modality-shared and modality-specific representations, facilitating interaction without cross-modal interference. By combining masked reconstruction with contrastive alignment, the model achieves state-of-the-art performance across downstream tasks, including beam prediction and channel estimation. Experimental results demonstrate robust few-shot adaptation and seamless integration of new modalities, positioning WiFo-MiSAC as a scalable backbone for future integrated sensing and communication systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a concrete SS-DMoE architecture and pre-training recipe for multimodal wireless signals, but the SOTA claims on beam prediction and channel estimation rest on details that are not visible in the abstract.

read the letter

The punchline is that this work proposes a task-agnostic foundation model for integrated sensing and communication by tokenizing heterogeneous signals and using a shared-specific disentangled mixture-of-experts layer to separate common and modality-specific features. That combination under the Synesthesia of Machines framing is the actual new piece, and it directly targets the fragmentation problem that has limited most prior wireless ML models to single tasks or single modalities. The pre-training mix of masked reconstruction plus contrastive alignment is a reasonable way to encourage both reconstruction fidelity and cross-modal alignment without explicit labels. If the routing in SS-DMoE really isolates the representations as intended, the few-shot adaptation and easy addition of new modalities would be genuine practical wins for ISAC systems. The paper earns credit for shipping a full model description and for framing the problem in terms that align with how wireless signals actually share physical statistics like multipath and Doppler. The central assumption that tokenization plus the expert routing will avoid cross-modal leakage is stated clearly and is not circular on its face. The low circularity burden noted in the report holds up because the performance is measured on held-out downstream tasks rather than on fitted parameters alone. The soft spot is exactly where the stress-test flagged it: the abstract asserts state-of-the-art results on beam prediction and channel estimation with robust few-shot behavior, yet supplies no baselines, no error bars, no dataset descriptions, and no ablation on whether the disentanglement objective actually prevents interference. Without those, it is impossible to tell whether the claimed generalization gains are real or whether shared physical-layer statistics are leaking across experts anyway. Minor issues like missing implementation details can be fixed in revision; the absence of any evidence for the load-bearing claim is more serious. This paper is for researchers working on foundation models or multimodal methods inside the wireless and ISAC communities. A reader who wants to see how mixture-of-experts ideas translate to RF signals will find the architecture section useful even before the numbers are verified. It deserves a serious referee because the idea is timely and the architecture is specified enough to be reproducible, but any editor should require the full experimental section and targeted ablations on the decoupling before acceptance. I would send it to review rather than desk-reject.

Referee Report

2 major / 1 minor

Summary. The paper proposes WiFo-MiSAC, a task-agnostic foundation model for multimodal sensing and communication integration via Synesthesia of Machines. It tokenizes heterogeneous wireless signals (RF and sensing) into a unified space for self-supervised pre-training, employs a shared-specific disentangled mixture-of-experts (SS-DMoE) architecture to separate modality-shared and modality-specific representations, and combines masked reconstruction with contrastive alignment. The model claims state-of-the-art performance on downstream tasks including beam prediction and channel estimation, along with robust few-shot adaptation and seamless integration of new modalities.

Significance. If the experimental claims are substantiated, this work could advance integrated sensing and communication systems by offering a scalable, unified backbone that addresses generalization limitations of fragmented task-specific models. The self-supervised pre-training strategy and emphasis on disentangled multimodal representations represent a potentially valuable direction for handling heterogeneous wireless data with reduced reliance on labeled datasets.

major comments (2)

[Abstract] Abstract: The abstract asserts state-of-the-art results on beam prediction and channel estimation but supplies no experimental details, baselines, error bars, or data-exclusion rules. Without these elements the numerical support for the central performance claims cannot be evaluated.
[SS-DMoE Architecture] SS-DMoE Architecture: The claim that the SS-DMoE successfully decouples modality-shared and modality-specific representations without cross-modal interference is load-bearing for the generalization and few-shot adaptation results. Given that wireless modalities share physical-layer statistics such as multipath and Doppler, the manuscript requires explicit ablations or an interference metric to demonstrate that the mixture-of-experts routing and contrastive alignment prevent leakage and deliver gains over fragmented baselines.

minor comments (1)

The abstract would benefit from a concise statement of the number of modalities and dataset sizes used in the pre-training and downstream evaluations to contextualize the reported robustness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment in detail below and indicate the revisions made to strengthen the work.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract asserts state-of-the-art results on beam prediction and channel estimation but supplies no experimental details, baselines, error bars, or data-exclusion rules. Without these elements the numerical support for the central performance claims cannot be evaluated.

Authors: We agree that the abstract is currently high-level and does not provide the requested experimental specifics, which limits immediate evaluation of the claims. In the revised manuscript we expand the abstract to include the primary baselines (task-specific models and prior multimodal approaches), the reported performance gains with associated error bars from the main experiments, and a brief reference to the data processing protocol detailed in Section 4.1. The full experimental setup, including any exclusion criteria, remains in the body of the paper. revision: yes
Referee: [SS-DMoE Architecture] SS-DMoE Architecture: The claim that the SS-DMoE successfully decouples modality-shared and modality-specific representations without cross-modal interference is load-bearing for the generalization and few-shot adaptation results. Given that wireless modalities share physical-layer statistics such as multipath and Doppler, the manuscript requires explicit ablations or an interference metric to demonstrate that the mixture-of-experts routing and contrastive alignment prevent leakage and deliver gains over fragmented baselines.

Authors: The referee correctly notes that the decoupling property is central to our generalization and few-shot results and that shared physical-layer effects could induce leakage. While the manuscript already reports ablation studies comparing SS-DMoE against non-disentangled MoE variants and shows corresponding gains in downstream tasks, we acknowledge that a direct interference metric is not yet present. In the revision we add (i) an explicit ablation isolating the effect of the shared-specific routing and (ii) a quantitative interference metric (normalized mutual information between shared and modality-specific expert activations) together with routing visualizations. These additions directly quantify leakage under shared statistics such as multipath and Doppler and confirm the contribution of the contrastive alignment term. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical self-supervised model with held-out evaluation

full rationale

The paper presents a foundation model trained via masked reconstruction and contrastive alignment on tokenized multimodal wireless signals, then evaluated on downstream tasks such as beam prediction and channel estimation. No equations, first-principles derivations, or predictions appear that reduce by construction to fitted inputs or self-citations. Performance claims rest on experimental results with few-shot adaptation and new-modality integration, which are falsifiable on held-out data. The SS-DMoE architecture and SoM framing are design choices justified by training objectives rather than tautological definitions or load-bearing self-citations that collapse the central result.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on standard machine-learning assumptions about tokenization and self-supervised objectives plus domain assumptions about wireless signal compatibility; the model itself introduces new architectural components whose effectiveness is asserted rather than derived from first principles.

free parameters (1)

Neural network weights and hyperparameters
All model parameters are learned from data during the self-supervised pre-training stage.

axioms (1)

domain assumption Heterogeneous wireless signals from different modalities can be tokenized into a shared latent space without prohibitive information loss.
Invoked at the start of the tokenization pipeline to enable unified pre-training.

invented entities (2)

SS-DMoE architecture no independent evidence
purpose: Decouple modality-shared and modality-specific representations to avoid cross-modal interference
New architectural component introduced by the paper.
WiFo-MiSAC model no independent evidence
purpose: Task-agnostic foundation model for multimodal sensing and communication
The overall proposed system.

pith-pipeline@v0.9.0 · 5436 in / 1555 out tokens · 54781 ms · 2026-05-10T04:01:38.375266+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 10 canonical work pages · 3 internal anchors

[1]

Integrated Sensing and Com- munications (ISAC) for Vehicular Communication Networks (VCN),

X. Cheng, D. Duan, S. Gao, and L. Yang, “Integrated Sensing and Com- munications (ISAC) for Vehicular Communication Networks (VCN),” IEEE Internet Things J., vol. 9, no. 23, pp. 23 441–23 451, Jul. 2022

2022
[2]

Integrated Sensing and Communications: Toward Dual- Functional Wireless Networks for 6G and Beyond,

F. Liuet al., “Integrated Sensing and Communications: Toward Dual- Functional Wireless Networks for 6G and Beyond,”IEEE J. Sel. Areas Commun., vol. 40, no. 6, pp. 1728–1767, Mar. 2022

2022
[3]

Integrated Sensing and Communication for Low Altitude Economy: Opportunities and Challenges,

Y . Jianget al., “Integrated Sensing and Communication for Low Altitude Economy: Opportunities and Challenges,”IEEE Commun. Mag., vol. 64, no. 12, pp. 72–78, Apr. 2025

2025
[4]

Intelligent Multi-Modal Sensing-Communication In- tegration: Synesthesia of Machines,

X. Chenget al., “Intelligent Multi-Modal Sensing-Communication In- tegration: Synesthesia of Machines,”IEEE Commun. Surv. Tutorials, vol. 26, no. 1, pp. 258–301, Firstquarter 2024

2024
[5]

Integrated sensing and communications toward proactive beamforming in mmWave V2I via multi-modal feature fusion (MMFF),

H. Zhang, S. Gao, X. Cheng, and L. Yang, “Integrated sensing and communications toward proactive beamforming in mmWave V2I via multi-modal feature fusion (MMFF),”IEEE Transactions on Wireless Communications, vol. 23, no. 11, pp. 15 721–15 735, Nov. 2024

2024
[6]

Deep Multimodal Learning: Merging Sensory Data for Massive MIMO Channel Predic- tion,

Y . Yang, F. Gao, C. Xing, J. An, and A. Alkhateeb, “Deep Multimodal Learning: Merging Sensory Data for Massive MIMO Channel Predic- tion,”IEEE J. Sel. Areas Commun., vol. 39, no. 7, pp. 1885–1898, Jul. 2020

2020
[7]

Sensing Aided Channel Estimation in Wideband Millimeter- Wave MIMO Systems,

R. Mundlamuri, R. Gangula, C. K. Thomas, F. Kaltenberger, and W. Saad, “Sensing Aided Channel Estimation in Wideband Millimeter- Wave MIMO Systems,” inIEEE Int. Conf. Commun. Workshops (ICC Workshops). Rome, Italy: IEEE, May 2023, pp. 1404–1409

2023
[8]

Radar Aided Proactive Blockage Prediction in Real-World Millimeter Wave Systems,

U. Demirhan and A. Alkhateeb, “Radar Aided Proactive Blockage Prediction in Real-World Millimeter Wave Systems,” inIEEE Int. Conf. Commun. (ICC). Seoul, South Korea: IEEE, May 2022, pp. 4547–4552

2022
[9]

LIDAR Data for Deep Learning-Based mmWave Beam-Selection,

A. Klautau, N. Gonz ´alez-Prelcic, and R. W. Heath, “LIDAR Data for Deep Learning-Based mmWave Beam-Selection,”IEEE Wireless Commun. Lett., vol. 8, no. 3, pp. 909–912, Feb. 2019

2019
[10]

Millimeter Wave Base Stations with Cameras: Vision-Aided Beam and Blockage Prediction,

M. Alrabeiah, A. Hredzak, and A. Alkhateeb, “Millimeter Wave Base Stations with Cameras: Vision-Aided Beam and Blockage Prediction,” inIEEE Proc. IEEE Veh. Technol. Conf. (VTC2020-Spring). Antwerp, Belgium: IEEE, May 2020, pp. 1–5

2020
[11]

Towards Real-World 6G Drone Communication: Position and Camera Aided Beam Prediction,

G. Charanet al., “Towards Real-World 6G Drone Communication: Position and Camera Aided Beam Prediction,” inIEEE Global Commun. Conf.(GLOBECOM). Rio de Janeiro, Brazil: IEEE, Dec. 2022, pp. 2951–2956

2022
[12]

SAM-Med3D: A Vision Foundation Model for General- Purpose Segmentation on V olumetric Medical Images,

H. Wanget al., “SAM-Med3D: A Vision Foundation Model for General- Purpose Segmentation on V olumetric Medical Images,”IEEE Trans. Neural Networks Learn. Syst., vol. 36, pp. 17 599–17 612, Oct. 2025

2025
[13]

DMAE-EEG: A Pretraining Framework for EEG Spatiotemporal Representation Learning,

Y . Zhanget al., “DMAE-EEG: A Pretraining Framework for EEG Spatiotemporal Representation Learning,”IEEE Trans. Neural Networks Learn. Syst., vol. 36, pp. 17 664–17 678, Oct. 2025

2025
[14]

Foundation Model Em- powered Synesthesia of Machines (SoM): AI-Native Intelligent Multi- Modal Sensing-Communication Integration,

X. Cheng, B. Liu, X. Liu, E. Liu, and Z. Huang, “Foundation Model Em- powered Synesthesia of Machines (SoM): AI-Native Intelligent Multi- Modal Sensing-Communication Integration,”IEEE Trans. Network Sci. Eng., vol. 13, pp. 762–782, 2026

2026
[15]

Large Wireless Foundation Models: Stronger over Bigger,

X. Cheng, B. Liu, X. Liu, and X. Cai, “Large Wireless Foundation Models: Stronger over Bigger,”arXiv preprint arXiv:2601.10963, 2026

work page arXiv 2026
[16]

Large wireless model (LWM): A foundation model for wireless channels,

S. Alikhani, G. Charan, and A. Alkhateeb, “Large Wireless Model (LWM): A Foundation Model for Wireless Channels,”arXiv preprint arXiv:2411.08872, 2024

work page arXiv 2024
[17]

WiFo: Wireless Foundation Model for Channel Prediction,

B. Liu, S. Gao, X. Liu, X. Cheng, and L. Yang, “WiFo: Wireless Foundation Model for Channel Prediction,”Sci. China Inf. Sci., vol. 68, no. 8, p. 162302, May 2025

2025
[18]

WiFo-CF: Wireless foundation model for CSI feedback,

X. Liu, S. Gao, B. Liu, X. Cheng, and L. Yang, “WiFo-CF: Wireless foundation model for CSI feedback,”arXiv preprint arXiv:2508.04068, 2025

work page arXiv 2025
[19]

WiFo-2: a generalist foundation model unifies heterogeneous wireless system design

B. Liu, X. Liu, S. Gao, X. Cheng, and L. Yang, “Foundation Model for Intelligent Wireless Communications,”arXiv preprint arXiv:2511.22222, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

IQFM–A Wireless Foundation Model for I/Q Streams in AI-Native 6G,

O. Mashaal and H. Abou-Zeid, “IQFM–A Wireless Foundation Model for I/Q Streams in AI-Native 6G,”IEEE Open J. Commun. Soc., vol. 7, pp. 1426–1441, Feb. 2026

2026
[21]

WiFo-M 2: Plug-and-Play Multi- Modal Sensing via Foundation Model to Empower Wireless Communi- cations,

H. Zhang, S. Gao, and X. Cheng, “WiFo-M 2: Plug-and-Play Multi- Modal Sensing via Foundation Model to Empower Wireless Communi- cations,”arXiv preprint arXiv:2601.09179, 2026

work page arXiv 2026
[22]

Wireless Multimodal Foundation Model (WMFM): Integrating Vision and Communication Modalities for 6G ISAC Systems,

M. Farzanullah, H. Zhang, A. B. Sediq, A. Afana, and M. Erol-Kantarci, “Wireless Multimodal Foundation Model (WMFM): Integrating Vision and Communication Modalities for 6G ISAC Systems,”arXiv preprint arXiv:2512.23897, 2025

work page arXiv 2025
[23]

FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER,

T. Yanet al., “FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER,”IEEE Trans. Neural Net- works Learn. Syst., vol. 36, pp. 10 779–10 793, Jun. 2025

2025
[24]

A Multi-Modal Foundational Model for Wireless Communication and Sensing,

V . Yazdnian and Y . Ghasempour, “A Multi-Modal Foundational Model for Wireless Communication and Sensing,”arXiv preprint arXiv:2602.04016, 2026

work page arXiv 2026
[25]

Radar Aided 6G Beam Prediction: Deep Learning Algorithms and Real-World Demonstration,

U. Demirhan and A. Alkhateeb, “Radar Aided 6G Beam Prediction: Deep Learning Algorithms and Real-World Demonstration,” inIEEE Wireless Commun. Netw. Conf. (WCNC). Austin, USA: IEEE, Apr. 2022, pp. 2655–2660

2022
[26]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,

A. Dosovitskiyet al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” inInt. Conf. Learn. Representations (ICLR), Vienna, Austria, May 2021, pp. 1–22

2021
[27]

Root mean square layer normalization,

B. Zhang and R. Sennrich, “Root mean square layer normalization,” Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, Dec. 2019

2019
[28]

Qwen3-VL Technical Report

S. Baiet al., “Qwen3-vl Technical Report,”arXiv preprint arXiv:2511.21631, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

D. Lepikhinet al., “Gshard: Scaling giant models with conditional computation and automatic sharding,”arXiv preprint arXiv:2006.16668, 2020

work page internal anchor Pith review arXiv 2006
[30]

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity,

W. Fedus, B. Zoph, and N. Shazeer, “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity,”J. Mach. Learn. Res., vol. 23, no. 120, pp. 1–39, Jan. 2022

2022
[31]

M 3SC: A Generic Dataset for Mixed Multi-Modal (MMM) Sensing and Communication Integration,

X. Chenget al., “M 3SC: A Generic Dataset for Mixed Multi-Modal (MMM) Sensing and Communication Integration,”China Commun., vol. 20, no. 11, pp. 13–29, Nov. 2023

2023
[32]

SynthSoM: A Synthetic Intelligent Multi-Modal Sensing-Communication Dataset for Synesthesia of Machines (SoM),

X. Chenget al., “SynthSoM: A Synthetic Intelligent Multi-Modal Sensing-Communication Dataset for Synesthesia of Machines (SoM),” Sci. Data, vol. 12, no. 819, May 2025

2025
[33]

DeepSense 6G: A Large-Scale Real-World Multi- Modal Sensing and Communication Dataset,

A. Alkhateebet al., “DeepSense 6G: A Large-Scale Real-World Multi- Modal Sensing and Communication Dataset,”IEEE Commun. Mag., vol. 61, no. 9, pp. 122–128, Sep. 2023

2023
[34]

LLM4CP: Adapting Large Language Models for Channel Prediction,

B. Liu, X. Liu, S. Gao, X. Cheng, and L. Yang, “LLM4CP: Adapting Large Language Models for Channel Prediction,”J. Commun. Inf. Networks, vol. 9, no. 2, pp. 113–125, Jun. 2024

2024
[35]

LLM4WM: Adapting LLM for Wireless Multi-Tasking,

X. Liu, S. Gao, B. Liu, X. Cheng, and L. Yang, “LLM4WM: Adapting LLM for Wireless Multi-Tasking,”IEEE Trans. Mach. Learn. Commun. Networking, vol. 3, pp. 835–847, Jul. 2025

2025
[36]

Accurate Channel Prediction Based on Transformer: Making Mobility Negligible,

H. Jiang, M. Cui, D. W. K. Ng, and L. Dai, “Accurate Channel Prediction Based on Transformer: Making Mobility Negligible,”IEEE J. Sel. Areas Commun., vol. 40, no. 9, pp. 2717–2732, July 2022

2022
[37]

Channelformer: Attention Based Neural Solution for Wireless Channel Estimation and Effective Online Train- ing,

D. Luan and J. S. Thompson, “Channelformer: Attention Based Neural Solution for Wireless Channel Estimation and Effective Online Train- ing,”IEEE Trans. Wireless Commun., vol. 22, no. 10, pp. 6562–6577, Oct. 2023

2023
[38]

Deep Learning for mmWave Beam and Blockage Prediction Using Sub-6 GHz Channels,

M. Alrabeiah and A. Alkhateeb, “Deep Learning for mmWave Beam and Blockage Prediction Using Sub-6 GHz Channels,”IEEE Trans. Commun., vol. 68, no. 9, pp. 5504–5518, Sep. 2020

2020
[39]

Attention Aided CSI Wireless Lo- calization,

A. Salihu, S. Schwarz, and M. Rupp, “Attention Aided CSI Wireless Lo- calization,” inIEEE Workshop Signal Process. Adv. Wireless Commun. (SPAWC), Oulu, Finland, Jul. 2022, pp. 1–5

2022
[40]

Multi-Modal Variable-Rate CSI Reconstruction for FDD Massive MIMO Systems,

Y . Nam and J. Choi, “Multi-Modal Variable-Rate CSI Reconstruction for FDD Massive MIMO Systems,”arXiv preprint arXiv:2501.11926, 2025

work page arXiv 2025