CSI-JEPA: Towards Foundation Representations for Ubiquitous Sensing with Minimal Supervision
Pith reviewed 2026-05-15 04:50 UTC · model grok-4.3
The pith
CSI-JEPA learns reusable temporal-spectral representations from unlabeled Wi-Fi channel state information through masked prediction of latent features.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CSI-JEPA pretrains an encoder on unlabeled CSI by predicting latent features of masked channel-response amplitude windows from visible context, using time-subcarrier tokenization and channel variation-aware masking to respect CSI physical structure, then freezes the encoder as a backbone for multiple downstream sensing tasks with only lightweight adapters.
What carries the argument
Masked predictive coding on tokenized CSI amplitude windows, where the model predicts latent features of variation-rich masked regions from surrounding visible context.
If this is right
- Seven diverse Wi-Fi sensing tasks achieve higher mean accuracy than fully supervised baselines while using far less labeled data.
- The same pretrained encoder supports multiple objectives by adding separate lightweight adapters rather than retraining entire models.
- Label budgets for new deployments can be reduced by up to 98 percent while maintaining competitive performance.
- Temporal-spectral structure in CSI is explicitly respected during pretraining through dimension-specific tokenization and variation-aware masking.
Where Pith is reading between the lines
- Similar masked-prediction pretraining could extend to other radio-frequency sensing modalities that share time-frequency structure.
- Foundation-style encoders for sensing would allow rapid adaptation across users, devices, and rooms without repeated large-scale labeling campaigns.
- Deployment cost models for Wi-Fi sensing systems would shift from data-collection expense to compute for one-time pretraining.
- The approach implies that explicit modeling of channel variation during masking is more effective than uniform random masking for radio signals.
Load-bearing premise
Representations learned from masked prediction on unlabeled CSI transfer effectively to new tasks and settings using only lightweight adapters without major domain shift.
What would settle it
Measure whether accuracy on a new device or environment falls below supervised baselines when the pretrained encoder is frozen and only adapters are trained on limited labels from that setting.
Figures
read the original abstract
Channel state information (CSI) provides a widely available sensing modality for human and environment perception, but existing CSI sensing models usually rely on task-specific supervised training and require substantial labeled data for each task, device, user, or environment. This limits their scalability in practical deployments where unlabeled CSI is abundant but labeled data is costly to collect. In this paper, we present CSI-JEPA, a self-supervised predictive representation learning framework for label-efficient, multi-task Wi-Fi sensing. CSI-JEPA learns reusable temporal-spectral representations from unlabeled CSI samples by predicting latent features of masked channel regions from visible context. To better match the physical structure of CSI, CSI-JEPA tokenizes channel-response amplitude windows along the time and subcarrier dimensions. It then introduces a channel variation-aware masking strategy that samples predictive targets from regions with stronger local temporal and subcarrier-domain variations. After pretraining, the encoder is frozen and used as a backbone, with lightweight task-specific adapters added for downstream sensing tasks. We evaluate CSI-JEPA on seven real-world Wi-Fi sensing tasks spanning diverse objectives and deployment settings. The results show that CSI-JEPA improves downstream sensing performance over competitive baselines, achieving up to 10.64 percentage points mean accuracy gain over state-of-the-art supervised Transformer and matched-budget label savings of up to 98.0%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. CSI-JEPA introduces a self-supervised JEPA-style framework that pretrains a Transformer encoder on unlabeled CSI amplitude windows by predicting latent features of masked regions, using a channel variation-aware masking strategy on time-subcarrier tokenized inputs. After pretraining, the encoder is frozen and paired with lightweight task-specific adapters for transfer to seven real-world Wi-Fi sensing tasks, where it reports accuracy gains of up to 10.64 percentage points and label savings of up to 98% relative to supervised Transformer baselines.
Significance. If the empirical results are robust, the work provides a practical path toward foundation representations for ubiquitous CSI sensing, substantially lowering the labeled-data barrier for multi-task, multi-environment deployment. The combination of physically motivated masking and frozen-encoder transfer is a clear strength that aligns with successful self-supervised paradigms in other modalities.
major comments (3)
- [§4] §4 (Experiments) and Table 2: the reported maximum gains (10.64 pp accuracy, 98 % label savings) are presented as headline results without per-task breakdowns, error bars, or statistical significance tests; this information is load-bearing for the central claim that CSI-JEPA outperforms matched-budget supervised Transformers across diverse tasks.
- [§3.2] §3.2 (Channel variation-aware masking): the precise definition of the variation threshold, the sampling distribution over high-variation regions, and the exact masking ratio schedule are described only qualitatively; without these details the strategy cannot be reproduced or ablated, undermining claims that the masking is tailored to CSI physical structure.
- [§4.3] §4.3 (Ablation studies): the paper does not report an ablation isolating the contribution of the variation-aware masking versus standard random masking, which is necessary to establish that the proposed strategy is responsible for the observed transfer gains rather than the JEPA objective alone.
minor comments (2)
- [§3.1] Notation for the tokenized CSI windows (time-subcarrier patches) is introduced without an explicit equation; adding a compact definition (e.g., Eq. (3)) would improve clarity.
- [Figure 2] Figure 2 (architecture diagram) would benefit from explicit arrows indicating the flow of visible vs. masked tokens through the predictor.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and constructive comments. We address each major point below and will revise the manuscript to strengthen reproducibility and empirical rigor.
read point-by-point responses
-
Referee: [§4] §4 (Experiments) and Table 2: the reported maximum gains (10.64 pp accuracy, 98 % label savings) are presented as headline results without per-task breakdowns, error bars, or statistical significance tests; this information is load-bearing for the central claim that CSI-JEPA outperforms matched-budget supervised Transformers across diverse tasks.
Authors: We agree that per-task breakdowns with error bars and significance testing are necessary to support the central claims. In the revised manuscript we will expand Table 2 (and add a supplementary table) to report mean accuracy and standard deviation over five random seeds for each of the seven tasks, together with p-values from paired t-tests against the matched-budget supervised Transformer baselines. This will make the robustness of the reported gains transparent. revision: yes
-
Referee: [§3.2] §3.2 (Channel variation-aware masking): the precise definition of the variation threshold, the sampling distribution over high-variation regions, and the exact masking ratio schedule are described only qualitatively; without these details the strategy cannot be reproduced or ablated, undermining claims that the masking is tailored to CSI physical structure.
Authors: We acknowledge that §3.2 currently provides only a qualitative description. In the revision we will add the exact implementation details used in our experiments: the mathematical definition of the local variation score, the precise threshold for selecting high-variation regions, the normalized sampling distribution, and the linear masking-ratio schedule. Pseudocode will also be included to ensure full reproducibility. revision: yes
-
Referee: [§4.3] §4.3 (Ablation studies): the paper does not report an ablation isolating the contribution of the variation-aware masking versus standard random masking, which is necessary to establish that the proposed strategy is responsible for the observed transfer gains rather than the JEPA objective alone.
Authors: We agree that isolating the masking strategy is required. We will add a new ablation subsection in §4.3 that trains an otherwise identical JEPA model with standard random masking and reports downstream accuracy on the same seven tasks. This will quantify the incremental benefit of the channel variation-aware masking. revision: yes
Circularity Check
No significant circularity; empirical claims rest on held-out evaluations
full rationale
The paper describes a JEPA-style masked-prediction pretraining pipeline on tokenized CSI amplitude windows, followed by frozen-encoder transfer with lightweight adapters. All reported gains (accuracy improvements and label savings) are presented as direct outcomes of empirical evaluation across seven real-world tasks. No equations, derivations, or parameter-fitting steps are shown that reduce the target quantities to the inputs by construction. The masking strategy is motivated by CSI physical structure rather than tautological redefinition, and the transfer protocol follows standard foundation-model practice without self-referential uniqueness theorems or ansatz smuggling. Any self-citations are peripheral and do not carry the central empirical claims.
Axiom & Free-Parameter Ledger
free parameters (1)
- masking ratio and variation threshold
axioms (1)
- domain assumption Unlabeled CSI contains sufficient latent structure for learning task-agnostic representations via masked prediction
invented entities (1)
-
channel variation-aware masking strategy
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CSI-JEPA learns reusable temporal-spectral representations from unlabeled CSI samples by predicting latent features of masked channel regions from visible context... channel variation-aware masking strategy that samples predictive targets from regions with stronger local temporal and subcarrier-domain variations.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
After pretraining, the encoder is frozen and used as a backbone, with lightweight task-specific adapters added for downstream sensing tasks.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Deep AI enabled ubiquitous wireless sensing: A survey,
C. Li, Z. Cao, and Y . Liu, “Deep AI enabled ubiquitous wireless sensing: A survey,”ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–35, 2021
work page 2021
-
[2]
Cross-domain WiFi sensing with channel state information: A survey,
C. Chen, G. Zhou, and Y . Lin, “Cross-domain WiFi sensing with channel state information: A survey,”ACM Computing Surveys, vol. 55, no. 11, pp. 1–37, 2023
work page 2023
-
[3]
WiFi sensing with channel state information: A survey,
Y . Ma, G. Zhou, and S. Wang, “WiFi sensing with channel state information: A survey,”ACM Computing Surveys (CSUR), vol. 52, no. 3, pp. 1–36, 2019
work page 2019
-
[4]
Wi-Fi can do more: Toward ubiquitous wireless sensing,
C. Wu, B. Wang, O. C. Au, and K. R. Liu, “Wi-Fi can do more: Toward ubiquitous wireless sensing,”IEEE Communications Standards Magazine, vol. 6, no. 2, pp. 42–49, 2022
work page 2022
-
[5]
Wireless sensing for human activity: A survey,
J. Liu, H. Liu, Y . Chen, Y . Wang, and C. Wang, “Wireless sensing for human activity: A survey,”IEEE Communications Surveys & Tutorials, vol. 22, no. 3, pp. 1629–1645, 2019
work page 2019
-
[6]
IEEE 802.11 bf WLAN sensing procedure: Enabling the widespread adoption of WiFi sensing,
T. Ropitault, C. R. da Silva, S. Blandino, A. Sahoo, N. Golmie, K. Yoon, C. Aldana, and C. Hu, “IEEE 802.11 bf WLAN sensing procedure: Enabling the widespread adoption of WiFi sensing,”IEEE Communications Standards Magazine, vol. 8, no. 1, pp. 58–64, 2024
work page 2024
-
[7]
An overview on IEEE 802.11 bf: WLAN sensing,
R. Du, H. Hua, H. Xie, X. Song, Z. Lyu, M. Hu, Y . Xin, S. McCann, M. Montemurro, T. X. Hanet al., “An overview on IEEE 802.11 bf: WLAN sensing,”IEEE Communications Surveys & Tutorials, vol. 27, no. 1, pp. 184–217, 2024
work page 2024
-
[8]
Beamforming Feedback- Driven Wireless Positioning: A Transferable Vision Transformer Ap- proach,
Z. Li, X. Luo, M. Chen, G. Li, and Y . Liu, “Beamforming Feedback- Driven Wireless Positioning: A Transferable Vision Transformer Ap- proach,”IEEE Transactions on Mobile Computing, 2026
work page 2026
-
[9]
Contactless respiration monitoring via off-the-shelf WiFi devices,
X. Liu, J. Cao, S. Tang, J. Wen, and P. Guo, “Contactless respiration monitoring via off-the-shelf WiFi devices,”IEEE Transactions on Mo- bile Computing, vol. 15, no. 10, pp. 2466–2479, 2015
work page 2015
-
[10]
Walls have no ears: A non-intrusive WiFi-based user identification system for mobile devices,
L. Cheng and J. Wang, “Walls have no ears: A non-intrusive WiFi-based user identification system for mobile devices,”IEEE/ACM Transactions on Networking, vol. 27, no. 1, pp. 245–257, 2019
work page 2019
-
[11]
BFMLoc: Transformer- Based Indoor Positioning Leveraging Beamforming Feedback Matri- ces,
Z. Li, X. Luo, M. Chen, C. Xu, and Y . Liu, “BFMLoc: Transformer- Based Indoor Positioning Leveraging Beamforming Feedback Matri- ces,” inICC 2025-IEEE International Conference on Communications. IEEE, 2025, pp. 6699–6704
work page 2025
-
[12]
Inferring person-to-person proximity using WiFi signals,
P. Sapiezynski, A. Stopczynski, D. K. Wind, J. Leskovec, and S. Lehmann, “Inferring person-to-person proximity using WiFi signals,” Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, vol. 1, no. 2, pp. 1–20, 2017
work page 2017
-
[13]
AutoFi: Toward automatic Wi-Fi human sensing via geometric self-supervised learning,
J. Yang, X. Chen, H. Zou, D. Wang, and L. Xie, “AutoFi: Toward automatic Wi-Fi human sensing via geometric self-supervised learning,” IEEE Internet of Things Journal, vol. 10, no. 8, pp. 7416–7425, 2022
work page 2022
-
[14]
A. Y . Radwan, M. Yildirim, N. Hasanzadeh, H. Tabassum, and S. Valaee, “A tutorial-cum-survey on self-supervised learning for wi-fi sensing: Trends, challenges, and outlook,”IEEE Communications Surveys & Tutorials, 2025
work page 2025
-
[15]
AM-FM: A Foundation Model for Ambient Intelligence Through WiFi,
G. Zhu, Y . Hu, S. Jayaweera, W. Gao, W.-H. Wang, J. Zhang, B. Wang, C. Wu, and K. Liu, “AM-FM: A Foundation Model for Ambient Intelligence Through WiFi,”arXiv preprint arXiv:2602.11200, 2026
-
[16]
CSI-MAE: A Masked Autoencoder-based Channel Foundation Model,
J. Jiang, X. Ruan, and S. Xu, “CSI-MAE: A Masked Autoencoder-based Channel Foundation Model,”arXiv preprint arXiv:2601.03789, 2026
-
[17]
A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27,
Y . LeCunet al., “A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27,”Open Review, vol. 62, no. 1, pp. 1–62, 2022
work page 2022
-
[18]
Toward inte- grated sensing and communications in IEEE 802.11 bf Wi-Fi networks,
F. Meneghello, C. Chen, C. Cordeiro, and F. Restuccia, “Toward inte- grated sensing and communications in IEEE 802.11 bf Wi-Fi networks,” IEEE Communications Magazine, vol. 61, no. 7, pp. 128–133, 2023
work page 2023
-
[19]
BFMSense: WiFi sensing using beamforming feedback matrix,
E. Yi, D. Wu, J. Xiong, F. Zhang, K. Niu, W. Li, and D. Zhang, “BFMSense: WiFi sensing using beamforming feedback matrix,” in21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), 2024, pp. 1697–1712
work page 2024
-
[20]
Sensing per- formance of the IEEE 802.11 bf protocol and its impact on data communication,
A. Sahoo, T. Ropitault, S. Blandino, and N. Golmie, “Sensing per- formance of the IEEE 802.11 bf protocol and its impact on data communication,” in2024 IEEE 100th Vehicular Technology Conference (VTC2024-Fall). IEEE, 2024, pp. 1–7
work page 2024
-
[21]
Deep learning-based joint channel estimation and CSI feedback for RIS-assisted communications,
H. Feng, Y . Xu, and Y . Zhao, “Deep learning-based joint channel estimation and CSI feedback for RIS-assisted communications,”IEEE Communications Letters, vol. 28, no. 8, pp. 1860–1864, 2024
work page 2024
-
[22]
Z. Li, X. Luo, M. Chen, C. Xu, S. Mao, and Y . Liu, “Contextual combinatorial beam management via online probing for multiple ac- cess mmWave wireless networks,”IEEE Journal on Selected Areas in Communications, vol. 43, no. 3, pp. 959–972, 2025
work page 2025
-
[23]
Denoising diffusion probabilistic model for radio map estimation in generative wireless networks,
X. Luo, Z. Li, Z. Peng, M. Chen, and Y . Liu, “Denoising diffusion probabilistic model for radio map estimation in generative wireless networks,”IEEE Transactions on Cognitive Communications and Net- working, vol. 11, no. 2, pp. 751–763, 2025
work page 2025
-
[24]
LLM4WM: Adapt- ing LLM for wireless multi-tasking,
X. Liu, S. Gao, B. Liu, X. Cheng, and L. Yang, “LLM4WM: Adapt- ing LLM for wireless multi-tasking,”IEEE Transactions on Machine Learning in Communications and Networking, 2025
work page 2025
-
[25]
MMSense: Adapting Vision-based Foundation Model for Multi-task Multi-modal Wireless Sensing,
Z. Li, X. Luo, X. Ge, L. Zhou, X. Lin, and Y . Liu, “MMSense: Adapting Vision-based Foundation Model for Multi-task Multi-modal Wireless Sensing,”arXiv preprint arXiv:2511.12305, 2025
-
[26]
Self-supervised learning from images with a joint-embedding predictive architecture,
M. Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rab- bat, Y . LeCun, and N. Ballas, “Self-supervised learning from images with a joint-embedding predictive architecture,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 15 619–15 629
work page 2023
-
[27]
V-jepa: Latent video prediction for visual representation learning,
A. Bardes, Q. Garrido, J. Ponce, X. Chen, M. Rabbat, Y . LeCun, M. Assran, and N. Ballas, “V-jepa: Latent video prediction for visual representation learning,” 2023
work page 2023
-
[28]
D. Chen, M. Shukor, T. Moutakanni, W. Chung, J. Yu, T. Kasarla, Y . Bang, A. Bolourchi, Y . LeCun, and P. Fung, “Vl-jepa: Joint em- bedding predictive architecture for vision-language,”arXiv preprint arXiv:2512.10942, 2025
-
[29]
WirelessJEPA: A Multi- Antenna Foundation Model using Spatio-temporal Wireless Latent Pre- dictions,
V . Chu, O. Mashaal, and H. Abou-Zeid, “WirelessJEPA: A Multi- Antenna Foundation Model using Spatio-temporal Wireless Latent Pre- dictions,”arXiv preprint arXiv:2601.20190, 2026
-
[30]
Learning latent wireless dynamics from channel state information,
C. B. Chaaya, A. M. Girgis, and M. Bennis, “Learning latent wireless dynamics from channel state information,”IEEE Wireless Communica- tions Letters, vol. 14, no. 2, pp. 489–493, 2024
work page 2024
-
[31]
Structured Latent Dynamics in Wireless CSI via Homomorphic World Models,
S. Naoumi, M. Bennis, and M. Chafii, “Structured Latent Dynamics in Wireless CSI via Homomorphic World Models,”arXiv preprint arXiv:2603.20048, 2026
-
[32]
C. Zheng, J. He, G. Cai, N. Li, M. Bennis, H. Wymeersch, and M. Debbah, “JEPA-MSAC: A Joint-Embedding Predictive Architec- ture for Multimodal Sensing-Assisted Communications,”arXiv preprint arXiv:2603.29796, 2026
-
[33]
SenseFi: A library and benchmark on deep-learning-empowered WiFi human sensing,
J. Yang, X. Chen, H. Zou, C. X. Lu, D. Wang, S. Sun, and L. Xie, “SenseFi: A library and benchmark on deep-learning-empowered WiFi human sensing,”Patterns, vol. 4, no. 3, 2023
work page 2023
-
[34]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[35]
CSI- bench: A large-scale in-the-wild dataset for multi-task WiFi sensing,
G. Zhu, Y . Hu, W. Gao, W.-H. Wang, B. Wang, and K. Liu, “CSI-Bench: A Large-Scale In-the-Wild Dataset for Multi-task WiFi Sensing,”arXiv preprint arXiv:2505.21866, 2025
-
[36]
The perceptron: a probabilistic model for information storage and organization in the brain
F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.”Psychological review, vol. 65, no. 6, p. 386, 1958
work page 1958
-
[37]
Learning repre- sentations by back-propagating errors,
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning repre- sentations by back-propagating errors,”nature, vol. 323, no. 6088, pp. 533–536, 1986
work page 1986
-
[38]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[39]
Gradient-based learning applied to document recognition,
Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998
work page 1998
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.