Recognition: unknown
LLM-AUG: Robust Wireless Data Augmentation with In-Context Learning in Large Language Models
Pith reviewed 2026-05-10 04:35 UTC · model grok-4.3
The pith
Large language models generate useful synthetic wireless data samples via in-context learning without any additional training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the LLM-AUG framework uses in-context learning in LLMs to generate synthetic samples directly in the learned embedding space. This leads to better performance than traditional and deep generative baselines on modulation and interference classification tasks, reaching near oracle levels with only 15% labeled data and providing relative gains of 67.6% and 35.7% over diffusion baselines on the RadioML and IC datasets along with a 29.4% gain under low SNR distribution shifts.
What carries the argument
The central mechanism is structured prompting of off-the-shelf LLMs to synthesize embedding-space samples that preserve class structure for downstream wireless classifiers.
Load-bearing premise
That prompts can direct an unmodified LLM to output synthetic points in the embedding space that align with the real data's class distributions.
What would settle it
If classifiers using the augmented dataset show no accuracy improvement over those using only the 15% real data in the low-shot or shifted SNR scenarios, the augmentation approach would be falsified.
Figures
read the original abstract
Data scarcity remains a fundamental bottleneck in applying deep learning to wireless communication problems, particularly in scenarios where collecting labeled Radio Frequency (RF) data is expensive, time-consuming, or operationally constrained. This paper proposes LLM-AUG, a data augmentation framework that leverages in-context learning in large language models (LLMs) to generate synthetic training samples directly in a learned embedding space. Unlike conventional generative approaches that require training task-specific models, LLM-AUG performs data generation through structured prompting, enabling rapid adaptation in low-shot regimes. We evaluate LLM-AUG on two representative tasks: modulation classification and interference classification using the RadioML 2016.10A dataset, and the Interference Classification (IC) dataset respectively. Results show that LLM-AUG consistently outperforms traditional augmentation and deep generative baselines across low-shot settings and reaches near oracle performance using only 15% labeled data. LLM-AUG further demonstrates improved robustness under distribution shifts, yielding a 29.4% relative gain over diffusion-based augmentation at a lower SNR value. On the RadioML and IC datasets, LLM-AUG yields a relative gain of 67.6% and 35.7% over the diffusion-based baseline. The t-SNE visualizations further validate that synthetic samples generated by better preserve class structure in the embedding space, leading to more consistent and informative augmentations. These results demonstrate that LLMs can serve as effective and practical data augmenters for wireless machine learning, enabling robust and data-efficient learning in evolving wireless environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes LLM-AUG, a data augmentation framework that uses in-context learning in off-the-shelf large language models to generate synthetic samples directly in a learned embedding space for wireless RF tasks. It evaluates the approach on modulation classification using the RadioML 2016.10A dataset and interference classification on the IC dataset, claiming consistent outperformance over traditional augmentation and deep generative (diffusion) baselines in low-shot regimes, near-oracle performance with only 15% labeled data, relative gains of 67.6% and 35.7% over diffusion baselines, and a 29.4% robustness gain under distribution shifts at lower SNR. t-SNE visualizations are cited as evidence that the synthetic samples preserve class structure.
Significance. If the empirical claims hold after detailed verification and reproduction, LLM-AUG would offer a training-free augmentation technique that leverages general-purpose LLMs for data-scarce wireless ML problems, potentially lowering barriers to applying deep learning in RF domains where labeled data collection is costly. The reported gains in low-shot accuracy and robustness under shifts would indicate practical utility beyond existing generative baselines.
major comments (3)
- [§3] §3 (Method): The pipeline for serializing real embeddings into text (e.g., comma-separated values or tokens), constructing the in-context prompt templates, and parsing LLM token outputs back into continuous vectors is not described. This is load-bearing for the central claim, as the skeptic concern correctly notes that nothing establishes the LLM respects the geometry or support of the embedding manifold; generated points could be biased interpolations or extrapolations.
- [§5] §5 (Experiments/Results): The reported quantitative gains (67.6%, 35.7%, 29.4% relative improvements, near-oracle at 15% data) are presented without error bars, number of independent runs, standard deviations, or statistical significance tests, and without details on baseline implementations or prompt engineering choices. This undermines confidence in the low-shot and robustness claims.
- [§6] §6 (Visualization/Validation): The t-SNE plots are described as validating preserved class structure, but no quantitative metrics (e.g., class-conditional distances, nearest-neighbor consistency, or distance to real data manifold) accompany them. Visual inspection alone does not rule out the possibility that gains arise from noisy or off-manifold samples.
minor comments (1)
- [Abstract] Abstract: The sentence 'synthetic samples generated by better preserve class structure' is grammatically incomplete and appears to contain a missing phrase or typo.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback, which has helped us identify areas for improvement in clarity and rigor. We address each major comment point by point below. We have revised the manuscript to incorporate additional details, experiments, and metrics as outlined in our responses.
read point-by-point responses
-
Referee: [§3] §3 (Method): The pipeline for serializing real embeddings into text (e.g., comma-separated values or tokens), constructing the in-context prompt templates, and parsing LLM token outputs back into continuous vectors is not described. This is load-bearing for the central claim, as the skeptic concern correctly notes that nothing establishes the LLM respects the geometry or support of the embedding manifold; generated points could be biased interpolations or extrapolations.
Authors: We agree that the serialization, prompting, and parsing pipeline requires explicit description to support reproducibility and address concerns about manifold geometry. In the revised manuscript, we will add a dedicated subsection in §3 (with an accompanying appendix) that details: (1) the exact serialization format (embeddings converted to comma-separated floating-point values with fixed precision and normalization); (2) the full in-context prompt template structure, including system instructions, example formatting, and few-shot selection criteria; and (3) the parsing procedure (extracting numerical tokens from LLM output and reconstructing vectors via string-to-float conversion with error handling). On the geometry concern, we note that our method is empirical and relies on the LLM learning distributional patterns from provided examples rather than explicit manifold constraints; we will expand the discussion to acknowledge this limitation while emphasizing that downstream task performance and class-structure preservation provide indirect validation. These changes will be made. revision: yes
-
Referee: [§5] §5 (Experiments/Results): The reported quantitative gains (67.6%, 35.7%, 29.4% relative improvements, near-oracle at 15% data) are presented without error bars, number of independent runs, standard deviations, or statistical significance tests, and without details on baseline implementations or prompt engineering choices. This undermines confidence in the low-shot and robustness claims.
Authors: We acknowledge that the absence of variability measures and implementation details weakens the presentation of results. In the revised version, we will: (1) report all key metrics with mean and standard deviation computed over at least five independent random seeds/runs; (2) add error bars to all figures and tables; (3) include a new paragraph detailing baseline implementations (e.g., diffusion model architectures, training hyperparameters, and data preprocessing); (4) specify prompt engineering choices (template variations tested and final selection criteria); and (5) perform and report paired t-tests or Wilcoxon tests for statistical significance of the reported gains. These additions will be incorporated into §5 and the experimental setup section. revision: yes
-
Referee: [§6] §6 (Visualization/Validation): The t-SNE plots are described as validating preserved class structure, but no quantitative metrics (e.g., class-conditional distances, nearest-neighbor consistency, or distance to real data manifold) accompany them. Visual inspection alone does not rule out the possibility that gains arise from noisy or off-manifold samples.
Authors: We agree that relying solely on qualitative t-SNE visualizations is insufficient for rigorous validation. In the revised manuscript, we will augment §6 with quantitative metrics computed on the embedding space: (1) class-conditional mean Euclidean distances between synthetic and real samples; (2) nearest-neighbor label consistency (fraction of synthetic samples whose k-NN in the combined real+synthetic set belong to the same class); and (3) a manifold proximity measure (average distance to the k-nearest real neighbors, compared against a null model of random points). These will be reported in a new table alongside the existing visualizations. We maintain that the primary evidence remains the downstream classification accuracy improvements, but these metrics will provide stronger support against concerns of off-manifold generation. revision: yes
Circularity Check
No derivation chain; purely empirical evaluation
full rationale
The paper describes an LLM-based data augmentation method via structured in-context prompting and reports empirical accuracy and robustness gains on RadioML and IC datasets against baselines. No equations, parameter fits, uniqueness theorems, or derivation steps are present. All claims reduce to experimental comparisons (e.g., 15% labeled data reaching near-oracle performance, 29.4% relative gain under distribution shift) rather than any mathematical reduction to inputs by construction. Self-citations, if any, are not load-bearing for the central results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs with structured prompting can generate synthetic samples that preserve class structure in a learned embedding space for RF signals
Reference graph
Works this paper leans on
-
[1]
Artificial neural networks-based machine learning for wireless networks: A tutorial,
M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, “Artificial neural networks-based machine learning for wireless networks: A tutorial,” IEEE Communications Surveys & Tutorials , vol. 21, no. 4, pp. 3039– 3071, 2019
2019
-
[2]
Automatic modulation classification: A deep architecture survey,
T. Huynh-The, Q.-V . Pham, T.-V . Nguyen, T. T. Nguyen, R. Ruby, M. Zeng, and D.-S. Kim, “Automatic modulation classification: A deep architecture survey,” IEEE Access , vol. 9, pp. 142950–142971, 2021
2021
-
[3]
Learning from the best: Active learning for wireless communications,
N. Soltani, J. Zhang, B. Salehi, D. Roy, R. Nowak, and K. Chowdhury, “Learning from the best: Active learning for wireless communications,” IEEE Wireless Communications , vol. 31, no. 4, pp. 177–183, 2024
2024
-
[4]
Unsupervised learning for human sensing using radio signals,
T. Li, L. Fan, Y . Y uan, and D. Katabi, “Unsupervised learning for human sensing using radio signals,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pp. 3288–3297, 2022
2022
-
[5]
Radio machine learning dataset generation with gnu radio,
T. O’Shea and N. West, “Radio machine learning dataset generation with gnu radio,” in Proceedings of the 6th GNU Radio Conference ,
-
[6]
Virginia Tech and Oklahoma State University
-
[7]
A survey on generative diffusion model,
H. Cao, C. Tan, Z. Gao, Y . Xu, G. Chen, P .-A. Heng, and S. Z. Li, “A survey on generative diffusion model,” 2023. LLM-AUG 10
2023
-
[8]
System-level analysis of adversarial attacks and defenses on intelligence in o-ran based cellular networks,
A. Chiejina, B. Kim, K. Chowhdury, and V . K. Shah, “System-level analysis of adversarial attacks and defenses on intelligence in o-ran based cellular networks,” in Proceedings of the 17th ACM Conference on Security and Privacy in Wireless and Mobile Networks , pp. 237–247, 2024
2024
-
[9]
Cnn-based spectrum sens- ing method for low probability of detection communication systems,
J.-H. Lee, S.-Y . Jeon, and E.-R. Jeong, “Cnn-based spectrum sens- ing method for low probability of detection communication systems,” Tehniki glasnik, vol. 19, pp. 581–586, 09 2025
2025
-
[10]
Automatic modulation classification with deep neural networks,
C. A. Harper, M. A. Thornton, and E. C. Larson, “Automatic modulation classification with deep neural networks,” Electronics, vol. 12, no. 18, 2023
2023
-
[11]
Aimc-spec: A benchmark dataset for automatic intrapulse modulation classification under variable noise conditions,
S. L. Cocks, S. Dreo, and F. Dayoub, “Aimc-spec: A benchmark dataset for automatic intrapulse modulation classification under variable noise conditions,” IEEE Access , vol. 13, pp. 214067–214078, 2025
2025
-
[12]
Language mod- els are few-shot learners,
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P . Dhariwal, A. Neelakantan, P . Shyam, G. Sastry, A. Askell, et al., “Language mod- els are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020
1901
-
[13]
Emergent Abilities of Large Language Models
J. Wei, Y . Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Y ogatama, M. Bosma, D. Zhou, D. Metzler, et al., “Emergent abilities of large language models,” arXiv preprint arXiv:2206.07682 , 2022
work page internal anchor Pith review arXiv 2022
-
[14]
Assessing deep generative models on time series network data,
M. H. Naveed, U. S. Hashmi, N. Tajved, N. Sultan, and A. Imran, “Assessing deep generative models on time series network data,” IEEE Access, vol. 10, pp. 64601–64617, 2022
2022
-
[15]
Influence of autoencoder-based data augmentation on deep learning-based wireless communication,
L. Li, Z. Zhang, and L. Y ang, “Influence of autoencoder-based data augmentation on deep learning-based wireless communication,” IEEE Wireless Communications Letters , vol. 10, no. 9, pp. 2090–2093, 2021
2090
-
[16]
Data augmentation with conditional gan for automatic modulation classification,
M. Patel, X. Wang, and S. Mao, “Data augmentation with conditional gan for automatic modulation classification,” in Proceedings of the 2nd ACM Workshop on Wireless Security and Machine Learning , WiseML ’20, (New Y ork, NY , USA), p. 3136, Association for Computing Machinery, 2020
2020
-
[17]
5gt-gan: Enhancing data augmentation for 5g-enabled mobile edge computing in smart cities,
C. Pandey, V . Tiwari, A. L. Imoize, C.-T. Li, C.-C. Lee, and D. S. Roy, “5gt-gan: Enhancing data augmentation for 5g-enabled mobile edge computing in smart cities,” IEEE Access , vol. 11, pp. 120983–120996, 2023
2023
-
[18]
Fair benchmarking of emerging one-step generative models against multistep diffusion and flow models,
A. Ravishankar, S. Liu, M. Wang, T. Zhou, J. Zhou, A. Sharma, Z. Hu, L. Das, A. Sobirov, F. Siddique, F. Y u, S. Baek, Y . Luo, and M. Wang, “Fair benchmarking of emerging one-step generative models against multistep diffusion and flow models,” 2026
2026
-
[19]
Diffusion models in vision: A survey,
F.-A. Croitoru, V . Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 45, no. 9, pp. 10850–10869, 2023
2023
-
[20]
A survey on generative diffusion models,
H. Cao, C. Tan, Z. Gao, Y . Xu, G. Chen, P .-A. Heng, and S. Z. Li, “A survey on generative diffusion models,” IEEE Transactions on Knowledge and Data Engineering , vol. 36, no. 7, pp. 2814–2830, 2024
2024
-
[21]
Diffusion model empowered data augmentation for automatic modulation recognition,
M. Li, P . Wang, Y . Dong, and Z. Wang, “Diffusion model empowered data augmentation for automatic modulation recognition,” IEEE Wireless Communications Letters , vol. 14, no. 4, pp. 1224–1228, 2025
2025
-
[22]
Leaf: Navigating concept drift in cellular networks,
S. Liu, F. Bronzino, P . Schmitt, A. N. Bhagoji, N. Feamster, H. G. Crespo, T. Coyle, and B. Ward, “Leaf: Navigating concept drift in cellular networks,” Proceedings of the ACM on Networking , vol. 1, no. CoNEXT2, pp. 1–24, 2023
2023
-
[23]
Generative-ai for ai/ml model adaptive retraining in beyond 5g networks,
V . Gudepu, B. Chirumamilla, V . R. Chintapalli, P . Castoldi, L. V al- carenghi, B. R. Tamma, and K. Kondepu, “Generative-ai for ai/ml model adaptive retraining in beyond 5g networks,” arXiv preprint arXiv:2408.14827, 2024
-
[24]
Preserving data privacy for ml-driven applications in open radio access networks,
P . Gajjar, A. Chiejina, and V . K. Shah, “Preserving data privacy for ml-driven applications in open radio access networks,” in 2024 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), pp. 339–346, IEEE, 2024
2024
-
[25]
A survey of dimen- sionality reduction techniques,
C. O. S. Sorzano, J. V argas, and A. P . Montano, “A survey of dimen- sionality reduction techniques,” 2014
2014
-
[26]
Gemma 3 technical report,
G. Team, A. Kamath, J. Ferret, S. Pathak, N. Vieillard, R. Merhej, S. Per- rin, T. Matejovicova, A. Ramé, M. Rivière, L. Rouillard, T. Mesnard, G. Cideron, J. bastien Grill, S. Ramos, E. Yvinec, M. Casbon, E. Pot, I. Penchev, G. Liu, F. Visin, K. Kenealy, L. Beyer, X. Zhai, A. Tsitsulin, R. Busa-Fekete, A. Feng, N. Sachdeva, B. Coleman, Y . Gao, B. Mustaf...
2025
-
[27]
A. Singh, A. Fry, A. Perelman, A. Tart, A. Ganesh, A. El-Kishky, A. McLaughlin, A. Low, A. Ostrow, A. Ananthram, et al. , “Openai gpt-5 system card,” arXiv preprint arXiv:2601.03267 , 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[28]
Introducing gpt-5.2,
O. AI, “Introducing gpt-5.2,” 12 2025
2025
-
[29]
Lost in the middle: How language models use long contexts,
N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P . Liang, “Lost in the middle: How language models use long contexts,” Transactions of the association for computational linguistics , vol. 12, pp. 157–173, 2024
2024
-
[30]
Generative adversarial nets,
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” Advances in neural information processing systems , vol. 27, 2014
2014
-
[31]
Auto-Encoding Variational Bayes
D. P . Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114 , 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[32]
Denoising diffusion probabilistic models,
J. Ho, A. Jain, and P . Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems , vol. 33, pp. 6840– 6851, 2020. APPENDIX A. Model and Concept Drift The simulation models two phenomena across three time steps ( t0, t1, t2), with all data sampled from bivariate normal distributions. For Model Drift, Two classes...
2020
-
[33]
This setting reflects the performance achievable under strict low-data conditions and establishes a lower bound for comparison against augmentation-based methods
No Augmentation: The no-augmentation baseline serves as a reference point, where the classifier is trained using only D real samples per class without any synthetic data generation. This setting reflects the performance achievable under strict low-data conditions and establishes a lower bound for comparison against augmentation-based methods. Since no addit...
-
[34]
GAN: The GAN baseline is implemented using a condi- tional generative adversarial framework comprising a Condi- tionalGenerator and ConditionalDiscriminator. The generator maps a latent noise vector of dimension zdim = 100, concate- nated with class-conditioning information, to synthetic spec- trogram samples, while the discriminator learns to distinguish...
-
[35]
Generate embeddings that are consistent with the distribution of the provided real samples
-
[36]
Maintain class-specific characteristics and avoid drifting into other class regions
-
[37]
Introduce controlled variation to improve diversity without adding excessive noise
-
[38]
Ensure generated samples lie close to the intrinsic class manifold in the embedding space
-
[39]
Avoid generating outliers/unrealistic samples
-
[40]
Generate only the embedding, without any additional explanation or formatting
Focus on preserving relationships between real samples. Generate only the embedding, without any additional explanation or formatting. User Prompt Given a set of real embeddings for a modulation class {class_name}: REAL EMBEDDINGS: {real_embedding_samples} Generate one synthetic embedding vectors that: • follow the same distribution • preserve class struc...
-
[41]
VAE: The V AE baseline uses a conditional variational autoencoder with a 4-layer encoder and decoder architecture. The encoder maps input spectrograms into a latent distribution characterized by a mean and variance, while the decoder reconstructs samples conditioned on both the latent variables and class labels. The model is trained using the evidence low...
-
[42]
After training, D synthetic samples per class are generated by sampling from the learned latent distribution and decoding them into spectrogram space yielding a total of 2D samples per class when combined with real data. While V AEs provide stable training and structured latent representations they often produce overly smooth or blurred samples due to the...
-
[43]
DDPM: The diffusion baseline is implemented using a MiniDDPM architecture with a U-Net backbone, which models the data distribution through a multi-step denoising process. The forward process gradually adds Gaussian noise to the data over T = 200 steps, following a linear noise schedule with β ranging from 1 × 10−4 to 0.02, while the reverse process learn...
-
[44]
During generation, the reverse diffusion process is used to synthesize D samples per class, resulting in 2D total samples per class when combined with real data. Diffusion models are generally more stable than GANs and provide improved sample diversity; however, they are computationally expensive and may still struggle to fully capture class-specific struc...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.