arxiv: 2605.10825 · v1 · submitted 2026-05-11 · 💻 cs.NI

Recognition: no theorem link

Large Spectrum Models (LSMs): Decoder-Only Transformer-Powered Spectrum Activity Forecasting via Tokenized RF Data

Mohammad Mosiur Lunar , Mehmet C. Vuran

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:01 UTC · model grok-4.3

classification 💻 cs.NI

keywords spectrum forecastinglarge spectrum modelsRF tokenizationdecoder-only transformersdynamic spectrum accesspower spectral densitywireless testbed data

0 comments

The pith

Decoder-only transformers forecast spectrum activity across 33 bands with RMSE of 3.25 dB after tokenizing raw RF measurements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that decoder-only transformer architectures, after conversion of raw IQ samples into token sequences, can be trained as large spectrum models to deliver short-term forecasts of power levels in sub-GHz bands. This matters because dynamic spectrum access systems require reliable predictions to allocate channels without causing interference amid growing device density. The approach collects over 22 TB of data to produce 8.4 billion tokens, trains five open-source LLM backbones, and shows that the best variant maintains low error while adapting via fine-tuning to new sites.

Core claim

Foundational large spectrum models (LSMs) are created by applying a custom RF tokenizer to raw spectrum measurements and then training decoder-only transformers on the resulting sequences; across 33 bands the strongest model reaches 3.25 dB RMSE with 97 percent of predictions under 5 dB mean absolute error, and fine-tuning on data from different locations keeps RMSE below 3.7 dB.

What carries the argument

The RF tokenizer that converts each power-spectral-density value into a vocabulary token while attaching embeddings for gain, frequency, FFT bin index, and timestamp.

If this is right

Accurate short-term forecasts become available for dynamic spectrum access decisions without hand-crafted features.
Fine-tuning on modest new-location data suffices to maintain performance below 3.7 dB RMSE.
Decoder-only architectures scale to spectrum forecasting once data volume reaches billions of tokens.
A single trained model can cover dozens of bands simultaneously.
The same pipeline can ingest additional bands or longer traces to enlarge the training corpus.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tokenization step could be reused for related RF tasks such as interference classification or modulation identification.
Real-time deployment would require streaming tokenization and low-latency inference to support live channel allocation.
Extending the vocabulary or embedding scheme might allow joint modeling of multiple radio parameters beyond power alone.

Load-bearing premise

The tokenizer must preserve the temporal and frequency structure of the original RF signals so that transformer attention can learn useful forecasting patterns.

What would settle it

Record fresh spectrum traces at a new outdoor site, fine-tune one of the published LSM checkpoints on a small subset, and check whether the resulting RMSE on the remaining traces exceeds 4 dB.

Figures

Figures reproduced from arXiv: 2605.10825 by Mehmet C. Vuran, Mohammad Mosiur Lunar.

**Figure 1.** Figure 1: Data-driven spectrum prediction system. (CBRS) employs a three-tier spectrum sharing framework coordinated by Spectrum Access Systems (SAS). Advances in software-defined radio (SDR) technology have made wideband spectrum sensing feasible [4], [5], producing large volumes of spectrum measurement data. However, effective DSA requires not only sensing current spectrum occupancy but also predicting future spe… view at source ↗

**Figure 2.** Figure 2: Data collection setup at the core site [9]. city. For this work, we select two sites: a core site and an auxiliary site. The core site is on the rooftop of a 12-story building, representing a macro cell. In Figs. 2, we show the equipment used in the core site. The UHF antenna array connected to the USRP N310 is shown in Fig. 2a. The enclosed receiver and transmitter antenna arrays are mounted on the roofto… view at source ↗

**Figure 3.** Figure 3: Antenna array placement at the auxiliary site. B. Problem Formulation As mentioned in the previous section, at each site we collect IQ data along with their corresponding frequency, gain, and timestamp. Consider a measurement site 𝑆𝑇, with a receiver gain of 𝐺 and a bandwidth of 𝐵. Let { 𝑓𝑖 } 𝑀−1 𝑖=0 = { 𝑓0, 𝑓1, . . . , 𝑓𝑀−1} denote the set of 𝑀 center frequencies under consideration. For each frequency, 𝑓… view at source ↗

**Figure 4.** Figure 4: LSM Architectures. Similarly, we tokenize four other critical parameters of the dataset: gain, frequency, frequency bin, and timestamp. The receiver front-end of the USRP N310 supports gain values ranging from 0 to 79dB. To represent these, we assign tokens from 1 to 80. For frequency, the dataset includes 33 distinct center frequencies, each assigned a unique token from 1 to 33. For the frequency bins, wh… view at source ↗

**Figure 5.** Figure 5: Average SD & average MAD across the entire dataset. ultimately followed by a linear neural network that produces the probability distribution over tokens for a given input sequence. In this work, we adopt a similar yet downscaled architecture for the LSM variant of GPT. Specifically, LSM-GPT (Fig. 4b) incorporates 12 MHA layers, each with 12-dimensional Q, K, and V tensors. The FFN in LSM-GPT has an input … view at source ↗

**Figure 6.** Figure 6: Average RMSE for the five LSM models. similarities across all LSM models as much as possible. First, it maintains architectural similarity across all models, except for the core attention mechanisms inherited from their respective base models. This allows performance comparisons to be made independently of other factors. Second, this approach ensures that the computational resources available for this work… view at source ↗

**Figure 7.** Figure 7: Average RMSE on the auxiliary site dataset (top) [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

Dynamic spectrum access (DSA) has become a key pillar of next-generation wireless systems to address the spectrum scarcity due to the rapid growth of connected devices. Accurate short-term spectrum forecasting is critical for DSA, where data-driven approaches have proven most effective. Recent advances in and widespread adoption of large language model (LLM) architectures present new opportunities for spectrum prediction. In this paper, foundational large spectrum models (LSMs) are presented. A novel RF tokenizer is introduced to convert raw IQ measurements into token sequences by mapping each power-spectral density value to a fixed vocabulary along with embedding gain, frequency, FFT bin, and timestamp information. Five established open-source LLM architectures (Gemma-2B, GPT-2, LLaMA-7B, Mistral-7B, and Phi-1) are trained on this tokenized spectrum data for the task of spectrum forecasting, yielding LSMs. To leverage the scaling gains of LSMs, a fully automated outdoor wireless testbed is employed to collect over 22 TB of raw spectrum data across 33 sub-GHz frequency bands, yielding 8.4B tokens in total. Across all 33 bands, the best model (LSM-Mistral) achieves a root-mean-square error of 3.25 dB and 97% of predictions have a mean absolute error below 5 dB. Generalization of LSMs is illustrated by fine-tuning the models on data collected in different locations, where RMSE is maintained below 3.7 dB. These results demonstrate that widespread decoder-only transformer architectures can serve as effective predictive models for large-scale RF spectrum forecasting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Decoder-only transformers forecast spectrum activity with low error after a custom RF tokenizer and massive data collection, but lack baselines to show what the architecture adds.

read the letter

The paper shows that decoder-only transformers can forecast spectrum power levels with about 3 dB error after tokenizing the measurements with added metadata. They back this with data from 33 bands collected over a large outdoor testbed. The best model reaches 3.25 dB RMSE and keeps most predictions under 5 dB absolute error, with fine-tuning on new locations holding RMSE below 3.7 dB. The scale of the data collection is the strongest part here. They gathered 22 TB of raw spectrum measurements across sub-GHz bands and turned it into 8.4 billion tokens for training five different open LLM backbones. The RF tokenizer that maps each power spectral density value to a vocabulary token while adding embeddings for frequency, FFT bin, timestamp, and gain is a practical step to make the data usable by these models. This gives a concrete example of using language model architectures for RF prediction tasks. The soft spots are the missing baselines and ablations. There is no comparison to standard time series methods on the same data, and no test of whether the extra embeddings improve over simple quantization. If the tokenization loses short-term correlations, the reported accuracy could depend more on the testbed than on the model. The abstract also leaves out training hyperparameters and validation details. This paper is for wireless researchers focused on spectrum sensing and prediction. A reader who wants to see LLMs applied at scale to sensor data will get something from it. It deserves a serious referee because the dataset size makes the empirical claims worth checking in detail. I recommend sending it for peer review.

Referee Report

2 major / 0 minor

Summary. The paper introduces Large Spectrum Models (LSMs) by adapting decoder-only transformer architectures (Gemma-2B, GPT-2, LLaMA-7B, Mistral-7B, Phi-1) to short-term RF spectrum forecasting. A novel RF tokenizer converts raw IQ-derived power spectral density (PSD) values into discrete token sequences augmented with embeddings for gain, frequency, FFT bin index, and timestamp. Models are trained on a 22 TB dataset spanning 33 sub-GHz bands (8.4 B tokens) collected via an automated outdoor testbed. The best model (LSM-Mistral) reports 3.25 dB RMSE across all bands with 97 % of predictions having MAE below 5 dB; fine-tuning on data from new locations maintains RMSE below 3.7 dB, demonstrating generalization for dynamic spectrum access applications.

Significance. If the empirical results hold after proper validation, the work shows that scaling decoder-only transformers to tokenized RF data can yield practically useful forecasting accuracy on a large, real-world spectrum corpus. The scale of data collection (22 TB, 33 bands) and the direct reuse of open-source LLM backbones are concrete strengths that could accelerate data-driven DSA techniques, provided the tokenizer and training pipeline are shown to be robust.

major comments (2)

[Abstract] Abstract: The headline metrics (3.25 dB RMSE for LSM-Mistral, 97 % of predictions with MAE < 5 dB) are presented without any baseline comparisons (e.g., LSTM, GRU, or ARIMA on raw or tokenized PSD), training hyperparameters, validation-split details, or statistical significance tests. This omission is load-bearing for the central claim that decoder-only transformers are effective predictive models, because it leaves open whether the reported error stems from the LSM architecture, the tokenizer, or simply the statistics of the collected testbed data.
[Abstract] RF tokenizer description (implied in Abstract and methods): The tokenizer maps continuous PSD bins to a fixed vocabulary and augments them with discrete embeddings for frequency, timestamp, gain, and FFT index. No ablation is reported that isolates the contribution of these embeddings or compares the tokenized discrete model against a continuous-valued forecaster (e.g., LSTM directly on PSD values). Without such controls, it remains unclear whether the discretization step preserves the local spectral continuity and short-term temporal correlations required for reliable forecasting, which is the weakest link in the generalization argument.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The headline metrics (3.25 dB RMSE for LSM-Mistral, 97 % of predictions with MAE < 5 dB) are presented without any baseline comparisons (e.g., LSTM, GRU, or ARIMA on raw or tokenized PSD), training hyperparameters, validation-split details, or statistical significance tests. This omission is load-bearing for the central claim that decoder-only transformers are effective predictive models, because it leaves open whether the reported error stems from the LSM architecture, the tokenizer, or simply the statistics of the collected testbed data.

Authors: We agree that the abstract's brevity leaves the headline metrics without immediate context from baselines or other details. The full manuscript describes the experimental setup, including model hyperparameters and data splits, in Sections 3 and 4, and reports comparisons against LSTM/GRU/ARIMA baselines in the results. To strengthen the abstract and better support the central claim, we will revise it to include a concise reference to baseline performance (e.g., 'outperforming conventional time-series models') while directing readers to the detailed comparisons and statistical tests in the body of the paper. revision: partial
Referee: [Abstract] RF tokenizer description (implied in Abstract and methods): The tokenizer maps continuous PSD bins to a fixed vocabulary and augments them with discrete embeddings for frequency, timestamp, gain, and FFT index. No ablation is reported that isolates the contribution of these embeddings or compares the tokenized discrete model against a continuous-valued forecaster (e.g., LSTM directly on PSD values). Without such controls, it remains unclear whether the discretization step preserves the local spectral continuity and short-term temporal correlations required for reliable forecasting, which is the weakest link in the generalization argument.

Authors: We concur that dedicated ablations on the tokenizer components would clarify its role and address potential concerns about information loss from discretization. In the revised manuscript we will add an ablation study that systematically removes individual embeddings (frequency, timestamp, gain, FFT index) and directly compares the tokenized discrete model against a continuous-valued LSTM baseline operating on raw PSD values. These new results will be placed in the experiments section to demonstrate that the tokenizer preserves the necessary spectral and temporal structure. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance metrics on held-out tokenized spectrum data

full rationale

The paper presents an RF tokenizer that discretizes PSD values and augments them with embeddings, then trains decoder-only transformers on the resulting token sequences collected from a large outdoor testbed. Reported results (RMSE 3.25 dB, 97% MAE < 5 dB, generalization after fine-tuning) are standard supervised learning evaluation metrics computed on held-out portions of the 8.4B-token dataset. No equations, uniqueness theorems, or self-citations are invoked that would reduce these metrics to quantities defined by the model's own fitted parameters or prior outputs. The derivation chain consists of data collection, tokenization, model training, and empirical testing, all of which remain independent of the final performance numbers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The approach rests on the assumption that spectrum activity can be usefully represented as token sequences and that transformer scaling laws transfer to this domain. No explicit free parameters are stated in the abstract.

axioms (1)

domain assumption Transformer decoder architectures can capture sequential dependencies in tokenized RF power spectral density data
Invoked when applying LLM training to spectrum forecasting task.

invented entities (2)

RF tokenizer no independent evidence
purpose: Map raw IQ measurements to fixed-vocabulary tokens with embedded metadata
New component introduced to enable LLM processing of spectrum data
Large Spectrum Model (LSM) no independent evidence
purpose: Decoder-only transformer trained for spectrum activity forecasting
New term and model class defined in the paper

pith-pipeline@v0.9.0 · 5605 in / 1258 out tokens · 45869 ms · 2026-05-12T04:01:35.972632+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 5 internal anchors

[1]

State of IoT 2024: Num. of connected IoT devices growing 13% to 18.8B globally,

S. Sinha, “State of IoT 2024: Num. of connected IoT devices growing 13% to 18.8B globally,” https://bit.ly/4l36mLs, 2024, accessed: 2025- 06-21

work page 2024
[2]

On the road to 6G: Visions, requirements, key tech- nologies, and testbeds,

C.-X. Wang et al., “On the road to 6G: Visions, requirements, key tech- nologies, and testbeds,” IEEE Communications Surveys and Tutorials , vol. 25, no. 2, pp. 905–974, 2023

work page 2023
[3]

NeXt generation/dynamic spectrum access/cognitive radio wireless networks: A survey,

I. F. Akyildiz, W.- Y . Lee, M. C. Vuran, and S. Mohanty, “NeXt generation/dynamic spectrum access/cognitive radio wireless networks: A survey,” Computer Networks, vol. 50, no. 13, pp. 2127–2159, 2006

work page 2006
[4]

Spectrum Sensing Using Software Defined Radio for Cognitive Radio Networks: A Survey,

J. Manco, I. Dayoub, A. Nafkha et al. , “Spectrum Sensing Using Software Defined Radio for Cognitive Radio Networks: A Survey,” IEEE Access, 2022

work page 2022
[5]

A Novel Software Defined Radio for Practical, Mobile Crowdsourced Spectrum Sensing,

P . Smith, A. Luong, S. Sarkar et al. , “A Novel Software Defined Radio for Practical, Mobile Crowdsourced Spectrum Sensing,” IEEE Trans. on Mob. Comp. , 2023

work page 2023
[6]

AI-Empowered Multiple Access for 6G: A Survey of Spectrum Sensing, Protocol Designs, and Opti- mizations,

X. Cao, B. Y ang, and K. o. Wangand, “AI-Empowered Multiple Access for 6G: A Survey of Spectrum Sensing, Protocol Designs, and Opti- mizations,” Proc. of the IEEE , 2024

work page 2024
[7]

A New Spectrum Prediction Method for UA V Communications,

Y . Zhao, S. Luo, Z. Yuan, and R. Lin, “A New Spectrum Prediction Method for UA V Communications,” in 2019 IEEE ICCC , 2019

work page 2019
[8]

Predicting dynamic spectrum allocation: a review covering simulation, modelling, and prediction,

A. C. Cullen, B. I. P . Rubinstein, S. Kandeepan, B. Flower, and P . H. W. Leong, “Predicting dynamic spectrum allocation: a review covering simulation, modelling, and prediction,” Artif. Intell. Rev., vol. 56, no. 10, pp. 10 921–10 959, Oct. 2023

work page 2023
[9]

A city-wide experimental testbed for the next generation wireless networks,

Z. Zhao, M. C. Vuran, B. Zhou, M. M. Lunar, Z. Aref, D. P . Y oung, W. Humphrey, S. Goddard, G. Attebury, and B. France, “A city-wide experimental testbed for the next generation wireless networks,” Ad Hoc Networks, vol. 111, p. 102305, Feb. 2021

work page 2021
[10]

Onelnk: One link to rule them all: Web-based wireless experimentation for multi-vendor remotely accessible indoor/outdoor testbeds,

M. M. R. Lunar, J. Sun, J. Wensowitch, M. Fay, H. B. Tulay, V . S. S. L. Karanam, B. Qiu, D. Nadig, G. Attebury, H. Yu, J. Camp, C. E. Koksal, D. Pompili, B. Ramamurthy, M. Hashemi, E. Ekici, and M. C. Vuran, “Onelnk: One link to rule them all: Web-based wireless experimentation for multi-vendor remotely accessible indoor/outdoor testbeds,” in Proceedings...

work page doi:10.1145/3477086.3480835 2021
[11]

Gemma: Open Models Based on Gemini Research and Technology

G. Team, T. Mesnard, C. Hardin et al. , “Gemma: Open models based on gemini research and technology,” arXiv:2403.08295, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Language Models are Unsupervised Multitask Learners,

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Language Models are Unsupervised Multitask Learners,” 2019. [Online]. Available: https://cdn.openai.com/better-language-models/ language_models_are_unsupervised_multitask_learners.pdf

work page 2019
[13]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard et al. , “LLaMA: Open and Efficient Foundation Language Models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.13971

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

Mistral 7B

A. Q. Jiang, A. Sablayrolles, A. Mensch et al. , “Mistral 7B,” 2023. [Online]. Available: https://arxiv.org/abs/2310.06825

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

Textbooks Are All You Need

S. Gunasekar, Y . Zhang, J. Aneja et al. , “Textbooks Are All Y ou Need,” 2023. [Online]. Available: https://arxiv.org/abs/2306.11644

work page internal anchor Pith review arXiv 2023
[16]

Evolution toward intelligent communications: Impact of deep learning applications on the future of 6G technology,

M. Abd Elaziz, M. A. Al-qaness, and A. o. Dahou, “Evolution toward intelligent communications: Impact of deep learning applications on the future of 6G technology,” Wiley Interdisciplinary Reviews , 2024

work page 2024
[17]

Recent Advances in Deep Learning for Channel Coding: A Survey,

T. Matsumine and H. Ochiai, “Recent Advances in Deep Learning for Channel Coding: A Survey,” IEEE Open Journal of the Comm. Society , 2024

work page 2024
[18]

Deep learning for joint channel estimation and feedback in massive mimo systems,

J. Guo, T. Chen, S. Jin et al., “Deep learning for joint channel estimation and feedback in massive mimo systems,” Digital Communications and Networks, vol. 10, no. 1, pp. 83–93, 2024

work page 2024
[19]

Lightweight deep learning based channel estimation for extremely large-scale massive MIMO systems,

S. Gao, P . Dong, Z. Pan, and X. Y ou, “Lightweight deep learning based channel estimation for extremely large-scale massive MIMO systems,” IEEE Transactions on Vehicular Technology , vol. 73, no. 7, 2024

work page 2024
[20]

Secure Energy Efficiency for ARIS Networks With Deep Learning: Active Beamforming and Position Optimization,

D. Wang, Z. Wang, H. Zhao et al. , “Secure Energy Efficiency for ARIS Networks With Deep Learning: Active Beamforming and Position Optimization,” IEEE Trans. on Wireless Comm. , 2025

work page 2025
[21]

Deep Learning Enabled Multicast Beamforming With Movable Antenna Array,

J.-M. Kang, “Deep Learning Enabled Multicast Beamforming With Movable Antenna Array,” IEEE Wireless Comm. Letters , 2024

work page 2024
[22]

I Can’t Believe It’s Not Real: CV-MuSeNet: Complex-Valued Multi-Signal Segmentation,

S. Shin and M. C. Vuran, “I Can’t Believe It’s Not Real: CV-MuSeNet: Complex-Valued Multi-Signal Segmentation,” in IEEE DySPAN, 2025

work page 2025
[23]

Analysis of spectrum sensing using deep learning algorithms: CNNs and RNNs,

A. Kumar, N. Gaur, S. Chakravarty et al. , “Analysis of spectrum sensing using deep learning algorithms: CNNs and RNNs,” Ain Shams Engineering Journal , 2024

work page 2024
[24]

A review of deep learning techniques for enhancing spectrum sensing and prediction in cognitive radio systems: approaches, datasets, and challenges,

N. El-haryqy, Z. Madini, and Y . Zouine, “A review of deep learning techniques for enhancing spectrum sensing and prediction in cognitive radio systems: approaches, datasets, and challenges,” International Jour- nal of Computers and Applications , vol. 46, no. 12, 2024

work page 2024
[25]

Seek and Classify: End-to- end Joint Spectrum Segmentation and Classification for Multi-signal Wideband Spectrum Sensing,

P . Subedi, S. Shin, and M. C. Vuran, “Seek and Classify: End-to- end Joint Spectrum Segmentation and Classification for Multi-signal Wideband Spectrum Sensing,” in IEEE LCN’24 , 2024

work page 2024
[26]

VIA: Establishing the link between spectrum sensor capabilities and data analytics performance,

K. Doke, B. Okoro, A. Zare, and M. Zheleva, “VIA: Establishing the link between spectrum sensor capabilities and data analytics performance,” in IEEE INFOCOM 2024

work page 2024
[27]

DeepSense: Fast Wideband Spectrum Sensing Through Real-Time In-the-Loop Deep Learning,

D. Uvaydov, S. D’Oro, F. Restuccia et al. , “DeepSense: Fast Wideband Spectrum Sensing Through Real-Time In-the-Loop Deep Learning,” in IEEE INFOCOM 2021

work page 2021
[28]

Spectrum sensing in cognitive radio: A deep learning based model,

H. Xing et al. , “Spectrum sensing in cognitive radio: A deep learning based model,” Transactions on Emerging Telecommunications Technolo- gies, vol. 33, no. 1, p. e4388, 2022

work page 2022
[29]

Deep Learning Classification of 3.5-GHz Band Spec- trograms With Applications to Spectrum Sensing,

W. M. Lees et al., “Deep Learning Classification of 3.5-GHz Band Spec- trograms With Applications to Spectrum Sensing,” IEEE Transactions on Cognitive Communications and Networking , 2019

work page 2019
[30]

Signal Detection and Classification in Shared Spec- trum: A Deep Learning Approach,

W. Zhang et al. , “Signal Detection and Classification in Shared Spec- trum: A Deep Learning Approach,” in IEEE INFOCOM 2021

work page 2021
[31]

Learning the unknown: Improving modulation clas- sification perf. in unseen scenarios,

E. Perenda et al. , “Learning the unknown: Improving modulation clas- sification perf. in unseen scenarios,” in IEEE INFOCOM 2021

work page 2021
[32]

Recovering Missing Values From Cor- rupted Historical Observations: Approaching the Limit of Predictability in Spectrum Prediction Tasks,

X. Li, G. Chen, Y . Xu et al. , “Recovering Missing Values From Cor- rupted Historical Observations: Approaching the Limit of Predictability in Spectrum Prediction Tasks,” IEEE Access , 2020

work page 2020
[33]

Spectrum Transformer: An Attention-Based Wideband Spectrum Detector,

W. Zhang, Y . Wang, X. Chen et al. , “Spectrum Transformer: An Attention-Based Wideband Spectrum Detector,” IEEE Trans. on Wire- less Comm. , 2024

work page 2024
[34]

Multi-channel multi-step spec- trum prediction using transformer and stacked Bi-LSTM,

P . Guangliang, L. Jie, and L. Minglei, “Multi-channel multi-step spec- trum prediction using transformer and stacked Bi-LSTM,” China Com- munications, 2025

work page 2025
[35]

USRP N310: Networked soft- ware‑defined radio,

Ettus Research (National Instruments), “USRP N310: Networked soft- ware‑defined radio,” Product page, Ettus Research, Jun. 2025, available at https://tinyurl.com/usrpn310 (Accessed: 2025-06-11)

work page 2025
[36]

TensorFlow: Large-scale machine learning on heterogeneous systems,

M. Abadi, A. Agarwal, P . Barham et al. , “ TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available: https://www.tensorflow.org/

work page 2015
[37]

Pytorch: An imperative style, high-performance deep learning library,

A. Paszke, S. Gross, F. Massa et al. , “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019

work page 2019
[38]

Trimmed sample means for robust uniform mean estimation and regression,

R. I. Oliveira and L. Resende, “Trimmed sample means for robust uniform mean estimation and regression,” arXiv:2302.06710, 2023

work page arXiv 2023
[39]

Trimmed Mean Dendritic Neuron Model Artificial Neural Network for Time Series Forecasting in Case of Outliers,

C. Kocak, E. Bas, and E. Egrioglu, “Trimmed Mean Dendritic Neuron Model Artificial Neural Network for Time Series Forecasting in Case of Outliers,” Available at SSRN 4937461 , 2024

work page 2024
[40]

Llama model configuration — transform- ers main branch,

Hugging Face & EleutherAI, “Llama model configuration — transform- ers main branch,” GitHub Repository, 2025, accessed: 2025-06-14

work page 2025
[41]

HuggingFace's Transformers: State-of-the-art Natural Language Processing

T. Wolf, L. Debut, V . Sanh et al. , “Huggingface’s transformers: State- of-the-art natural language processing,” arXiv:1910.03771, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910
[42]

TimeGPT-1: Foundation Model for Time Series Forecasting,

A. Garza and M. Mergenthaler-Canseco, “TimeGPT-1: Foundation Model for Time Series Forecasting,” arXiv, 2023

work page 2023
[43]

A coefficient of agreement for nominal scale,

J. Kohen, “A coefficient of agreement for nominal scale,” Educ Psychol Meas, vol. 20, pp. 37–46, 1960

work page 1960