pith. machine review for the scientific record. sign in

arxiv: 2605.10825 · v1 · submitted 2026-05-11 · 💻 cs.NI

Recognition: no theorem link

Large Spectrum Models (LSMs): Decoder-Only Transformer-Powered Spectrum Activity Forecasting via Tokenized RF Data

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:01 UTC · model grok-4.3

classification 💻 cs.NI
keywords spectrum forecastinglarge spectrum modelsRF tokenizationdecoder-only transformersdynamic spectrum accesspower spectral densitywireless testbed data
0
0 comments X

The pith

Decoder-only transformers forecast spectrum activity across 33 bands with RMSE of 3.25 dB after tokenizing raw RF measurements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that decoder-only transformer architectures, after conversion of raw IQ samples into token sequences, can be trained as large spectrum models to deliver short-term forecasts of power levels in sub-GHz bands. This matters because dynamic spectrum access systems require reliable predictions to allocate channels without causing interference amid growing device density. The approach collects over 22 TB of data to produce 8.4 billion tokens, trains five open-source LLM backbones, and shows that the best variant maintains low error while adapting via fine-tuning to new sites.

Core claim

Foundational large spectrum models (LSMs) are created by applying a custom RF tokenizer to raw spectrum measurements and then training decoder-only transformers on the resulting sequences; across 33 bands the strongest model reaches 3.25 dB RMSE with 97 percent of predictions under 5 dB mean absolute error, and fine-tuning on data from different locations keeps RMSE below 3.7 dB.

What carries the argument

The RF tokenizer that converts each power-spectral-density value into a vocabulary token while attaching embeddings for gain, frequency, FFT bin index, and timestamp.

If this is right

  • Accurate short-term forecasts become available for dynamic spectrum access decisions without hand-crafted features.
  • Fine-tuning on modest new-location data suffices to maintain performance below 3.7 dB RMSE.
  • Decoder-only architectures scale to spectrum forecasting once data volume reaches billions of tokens.
  • A single trained model can cover dozens of bands simultaneously.
  • The same pipeline can ingest additional bands or longer traces to enlarge the training corpus.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same tokenization step could be reused for related RF tasks such as interference classification or modulation identification.
  • Real-time deployment would require streaming tokenization and low-latency inference to support live channel allocation.
  • Extending the vocabulary or embedding scheme might allow joint modeling of multiple radio parameters beyond power alone.

Load-bearing premise

The tokenizer must preserve the temporal and frequency structure of the original RF signals so that transformer attention can learn useful forecasting patterns.

What would settle it

Record fresh spectrum traces at a new outdoor site, fine-tune one of the published LSM checkpoints on a small subset, and check whether the resulting RMSE on the remaining traces exceeds 4 dB.

Figures

Figures reproduced from arXiv: 2605.10825 by Mehmet C. Vuran, Mohammad Mosiur Lunar.

Figure 1
Figure 1. Figure 1: Data-driven spectrum prediction system. (CBRS) employs a three-tier spectrum sharing framework coordinated by Spectrum Access Systems (SAS). Advances in software-defined radio (SDR) technology have made wideband spectrum sensing feasible [4], [5], producing large volumes of spectrum measurement data. However, effective DSA requires not only sensing current spectrum occupancy but also predict￾ing future spe… view at source ↗
Figure 2
Figure 2. Figure 2: Data collection setup at the core site [9]. city. For this work, we select two sites: a core site and an auxiliary site. The core site is on the rooftop of a 12-story building, representing a macro cell. In Figs. 2, we show the equipment used in the core site. The UHF antenna array connected to the USRP N310 is shown in Fig. 2a. The enclosed receiver and transmitter antenna arrays are mounted on the roofto… view at source ↗
Figure 3
Figure 3. Figure 3: Antenna array placement at the auxiliary site. B. Problem Formulation As mentioned in the previous section, at each site we collect IQ data along with their corresponding frequency, gain, and timestamp. Consider a measurement site 𝑆𝑇, with a receiver gain of 𝐺 and a bandwidth of 𝐵. Let { 𝑓𝑖 } 𝑀−1 𝑖=0 = { 𝑓0, 𝑓1, . . . , 𝑓𝑀−1} denote the set of 𝑀 center frequencies under consideration. For each frequency, 𝑓… view at source ↗
Figure 4
Figure 4. Figure 4: LSM Architectures. Similarly, we tokenize four other critical parameters of the dataset: gain, frequency, frequency bin, and timestamp. The receiver front-end of the USRP N310 supports gain values ranging from 0 to 79dB. To represent these, we assign tokens from 1 to 80. For frequency, the dataset includes 33 distinct center frequencies, each assigned a unique token from 1 to 33. For the frequency bins, wh… view at source ↗
Figure 5
Figure 5. Figure 5: Average SD & average MAD across the entire dataset. ultimately followed by a linear neural network that produces the probability distribution over tokens for a given input sequence. In this work, we adopt a similar yet downscaled architecture for the LSM variant of GPT. Specifically, LSM-GPT (Fig. 4b) incorporates 12 MHA layers, each with 12-dimensional Q, K, and V tensors. The FFN in LSM-GPT has an input … view at source ↗
Figure 6
Figure 6. Figure 6: Average RMSE for the five LSM models. similarities across all LSM models as much as possible. First, it maintains architectural similarity across all models, except for the core attention mechanisms inherited from their respective base models. This allows performance comparisons to be made independently of other factors. Second, this approach ensures that the computational resources available for this work… view at source ↗
Figure 7
Figure 7. Figure 7: Average RMSE on the auxiliary site dataset (top) [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
read the original abstract

Dynamic spectrum access (DSA) has become a key pillar of next-generation wireless systems to address the spectrum scarcity due to the rapid growth of connected devices. Accurate short-term spectrum forecasting is critical for DSA, where data-driven approaches have proven most effective. Recent advances in and widespread adoption of large language model (LLM) architectures present new opportunities for spectrum prediction. In this paper, foundational large spectrum models (LSMs) are presented. A novel RF tokenizer is introduced to convert raw IQ measurements into token sequences by mapping each power-spectral density value to a fixed vocabulary along with embedding gain, frequency, FFT bin, and timestamp information. Five established open-source LLM architectures (Gemma-2B, GPT-2, LLaMA-7B, Mistral-7B, and Phi-1) are trained on this tokenized spectrum data for the task of spectrum forecasting, yielding LSMs. To leverage the scaling gains of LSMs, a fully automated outdoor wireless testbed is employed to collect over 22 TB of raw spectrum data across 33 sub-GHz frequency bands, yielding 8.4B tokens in total. Across all 33 bands, the best model (LSM-Mistral) achieves a root-mean-square error of 3.25 dB and 97% of predictions have a mean absolute error below 5 dB. Generalization of LSMs is illustrated by fine-tuning the models on data collected in different locations, where RMSE is maintained below 3.7 dB. These results demonstrate that widespread decoder-only transformer architectures can serve as effective predictive models for large-scale RF spectrum forecasting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces Large Spectrum Models (LSMs) by adapting decoder-only transformer architectures (Gemma-2B, GPT-2, LLaMA-7B, Mistral-7B, Phi-1) to short-term RF spectrum forecasting. A novel RF tokenizer converts raw IQ-derived power spectral density (PSD) values into discrete token sequences augmented with embeddings for gain, frequency, FFT bin index, and timestamp. Models are trained on a 22 TB dataset spanning 33 sub-GHz bands (8.4 B tokens) collected via an automated outdoor testbed. The best model (LSM-Mistral) reports 3.25 dB RMSE across all bands with 97 % of predictions having MAE below 5 dB; fine-tuning on data from new locations maintains RMSE below 3.7 dB, demonstrating generalization for dynamic spectrum access applications.

Significance. If the empirical results hold after proper validation, the work shows that scaling decoder-only transformers to tokenized RF data can yield practically useful forecasting accuracy on a large, real-world spectrum corpus. The scale of data collection (22 TB, 33 bands) and the direct reuse of open-source LLM backbones are concrete strengths that could accelerate data-driven DSA techniques, provided the tokenizer and training pipeline are shown to be robust.

major comments (2)
  1. [Abstract] Abstract: The headline metrics (3.25 dB RMSE for LSM-Mistral, 97 % of predictions with MAE < 5 dB) are presented without any baseline comparisons (e.g., LSTM, GRU, or ARIMA on raw or tokenized PSD), training hyperparameters, validation-split details, or statistical significance tests. This omission is load-bearing for the central claim that decoder-only transformers are effective predictive models, because it leaves open whether the reported error stems from the LSM architecture, the tokenizer, or simply the statistics of the collected testbed data.
  2. [Abstract] RF tokenizer description (implied in Abstract and methods): The tokenizer maps continuous PSD bins to a fixed vocabulary and augments them with discrete embeddings for frequency, timestamp, gain, and FFT index. No ablation is reported that isolates the contribution of these embeddings or compares the tokenized discrete model against a continuous-valued forecaster (e.g., LSTM directly on PSD values). Without such controls, it remains unclear whether the discretization step preserves the local spectral continuity and short-term temporal correlations required for reliable forecasting, which is the weakest link in the generalization argument.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline metrics (3.25 dB RMSE for LSM-Mistral, 97 % of predictions with MAE < 5 dB) are presented without any baseline comparisons (e.g., LSTM, GRU, or ARIMA on raw or tokenized PSD), training hyperparameters, validation-split details, or statistical significance tests. This omission is load-bearing for the central claim that decoder-only transformers are effective predictive models, because it leaves open whether the reported error stems from the LSM architecture, the tokenizer, or simply the statistics of the collected testbed data.

    Authors: We agree that the abstract's brevity leaves the headline metrics without immediate context from baselines or other details. The full manuscript describes the experimental setup, including model hyperparameters and data splits, in Sections 3 and 4, and reports comparisons against LSTM/GRU/ARIMA baselines in the results. To strengthen the abstract and better support the central claim, we will revise it to include a concise reference to baseline performance (e.g., 'outperforming conventional time-series models') while directing readers to the detailed comparisons and statistical tests in the body of the paper. revision: partial

  2. Referee: [Abstract] RF tokenizer description (implied in Abstract and methods): The tokenizer maps continuous PSD bins to a fixed vocabulary and augments them with discrete embeddings for frequency, timestamp, gain, and FFT index. No ablation is reported that isolates the contribution of these embeddings or compares the tokenized discrete model against a continuous-valued forecaster (e.g., LSTM directly on PSD values). Without such controls, it remains unclear whether the discretization step preserves the local spectral continuity and short-term temporal correlations required for reliable forecasting, which is the weakest link in the generalization argument.

    Authors: We concur that dedicated ablations on the tokenizer components would clarify its role and address potential concerns about information loss from discretization. In the revised manuscript we will add an ablation study that systematically removes individual embeddings (frequency, timestamp, gain, FFT index) and directly compares the tokenized discrete model against a continuous-valued LSTM baseline operating on raw PSD values. These new results will be placed in the experiments section to demonstrate that the tokenizer preserves the necessary spectral and temporal structure. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance metrics on held-out tokenized spectrum data

full rationale

The paper presents an RF tokenizer that discretizes PSD values and augments them with embeddings, then trains decoder-only transformers on the resulting token sequences collected from a large outdoor testbed. Reported results (RMSE 3.25 dB, 97% MAE < 5 dB, generalization after fine-tuning) are standard supervised learning evaluation metrics computed on held-out portions of the 8.4B-token dataset. No equations, uniqueness theorems, or self-citations are invoked that would reduce these metrics to quantities defined by the model's own fitted parameters or prior outputs. The derivation chain consists of data collection, tokenization, model training, and empirical testing, all of which remain independent of the final performance numbers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The approach rests on the assumption that spectrum activity can be usefully represented as token sequences and that transformer scaling laws transfer to this domain. No explicit free parameters are stated in the abstract.

axioms (1)
  • domain assumption Transformer decoder architectures can capture sequential dependencies in tokenized RF power spectral density data
    Invoked when applying LLM training to spectrum forecasting task.
invented entities (2)
  • RF tokenizer no independent evidence
    purpose: Map raw IQ measurements to fixed-vocabulary tokens with embedded metadata
    New component introduced to enable LLM processing of spectrum data
  • Large Spectrum Model (LSM) no independent evidence
    purpose: Decoder-only transformer trained for spectrum activity forecasting
    New term and model class defined in the paper

pith-pipeline@v0.9.0 · 5605 in / 1258 out tokens · 45869 ms · 2026-05-12T04:01:35.972632+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 5 internal anchors

  1. [1]

    State of IoT 2024: Num. of connected IoT devices growing 13% to 18.8B globally,

    S. Sinha, “State of IoT 2024: Num. of connected IoT devices growing 13% to 18.8B globally,” https://bit.ly/4l36mLs, 2024, accessed: 2025- 06-21

  2. [2]

    On the road to 6G: Visions, requirements, key tech- nologies, and testbeds,

    C.-X. Wang et al., “On the road to 6G: Visions, requirements, key tech- nologies, and testbeds,” IEEE Communications Surveys and Tutorials , vol. 25, no. 2, pp. 905–974, 2023

  3. [3]

    NeXt generation/dynamic spectrum access/cognitive radio wireless networks: A survey,

    I. F. Akyildiz, W.- Y . Lee, M. C. Vuran, and S. Mohanty, “NeXt generation/dynamic spectrum access/cognitive radio wireless networks: A survey,” Computer Networks, vol. 50, no. 13, pp. 2127–2159, 2006

  4. [4]

    Spectrum Sensing Using Software Defined Radio for Cognitive Radio Networks: A Survey,

    J. Manco, I. Dayoub, A. Nafkha et al. , “Spectrum Sensing Using Software Defined Radio for Cognitive Radio Networks: A Survey,” IEEE Access, 2022

  5. [5]

    A Novel Software Defined Radio for Practical, Mobile Crowdsourced Spectrum Sensing,

    P . Smith, A. Luong, S. Sarkar et al. , “A Novel Software Defined Radio for Practical, Mobile Crowdsourced Spectrum Sensing,” IEEE Trans. on Mob. Comp. , 2023

  6. [6]

    AI-Empowered Multiple Access for 6G: A Survey of Spectrum Sensing, Protocol Designs, and Opti- mizations,

    X. Cao, B. Y ang, and K. o. Wangand, “AI-Empowered Multiple Access for 6G: A Survey of Spectrum Sensing, Protocol Designs, and Opti- mizations,” Proc. of the IEEE , 2024

  7. [7]

    A New Spectrum Prediction Method for UA V Communications,

    Y . Zhao, S. Luo, Z. Yuan, and R. Lin, “A New Spectrum Prediction Method for UA V Communications,” in 2019 IEEE ICCC , 2019

  8. [8]

    Predicting dynamic spectrum allocation: a review covering simulation, modelling, and prediction,

    A. C. Cullen, B. I. P . Rubinstein, S. Kandeepan, B. Flower, and P . H. W. Leong, “Predicting dynamic spectrum allocation: a review covering simulation, modelling, and prediction,” Artif. Intell. Rev., vol. 56, no. 10, pp. 10 921–10 959, Oct. 2023

  9. [9]

    A city-wide experimental testbed for the next generation wireless networks,

    Z. Zhao, M. C. Vuran, B. Zhou, M. M. Lunar, Z. Aref, D. P . Y oung, W. Humphrey, S. Goddard, G. Attebury, and B. France, “A city-wide experimental testbed for the next generation wireless networks,” Ad Hoc Networks, vol. 111, p. 102305, Feb. 2021

  10. [10]

    Onelnk: One link to rule them all: Web-based wireless experimentation for multi-vendor remotely accessible indoor/outdoor testbeds,

    M. M. R. Lunar, J. Sun, J. Wensowitch, M. Fay, H. B. Tulay, V . S. S. L. Karanam, B. Qiu, D. Nadig, G. Attebury, H. Yu, J. Camp, C. E. Koksal, D. Pompili, B. Ramamurthy, M. Hashemi, E. Ekici, and M. C. Vuran, “Onelnk: One link to rule them all: Web-based wireless experimentation for multi-vendor remotely accessible indoor/outdoor testbeds,” in Proceedings...

  11. [11]

    Gemma: Open Models Based on Gemini Research and Technology

    G. Team, T. Mesnard, C. Hardin et al. , “Gemma: Open models based on gemini research and technology,” arXiv:2403.08295, 2024

  12. [12]

    Language Models are Unsupervised Multitask Learners,

    A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Language Models are Unsupervised Multitask Learners,” 2019. [Online]. Available: https://cdn.openai.com/better-language-models/ language_models_are_unsupervised_multitask_learners.pdf

  13. [13]

    LLaMA: Open and Efficient Foundation Language Models

    H. Touvron, T. Lavril, G. Izacard et al. , “LLaMA: Open and Efficient Foundation Language Models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.13971

  14. [14]

    Mistral 7B

    A. Q. Jiang, A. Sablayrolles, A. Mensch et al. , “Mistral 7B,” 2023. [Online]. Available: https://arxiv.org/abs/2310.06825

  15. [15]

    Textbooks Are All You Need

    S. Gunasekar, Y . Zhang, J. Aneja et al. , “Textbooks Are All Y ou Need,” 2023. [Online]. Available: https://arxiv.org/abs/2306.11644

  16. [16]

    Evolution toward intelligent communications: Impact of deep learning applications on the future of 6G technology,

    M. Abd Elaziz, M. A. Al-qaness, and A. o. Dahou, “Evolution toward intelligent communications: Impact of deep learning applications on the future of 6G technology,” Wiley Interdisciplinary Reviews , 2024

  17. [17]

    Recent Advances in Deep Learning for Channel Coding: A Survey,

    T. Matsumine and H. Ochiai, “Recent Advances in Deep Learning for Channel Coding: A Survey,” IEEE Open Journal of the Comm. Society , 2024

  18. [18]

    Deep learning for joint channel estimation and feedback in massive mimo systems,

    J. Guo, T. Chen, S. Jin et al., “Deep learning for joint channel estimation and feedback in massive mimo systems,” Digital Communications and Networks, vol. 10, no. 1, pp. 83–93, 2024

  19. [19]

    Lightweight deep learning based channel estimation for extremely large-scale massive MIMO systems,

    S. Gao, P . Dong, Z. Pan, and X. Y ou, “Lightweight deep learning based channel estimation for extremely large-scale massive MIMO systems,” IEEE Transactions on Vehicular Technology , vol. 73, no. 7, 2024

  20. [20]

    Secure Energy Efficiency for ARIS Networks With Deep Learning: Active Beamforming and Position Optimization,

    D. Wang, Z. Wang, H. Zhao et al. , “Secure Energy Efficiency for ARIS Networks With Deep Learning: Active Beamforming and Position Optimization,” IEEE Trans. on Wireless Comm. , 2025

  21. [21]

    Deep Learning Enabled Multicast Beamforming With Movable Antenna Array,

    J.-M. Kang, “Deep Learning Enabled Multicast Beamforming With Movable Antenna Array,” IEEE Wireless Comm. Letters , 2024

  22. [22]

    I Can’t Believe It’s Not Real: CV-MuSeNet: Complex-Valued Multi-Signal Segmentation,

    S. Shin and M. C. Vuran, “I Can’t Believe It’s Not Real: CV-MuSeNet: Complex-Valued Multi-Signal Segmentation,” in IEEE DySPAN, 2025

  23. [23]

    Analysis of spectrum sensing using deep learning algorithms: CNNs and RNNs,

    A. Kumar, N. Gaur, S. Chakravarty et al. , “Analysis of spectrum sensing using deep learning algorithms: CNNs and RNNs,” Ain Shams Engineering Journal , 2024

  24. [24]

    A review of deep learning techniques for enhancing spectrum sensing and prediction in cognitive radio systems: approaches, datasets, and challenges,

    N. El-haryqy, Z. Madini, and Y . Zouine, “A review of deep learning techniques for enhancing spectrum sensing and prediction in cognitive radio systems: approaches, datasets, and challenges,” International Jour- nal of Computers and Applications , vol. 46, no. 12, 2024

  25. [25]

    Seek and Classify: End-to- end Joint Spectrum Segmentation and Classification for Multi-signal Wideband Spectrum Sensing,

    P . Subedi, S. Shin, and M. C. Vuran, “Seek and Classify: End-to- end Joint Spectrum Segmentation and Classification for Multi-signal Wideband Spectrum Sensing,” in IEEE LCN’24 , 2024

  26. [26]

    VIA: Establishing the link between spectrum sensor capabilities and data analytics performance,

    K. Doke, B. Okoro, A. Zare, and M. Zheleva, “VIA: Establishing the link between spectrum sensor capabilities and data analytics performance,” in IEEE INFOCOM 2024

  27. [27]

    DeepSense: Fast Wideband Spectrum Sensing Through Real-Time In-the-Loop Deep Learning,

    D. Uvaydov, S. D’Oro, F. Restuccia et al. , “DeepSense: Fast Wideband Spectrum Sensing Through Real-Time In-the-Loop Deep Learning,” in IEEE INFOCOM 2021

  28. [28]

    Spectrum sensing in cognitive radio: A deep learning based model,

    H. Xing et al. , “Spectrum sensing in cognitive radio: A deep learning based model,” Transactions on Emerging Telecommunications Technolo- gies, vol. 33, no. 1, p. e4388, 2022

  29. [29]

    Deep Learning Classification of 3.5-GHz Band Spec- trograms With Applications to Spectrum Sensing,

    W. M. Lees et al., “Deep Learning Classification of 3.5-GHz Band Spec- trograms With Applications to Spectrum Sensing,” IEEE Transactions on Cognitive Communications and Networking , 2019

  30. [30]

    Signal Detection and Classification in Shared Spec- trum: A Deep Learning Approach,

    W. Zhang et al. , “Signal Detection and Classification in Shared Spec- trum: A Deep Learning Approach,” in IEEE INFOCOM 2021

  31. [31]

    Learning the unknown: Improving modulation clas- sification perf. in unseen scenarios,

    E. Perenda et al. , “Learning the unknown: Improving modulation clas- sification perf. in unseen scenarios,” in IEEE INFOCOM 2021

  32. [32]

    Recovering Missing Values From Cor- rupted Historical Observations: Approaching the Limit of Predictability in Spectrum Prediction Tasks,

    X. Li, G. Chen, Y . Xu et al. , “Recovering Missing Values From Cor- rupted Historical Observations: Approaching the Limit of Predictability in Spectrum Prediction Tasks,” IEEE Access , 2020

  33. [33]

    Spectrum Transformer: An Attention-Based Wideband Spectrum Detector,

    W. Zhang, Y . Wang, X. Chen et al. , “Spectrum Transformer: An Attention-Based Wideband Spectrum Detector,” IEEE Trans. on Wire- less Comm. , 2024

  34. [34]

    Multi-channel multi-step spec- trum prediction using transformer and stacked Bi-LSTM,

    P . Guangliang, L. Jie, and L. Minglei, “Multi-channel multi-step spec- trum prediction using transformer and stacked Bi-LSTM,” China Com- munications, 2025

  35. [35]

    USRP N310: Networked soft- ware‑defined radio,

    Ettus Research (National Instruments), “USRP N310: Networked soft- ware‑defined radio,” Product page, Ettus Research, Jun. 2025, available at https://tinyurl.com/usrpn310 (Accessed: 2025-06-11)

  36. [36]

    TensorFlow: Large-scale machine learning on heterogeneous systems,

    M. Abadi, A. Agarwal, P . Barham et al. , “ TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available: https://www.tensorflow.org/

  37. [37]

    Pytorch: An imperative style, high-performance deep learning library,

    A. Paszke, S. Gross, F. Massa et al. , “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019

  38. [38]

    Trimmed sample means for robust uniform mean estimation and regression,

    R. I. Oliveira and L. Resende, “Trimmed sample means for robust uniform mean estimation and regression,” arXiv:2302.06710, 2023

  39. [39]

    Trimmed Mean Dendritic Neuron Model Artificial Neural Network for Time Series Forecasting in Case of Outliers,

    C. Kocak, E. Bas, and E. Egrioglu, “Trimmed Mean Dendritic Neuron Model Artificial Neural Network for Time Series Forecasting in Case of Outliers,” Available at SSRN 4937461 , 2024

  40. [40]

    Llama model configuration — transform- ers main branch,

    Hugging Face & EleutherAI, “Llama model configuration — transform- ers main branch,” GitHub Repository, 2025, accessed: 2025-06-14

  41. [41]

    HuggingFace's Transformers: State-of-the-art Natural Language Processing

    T. Wolf, L. Debut, V . Sanh et al. , “Huggingface’s transformers: State- of-the-art natural language processing,” arXiv:1910.03771, 2019

  42. [42]

    TimeGPT-1: Foundation Model for Time Series Forecasting,

    A. Garza and M. Mergenthaler-Canseco, “TimeGPT-1: Foundation Model for Time Series Forecasting,” arXiv, 2023

  43. [43]

    A coefficient of agreement for nominal scale,

    J. Kohen, “A coefficient of agreement for nominal scale,” Educ Psychol Meas, vol. 20, pp. 37–46, 1960