Step-adaptive multimodal fusion network with multi-scale cloud feature learning for ultra-short-term solar irradiance forecasting
Pith reviewed 2026-06-28 01:35 UTC · model grok-4.3
The pith
A step-adaptive multimodal fusion network with multi-scale cloud feature extraction improves ultra-short-term solar irradiance forecasts over prior methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that its step-adaptive multimodal fusion network, built around InceptionNeXt for multi-scale multi-directional cloud image features, a step-adaptive low-frequency compensation unit, and TempAttnLSTM for global temporal modeling, delivers higher accuracy in ultra-short-term solar irradiance forecasting than existing approaches when evaluated on the NREL dataset and real stations in Shandong.
What carries the argument
InceptionNeXt extracts multi-scale spatial features from cloud images; the step-adaptive low-frequency compensation unit dynamically adjusts global information according to the prediction step; TempAttnLSTM models temporal dependencies after fusing image and meteorological time-series features.
If this is right
- Spatial cloud dynamics captured from images reduce forecast error under complex weather compared with time-series-only models.
- Dynamic adjustment of low-frequency compensation to each prediction step enables more reliable multi-step outputs.
- Fusion of image-derived features with meteorological series produces more robust predictions than either modality alone.
- The overall architecture supports direct use in photovoltaic system dispatch and grid stability applications.
Where Pith is reading between the lines
- The step-adaptive unit could be tested on other sequence prediction tasks where optimal compensation changes with horizon length.
- Replacing InceptionNeXt with alternative multi-scale extractors would show whether the gains depend on that specific backbone.
- Application to wind or load forecasting under visual sky conditions would test transferability beyond solar irradiance.
Load-bearing premise
The three listed shortcomings of prior work are the main limits to accuracy and the new components fix them without adding offsetting errors or dataset-specific artifacts.
What would settle it
A head-to-head test on an independent dataset with different cloud patterns or forecast horizons in which the proposed model shows no accuracy gain over the strongest baselines.
Figures
read the original abstract
Ultra-short-term solar irradiance prediction is critical for photovoltaic system dispatch and power grid stability. Existing approaches suffer from three key shortcomings: single time-series models cannot capture the spatial dynamics of clouds under complex conditions, standard convolutions inadequately represent multi-scale cloud features, and fixed low-frequency compensation strategies fail to adapt to different prediction steps. To address these issues, this proposes a multi-source data fusion model for ultra-short-term irradiance prediction. The model first employs InceptionNeXt to extract multi-scale, multi-directional spatial features from ground-based cloud images. A step-adaptive low-frequency compensation unit is then introduced to dynamically modulate global low-frequency information based on the prediction step. Eventually, the enhanced image features are combined with meteorological time-series features, and a TempAttnLSTM network captures global temporal dependencies for multi-step prediction. Experiments on the public NREL dataset and practical photovoltaic stations in Shandong illustrate the effectiveness of the proposed method compared with several state-of-the-art approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a step-adaptive multimodal fusion network for ultra-short-term solar irradiance forecasting. It uses InceptionNeXt to extract multi-scale and multi-directional spatial features from ground-based cloud images, introduces a step-adaptive low-frequency compensation unit to dynamically modulate global low-frequency information according to the prediction step, fuses the enhanced image features with meteorological time-series data, and employs a TempAttnLSTM network to capture global temporal dependencies for multi-step prediction. Experiments on the public NREL dataset and practical photovoltaic stations in Shandong are stated to demonstrate effectiveness relative to several state-of-the-art approaches.
Significance. If the claimed performance gains hold under rigorous validation, the work could advance ultra-short-term solar forecasting by jointly addressing spatial cloud dynamics via multi-scale convolutions and step-dependent low-frequency adaptation, which are relevant for photovoltaic dispatch and grid stability. The multimodal design and use of a public benchmark dataset are positive elements for potential reproducibility.
major comments (2)
- [Abstract] Abstract: the assertion that experiments 'illustrate the effectiveness' of the proposed method is unsupported by any quantitative results, error bars, ablation studies, dataset statistics, or specific metric values (e.g., RMSE/MAE improvements). This evidentiary gap is load-bearing for the central claim of superiority over SOTA methods.
- [Experiments] Experiments section (as summarized): without component-level ablations isolating the contribution of the step-adaptive low-frequency compensation unit versus InceptionNeXt or TempAttnLSTM, it is not possible to confirm that the new components address the three stated shortcomings without introducing offsetting errors or dataset-specific artifacts.
minor comments (1)
- [Abstract] Abstract: the three shortcomings of prior work are clearly enumerated, but the manuscript would benefit from briefly stating the magnitude of reported gains (even in the abstract) to allow readers to gauge practical significance.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract and experiments. We will revise the manuscript to strengthen the evidentiary basis for our claims while preserving the core contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that experiments 'illustrate the effectiveness' of the proposed method is unsupported by any quantitative results, error bars, ablation studies, dataset statistics, or specific metric values (e.g., RMSE/MAE improvements). This evidentiary gap is load-bearing for the central claim of superiority over SOTA methods.
Authors: We agree that the abstract lacks specific quantitative support. In the revised version, the abstract will be updated to report key performance metrics (e.g., RMSE/MAE reductions on NREL and Shandong datasets) and direct comparisons to the referenced SOTA baselines, providing concrete evidence for the effectiveness claims. revision: yes
-
Referee: [Experiments] Experiments section (as summarized): without component-level ablations isolating the contribution of the step-adaptive low-frequency compensation unit versus InceptionNeXt or TempAttnLSTM, it is not possible to confirm that the new components address the three stated shortcomings without introducing offsetting errors or dataset-specific artifacts.
Authors: We acknowledge the value of explicit component ablations. The revised manuscript will add dedicated ablation experiments that isolate the step-adaptive low-frequency compensation unit, InceptionNeXt multi-scale features, and TempAttnLSTM, quantifying their individual contributions and ruling out offsetting effects or dataset artifacts. revision: yes
Circularity Check
No significant circularity
full rationale
The paper presents an empirical architecture (InceptionNeXt for multi-scale cloud features, step-adaptive compensation unit, TempAttnLSTM for temporal fusion) whose central claim is measurable improvement on external public (NREL) and real-world (Shandong) datasets relative to prior SOTA methods. No equations, predictions, or uniqueness claims reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the derivation chain consists of standard component motivations followed by independent experimental comparison. This is the most common honest outcome for applied ML papers whose value rests on external benchmarks rather than internal algebraic closure.
Axiom & Free-Parameter Ledger
free parameters (1)
- neural network hyperparameters and learned weights
axioms (2)
- domain assumption InceptionNeXt extracts multi-scale multi-directional spatial features from cloud images
- domain assumption TempAttnLSTM captures global temporal dependencies from fused features
invented entities (1)
-
step-adaptive low-frequency compensation unit
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Rafati, M
A. Rafati, M. Joorabian, E. Mashhour, H. R. Shaker, High dimensional very short-term solar power forecast- ing based on a data-driven heuristic method, Energy 219 (2021) 119647
2021
-
[2]
Alonso-Montesinos, F
J. Alonso-Montesinos, F. Batlles, The use of a sky cam- era for solar radiation estimation based on digital image processing, Energy 90 (2015) 377–386
2015
-
[3]
C. Shi, Z. Su, K. Zhang, X. Xie, X. Zhang, Cloudswin- net: A hybrid CNN-transformer framework for ground- based cloud images fine-grained segmentation, Energy 309 (2024) 133128
2024
-
[4]
Z. Zhen, J. Liu, Z. Zhang, F. Wang, H. Chai, Y . Yu, X. Lu, T. Wang, Y . Lin, Deep learning based surface irradiance mapping model for solar PV power forecasting using sky image, IEEE Transactions on Industry Applications 56 (4) (2020) 3385–3396
2020
-
[5]
C. Feng, J. Zhang, W. Zhang, B.-M. Hodge, Convo- lutional neural networks for intra-hour solar forecast- ing based on sky image sequences, Applied Energy 310 (2022) 118438
2022
-
[6]
Huang, J
X. Huang, J. Liu, S. Xu, C. Li, Q. Li, Y . Tai, A 3D ConvLSTM-CNN network based on multi-channel color extraction for ultra-short-term solar irradiance forecast- ing, Energy 272 (2023) 127140
2023
-
[7]
H. Zang, D. Chen, J. Liu, L. Cheng, G. Sun, Z. Wei, Improving ultra-short-term photovoltaic power forecast- ing using a novel sky-image-based framework considering spatial-temporal feature interaction, Energy 293 (2024) 130538. 11
2024
-
[8]
Y . Nie, Q. Paletta, A. Scott, L. M. Pomares, G. Arbod, S. Sgouridis, J. Lasenby, A. Brandt, Sky image-based so- lar forecasting using deep learning with heterogeneous multi-location data: Dataset fusion versus transfer learn- ing, Applied Energy 369 (2024) 123467
2024
-
[9]
A. L. Jonathan, D. Cai, C. C. Ukwuoma, N. J. J. Nkou, Q. Huang, O. Bamisile, A radiant shift: Attention- embedded CNNs for accurate solar irradiance forecasting and prediction from sky images, Renewable Energy 234 (2024) 121133
2024
-
[10]
S. Xu, J. Liu, X. Huang, C. Li, Z. Chen, Y . Tai, Minutely multi-step irradiance forecasting based on all-sky images using LSTM-informerstack hybrid model with dual fea- ture enhancement, Renewable Energy 224 (2024) 120135
2024
-
[11]
Q. Dai, X. Hou, D. Su, Z. Cui, Photovoltaic power pre- diction based on sky images and tokens-to-token vision transformer, International Journal of Renewable Energy Development 12 (6) (2023) 1104–1112
2023
-
[12]
K. Wang, X. Qi, H. Liu, Photovoltaic power forecast- ing based on LSTM-convolutional network, Energy 189 (2019) 116225
2019
-
[13]
C. Shi, M. Zhang, H. Xiang, K. Zhang, S. Ju, X. Zhang, L. Han, A ground-based cloud image classification method for photovoltaic power prediction based on con- volutional neural networks and vision transformer, Engi- neering Applications of Artificial Intelligence 159 (2025) 111582
2025
-
[14]
Y . Ma, W. Yu, J. Zhu, Z. You, A. Jia, Research on ultra- short-term photovoltaic power forecasting using multi- modal data and ensemble learning, Energy 330 (2025) 136831
2025
-
[15]
Caldas, R
M. Caldas, R. Alonso-Suárez, Very short-term solar irra- diance forecast using all-sky imaging and real-time irradi- ance measurements, Renewable Energy 143 (2019) 1643– 1658
2019
-
[16]
Ajith, M
M. Ajith, M. Martínez-Ramón, Deep learning algorithms for very short term solar irradiance forecasting: A survey, Renewable and Sustainable Energy Reviews 182 (2023) 113362
2023
-
[17]
Paletta, G
Q. Paletta, G. Terrén-Serrano, Y . Nie, B. Li, J. Bieker, W. Zhang, L. Dubus, S. Dev, C. Feng, Advances in so- lar forecasting: Computer vision with deep learning, Ad- vances in Applied Energy 11 (2023) 100150
2023
-
[18]
Hendrikx, K
N. Hendrikx, K. Barhmi, L. Visser, T. De Bruin, M. Pó, A. Salah, W. Van Sark, All sky imaging-based short-term solar irradiance forecasting with long short-term memory networks, Solar Energy 272 (2024) 112463
2024
-
[19]
Ansong, G
M. Ansong, G. Huang, T. N. Nyang’onda, R. J. Musembi, B. S. Richards, Very short-term solar irradiance forecast- ing based on open-source low-cost sky imager and hy- brid deep-learning techniques, Solar Energy 294 (2025) 113516
2025
-
[20]
W. Dou, K. Wang, S. Shan, M. Chen, K. Zhang, H. Wei, V . Sreeram, A multi-modal deep clustering method for day-ahead solar irradiance forecasting using ground- based cloud imagery and time series data, Energy 321 (2025) 135285
2025
-
[21]
F. Wu, J. Wu, Y . Kong, C. Yang, G. Yang, H. Shu, G. Carrault, L. Senhadji, Multiscale low-frequency mem- ory network for improved feature extraction in convolu- tional neural networks, in: Proceedings of the AAAI Con- ference on Artificial Intelligence, V ol. 38, 2024, pp. 5967– 5975
2024
-
[22]
W. Yu, P. Zhou, S. Yan, X. Wang, Inceptionnext: When in- ception meets convnext, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 5672–5683
2024
-
[23]
X. Wang, J. Wu, S. Wang, J. Zhang, Multi-stream decom- position with temporal attention for ultra-short-term pho- tovoltaic irradiance forecasting, in: 2025 China Automa- tion Congress (CAC), IEEE, 2025, pp. 6413–6420
2025
-
[24]
Schuster, K
M. Schuster, K. K. Paliwal, Bidirectional recurrent neu- ral networks, IEEE Transactions on Signal Processing 45 (11) (1997) 2673–2681
1997
-
[25]
Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, M. Long, itransformer: Inverted transformers are effective for time series forecasting, arXiv preprint arXiv:2310.06625 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[26]
Y . Nie, N. H. Nguyen, P. Sinthong, J. Kalagnanam, A time series is worth 64 words: Long-term forecasting with transformers, arXiv preprint arXiv:2211.14730 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[27]
X. Zhao, H. Wei, H. Wang, T. Zhu, K. Zhang, 3D-CNN- based feature extraction of ground-based cloud images for direct normal irradiance prediction, Solar Energy 181 (2019) 510–518
2019
-
[28]
Ajith, M
M. Ajith, M. Martínez-Ramón, Deep learning based so- lar radiation micro forecast by fusion of infrared cloud images and radiation data, Applied Energy 294 (2021) 117014
2021
-
[29]
S. Shan, C. Li, Z. Ding, Y . Wang, K. Zhang, H. Wei, Ensemble learning based multi-modal intra-hour irradi- ance forecasting, Energy Conversion and Management 270 (2022) 116206
2022
-
[30]
Sengupta, Y
M. Sengupta, Y . Xie, A. Lopez, A. Habte, G. Maclau- rin, J. Shelby, The national solar radiation data base (NSRDB), Renewable and Sustainable Energy Reviews 89 (2018) 51–60. 12
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.