Visibility nowcasting in South Korea: a machine learning approach to class imbalance and distribution shift

Bong Gyun Shin; Chan Sik Lee; Hyesun Suh

arxiv: 2605.21507 · v1 · pith:3OVMVZ52new · submitted 2026-05-09 · ⚛️ physics.ao-ph · cs.AI· cs.CE· cs.LG

Visibility nowcasting in South Korea: a machine learning approach to class imbalance and distribution shift

Bong Gyun Shin , Chan Sik Lee , Hyesun Suh This is my paper

Pith reviewed 2026-05-22 01:58 UTC · model grok-4.3

classification ⚛️ physics.ao-ph cs.AIcs.CEcs.LG

keywords visibility nowcastingmachine learningclass imbalancedistributional shiftWasserstein distanceSHAP analysisSouth Koreaatmospheric visibility

0 comments

The pith

Visibility nowcasts in South Korean cities lose accuracy on new data because of shifts in meteorological and pollutant distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper sets out a machine learning framework for nowcasting atmospheric visibility across six major South Korean cities, where low-visibility events are rare and weather-pollution patterns evolve from year to year. The authors balance the 2018-2020 training records with SMOTENC and CTGAN, then combine machine-learning and deep-learning models into an ensemble before testing on 2021 data. They find that cross-validation scores do not hold up on the later period and trace the drop to a change in the underlying data distribution, shown by the Wasserstein distance on the single most important input variable according to SHAP values. A sympathetic reader cares because visibility predictions directly affect road safety and air-quality alerts, yet models built on past conditions can silently degrade when the environment itself shifts.

Core claim

The central claim is that an ensemble of machine learning and deep learning models, after SMOTENC and CTGAN are used to correct class imbalance in the scarce low-visibility cases, achieves strong results during cross-validation on 2018-2020 data yet shows a clear drop in predictive performance when applied to the 2021 test set. The authors attribute this degradation to a distributional shift between the training and test periods and support the attribution by computing the Wasserstein distance on the feature that SHAP analysis ranks as most influential.

What carries the argument

The Wasserstein distance computed on the single highest-SHAP-importance feature, used to quantify and confirm the distributional shift between the 2018-2020 training window and the 2021 test window.

If this is right

Nowcasting systems for visibility must detect and adapt to year-to-year changes in the joint distribution of meteorological and air-pollutant variables.
Cross-validation scores on historical data cannot be taken as reliable indicators of future operational performance.
Operational visibility models require ongoing monitoring of input-feature distributions to maintain usefulness over time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Periodic retraining or online adaptation of the ensemble may be needed to keep pace with evolving environmental conditions.
The same imbalance-plus-shift problem is likely to appear in other time-series environmental forecasts such as air-quality or precipitation nowcasting.
Testing the framework on additional future years would reveal whether the performance decline accelerates or stabilizes.

Load-bearing premise

The performance drop on the 2021 test set stems mainly from a change in data distribution rather than from model overfitting, alterations in measurement methods, or other unaccounted variables, and that the Wasserstein distance on one SHAP-selected feature is enough to establish this cause.

What would settle it

A finding that the Wasserstein distance on the top SHAP feature is small yet predictive skill on 2021 data remains low, or that retraining on data that includes periods closer to 2021 restores skill without any change to the shift measure, would undermine the claim that distributional shift is the primary driver.

read the original abstract

Atmospheric visibility is a critical variable for transportation safety and air quality management, however, accurate prediction remains challenging due to the complex interactions between meteorological conditions and air pollutants, as well as the rarity of low-visibility events. This study introduces a machine learning framework to nowcast visibility in six major South Korean cities. To handle the imbalance in the 2018-2020 training data, we applied the Synthetic Minority Over-sampling Technique with Nominal and Continuous (SMOTENC) and Conditional Tabular Generative Adversarial Network (CTGAN). An ensemble approach combining machine learning and deep learning models was then used and evaluated on a 2021 test dataset. The results revealed a marked decline in predictive performance in the test set compared to the cross-validation phase. This degradation was attributed to a distributional shift between training and testing periods, which was quantitatively confirmed by measuring the Wasserstein distance of the most influential feature identified by SHAP analysis. In general, this study presents a methodology that aims to simultaneously address the dual challenges of data imbalance and temporal distributional shifts, and emphasizes the necessity of accounting for evolving external environmental factors when implementing nowcasting models on time-series data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Applies standard imbalance fixes and ensembles to visibility nowcasting but weakly attributes the test drop to shift using only one feature's Wasserstein distance.

read the letter

The paper applies SMOTENC and CTGAN to oversample rare low-visibility cases in 2018-2020 South Korean city data, then ensembles several models and uses SHAP to flag key features before checking Wasserstein distance on the top one to explain why performance fell on the 2021 hold-out. That combination for this exact task is new enough to be worth noting, and the work does a clean job of laying out the practical pipeline for an environmental nowcasting problem where low-visibility events are scarce and conditions change over time. The data split is temporal and the metrics come from real observations, which keeps it grounded. The main soft spot is the central claim that distributional shift caused the drop. The only quantitative support is Wasserstein distance on a single post-hoc SHAP feature; there are no full-set divergence numbers, no joint-distribution checks, and no tests that rule out overfitting to the training window or changes in measurement practices. That leaves the attribution thinner than it needs to be for a load-bearing explanation. Readers working on similar urban air-quality or transport nowcasting tasks will find the case study useful as a worked example of handling imbalance and noticing shift, even if they have to supply their own stronger diagnostics. The paper is coherent on its own terms and engages the relevant literature without obvious contradictions, so it deserves a serious referee who can ask for the missing controls and broader shift metrics.

Referee Report

2 major / 1 minor

Summary. The paper introduces a machine learning framework for nowcasting visibility in six major South Korean cities using 2018-2020 training data. It applies SMOTENC and CTGAN to address class imbalance for rare low-visibility events, combines machine learning and deep learning models in an ensemble, and evaluates on a 2021 test set. The marked decline in performance relative to cross-validation is attributed to temporal distributional shift, with quantitative support from the Wasserstein distance computed on the single most influential feature identified by post-hoc SHAP analysis.

Significance. If the attribution to distributional shift is substantiated, the work provides a concrete example of handling both class imbalance and temporal shifts in environmental nowcasting, which is relevant for transportation safety and air quality applications. The choice of an independent metric (Wasserstein distance) on a data-driven feature offers a step toward falsifiable explanations in applied ML for atmospheric time series, though the current evidence is limited in scope.

major comments (2)

Abstract: The central claim that the observed performance decline on the 2021 test set is primarily caused by distributional shift rests on Wasserstein distance computed only for the single most influential feature from SHAP analysis. This does not establish the shift as the dominant cause without reporting divergence metrics across the full feature set or joint distributions, nor controlled comparisons isolating the shift from alternatives such as overfitting to 2018-2020 patterns or unmodeled changes in pollutant measurement protocols.
Abstract: No specific performance metrics (e.g., precision, recall, F1, or AUC with error bars), exact ensemble architectures, hyperparameter details, or the procedure for post-hoc SHAP feature selection are reported, which prevents assessment of whether the decline magnitude is consistent with the claimed shift or with other factors.

minor comments (1)

Abstract: The sentence beginning 'This degradation was attributed...' would benefit from a brief parenthetical note on the exact feature used for the Wasserstein calculation to improve immediate readability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We are grateful to the referee for providing a thorough review of our manuscript. The comments have prompted us to clarify several aspects of our methodology and results. Below, we respond to each major comment in turn.

read point-by-point responses

Referee: Abstract: The central claim that the observed performance decline on the 2021 test set is primarily caused by distributional shift rests on Wasserstein distance computed only for the single most influential feature from SHAP analysis. This does not establish the shift as the dominant cause without reporting divergence metrics across the full feature set or joint distributions, nor controlled comparisons isolating the shift from alternatives such as overfitting to 2018-2020 patterns or unmodeled changes in pollutant measurement protocols.

Authors: We acknowledge the validity of this concern. Our attribution to distributional shift is based on the most influential feature per SHAP, which we chose as a focused, interpretable approach. To strengthen this, we will add Wasserstein distance calculations for additional top features from the SHAP analysis in the revised manuscript. We will also discuss potential confounding factors like overfitting and measurement changes. However, a comprehensive set of controlled comparisons to definitively isolate the shift is not feasible within the current study scope and will be listed as a limitation. revision: partial
Referee: Abstract: No specific performance metrics (e.g., precision, recall, F1, or AUC with error bars), exact ensemble architectures, hyperparameter details, or the procedure for post-hoc SHAP feature selection are reported, which prevents assessment of whether the decline magnitude is consistent with the claimed shift or with other factors.

Authors: We agree that the abstract should provide more quantitative context. In the revision, we will incorporate specific performance metrics including precision, recall, F1, and AUC with error bars for both cross-validation and the 2021 test set. We will also briefly describe the ensemble architecture (an ensemble of tree-based models and deep learning models), key hyperparameters, and the post-hoc SHAP feature selection procedure. These details are elaborated in the methods and results sections, but summarizing them in the abstract will improve accessibility. revision: yes

standing simulated objections not resolved

Conducting controlled comparisons to fully isolate distributional shift from alternatives like overfitting or changes in measurement protocols.

Circularity Check

0 steps flagged

No significant circularity: empirical attribution relies on independent data metric

full rationale

The paper performs a standard temporal train-test split on real meteorological and pollutant data (2018-2020 training, 2021 testing), directly measures predictive performance drop on the held-out set, and attributes it to distributional shift via Wasserstein distance computed on the single highest-SHAP-importance feature. This metric is an external statistical comparison of observed data distributions and does not reduce to any fitted model parameter, self-definition, or self-citation chain by construction. The ensemble, SMOTENC, and CTGAN steps address imbalance separately from the shift diagnosis. No load-bearing step equates a claimed result to its own inputs; the derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard ML assumptions plus domain choices for handling imbalance and attributing shift; no new physical entities are postulated.

free parameters (2)

Ensemble model selection and hyperparameters
Specific models combined in the ensemble and their tuning parameters are chosen to optimize performance on the training data.
Definition of low-visibility class threshold
The boundary separating rare low-visibility events from the majority class is implicitly set to create the imbalance problem addressed by oversampling.

axioms (1)

domain assumption The 2018-2020 training data distribution is sufficiently stationary within the period to allow effective model training despite known temporal variability in environmental data.
The paper trains on this fixed window and evaluates generalization to 2021 without adaptive mechanisms.

pith-pipeline@v0.9.0 · 5748 in / 1733 out tokens · 59756 ms · 2026-05-22T01:58:58.439587+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

This degradation was attributed to a distributional shift ... quantitatively confirmed by measuring the Wasserstein distance of the most influential feature identified by SHAP analysis.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we applied the Synthetic Minority Over-sampling Technique with Nominal and Continuous (SMOTENC) and Conditional Tabular Generative Adversarial Network (CTGAN)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 2 internal anchors

[1]

Environmental Research159, 466–473 (2017) https://doi.org/10.1016/j.envres.2017.08.018

Hu, Y., Yao, L., Cheng, Z., Wang, Y.: Long-term atmospheric visibility trends in megacities of china, india and the united states. Environmental Research159, 466–473 (2017) https://doi.org/10.1016/j.envres.2017.08.018

work page doi:10.1016/j.envres.2017.08.018 2017
[2]

Advances in Atmospheric Sciences36(10), 1060–1077 (2019) https://doi.org/10.1007/s00376-019-8252-5

Qian, W., Leung, J.C.-H., Chen, Y., Huang, S.: Applying anomaly-based weather analysis to the prediction of low visibility associated with the coastal fog at ningbo-zhoushan port in east china. Advances in Atmospheric Sciences36(10), 1060–1077 (2019) https://doi.org/10.1007/s00376-019-8252-5

work page doi:10.1007/s00376-019-8252-5 2019
[3]

International Journal of Forecasting39(2), 992–1004 (2023) https://doi.org/10.1016/j.ijforecast.2022

Ortega, L.C., Otero, L.D., Solomon, M., Otero, C.E., Fabregas, A.: Deep learning models for visibility forecasting using climatological data. International Journal of Forecasting39(2), 992–1004 (2023) https://doi.org/10.1016/j.ijforecast.2022. 03.009

work page doi:10.1016/j.ijforecast.2022 2023
[4]

IEEE Access12, 72530–72543 (2024) https://doi.org/10.1109/ACCESS.2024.3401091

Raj, S., Deo, R.C., Sharma, E., Prasad, R., Dinh, T., Salcedo-Sanz, S.: Atmo- spheric visibility and cloud ceiling predictions with hybrid iis-lstm integrated model: Case studies for fiji’s aviation industry. IEEE Access12, 72530–72543 (2024) https://doi.org/10.1109/ACCESS.2024.3401091

work page doi:10.1109/access.2024.3401091 2024
[5]

In: 2019 IEEE International Systems Conference (SysCon), pp

Ortega, L., Otero, L.D., Otero, C.: Application of machine learning algorithms for visibility classification. In: 2019 IEEE International Systems Conference (SysCon), pp. 1–5 (2019). https://doi.org/10.1109/SYSCON.2019.8836910

work page doi:10.1109/syscon.2019.8836910 2019
[6]

Weather and Climate Extremes28, 100243 (2020) https://doi.org/10.1016/j.wace.2020.100243

Taszarek, M., Kendzierski, S., Pilguj, N.: Hazardous weather affecting european airports: Climatological estimates of situations with limited visibility, thun- derstorm, low-level wind shear and snowfall from era5. Weather and Climate Extremes28, 100243 (2020) https://doi.org/10.1016/j.wace.2020.100243

work page doi:10.1016/j.wace.2020.100243 2020
[7]

International Journal of Transportation Science and Technology9(4), 287–298 (2020) https://doi.org/10.1016/j.ijtst.2020.02.001

Zhai, B., Lu, J., Wang, Y., Wu, B.: Real-time prediction of crash risk on free- ways under fog conditions. International Journal of Transportation Science and Technology9(4), 287–298 (2020) https://doi.org/10.1016/j.ijtst.2020.02.001

work page doi:10.1016/j.ijtst.2020.02.001 2020
[8]

Journal of Navigation77(4), 436–456 (2024) https://doi.org/10.1017/S0373463324000377

Ding, G., Li, R., Li, C., Yang, B., Li, Y., Yu, Q., Geng, X., Yao, Z., Zhang, K., Wen, J.: Review of ship navigation safety in fog. Journal of Navigation77(4), 436–456 (2024) https://doi.org/10.1017/S0373463324000377

work page doi:10.1017/s0373463324000377 2024
[9]

Journal of the Korean Meteorological Society29, 439–450 (2019) https://doi.org/10.14191/Atmos.2019.29.4.439

Lee, Y.-S., Reno, K.-Y., Choi, R., Kim, K.-H., Park, S.-H., Nam, H.-J., Kim, S.- B.: Improvement of automatic present weather observation with in situ visibility and humidity measurements. Journal of the Korean Meteorological Society29, 439–450 (2019) https://doi.org/10.14191/Atmos.2019.29.4.439

work page doi:10.14191/atmos.2019.29.4.439 2019
[10]

Remote Sensing13(11) (2021) https://doi.org/10.3390/rs13112096 30

Yu, Z., Qu, Y., Wang, Y., Ma, J., Cao, Y.: Application of machine-learning- based fusion model in visibility forecast: A case study of shanghai, china. Remote Sensing13(11) (2021) https://doi.org/10.3390/rs13112096 30

work page doi:10.3390/rs13112096 2021
[11]

Weather and Forecasting37(12), 2263–2274 (2022) https://doi.org/10.1175/ WAF-D-22-0053.1

Kim, B.-Y., Belorid, M., Cha, J.W.: Short-term visibility prediction using tree- based machine learning algorithms and numerical weather prediction data. Weather and Forecasting37(12), 2263–2274 (2022) https://doi.org/10.1175/ WAF-D-22-0053.1

work page 2022
[12]

IET Confer- ence Proceedings2024, 221–226 (2025) https://doi.org/10.1049/icp.2025.0028 https://digital-library.theiet.org/doi/pdf/10.1049/icp.2025.0028

Zhou, B., Yin, Y., Zang, Z., Niu, D., Gao, H., Fu, X.: An effective atmo- spheric visibility forecasting model based on improved rainformer. IET Confer- ence Proceedings2024, 221–226 (2025) https://doi.org/10.1049/icp.2025.0028 https://digital-library.theiet.org/doi/pdf/10.1049/icp.2025.0028

work page doi:10.1049/icp.2025.0028 2025
[13]

Chantry, M., Christensen, H., Dueben, P., Palmer, T.: Opportunities and chal- lenges for machine learning in weather and climate modelling: hard, medium and soft ai. Philosophical Transactions of the Royal Society A: Mathemati- cal, Physical and Engineering Sciences379(2194), 20200083 (2021) https:// doi.org/10.1098/rsta.2020.0083 https://royalsocietypubl...

work page doi:10.1098/rsta.2020.0083 2021
[14]

Archives of Computational Methods in Engineering29(2), 1247–1275 (2022) https://doi.org/10.1007/ s11831-021-09616-4

Fathi, M., Kashani, M.H., Jameii, S.M., Mahdipour, E.: Big data analyt- ics in weather forecasting: A systematic review. Archives of Computational Methods in Engineering29(2), 1247–1275 (2022) https://doi.org/10.1007/ s11831-021-09616-4

work page 2022
[15]

Applied Sciences9(22) (2019) https://doi.org/10.3390/ app9224931

Aguasca-Colomo, R., Castellanos-Nieves, D., M´ endez, M.: Comparative analysis of rainfall prediction models using machine learning in islands with complex orog- raphy: Tenerife island. Applied Sciences9(22) (2019) https://doi.org/10.3390/ app9224931

work page 2019
[16]

SMOTE: Synthetic Minority Over-sampling Technique

Bowyer, K.W., Chawla, N.V., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. CoRRabs/1106.1813(2011) 1106.1813

work page internal anchor Pith review Pith/arXiv arXiv 2011
[17]

In: Wallach, H., Larochelle, H., Beygelz- imer, A., Alch´ e-Buc, F., Fox, E., Garnett, R

Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional gan. In: Wallach, H., Larochelle, H., Beygelz- imer, A., Alch´ e-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., Vancou- ver, Canada (2019). https://proceedings.neurips.cc/paper ...

work page 2019
[18]

Wasserstein GAN

Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN (2017). https://arxiv. org/abs/1701.07875

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

Atmospheric Environment42(7), 1424–1435 (2008) https://doi

Deng, X., Tie, X., Wu, D., Zhou, X., Bi, X., Tan, H., Li, F., Jiang, C.: Long- term trend of visibility and its characterizations in the pearl river delta (prd) region, china. Atmospheric Environment42(7), 1424–1435 (2008) https://doi. org/10.1016/j.atmosenv.2007.11.025

work page doi:10.1016/j.atmosenv.2007.11.025 2008
[21]

Advances in Meteorology2020(1), 8899750 (2020) https://doi.org/10.1155/2020/8899750 https://onlinelibrary.wiley.com/doi/pdf/10.1155/2020/8899750

Zhang, J., Zhao, P., Wang, X., Zhang, J., Liu, J., Li, B., Zhou, Y., Wang, H.: Main factors influencing winter visibility at the xinjin flight college of the civil aviation flight university of china. Advances in Meteorology2020(1), 8899750 (2020) https://doi.org/10.1155/2020/8899750 https://onlinelibrary.wiley.com/doi/pdf/10.1155/2020/8899750

work page doi:10.1155/2020/8899750 2020
[22]

Chen and C

Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16, pp. 785–794. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939785

work page doi:10.1145/2939672.2939785 2016
[23]

In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.- Y.: Lightgbm: A highly efficient gradient boosting decision tree. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc., Long Beach, CA, USA (2017)....

work page 2017
[24]

Deep residual learning for image recognition,

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016
[25]

In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W

Gorishniy, Y., Rubachev, I., Khrulkov, V., Babenko, A.: Revisiting deep learning models for tabular data. In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems (2021). https: //openreview.net/forum?id=i Q1yrOegLY

work page 2021
[26]

In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Min- ing

Ke, G., Xu, Z., Zhang, J., Bian, J., Liu, T.-Y.: Deepgbm: A deep learning frame- work distilled by gbdt for online prediction tasks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Min- ing. KDD ’19, pp. 384–394. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3292500.333...

work page doi:10.1145/3292500.3330858 2019
[27]

Weather and Forecasting5(4), 570–575 (1990) https://doi.org/10.1175/1520-0434(1990) 005⟨0570:TCSIAA⟩2.0.CO;2

Schaefer, J.T.: The critical success index as an indicator of warning skill. Weather and Forecasting5(4), 570–575 (1990) https://doi.org/10.1175/1520-0434(1990) 005⟨0570:TCSIAA⟩2.0.CO;2

work page doi:10.1175/1520-0434(1990 1990
[28]

UNP Journal of Statistics and Data Science1(3), 120–125 (2023) https: //doi.org/10.24036/ujsds/vol1-iss3/39 32

Nabilla, V.H., Fitria, D., Permana, D., Fitri, F.: Comparison of haversine and euclidean distance formulas for calculating distance between regencies in west sumatra. UNP Journal of Statistics and Data Science1(3), 120–125 (2023) https: //doi.org/10.24036/ujsds/vol1-iss3/39 32

work page doi:10.24036/ujsds/vol1-iss3/39 2023
[29]

Environment International169, 107538 (2022) https://doi.org/10.1016/j.envint.2022.107538

Xu, C., Wang, J., Hu, M., Wang, W.: A new method for interpolation of miss- ing air quality data at monitor stations. Environment International169, 107538 (2022) https://doi.org/10.1016/j.envint.2022.107538

work page doi:10.1016/j.envint.2022.107538 2022
[30]

PLOS ONE19(9), 1–39 (2024) https://doi.org/10.1371/journal.pone.0306303

Hua, V., Nguyen, T., Dao, M.-S., Nguyen, H.D., Nguyen, B.T.: The impact of data imputation on air quality prediction problem. PLOS ONE19(9), 1–39 (2024) https://doi.org/10.1371/journal.pone.0306303

work page doi:10.1371/journal.pone.0306303 2024
[31]

Environmental Science and Pollution Research International30(28), 72319–72335 (2023) https: //doi.org/10.1007/s11356-023-27176-x

Parra-Plazas, J., Gaona-Garcia, P., Plazas-Nossa, L.: Time series outlier removal and imputing methods based on colombian weather stations data. Environmental Science and Pollution Research International30(28), 72319–72335 (2023) https: //doi.org/10.1007/s11356-023-27176-x

work page doi:10.1007/s11356-023-27176-x 2023
[32]

Engineering Applications of Artificial Intelligence162, 112780 (2025) https://doi.org/10.1016/j.engappai.2025.112780

Porcelli, L., Fiore, U., Palmieri, F.: Generative models with helical time encod- ing for seasonal time series forecasting. Engineering Applications of Artificial Intelligence162, 112780 (2025) https://doi.org/10.1016/j.engappai.2025.112780

work page doi:10.1016/j.engappai.2025.112780 2025
[33]

Air2(4), 444–467 (2024) https://doi.org/10.3390/ air2040026

Calastrini, F., Messeri, G., Orlandi, A.: Long-range mineral dust transport events in mediterranean countries. Air2(4), 444–467 (2024) https://doi.org/10.3390/ air2040026

work page 2024
[34]

Journal of Fundamental and Applied Sciences10, 1256–1267 (2018) https://doi.org/10

Haris, N.A., Azlan, A., Nor, N.M., Sharif, N.A.M.: Improving air pollution index (api) predictive accuracy using time series cross-validation technique. Journal of Fundamental and Applied Sciences10, 1256–1267 (2018) https://doi.org/10. 4314/jfas.v10i1s.93

work page 2018
[35]

https://arxiv.org/abs/2511.11945

Temraz, M., Keane, M.T.: Augmenting The Weather: A Hybrid Counterfactual- SMOTE Algorithm for Improving Crop Growth Prediction When Climate Changes (2025). https://arxiv.org/abs/2511.11945

work page arXiv 2025
[36]

IEEE Access10, 30655–30665 (2022) https://doi.org/10

Sharma, A., Singh, P.K., Chandra, R.: Smotified-gan for class imbalanced pattern classification problems. IEEE Access10, 30655–30665 (2022) https://doi.org/10. 1109/ACCESS.2022.3158977

work page arXiv 2022
[37]

Information Sciences with Applications5, 1–10 (2025) https://doi.org/10.61356/j.iswa.2025.5466

Abdullah, W., Bacanin, N., Venkatachalam, K.: Ensemble rf-knn model for accu- rate prediction of drought levels. Information Sciences with Applications5, 1–10 (2025) https://doi.org/10.61356/j.iswa.2025.5466

work page doi:10.61356/j.iswa.2025.5466 2025
[38]

Neurocomputing149, 275– 284 (2015) https://doi.org/10.1016/j.neucom.2014.02.072

Cao, J., Kwong, S., Wang, R., Li, X., Li, K., Kong, X.: Class-specific soft voting based multiple extreme learning machines ensemble. Neurocomputing149, 275– 284 (2015) https://doi.org/10.1016/j.neucom.2014.02.072 . Advances in neural networks Advances in Extreme Learning Machines

work page doi:10.1016/j.neucom.2014.02.072 2015
[39]

Vietnam Journal of Computer Science11(04), 531–552 (2024) https://doi.org/ 10.1142/S2196888824500155 https://doi.org/10.1142/S2196888824500155 33

Cao-Van, K., Minh, T.C., Minh, L.G., Quyen, T.T.B., Tan, H.M.: Soft-voting ensemble model: An efficient learning approach for predictive prostate cancer risk. Vietnam Journal of Computer Science11(04), 531–552 (2024) https://doi.org/ 10.1142/S2196888824500155 https://doi.org/10.1142/S2196888824500155 33

work page doi:10.1142/s2196888824500155 2024
[40]

Journal of Network and Computer Applications212, 103560 (2023) https://doi.org/10.1016/j.jnca.2022.103560

Khan, M.A., Iqbal, N., Imran, Jamil, H., Kim, D.-H.: An optimized ensemble prediction model using automl based on soft voting classifier for network intrusion detection. Journal of Network and Computer Applications212, 103560 (2023) https://doi.org/10.1016/j.jnca.2022.103560

work page doi:10.1016/j.jnca.2022.103560 2023
[41]

Sensors22(19) (2022) https://doi.org/10.3390/s22197268

Kibria, H.B., Nahiduzzaman, M., Goni, M.O.F., Ahsan, M., Haider, J.: An ensem- ble approach for the prediction of diabetes mellitus using a soft voting classifier with an explainable ai. Sensors22(19) (2022) https://doi.org/10.3390/s22197268

work page doi:10.3390/s22197268 2022
[42]

Applied Sciences12(15) (2022) https://doi.org/10.3390/app12157554

Manconi, A., Armano, G., Gnocchi, M., Milanesi, L.: A soft-voting ensemble clas- sifier for detecting patients affected by covid-19. Applied Sciences12(15) (2022) https://doi.org/10.3390/app12157554

work page doi:10.3390/app12157554 2022
[43]

Symmetry17(2) (2025) https://doi.org/ 10.3390/sym17020185

Sultan, S.Q., Javaid, N., Alrajeh, N., Aslam, M.: Machine learning-based stacking ensemble model for prediction of heart disease with explainable ai and k-fold cross-validation: A symmetric approach. Symmetry17(2) (2025) https://doi.org/ 10.3390/sym17020185

work page doi:10.3390/sym17020185 2025
[44]

IEEE Transactions on Energy Conversion40(1), 557–567 (2025) https://doi.org/10.1109/TEC.2024.3420394

Rammurti Sharma, N., Rameshchandra Bhalja, B., Malik, O.P.: Machine learning-based severity assessment and incipient turn-to-turn fault detection in induction motors. IEEE Transactions on Energy Conversion40(1), 557–567 (2025) https://doi.org/10.1109/TEC.2024.3420394

work page doi:10.1109/tec.2024.3420394 2025
[45]

Technologies13(3) (2025) https://doi.org/10.3390/ technologies13030088

Imani, M., Beikmohammadi, A., Arabnia, H.R.: Comprehensive analysis of random forest and xgboost performance with smote, adasyn, and gnus under varying imbalance levels. Technologies13(3) (2025) https://doi.org/10.3390/ technologies13030088

work page 2025
[46]

Journal of the American Medi- cal Informatics Association31(11), 2529–2539 (2024) https://doi

Tian, M., Chen, B., Guo, A., Jiang, S., Zhang, A.R.: Reliable generation of privacy-preserving synthetic electronic health record time series via diffusion models. Journal of the American Medi- cal Informatics Association31(11), 2529–2539 (2024) https://doi. org/10.1093/jamia/ocae229 https://academic.oup.com/jamia/article- pdf/31/11/2529/59813606/ocae229.pdf

work page doi:10.1093/jamia/ocae229 2024
[47]

Aerosol and Air Quality Research, 1048–1061 (2020) https://doi.org/10

Won, W.-S., Oh, R., Lee, W., Kim, K.-Y., Ku, S., Su, P.-C., Yoon, Y.-J.: Impact of fine particulate matter on visibility at incheon international airport, south korea. Aerosol and Air Quality Research, 1048–1061 (2020) https://doi.org/10. 4209/aaqr.2019.03.0106

work page 2020
[48]

Atmosphere11(5) (2020) https://doi.org/ 10.3390/atmos11050461

Sun, X., Zhao, T., Liu, D., Gong, S., Xu, J., Ma, X.: Quantifying the influences of pm2.5 and relative humidity on change of atmospheric visibility over recent winters in an urban area of east china. Atmosphere11(5) (2020) https://doi.org/ 10.3390/atmos11050461

work page doi:10.3390/atmos11050461 2020
[49]

Masset, R

Sfar, W., Amhaimar, L., Khalidi, A., Talbi, B.: A hybrid long-term photovoltaic power prediction model integrating a bilstm network with residual correction via 34 catboost. Results in Engineering29, 108898 (2026) https://doi.org/10.1016/j. rineng.2025.108898

work page doi:10.1016/j 2026
[50]

doi: 10.24963/ijcai.2022/

Rozemberczki, B., Watson, L., Bayer, P., Yang, H.-T., Kiss, O., Nilsson, S., Sarkar, R.: The shapley value in machine learning. In: Raedt, L.D. (ed.) Proceed- ings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pp. 5572–5579. International Joint Conferences on Artificial Intelli- gence Organization, Vienna, Austri...

work page doi:10.24963/ijcai.2022/ 2022
[51]

https://arxiv.org/abs/2505.03992

Briscoe, J., Kepler, G., Deford, D., Gebremedhin, A.: Algorithmic Accountability in Small Data: Sample-Size-Induced Bias Within Classification Metrics (2025). https://arxiv.org/abs/2505.03992

work page arXiv 2025
[52]

In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J

Francazi, E., Baity-Jesi, M., Lucchi, A.: A theoretical analysis of the learn- ing dynamics under class imbalance. In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) Proceedings of the 40th Inter- national Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 202, pp. 10285–10322. PMLR, Honolul...

work page 2023
[53]

Journal of Advances in Modeling Earth Sys- tems15(12), 2023–003792 (2023) https://doi.org/10.1029/2023MS003792 https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2023MS003792

Smith, T.A., Penny, S.G., Platt, J.A., Chen, T.-C.: Temporal subsam- pling diminishes small spatial scales in recurrent neural network emulators of geophysical turbulence. Journal of Advances in Modeling Earth Sys- tems15(12), 2023–003792 (2023) https://doi.org/10.1029/2023MS003792 https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2023MS003792. e202...

work page doi:10.1029/2023ms003792 2023

[1] [1]

Environmental Research159, 466–473 (2017) https://doi.org/10.1016/j.envres.2017.08.018

Hu, Y., Yao, L., Cheng, Z., Wang, Y.: Long-term atmospheric visibility trends in megacities of china, india and the united states. Environmental Research159, 466–473 (2017) https://doi.org/10.1016/j.envres.2017.08.018

work page doi:10.1016/j.envres.2017.08.018 2017

[2] [2]

Advances in Atmospheric Sciences36(10), 1060–1077 (2019) https://doi.org/10.1007/s00376-019-8252-5

Qian, W., Leung, J.C.-H., Chen, Y., Huang, S.: Applying anomaly-based weather analysis to the prediction of low visibility associated with the coastal fog at ningbo-zhoushan port in east china. Advances in Atmospheric Sciences36(10), 1060–1077 (2019) https://doi.org/10.1007/s00376-019-8252-5

work page doi:10.1007/s00376-019-8252-5 2019

[3] [3]

International Journal of Forecasting39(2), 992–1004 (2023) https://doi.org/10.1016/j.ijforecast.2022

Ortega, L.C., Otero, L.D., Solomon, M., Otero, C.E., Fabregas, A.: Deep learning models for visibility forecasting using climatological data. International Journal of Forecasting39(2), 992–1004 (2023) https://doi.org/10.1016/j.ijforecast.2022. 03.009

work page doi:10.1016/j.ijforecast.2022 2023

[4] [4]

IEEE Access12, 72530–72543 (2024) https://doi.org/10.1109/ACCESS.2024.3401091

Raj, S., Deo, R.C., Sharma, E., Prasad, R., Dinh, T., Salcedo-Sanz, S.: Atmo- spheric visibility and cloud ceiling predictions with hybrid iis-lstm integrated model: Case studies for fiji’s aviation industry. IEEE Access12, 72530–72543 (2024) https://doi.org/10.1109/ACCESS.2024.3401091

work page doi:10.1109/access.2024.3401091 2024

[5] [5]

In: 2019 IEEE International Systems Conference (SysCon), pp

Ortega, L., Otero, L.D., Otero, C.: Application of machine learning algorithms for visibility classification. In: 2019 IEEE International Systems Conference (SysCon), pp. 1–5 (2019). https://doi.org/10.1109/SYSCON.2019.8836910

work page doi:10.1109/syscon.2019.8836910 2019

[6] [6]

Weather and Climate Extremes28, 100243 (2020) https://doi.org/10.1016/j.wace.2020.100243

Taszarek, M., Kendzierski, S., Pilguj, N.: Hazardous weather affecting european airports: Climatological estimates of situations with limited visibility, thun- derstorm, low-level wind shear and snowfall from era5. Weather and Climate Extremes28, 100243 (2020) https://doi.org/10.1016/j.wace.2020.100243

work page doi:10.1016/j.wace.2020.100243 2020

[7] [7]

International Journal of Transportation Science and Technology9(4), 287–298 (2020) https://doi.org/10.1016/j.ijtst.2020.02.001

Zhai, B., Lu, J., Wang, Y., Wu, B.: Real-time prediction of crash risk on free- ways under fog conditions. International Journal of Transportation Science and Technology9(4), 287–298 (2020) https://doi.org/10.1016/j.ijtst.2020.02.001

work page doi:10.1016/j.ijtst.2020.02.001 2020

[8] [8]

Journal of Navigation77(4), 436–456 (2024) https://doi.org/10.1017/S0373463324000377

Ding, G., Li, R., Li, C., Yang, B., Li, Y., Yu, Q., Geng, X., Yao, Z., Zhang, K., Wen, J.: Review of ship navigation safety in fog. Journal of Navigation77(4), 436–456 (2024) https://doi.org/10.1017/S0373463324000377

work page doi:10.1017/s0373463324000377 2024

[9] [9]

Journal of the Korean Meteorological Society29, 439–450 (2019) https://doi.org/10.14191/Atmos.2019.29.4.439

Lee, Y.-S., Reno, K.-Y., Choi, R., Kim, K.-H., Park, S.-H., Nam, H.-J., Kim, S.- B.: Improvement of automatic present weather observation with in situ visibility and humidity measurements. Journal of the Korean Meteorological Society29, 439–450 (2019) https://doi.org/10.14191/Atmos.2019.29.4.439

work page doi:10.14191/atmos.2019.29.4.439 2019

[10] [10]

Remote Sensing13(11) (2021) https://doi.org/10.3390/rs13112096 30

Yu, Z., Qu, Y., Wang, Y., Ma, J., Cao, Y.: Application of machine-learning- based fusion model in visibility forecast: A case study of shanghai, china. Remote Sensing13(11) (2021) https://doi.org/10.3390/rs13112096 30

work page doi:10.3390/rs13112096 2021

[11] [11]

Weather and Forecasting37(12), 2263–2274 (2022) https://doi.org/10.1175/ WAF-D-22-0053.1

Kim, B.-Y., Belorid, M., Cha, J.W.: Short-term visibility prediction using tree- based machine learning algorithms and numerical weather prediction data. Weather and Forecasting37(12), 2263–2274 (2022) https://doi.org/10.1175/ WAF-D-22-0053.1

work page 2022

[12] [12]

IET Confer- ence Proceedings2024, 221–226 (2025) https://doi.org/10.1049/icp.2025.0028 https://digital-library.theiet.org/doi/pdf/10.1049/icp.2025.0028

Zhou, B., Yin, Y., Zang, Z., Niu, D., Gao, H., Fu, X.: An effective atmo- spheric visibility forecasting model based on improved rainformer. IET Confer- ence Proceedings2024, 221–226 (2025) https://doi.org/10.1049/icp.2025.0028 https://digital-library.theiet.org/doi/pdf/10.1049/icp.2025.0028

work page doi:10.1049/icp.2025.0028 2025

[13] [13]

Chantry, M., Christensen, H., Dueben, P., Palmer, T.: Opportunities and chal- lenges for machine learning in weather and climate modelling: hard, medium and soft ai. Philosophical Transactions of the Royal Society A: Mathemati- cal, Physical and Engineering Sciences379(2194), 20200083 (2021) https:// doi.org/10.1098/rsta.2020.0083 https://royalsocietypubl...

work page doi:10.1098/rsta.2020.0083 2021

[14] [14]

Archives of Computational Methods in Engineering29(2), 1247–1275 (2022) https://doi.org/10.1007/ s11831-021-09616-4

Fathi, M., Kashani, M.H., Jameii, S.M., Mahdipour, E.: Big data analyt- ics in weather forecasting: A systematic review. Archives of Computational Methods in Engineering29(2), 1247–1275 (2022) https://doi.org/10.1007/ s11831-021-09616-4

work page 2022

[15] [15]

Applied Sciences9(22) (2019) https://doi.org/10.3390/ app9224931

Aguasca-Colomo, R., Castellanos-Nieves, D., M´ endez, M.: Comparative analysis of rainfall prediction models using machine learning in islands with complex orog- raphy: Tenerife island. Applied Sciences9(22) (2019) https://doi.org/10.3390/ app9224931

work page 2019

[16] [16]

SMOTE: Synthetic Minority Over-sampling Technique

Bowyer, K.W., Chawla, N.V., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. CoRRabs/1106.1813(2011) 1106.1813

work page internal anchor Pith review Pith/arXiv arXiv 2011

[17] [17]

In: Wallach, H., Larochelle, H., Beygelz- imer, A., Alch´ e-Buc, F., Fox, E., Garnett, R

Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional gan. In: Wallach, H., Larochelle, H., Beygelz- imer, A., Alch´ e-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., Vancou- ver, Canada (2019). https://proceedings.neurips.cc/paper ...

work page 2019

[18] [18]

Wasserstein GAN

Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN (2017). https://arxiv. org/abs/1701.07875

work page internal anchor Pith review Pith/arXiv arXiv 2017

[19] [19]

Atmospheric Environment42(7), 1424–1435 (2008) https://doi

Deng, X., Tie, X., Wu, D., Zhou, X., Bi, X., Tan, H., Li, F., Jiang, C.: Long- term trend of visibility and its characterizations in the pearl river delta (prd) region, china. Atmospheric Environment42(7), 1424–1435 (2008) https://doi. org/10.1016/j.atmosenv.2007.11.025

work page doi:10.1016/j.atmosenv.2007.11.025 2008

[20] [21]

Advances in Meteorology2020(1), 8899750 (2020) https://doi.org/10.1155/2020/8899750 https://onlinelibrary.wiley.com/doi/pdf/10.1155/2020/8899750

Zhang, J., Zhao, P., Wang, X., Zhang, J., Liu, J., Li, B., Zhou, Y., Wang, H.: Main factors influencing winter visibility at the xinjin flight college of the civil aviation flight university of china. Advances in Meteorology2020(1), 8899750 (2020) https://doi.org/10.1155/2020/8899750 https://onlinelibrary.wiley.com/doi/pdf/10.1155/2020/8899750

work page doi:10.1155/2020/8899750 2020

[21] [22]

Chen and C

Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16, pp. 785–794. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939785

work page doi:10.1145/2939672.2939785 2016

[22] [23]

In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.- Y.: Lightgbm: A highly efficient gradient boosting decision tree. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc., Long Beach, CA, USA (2017)....

work page 2017

[23] [24]

Deep residual learning for image recognition,

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016

[24] [25]

In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W

Gorishniy, Y., Rubachev, I., Khrulkov, V., Babenko, A.: Revisiting deep learning models for tabular data. In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems (2021). https: //openreview.net/forum?id=i Q1yrOegLY

work page 2021

[25] [26]

In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Min- ing

Ke, G., Xu, Z., Zhang, J., Bian, J., Liu, T.-Y.: Deepgbm: A deep learning frame- work distilled by gbdt for online prediction tasks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Min- ing. KDD ’19, pp. 384–394. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3292500.333...

work page doi:10.1145/3292500.3330858 2019

[26] [27]

Weather and Forecasting5(4), 570–575 (1990) https://doi.org/10.1175/1520-0434(1990) 005⟨0570:TCSIAA⟩2.0.CO;2

Schaefer, J.T.: The critical success index as an indicator of warning skill. Weather and Forecasting5(4), 570–575 (1990) https://doi.org/10.1175/1520-0434(1990) 005⟨0570:TCSIAA⟩2.0.CO;2

work page doi:10.1175/1520-0434(1990 1990

[27] [28]

UNP Journal of Statistics and Data Science1(3), 120–125 (2023) https: //doi.org/10.24036/ujsds/vol1-iss3/39 32

Nabilla, V.H., Fitria, D., Permana, D., Fitri, F.: Comparison of haversine and euclidean distance formulas for calculating distance between regencies in west sumatra. UNP Journal of Statistics and Data Science1(3), 120–125 (2023) https: //doi.org/10.24036/ujsds/vol1-iss3/39 32

work page doi:10.24036/ujsds/vol1-iss3/39 2023

[28] [29]

Environment International169, 107538 (2022) https://doi.org/10.1016/j.envint.2022.107538

Xu, C., Wang, J., Hu, M., Wang, W.: A new method for interpolation of miss- ing air quality data at monitor stations. Environment International169, 107538 (2022) https://doi.org/10.1016/j.envint.2022.107538

work page doi:10.1016/j.envint.2022.107538 2022

[29] [30]

PLOS ONE19(9), 1–39 (2024) https://doi.org/10.1371/journal.pone.0306303

Hua, V., Nguyen, T., Dao, M.-S., Nguyen, H.D., Nguyen, B.T.: The impact of data imputation on air quality prediction problem. PLOS ONE19(9), 1–39 (2024) https://doi.org/10.1371/journal.pone.0306303

work page doi:10.1371/journal.pone.0306303 2024

[30] [31]

Environmental Science and Pollution Research International30(28), 72319–72335 (2023) https: //doi.org/10.1007/s11356-023-27176-x

Parra-Plazas, J., Gaona-Garcia, P., Plazas-Nossa, L.: Time series outlier removal and imputing methods based on colombian weather stations data. Environmental Science and Pollution Research International30(28), 72319–72335 (2023) https: //doi.org/10.1007/s11356-023-27176-x

work page doi:10.1007/s11356-023-27176-x 2023

[31] [32]

Engineering Applications of Artificial Intelligence162, 112780 (2025) https://doi.org/10.1016/j.engappai.2025.112780

Porcelli, L., Fiore, U., Palmieri, F.: Generative models with helical time encod- ing for seasonal time series forecasting. Engineering Applications of Artificial Intelligence162, 112780 (2025) https://doi.org/10.1016/j.engappai.2025.112780

work page doi:10.1016/j.engappai.2025.112780 2025

[32] [33]

Air2(4), 444–467 (2024) https://doi.org/10.3390/ air2040026

Calastrini, F., Messeri, G., Orlandi, A.: Long-range mineral dust transport events in mediterranean countries. Air2(4), 444–467 (2024) https://doi.org/10.3390/ air2040026

work page 2024

[33] [34]

Journal of Fundamental and Applied Sciences10, 1256–1267 (2018) https://doi.org/10

Haris, N.A., Azlan, A., Nor, N.M., Sharif, N.A.M.: Improving air pollution index (api) predictive accuracy using time series cross-validation technique. Journal of Fundamental and Applied Sciences10, 1256–1267 (2018) https://doi.org/10. 4314/jfas.v10i1s.93

work page 2018

[34] [35]

https://arxiv.org/abs/2511.11945

Temraz, M., Keane, M.T.: Augmenting The Weather: A Hybrid Counterfactual- SMOTE Algorithm for Improving Crop Growth Prediction When Climate Changes (2025). https://arxiv.org/abs/2511.11945

work page arXiv 2025

[35] [36]

IEEE Access10, 30655–30665 (2022) https://doi.org/10

Sharma, A., Singh, P.K., Chandra, R.: Smotified-gan for class imbalanced pattern classification problems. IEEE Access10, 30655–30665 (2022) https://doi.org/10. 1109/ACCESS.2022.3158977

work page arXiv 2022

[36] [37]

Information Sciences with Applications5, 1–10 (2025) https://doi.org/10.61356/j.iswa.2025.5466

Abdullah, W., Bacanin, N., Venkatachalam, K.: Ensemble rf-knn model for accu- rate prediction of drought levels. Information Sciences with Applications5, 1–10 (2025) https://doi.org/10.61356/j.iswa.2025.5466

work page doi:10.61356/j.iswa.2025.5466 2025

[37] [38]

Neurocomputing149, 275– 284 (2015) https://doi.org/10.1016/j.neucom.2014.02.072

Cao, J., Kwong, S., Wang, R., Li, X., Li, K., Kong, X.: Class-specific soft voting based multiple extreme learning machines ensemble. Neurocomputing149, 275– 284 (2015) https://doi.org/10.1016/j.neucom.2014.02.072 . Advances in neural networks Advances in Extreme Learning Machines

work page doi:10.1016/j.neucom.2014.02.072 2015

[38] [39]

Vietnam Journal of Computer Science11(04), 531–552 (2024) https://doi.org/ 10.1142/S2196888824500155 https://doi.org/10.1142/S2196888824500155 33

Cao-Van, K., Minh, T.C., Minh, L.G., Quyen, T.T.B., Tan, H.M.: Soft-voting ensemble model: An efficient learning approach for predictive prostate cancer risk. Vietnam Journal of Computer Science11(04), 531–552 (2024) https://doi.org/ 10.1142/S2196888824500155 https://doi.org/10.1142/S2196888824500155 33

work page doi:10.1142/s2196888824500155 2024

[39] [40]

Journal of Network and Computer Applications212, 103560 (2023) https://doi.org/10.1016/j.jnca.2022.103560

Khan, M.A., Iqbal, N., Imran, Jamil, H., Kim, D.-H.: An optimized ensemble prediction model using automl based on soft voting classifier for network intrusion detection. Journal of Network and Computer Applications212, 103560 (2023) https://doi.org/10.1016/j.jnca.2022.103560

work page doi:10.1016/j.jnca.2022.103560 2023

[40] [41]

Sensors22(19) (2022) https://doi.org/10.3390/s22197268

Kibria, H.B., Nahiduzzaman, M., Goni, M.O.F., Ahsan, M., Haider, J.: An ensem- ble approach for the prediction of diabetes mellitus using a soft voting classifier with an explainable ai. Sensors22(19) (2022) https://doi.org/10.3390/s22197268

work page doi:10.3390/s22197268 2022

[41] [42]

Applied Sciences12(15) (2022) https://doi.org/10.3390/app12157554

Manconi, A., Armano, G., Gnocchi, M., Milanesi, L.: A soft-voting ensemble clas- sifier for detecting patients affected by covid-19. Applied Sciences12(15) (2022) https://doi.org/10.3390/app12157554

work page doi:10.3390/app12157554 2022

[42] [43]

Symmetry17(2) (2025) https://doi.org/ 10.3390/sym17020185

Sultan, S.Q., Javaid, N., Alrajeh, N., Aslam, M.: Machine learning-based stacking ensemble model for prediction of heart disease with explainable ai and k-fold cross-validation: A symmetric approach. Symmetry17(2) (2025) https://doi.org/ 10.3390/sym17020185

work page doi:10.3390/sym17020185 2025

[43] [44]

IEEE Transactions on Energy Conversion40(1), 557–567 (2025) https://doi.org/10.1109/TEC.2024.3420394

Rammurti Sharma, N., Rameshchandra Bhalja, B., Malik, O.P.: Machine learning-based severity assessment and incipient turn-to-turn fault detection in induction motors. IEEE Transactions on Energy Conversion40(1), 557–567 (2025) https://doi.org/10.1109/TEC.2024.3420394

work page doi:10.1109/tec.2024.3420394 2025

[44] [45]

Technologies13(3) (2025) https://doi.org/10.3390/ technologies13030088

Imani, M., Beikmohammadi, A., Arabnia, H.R.: Comprehensive analysis of random forest and xgboost performance with smote, adasyn, and gnus under varying imbalance levels. Technologies13(3) (2025) https://doi.org/10.3390/ technologies13030088

work page 2025

[45] [46]

Journal of the American Medi- cal Informatics Association31(11), 2529–2539 (2024) https://doi

Tian, M., Chen, B., Guo, A., Jiang, S., Zhang, A.R.: Reliable generation of privacy-preserving synthetic electronic health record time series via diffusion models. Journal of the American Medi- cal Informatics Association31(11), 2529–2539 (2024) https://doi. org/10.1093/jamia/ocae229 https://academic.oup.com/jamia/article- pdf/31/11/2529/59813606/ocae229.pdf

work page doi:10.1093/jamia/ocae229 2024

[46] [47]

Aerosol and Air Quality Research, 1048–1061 (2020) https://doi.org/10

Won, W.-S., Oh, R., Lee, W., Kim, K.-Y., Ku, S., Su, P.-C., Yoon, Y.-J.: Impact of fine particulate matter on visibility at incheon international airport, south korea. Aerosol and Air Quality Research, 1048–1061 (2020) https://doi.org/10. 4209/aaqr.2019.03.0106

work page 2020

[47] [48]

Atmosphere11(5) (2020) https://doi.org/ 10.3390/atmos11050461

Sun, X., Zhao, T., Liu, D., Gong, S., Xu, J., Ma, X.: Quantifying the influences of pm2.5 and relative humidity on change of atmospheric visibility over recent winters in an urban area of east china. Atmosphere11(5) (2020) https://doi.org/ 10.3390/atmos11050461

work page doi:10.3390/atmos11050461 2020

[48] [49]

Masset, R

Sfar, W., Amhaimar, L., Khalidi, A., Talbi, B.: A hybrid long-term photovoltaic power prediction model integrating a bilstm network with residual correction via 34 catboost. Results in Engineering29, 108898 (2026) https://doi.org/10.1016/j. rineng.2025.108898

work page doi:10.1016/j 2026

[49] [50]

doi: 10.24963/ijcai.2022/

Rozemberczki, B., Watson, L., Bayer, P., Yang, H.-T., Kiss, O., Nilsson, S., Sarkar, R.: The shapley value in machine learning. In: Raedt, L.D. (ed.) Proceed- ings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pp. 5572–5579. International Joint Conferences on Artificial Intelli- gence Organization, Vienna, Austri...

work page doi:10.24963/ijcai.2022/ 2022

[50] [51]

https://arxiv.org/abs/2505.03992

Briscoe, J., Kepler, G., Deford, D., Gebremedhin, A.: Algorithmic Accountability in Small Data: Sample-Size-Induced Bias Within Classification Metrics (2025). https://arxiv.org/abs/2505.03992

work page arXiv 2025

[51] [52]

In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J

Francazi, E., Baity-Jesi, M., Lucchi, A.: A theoretical analysis of the learn- ing dynamics under class imbalance. In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) Proceedings of the 40th Inter- national Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 202, pp. 10285–10322. PMLR, Honolul...

work page 2023

[52] [53]

Journal of Advances in Modeling Earth Sys- tems15(12), 2023–003792 (2023) https://doi.org/10.1029/2023MS003792 https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2023MS003792

Smith, T.A., Penny, S.G., Platt, J.A., Chen, T.-C.: Temporal subsam- pling diminishes small spatial scales in recurrent neural network emulators of geophysical turbulence. Journal of Advances in Modeling Earth Sys- tems15(12), 2023–003792 (2023) https://doi.org/10.1029/2023MS003792 https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2023MS003792. e202...

work page doi:10.1029/2023ms003792 2023