Visibility nowcasting in South Korea: a machine learning approach to class imbalance and distribution shift
Pith reviewed 2026-05-22 01:58 UTC · model grok-4.3
The pith
Visibility nowcasts in South Korean cities lose accuracy on new data because of shifts in meteorological and pollutant distributions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that an ensemble of machine learning and deep learning models, after SMOTENC and CTGAN are used to correct class imbalance in the scarce low-visibility cases, achieves strong results during cross-validation on 2018-2020 data yet shows a clear drop in predictive performance when applied to the 2021 test set. The authors attribute this degradation to a distributional shift between the training and test periods and support the attribution by computing the Wasserstein distance on the feature that SHAP analysis ranks as most influential.
What carries the argument
The Wasserstein distance computed on the single highest-SHAP-importance feature, used to quantify and confirm the distributional shift between the 2018-2020 training window and the 2021 test window.
If this is right
- Nowcasting systems for visibility must detect and adapt to year-to-year changes in the joint distribution of meteorological and air-pollutant variables.
- Cross-validation scores on historical data cannot be taken as reliable indicators of future operational performance.
- Operational visibility models require ongoing monitoring of input-feature distributions to maintain usefulness over time.
Where Pith is reading between the lines
- Periodic retraining or online adaptation of the ensemble may be needed to keep pace with evolving environmental conditions.
- The same imbalance-plus-shift problem is likely to appear in other time-series environmental forecasts such as air-quality or precipitation nowcasting.
- Testing the framework on additional future years would reveal whether the performance decline accelerates or stabilizes.
Load-bearing premise
The performance drop on the 2021 test set stems mainly from a change in data distribution rather than from model overfitting, alterations in measurement methods, or other unaccounted variables, and that the Wasserstein distance on one SHAP-selected feature is enough to establish this cause.
What would settle it
A finding that the Wasserstein distance on the top SHAP feature is small yet predictive skill on 2021 data remains low, or that retraining on data that includes periods closer to 2021 restores skill without any change to the shift measure, would undermine the claim that distributional shift is the primary driver.
read the original abstract
Atmospheric visibility is a critical variable for transportation safety and air quality management, however, accurate prediction remains challenging due to the complex interactions between meteorological conditions and air pollutants, as well as the rarity of low-visibility events. This study introduces a machine learning framework to nowcast visibility in six major South Korean cities. To handle the imbalance in the 2018-2020 training data, we applied the Synthetic Minority Over-sampling Technique with Nominal and Continuous (SMOTENC) and Conditional Tabular Generative Adversarial Network (CTGAN). An ensemble approach combining machine learning and deep learning models was then used and evaluated on a 2021 test dataset. The results revealed a marked decline in predictive performance in the test set compared to the cross-validation phase. This degradation was attributed to a distributional shift between training and testing periods, which was quantitatively confirmed by measuring the Wasserstein distance of the most influential feature identified by SHAP analysis. In general, this study presents a methodology that aims to simultaneously address the dual challenges of data imbalance and temporal distributional shifts, and emphasizes the necessity of accounting for evolving external environmental factors when implementing nowcasting models on time-series data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a machine learning framework for nowcasting visibility in six major South Korean cities using 2018-2020 training data. It applies SMOTENC and CTGAN to address class imbalance for rare low-visibility events, combines machine learning and deep learning models in an ensemble, and evaluates on a 2021 test set. The marked decline in performance relative to cross-validation is attributed to temporal distributional shift, with quantitative support from the Wasserstein distance computed on the single most influential feature identified by post-hoc SHAP analysis.
Significance. If the attribution to distributional shift is substantiated, the work provides a concrete example of handling both class imbalance and temporal shifts in environmental nowcasting, which is relevant for transportation safety and air quality applications. The choice of an independent metric (Wasserstein distance) on a data-driven feature offers a step toward falsifiable explanations in applied ML for atmospheric time series, though the current evidence is limited in scope.
major comments (2)
- Abstract: The central claim that the observed performance decline on the 2021 test set is primarily caused by distributional shift rests on Wasserstein distance computed only for the single most influential feature from SHAP analysis. This does not establish the shift as the dominant cause without reporting divergence metrics across the full feature set or joint distributions, nor controlled comparisons isolating the shift from alternatives such as overfitting to 2018-2020 patterns or unmodeled changes in pollutant measurement protocols.
- Abstract: No specific performance metrics (e.g., precision, recall, F1, or AUC with error bars), exact ensemble architectures, hyperparameter details, or the procedure for post-hoc SHAP feature selection are reported, which prevents assessment of whether the decline magnitude is consistent with the claimed shift or with other factors.
minor comments (1)
- Abstract: The sentence beginning 'This degradation was attributed...' would benefit from a brief parenthetical note on the exact feature used for the Wasserstein calculation to improve immediate readability.
Simulated Author's Rebuttal
We are grateful to the referee for providing a thorough review of our manuscript. The comments have prompted us to clarify several aspects of our methodology and results. Below, we respond to each major comment in turn.
read point-by-point responses
-
Referee: Abstract: The central claim that the observed performance decline on the 2021 test set is primarily caused by distributional shift rests on Wasserstein distance computed only for the single most influential feature from SHAP analysis. This does not establish the shift as the dominant cause without reporting divergence metrics across the full feature set or joint distributions, nor controlled comparisons isolating the shift from alternatives such as overfitting to 2018-2020 patterns or unmodeled changes in pollutant measurement protocols.
Authors: We acknowledge the validity of this concern. Our attribution to distributional shift is based on the most influential feature per SHAP, which we chose as a focused, interpretable approach. To strengthen this, we will add Wasserstein distance calculations for additional top features from the SHAP analysis in the revised manuscript. We will also discuss potential confounding factors like overfitting and measurement changes. However, a comprehensive set of controlled comparisons to definitively isolate the shift is not feasible within the current study scope and will be listed as a limitation. revision: partial
-
Referee: Abstract: No specific performance metrics (e.g., precision, recall, F1, or AUC with error bars), exact ensemble architectures, hyperparameter details, or the procedure for post-hoc SHAP feature selection are reported, which prevents assessment of whether the decline magnitude is consistent with the claimed shift or with other factors.
Authors: We agree that the abstract should provide more quantitative context. In the revision, we will incorporate specific performance metrics including precision, recall, F1, and AUC with error bars for both cross-validation and the 2021 test set. We will also briefly describe the ensemble architecture (an ensemble of tree-based models and deep learning models), key hyperparameters, and the post-hoc SHAP feature selection procedure. These details are elaborated in the methods and results sections, but summarizing them in the abstract will improve accessibility. revision: yes
- Conducting controlled comparisons to fully isolate distributional shift from alternatives like overfitting or changes in measurement protocols.
Circularity Check
No significant circularity: empirical attribution relies on independent data metric
full rationale
The paper performs a standard temporal train-test split on real meteorological and pollutant data (2018-2020 training, 2021 testing), directly measures predictive performance drop on the held-out set, and attributes it to distributional shift via Wasserstein distance computed on the single highest-SHAP-importance feature. This metric is an external statistical comparison of observed data distributions and does not reduce to any fitted model parameter, self-definition, or self-citation chain by construction. The ensemble, SMOTENC, and CTGAN steps address imbalance separately from the shift diagnosis. No load-bearing step equates a claimed result to its own inputs; the derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- Ensemble model selection and hyperparameters
- Definition of low-visibility class threshold
axioms (1)
- domain assumption The 2018-2020 training data distribution is sufficiently stationary within the period to allow effective model training despite known temporal variability in environmental data.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
This degradation was attributed to a distributional shift ... quantitatively confirmed by measuring the Wasserstein distance of the most influential feature identified by SHAP analysis.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we applied the Synthetic Minority Over-sampling Technique with Nominal and Continuous (SMOTENC) and Conditional Tabular Generative Adversarial Network (CTGAN)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Environmental Research159, 466–473 (2017) https://doi.org/10.1016/j.envres.2017.08.018
Hu, Y., Yao, L., Cheng, Z., Wang, Y.: Long-term atmospheric visibility trends in megacities of china, india and the united states. Environmental Research159, 466–473 (2017) https://doi.org/10.1016/j.envres.2017.08.018
-
[2]
Advances in Atmospheric Sciences36(10), 1060–1077 (2019) https://doi.org/10.1007/s00376-019-8252-5
Qian, W., Leung, J.C.-H., Chen, Y., Huang, S.: Applying anomaly-based weather analysis to the prediction of low visibility associated with the coastal fog at ningbo-zhoushan port in east china. Advances in Atmospheric Sciences36(10), 1060–1077 (2019) https://doi.org/10.1007/s00376-019-8252-5
-
[3]
International Journal of Forecasting39(2), 992–1004 (2023) https://doi.org/10.1016/j.ijforecast.2022
Ortega, L.C., Otero, L.D., Solomon, M., Otero, C.E., Fabregas, A.: Deep learning models for visibility forecasting using climatological data. International Journal of Forecasting39(2), 992–1004 (2023) https://doi.org/10.1016/j.ijforecast.2022. 03.009
-
[4]
IEEE Access12, 72530–72543 (2024) https://doi.org/10.1109/ACCESS.2024.3401091
Raj, S., Deo, R.C., Sharma, E., Prasad, R., Dinh, T., Salcedo-Sanz, S.: Atmo- spheric visibility and cloud ceiling predictions with hybrid iis-lstm integrated model: Case studies for fiji’s aviation industry. IEEE Access12, 72530–72543 (2024) https://doi.org/10.1109/ACCESS.2024.3401091
-
[5]
In: 2019 IEEE International Systems Conference (SysCon), pp
Ortega, L., Otero, L.D., Otero, C.: Application of machine learning algorithms for visibility classification. In: 2019 IEEE International Systems Conference (SysCon), pp. 1–5 (2019). https://doi.org/10.1109/SYSCON.2019.8836910
-
[6]
Weather and Climate Extremes28, 100243 (2020) https://doi.org/10.1016/j.wace.2020.100243
Taszarek, M., Kendzierski, S., Pilguj, N.: Hazardous weather affecting european airports: Climatological estimates of situations with limited visibility, thun- derstorm, low-level wind shear and snowfall from era5. Weather and Climate Extremes28, 100243 (2020) https://doi.org/10.1016/j.wace.2020.100243
-
[7]
Zhai, B., Lu, J., Wang, Y., Wu, B.: Real-time prediction of crash risk on free- ways under fog conditions. International Journal of Transportation Science and Technology9(4), 287–298 (2020) https://doi.org/10.1016/j.ijtst.2020.02.001
-
[8]
Journal of Navigation77(4), 436–456 (2024) https://doi.org/10.1017/S0373463324000377
Ding, G., Li, R., Li, C., Yang, B., Li, Y., Yu, Q., Geng, X., Yao, Z., Zhang, K., Wen, J.: Review of ship navigation safety in fog. Journal of Navigation77(4), 436–456 (2024) https://doi.org/10.1017/S0373463324000377
-
[9]
Lee, Y.-S., Reno, K.-Y., Choi, R., Kim, K.-H., Park, S.-H., Nam, H.-J., Kim, S.- B.: Improvement of automatic present weather observation with in situ visibility and humidity measurements. Journal of the Korean Meteorological Society29, 439–450 (2019) https://doi.org/10.14191/Atmos.2019.29.4.439
-
[10]
Remote Sensing13(11) (2021) https://doi.org/10.3390/rs13112096 30
Yu, Z., Qu, Y., Wang, Y., Ma, J., Cao, Y.: Application of machine-learning- based fusion model in visibility forecast: A case study of shanghai, china. Remote Sensing13(11) (2021) https://doi.org/10.3390/rs13112096 30
-
[11]
Weather and Forecasting37(12), 2263–2274 (2022) https://doi.org/10.1175/ WAF-D-22-0053.1
Kim, B.-Y., Belorid, M., Cha, J.W.: Short-term visibility prediction using tree- based machine learning algorithms and numerical weather prediction data. Weather and Forecasting37(12), 2263–2274 (2022) https://doi.org/10.1175/ WAF-D-22-0053.1
work page 2022
-
[12]
Zhou, B., Yin, Y., Zang, Z., Niu, D., Gao, H., Fu, X.: An effective atmo- spheric visibility forecasting model based on improved rainformer. IET Confer- ence Proceedings2024, 221–226 (2025) https://doi.org/10.1049/icp.2025.0028 https://digital-library.theiet.org/doi/pdf/10.1049/icp.2025.0028
-
[13]
Chantry, M., Christensen, H., Dueben, P., Palmer, T.: Opportunities and chal- lenges for machine learning in weather and climate modelling: hard, medium and soft ai. Philosophical Transactions of the Royal Society A: Mathemati- cal, Physical and Engineering Sciences379(2194), 20200083 (2021) https:// doi.org/10.1098/rsta.2020.0083 https://royalsocietypubl...
-
[14]
Fathi, M., Kashani, M.H., Jameii, S.M., Mahdipour, E.: Big data analyt- ics in weather forecasting: A systematic review. Archives of Computational Methods in Engineering29(2), 1247–1275 (2022) https://doi.org/10.1007/ s11831-021-09616-4
work page 2022
-
[15]
Applied Sciences9(22) (2019) https://doi.org/10.3390/ app9224931
Aguasca-Colomo, R., Castellanos-Nieves, D., M´ endez, M.: Comparative analysis of rainfall prediction models using machine learning in islands with complex orog- raphy: Tenerife island. Applied Sciences9(22) (2019) https://doi.org/10.3390/ app9224931
work page 2019
-
[16]
SMOTE: Synthetic Minority Over-sampling Technique
Bowyer, K.W., Chawla, N.V., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. CoRRabs/1106.1813(2011) 1106.1813
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[17]
In: Wallach, H., Larochelle, H., Beygelz- imer, A., Alch´ e-Buc, F., Fox, E., Garnett, R
Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional gan. In: Wallach, H., Larochelle, H., Beygelz- imer, A., Alch´ e-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., Vancou- ver, Canada (2019). https://proceedings.neurips.cc/paper ...
work page 2019
-
[18]
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN (2017). https://arxiv. org/abs/1701.07875
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[19]
Atmospheric Environment42(7), 1424–1435 (2008) https://doi
Deng, X., Tie, X., Wu, D., Zhou, X., Bi, X., Tan, H., Li, F., Jiang, C.: Long- term trend of visibility and its characterizations in the pearl river delta (prd) region, china. Atmospheric Environment42(7), 1424–1435 (2008) https://doi. org/10.1016/j.atmosenv.2007.11.025
-
[21]
Zhang, J., Zhao, P., Wang, X., Zhang, J., Liu, J., Li, B., Zhou, Y., Wang, H.: Main factors influencing winter visibility at the xinjin flight college of the civil aviation flight university of china. Advances in Meteorology2020(1), 8899750 (2020) https://doi.org/10.1155/2020/8899750 https://onlinelibrary.wiley.com/doi/pdf/10.1155/2020/8899750
-
[22]
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16, pp. 785–794. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939785
-
[23]
In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.- Y.: Lightgbm: A highly efficient gradient boosting decision tree. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc., Long Beach, CA, USA (2017)....
work page 2017
-
[24]
Deep residual learning for image recognition,
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
-
[25]
In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W
Gorishniy, Y., Rubachev, I., Khrulkov, V., Babenko, A.: Revisiting deep learning models for tabular data. In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems (2021). https: //openreview.net/forum?id=i Q1yrOegLY
work page 2021
-
[26]
Ke, G., Xu, Z., Zhang, J., Bian, J., Liu, T.-Y.: Deepgbm: A deep learning frame- work distilled by gbdt for online prediction tasks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Min- ing. KDD ’19, pp. 384–394. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3292500.333...
-
[27]
Schaefer, J.T.: The critical success index as an indicator of warning skill. Weather and Forecasting5(4), 570–575 (1990) https://doi.org/10.1175/1520-0434(1990) 005⟨0570:TCSIAA⟩2.0.CO;2
-
[28]
Nabilla, V.H., Fitria, D., Permana, D., Fitri, F.: Comparison of haversine and euclidean distance formulas for calculating distance between regencies in west sumatra. UNP Journal of Statistics and Data Science1(3), 120–125 (2023) https: //doi.org/10.24036/ujsds/vol1-iss3/39 32
-
[29]
Environment International169, 107538 (2022) https://doi.org/10.1016/j.envint.2022.107538
Xu, C., Wang, J., Hu, M., Wang, W.: A new method for interpolation of miss- ing air quality data at monitor stations. Environment International169, 107538 (2022) https://doi.org/10.1016/j.envint.2022.107538
-
[30]
PLOS ONE19(9), 1–39 (2024) https://doi.org/10.1371/journal.pone.0306303
Hua, V., Nguyen, T., Dao, M.-S., Nguyen, H.D., Nguyen, B.T.: The impact of data imputation on air quality prediction problem. PLOS ONE19(9), 1–39 (2024) https://doi.org/10.1371/journal.pone.0306303
-
[31]
Parra-Plazas, J., Gaona-Garcia, P., Plazas-Nossa, L.: Time series outlier removal and imputing methods based on colombian weather stations data. Environmental Science and Pollution Research International30(28), 72319–72335 (2023) https: //doi.org/10.1007/s11356-023-27176-x
-
[32]
Porcelli, L., Fiore, U., Palmieri, F.: Generative models with helical time encod- ing for seasonal time series forecasting. Engineering Applications of Artificial Intelligence162, 112780 (2025) https://doi.org/10.1016/j.engappai.2025.112780
-
[33]
Air2(4), 444–467 (2024) https://doi.org/10.3390/ air2040026
Calastrini, F., Messeri, G., Orlandi, A.: Long-range mineral dust transport events in mediterranean countries. Air2(4), 444–467 (2024) https://doi.org/10.3390/ air2040026
work page 2024
-
[34]
Journal of Fundamental and Applied Sciences10, 1256–1267 (2018) https://doi.org/10
Haris, N.A., Azlan, A., Nor, N.M., Sharif, N.A.M.: Improving air pollution index (api) predictive accuracy using time series cross-validation technique. Journal of Fundamental and Applied Sciences10, 1256–1267 (2018) https://doi.org/10. 4314/jfas.v10i1s.93
work page 2018
-
[35]
https://arxiv.org/abs/2511.11945
Temraz, M., Keane, M.T.: Augmenting The Weather: A Hybrid Counterfactual- SMOTE Algorithm for Improving Crop Growth Prediction When Climate Changes (2025). https://arxiv.org/abs/2511.11945
-
[36]
IEEE Access10, 30655–30665 (2022) https://doi.org/10
Sharma, A., Singh, P.K., Chandra, R.: Smotified-gan for class imbalanced pattern classification problems. IEEE Access10, 30655–30665 (2022) https://doi.org/10. 1109/ACCESS.2022.3158977
-
[37]
Information Sciences with Applications5, 1–10 (2025) https://doi.org/10.61356/j.iswa.2025.5466
Abdullah, W., Bacanin, N., Venkatachalam, K.: Ensemble rf-knn model for accu- rate prediction of drought levels. Information Sciences with Applications5, 1–10 (2025) https://doi.org/10.61356/j.iswa.2025.5466
-
[38]
Neurocomputing149, 275– 284 (2015) https://doi.org/10.1016/j.neucom.2014.02.072
Cao, J., Kwong, S., Wang, R., Li, X., Li, K., Kong, X.: Class-specific soft voting based multiple extreme learning machines ensemble. Neurocomputing149, 275– 284 (2015) https://doi.org/10.1016/j.neucom.2014.02.072 . Advances in neural networks Advances in Extreme Learning Machines
-
[39]
Cao-Van, K., Minh, T.C., Minh, L.G., Quyen, T.T.B., Tan, H.M.: Soft-voting ensemble model: An efficient learning approach for predictive prostate cancer risk. Vietnam Journal of Computer Science11(04), 531–552 (2024) https://doi.org/ 10.1142/S2196888824500155 https://doi.org/10.1142/S2196888824500155 33
-
[40]
Khan, M.A., Iqbal, N., Imran, Jamil, H., Kim, D.-H.: An optimized ensemble prediction model using automl based on soft voting classifier for network intrusion detection. Journal of Network and Computer Applications212, 103560 (2023) https://doi.org/10.1016/j.jnca.2022.103560
-
[41]
Sensors22(19) (2022) https://doi.org/10.3390/s22197268
Kibria, H.B., Nahiduzzaman, M., Goni, M.O.F., Ahsan, M., Haider, J.: An ensem- ble approach for the prediction of diabetes mellitus using a soft voting classifier with an explainable ai. Sensors22(19) (2022) https://doi.org/10.3390/s22197268
-
[42]
Applied Sciences12(15) (2022) https://doi.org/10.3390/app12157554
Manconi, A., Armano, G., Gnocchi, M., Milanesi, L.: A soft-voting ensemble clas- sifier for detecting patients affected by covid-19. Applied Sciences12(15) (2022) https://doi.org/10.3390/app12157554
-
[43]
Symmetry17(2) (2025) https://doi.org/ 10.3390/sym17020185
Sultan, S.Q., Javaid, N., Alrajeh, N., Aslam, M.: Machine learning-based stacking ensemble model for prediction of heart disease with explainable ai and k-fold cross-validation: A symmetric approach. Symmetry17(2) (2025) https://doi.org/ 10.3390/sym17020185
-
[44]
IEEE Transactions on Energy Conversion40(1), 557–567 (2025) https://doi.org/10.1109/TEC.2024.3420394
Rammurti Sharma, N., Rameshchandra Bhalja, B., Malik, O.P.: Machine learning-based severity assessment and incipient turn-to-turn fault detection in induction motors. IEEE Transactions on Energy Conversion40(1), 557–567 (2025) https://doi.org/10.1109/TEC.2024.3420394
-
[45]
Technologies13(3) (2025) https://doi.org/10.3390/ technologies13030088
Imani, M., Beikmohammadi, A., Arabnia, H.R.: Comprehensive analysis of random forest and xgboost performance with smote, adasyn, and gnus under varying imbalance levels. Technologies13(3) (2025) https://doi.org/10.3390/ technologies13030088
work page 2025
-
[46]
Journal of the American Medi- cal Informatics Association31(11), 2529–2539 (2024) https://doi
Tian, M., Chen, B., Guo, A., Jiang, S., Zhang, A.R.: Reliable generation of privacy-preserving synthetic electronic health record time series via diffusion models. Journal of the American Medi- cal Informatics Association31(11), 2529–2539 (2024) https://doi. org/10.1093/jamia/ocae229 https://academic.oup.com/jamia/article- pdf/31/11/2529/59813606/ocae229.pdf
-
[47]
Aerosol and Air Quality Research, 1048–1061 (2020) https://doi.org/10
Won, W.-S., Oh, R., Lee, W., Kim, K.-Y., Ku, S., Su, P.-C., Yoon, Y.-J.: Impact of fine particulate matter on visibility at incheon international airport, south korea. Aerosol and Air Quality Research, 1048–1061 (2020) https://doi.org/10. 4209/aaqr.2019.03.0106
work page 2020
-
[48]
Atmosphere11(5) (2020) https://doi.org/ 10.3390/atmos11050461
Sun, X., Zhao, T., Liu, D., Gong, S., Xu, J., Ma, X.: Quantifying the influences of pm2.5 and relative humidity on change of atmospheric visibility over recent winters in an urban area of east china. Atmosphere11(5) (2020) https://doi.org/ 10.3390/atmos11050461
-
[49]
Sfar, W., Amhaimar, L., Khalidi, A., Talbi, B.: A hybrid long-term photovoltaic power prediction model integrating a bilstm network with residual correction via 34 catboost. Results in Engineering29, 108898 (2026) https://doi.org/10.1016/j. rineng.2025.108898
work page doi:10.1016/j 2026
-
[50]
Rozemberczki, B., Watson, L., Bayer, P., Yang, H.-T., Kiss, O., Nilsson, S., Sarkar, R.: The shapley value in machine learning. In: Raedt, L.D. (ed.) Proceed- ings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pp. 5572–5579. International Joint Conferences on Artificial Intelli- gence Organization, Vienna, Austri...
-
[51]
https://arxiv.org/abs/2505.03992
Briscoe, J., Kepler, G., Deford, D., Gebremedhin, A.: Algorithmic Accountability in Small Data: Sample-Size-Induced Bias Within Classification Metrics (2025). https://arxiv.org/abs/2505.03992
-
[52]
In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J
Francazi, E., Baity-Jesi, M., Lucchi, A.: A theoretical analysis of the learn- ing dynamics under class imbalance. In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) Proceedings of the 40th Inter- national Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 202, pp. 10285–10322. PMLR, Honolul...
work page 2023
-
[53]
Smith, T.A., Penny, S.G., Platt, J.A., Chen, T.-C.: Temporal subsam- pling diminishes small spatial scales in recurrent neural network emulators of geophysical turbulence. Journal of Advances in Modeling Earth Sys- tems15(12), 2023–003792 (2023) https://doi.org/10.1029/2023MS003792 https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2023MS003792. e202...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.