pith. sign in

arxiv: 2605.15083 · v1 · pith:OOFR3P3Onew · submitted 2026-05-14 · 💻 cs.LG · cs.AI

Novel Dynamic Batch-Sensitive Adam Optimiser for Vehicular Accident Injury Severity Prediction

Pith reviewed 2026-06-30 21:11 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords DBS-Adamdynamic optimizerbatch difficulty scoreimbalanced sequential dataaccident injury severityBi-LSTMgradient normlearning rate scaling
0
0 comments X

The pith

DBS-Adam scales learning rates by batch difficulty scores from gradient norms to handle imbalanced sequential accident data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Dynamic Batch-Sensitive Adam (DBS-Adam), an optimizer that computes a batch difficulty score from exponential moving averages of gradient norms and batch loss, then scales the learning rate higher for difficult batches and lower for easier ones. This is tested by pairing the optimizer with Bi-Directional LSTM networks on vehicular accident injury severity prediction, after applying SMOTE-ENN resampling and Focal Loss to address class imbalance. The central goal is to improve convergence stability on minority classes in sequential data without extra architectural changes. Experiments across multiple seeds show DBS-Adam reaching 95.22 percent accuracy and 96.11 percent precision while delivering statistically significant gains over AMSGrad, AdamW, and AdaBound.

Core claim

DBS-Adam improves training on imbalanced sequential datasets by dynamically scaling the learning rate with a batch difficulty score derived from exponential moving averages of gradient norms and batch loss, yielding 95.22 percent test accuracy, 96.11 percent precision, 95.28 percent recall, 95.39 percent F1-score, and 0.0086 test loss while outperforming AMSGrad, AdamW, and AdaBound with p equals 0.020 on precision.

What carries the argument

The batch difficulty score, formed from exponential moving averages of per-batch gradient norms and loss values, which directly multiplies the learning rate to increase updates on hard batches and decrease them on easy ones.

If this is right

  • DBS-Adam reaches 95.22 percent test accuracy and statistically significant precision gains over three standard Adam variants.
  • The optimizer integrates directly with Bi-LSTM, SMOTE-ENN, and Focal Loss to manage class imbalance in sequential accident records.
  • The resulting model supports real-time severity classification for emergency response planning.
  • Performance holds across five random seeds on the tested vehicular injury dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The difficulty-score mechanism could be ported to other first-order optimizers to reduce reliance on external resampling for imbalance.
  • If the score remains stable across datasets, it may simplify pipelines for other time-series classification tasks with rare events.
  • Direct comparison on larger or noisier sequential datasets would test whether the gains generalize beyond the current accident records.

Load-bearing premise

The batch difficulty score supplies a stable, non-overfitting signal that genuinely aids convergence on minority classes rather than capturing noise from the particular accident dataset and resampling steps.

What would settle it

Re-run the Bi-LSTM experiments with DBS-Adam but replace the computed difficulty score with a fixed constant or random values and check whether the reported accuracy and precision advantages disappear.

read the original abstract

The choice of optimiser is important in deep learning, as it strongly influences model efficiency and speed of convergence. However, many commonly used optimisers encounter difficulties when applied to imbalanced and sequential datasets, limiting their ability to capture patterns of minority classes. In this study, we propose Dynamic Batch-Sensitive Adam (DBS-Adam), an optimiser that dynamically scales the learning rate using a batch difficulty score derived from exponential moving averages of gradient norms and batch loss. DBS-Adam improves training stability and accelerates convergence by increasing updates for difficult batches and reducing them for easier ones. We evaluate DBS-Adam by integrating it with Bi-Directional LSTM networks for accident injury severity prediction, addressing class imbalance through SMOTE-ENN resampling and Focal Loss. Four experimental configurations compare baseline Bi-LSTM models and alternative architectures to assess optimiser impact. Rigorous comparison against state-of-the-art optimisers (AMSGrad, AdamW, AdaBound) across five random seeds demonstrated DBS-Adam's competitive performance with statistically significant precision improvements (p=0.020). Results indicate that DBS-Adam outperforms standard optimisation approaches, achieving 95.22% test accuracy, 96.11% precision, 95.28% recall, 95.39% F1-score, and a test loss of 0.0086. The proposed framework enables effective real-time accident severity classification for targeted emergency response and road safety interventions, demonstrating the value of DBS-Adam for learning from imbalanced sequential data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Dynamic Batch-Sensitive Adam (DBS-Adam), an optimizer that dynamically scales the learning rate via a batch difficulty score computed from exponential moving averages of gradient norms and batch loss. It integrates this optimizer with Bi-Directional LSTM networks for vehicular accident injury severity prediction, using SMOTE-ENN resampling and Focal Loss to address class imbalance, and reports superior performance (95.22% accuracy, 96.11% precision) with p=0.020 over AMSGrad, AdamW, and AdaBound across five random seeds.

Significance. If the batch difficulty score supplies a stable, generalizable signal that improves convergence on minority classes in imbalanced sequential data without overfitting to dataset idiosyncrasies, the method could aid optimization in safety-critical applications such as real-time accident classification. The experimental setup with multiple optimizer baselines and a statistical test is a positive step, but the absence of the explicit formulation and isolating ablations prevents assessment of whether the reported gains arise from the claimed mechanism.

major comments (3)
  1. [Abstract and Proposed Method] Abstract and Proposed Method section: No equations are provided for the batch difficulty score (how EMA gradient norms and batch loss are combined) or the difficulty-to-learning-rate mapping function, despite these being the core of DBS-Adam and listed among the free parameters. This is load-bearing because the central performance claims (95.22% test accuracy, 96.11% precision, p=0.020) are attributed to this component.
  2. [Experiments] Experiments section: Only five random seeds are used and no ablation is reported that disables or randomizes the difficulty score while holding SMOTE-ENN resampling, Focal Loss, and Bi-LSTM architecture fixed. This undermines the claim that the score improves minority-class convergence rather than fitting noise in the particular accident dataset.
  3. [Results] Results section: The p=0.020 for precision improvement is reported without specifying the statistical test, providing variance or standard deviation across the five seeds, or applying multiple-comparison correction, making it impossible to evaluate the robustness of the statistical significance claim.
minor comments (2)
  1. [Abstract] The abstract states that four experimental configurations were compared but does not enumerate them or link them to the reported metrics.
  2. [Experiments] No learning curves or convergence plots are mentioned, which would help verify the claimed acceleration on difficult batches.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important issues of clarity, experimental rigor, and statistical reporting. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract and Proposed Method] Abstract and Proposed Method section: No equations are provided for the batch difficulty score (how EMA gradient norms and batch loss are combined) or the difficulty-to-learning-rate mapping function, despite these being the core of DBS-Adam and listed among the free parameters. This is load-bearing because the central performance claims (95.22% test accuracy, 96.11% precision, p=0.020) are attributed to this component.

    Authors: We agree that the explicit equations for the batch difficulty score and its mapping to the learning-rate scale factor are essential for reproducibility and for allowing readers to evaluate the claimed mechanism. Their omission from the current manuscript was an oversight in the presentation of the method. In the revised version we will insert the full mathematical definitions: the difficulty score computed from the EMA of per-batch gradient norms and loss values, the precise combination rule, and the functional form that converts the difficulty score into a multiplicative adjustment to the base learning rate. The free parameters and the values used in the reported experiments will also be listed explicitly. revision: yes

  2. Referee: [Experiments] Experiments section: Only five random seeds are used and no ablation is reported that disables or randomizes the difficulty score while holding SMOTE-ENN resampling, Focal Loss, and Bi-LSTM architecture fixed. This undermines the claim that the score improves minority-class convergence rather than fitting noise in the particular accident dataset.

    Authors: The existing experiments already hold the Bi-LSTM architecture, SMOTE-ENN resampling, and Focal Loss fixed while varying only the optimizer, thereby isolating the contribution of DBS-Adam relative to AMSGrad, AdamW, and AdaBound. Nevertheless, we acknowledge that an explicit ablation that turns the difficulty-score component on and off (or replaces it with a random scalar) would provide more direct evidence that the performance gain stems from the proposed mechanism rather than dataset-specific noise. We will add this ablation study to the revised manuscript. With respect to the number of random seeds, five seeds is the number reported; we will retain this number for computational practicality but will also report per-seed metrics so readers can assess variability. revision: partial

  3. Referee: [Results] Results section: The p=0.020 for precision improvement is reported without specifying the statistical test, providing variance or standard deviation across the five seeds, or applying multiple-comparison correction, making it impossible to evaluate the robustness of the statistical significance claim.

    Authors: We will revise the Results section to state the exact statistical test (a paired t-test across the five independent seeds), to report mean performance together with standard deviation for each metric, and to indicate whether any multiple-comparison correction was applied. If the original analysis did not include correction, we will either apply an appropriate correction (e.g., Bonferroni) or discuss the implications for the reported p-value. These additions will allow readers to judge the robustness of the significance claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation is independent of any self-referential derivation

full rationale

The paper proposes DBS-Adam via an explicit construction (EMA-based batch difficulty score for dynamic LR scaling) and reports standard empirical results on a fixed dataset against baselines. No derivation chain, uniqueness theorem, or ansatz is presented that reduces the claimed performance metrics or the optimizer definition to its own fitted inputs by construction. The comparison uses external baselines (AMSGrad etc.) and reports p-values from statistical tests; these are not tautological. The evaluation is self-contained against the chosen benchmarks and does not rely on load-bearing self-citations or renaming of known results.

Axiom & Free-Parameter Ledger

2 free parameters · 0 axioms · 1 invented entities

The central claim rests on the empirical superiority of one particular functional form for the difficulty score; that form introduces at least two smoothing factors (the EMA decay rates) and a scaling function whose exact shape is not given in the abstract, all of which must be chosen or fitted.

free parameters (2)
  • EMA decay rates for gradient norm and loss
    Two exponential smoothing constants that control how quickly the difficulty score reacts to recent batches; their values are not stated and must be selected to obtain the reported numbers.
  • Difficulty-to-learning-rate mapping function
    The rule that converts the scalar difficulty score into a multiplicative adjustment of the base learning rate is not specified and therefore functions as an additional free parameter.
invented entities (1)
  • batch difficulty score no independent evidence
    purpose: Scalar that modulates per-batch learning rate inside Adam
    A derived quantity whose definition is internal to the optimizer; no independent physical or statistical justification is supplied beyond the claim that it improves convergence on this dataset.

pith-pipeline@v0.9.1-grok · 5832 in / 1685 out tokens · 30391 ms · 2026-06-30T21:11:38.187357+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

68 extracted references · 50 canonical work pages · 1 internal anchor

  1. [2]

    Exploring Optimization Dynamics: Hybrid Approaches Combining Adaptive and Traditional Techniques for Deep Learning Models,

    Y. Pattanaik et al., “Exploring Optimization Dynamics: Hybrid Approaches Combining Adaptive and Traditional Techniques for Deep Learning Models,” in 2025 4th International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE) , IEEE, Apr. 2025, pp. 1 –6. doi: 10.1109/ICDCECE65353.2025.11035942

  2. [3]

    A Study of the Optimization Algorithms in Deep Learning,

    R. Zaheer and H. Shaziya, “A Study of the Optimization Algorithms in Deep Learning,” in 2019 Third International Conference on Inventive Systems and Control (ICISC) , IEEE, Jan. 2019, pp. 536 –539. doi: 10.1109/ICISC44355.2019.9036442

  3. [4]

    A Comparison of Optimization Algorithms for Deep Learning,

    D. Soydaner, “A Comparison of Optimization Algorithms for Deep Learning,” Intern J Pattern Recognit Artif Intell, vol. 34, no. 13, p. 2052013, Dec. 2020, doi: 10.1142/S0218001420520138

  4. [5]

    Road safety in Nigeria: unravelling the challenges, measures, and strategies for improvement,

    C. Uzondu, S. Jamson, and G. Marsden, “Road safety in Nigeria: unravelling the challenges, measures, and strategies for improvement,” Int J Inj Contr Saf Promot , vol. 29, no. 4, pp. 522 –532, Oct. 2022, doi: 10.1080/17457300.2022.2087230

  5. [6]

    Road traffic accidents in Pakistan: unveiling the emergency service challenge,

    M. A. Abdullah and M. A. Yasin, “Road traffic accidents in Pakistan: unveiling the emergency service challenge,” Journal of Basic & Clinical Medical Sciences, vol. 2, pp. 1–3, Jan. 2024, doi: 10.58398/0002.000007

  6. [7]

    Building machine-learning models for reducing the severity of bicyclist road traffic injuries,

    S. Birfir, A. Elalouf, and T. Rosenbloom, “Building machine-learning models for reducing the severity of bicyclist road traffic injuries,” Transportation Engineering , vol. 12, p. 100179, Jun. 2023, doi: 10.1016/j.treng.2023.100179. 37

  7. [8]

    Severity Prediction of Traffic Accidents with Recurrent Neural Networks,

    M. Sameen and B. Pradhan, “Severity Prediction of Traffic Accidents with Recurrent Neural Networks,” Applied Sciences, vol. 7, no. 6, p. 476, Jun. 2017, doi: 10.3390/app7060476

  8. [9]

    Real-Time Crash Risk Prediction using Long Short-Term Memory Recurrent Neural Network,

    J. Yuan, M. Abdel-Aty, Y. Gong, and Q. Cai, “Real-Time Crash Risk Prediction using Long Short-Term Memory Recurrent Neural Network,” Transportation Research Record: Journal of the Transportation Research Board , vol. 2673, no. 4, pp. 314–326, Apr. 2019, doi: 10.1177/0361198119840611

  9. [10]

    Accident Prediction Models for Urban Unsignalized Intersections in British Columbia,

    T. Sayed and F. Rodriguez, “Accident Prediction Models for Urban Unsignalized Intersections in British Columbia,” Transportation Research Record: Journal of the Transportation Research Board , vol. 1665, no. 1, pp. 93–99, Jan. 1999, doi: 10.3141/1665-13

  10. [11]

    Towards an Accident Severity Prediction System with Logistic Regression,

    H. Mensouri, A. Azmani, and M. Azmani, “Towards an Accident Severity Prediction System with Logistic Regression,” in International Conference on Advanced Intelligent Systems for Sustainable Development , Springer, 2023, pp. 396–410. doi: 10.1007/978-3-031-26384-2_34

  11. [12]

    An Accident Prediction Model Based on ARIMA in Kuala Lumpur, Malaysia, Using Time Series of Actual Accidents and Related Data,

    B. Chong Choo, M. Abdul Razak, M. Z. Mohd Tohir, D. R. Awang Biak, and S. Syam, “An Accident Prediction Model Based on ARIMA in Kuala Lumpur, Malaysia, Using Time Series of Actual Accidents and Related Data,” Pertanika J Sci Technol, vol. 32, no. 3, pp. 1103–1122, Apr. 2024, doi: 10.47836/pjst.32.3.07

  12. [13]

    Comparing Prediction Performance for Crash Injury Severity Among Various Machine Learning and Statistical Methods,

    J. Zhang, Z. Li, Z. Pu, and C. Xu, “Comparing Prediction Performance for Crash Injury Severity Among Various Machine Learning and Statistical Methods,” IEEE Access , vol. 6, pp. 60079 –60087, 2018, doi: 10.1109/ACCESS.2018.2874979

  13. [14]

    Injury severity prediction of traffic crashes with ensemble machine learning techniques: a comparative study,

    A. Jamal et al. , “Injury severity prediction of traffic crashes with ensemble machine learning techniques: a comparative study,” Int J Inj Contr Saf Promot , vol. 28, no. 4, pp. 408 –427, Oct. 2021, doi: 10.1080/17457300.2021.1928233

  14. [15]

    Decision Tree Model for Non - Fatal Road Accident Injury,

    F. E. Sapri, N. S. Nordin, S. M. Hasan, W. F. Wan Yaacob, and S. A. Md Nasir, “Decision Tree Model for Non - Fatal Road Accident Injury,” Int J Adv Sci Eng Inf Technol , vol. 7, no. 1, p. 63, Feb. 2017, doi: 10.18517/ijaseit.7.1.1110

  15. [16]

    Performance of Traffic Accidents’ Prediction Models,

    H. R. Al-Masaeid and F. J. Khaled, “Performance of Traffic Accidents’ Prediction Models,” Jordan Journal of Civil Engineering, vol. 17, no. 1, pp. 34–44, Jan. 2023, doi: 10.14525/JJCE.v17i1.04

  16. [17]

    Using support vector machine models for crash injury severity analysis,

    Z. Li, P. Liu, W. Wang, and C. Xu, “Using support vector machine models for crash injury severity analysis,” Accid Anal Prev, vol. 45, pp. 478–486, Mar. 2012, doi: 10.1016/j.aap.2011.08.016

  17. [18]

    A review on neural network techniques for the prediction of road traffic accident severity,

    Md. E. Shaik, Md. M. Islam, and Q. S. Hossain, “A review on neural network techniques for the prediction of road traffic accident severity,” Asian Transport Studies , vol. 7, p. 100040, 2021, doi: 10.1016/j.eastsj.2021.100040

  18. [19]

    Intelligent Automated Interference for the Protection of Road Safety,

    G. Pant, R. Bahuguna, S. Pandey, A. Gehlot, S. P. Yadav, and R. K. Pachauri, “Intelligent Automated Interference for the Protection of Road Safety,” in 2023 International Conference on Computational Intelligence, Communication Technology and Networking (CICTN) , IEEE, Apr. 2023, pp. 87 –91. doi: 10.1109/CICTN57981.2023.10141086

  19. [20]

    Severity Prediction of Traffic Accident Using an Artificial Neural Network,

    S. Alkheder, M. Taamneh, and S. Taamneh, “Severity Prediction of Traffic Accident Using an Artificial Neural Network,” J Forecast, vol. 36, no. 1, pp. 100–108, Jan. 2017, doi: 10.1002/for.2425

  20. [21]

    Predicting Injury Severity Levels in Traffic Crashes: A Modeling Comparison,

    M. A. Abdel -Aty and H. T. Abdelwahab, “Predicting Injury Severity Levels in Traffic Crashes: A Modeling Comparison,” J Transp Eng , vol. 130, no. 2, pp. 204 –210, Mar. 2004, doi: 10.1061/(ASCE)0733 - 947X(2004)130:2(204)

  21. [22]

    Predicting Crash Injury Severity with Machine Learning Algorithm Synergized with Clustering Technique: A Promising Protocol,

    K. Assi, S. M. Rahman, U. Mansoor, and N. Ratrout, “Predicting Crash Injury Severity with Machine Learning Algorithm Synergized with Clustering Technique: A Promising Protocol,” Int J Environ Res Public Health, vol. 17, no. 15, p. 5497, Jul. 2020, doi: 10.3390/ijerph17155497

  22. [23]

    Utilization of Artificial Neural Networks (Ann) in Predicting Accidents Within Maharlika Highway San Pablo City, Laguna,

    Patrick Louie Jay R. Federizo, Marriel Bondad-Baet, Arhgy L. Batarlo, Paul Andrei Enriquez, Jimuel Edmon V. Landicho, and Juliana Marie B. Pareja, “Utilization of Artificial Neural Networks (Ann) in Predicting Accidents Within Maharlika Highway San Pablo City, Laguna,” International Journal of Latest Technology in Engineering Management & Applied Science ...

  23. [24]

    Prediction for traffic accident severity: comparing the artificial neural network, genetic algorithm, combined genetic algorithm and pattern search methods,

    M. M. Kunt, I. Aghayan, and N. Noii, “Prediction for traffic accident severity: comparing the artificial neural network, genetic algorithm, combined genetic algorithm and pattern search methods,” Transport, vol. 26, no. 4, pp. 353–366, Jan. 2012, doi: 10.3846/16484142.2011.635465

  24. [25]

    Severity prediction of motorcycle crashes with machine learning methods,

    L. Wahab and H. Jiang, “Severity prediction of motorcycle crashes with machine learning methods,” International Journal of Crashworthiness , vol. 25, no. 5, pp. 485 –492, Sep. 2020, doi: 10.1080/13588265.2019.1616885

  25. [26]

    Predicting and explaining severity of road accident using artificial intelligence techniques, SHAP and feature analysis,

    C. Panda, A. K. Mishra, A. K. Dash, and H. Nawab, “Predicting and explaining severity of road accident using artificial intelligence techniques, SHAP and feature analysis,” International Journal of Crashworthiness, vol. 28, no. 2, pp. 186–201, Mar. 2023, doi: 10.1080/13588265.2022.2074643. 38

  26. [27]

    A Survey of Neural Network Optimization Algorithms,

    C. Ji, “A Survey of Neural Network Optimization Algorithms,” in 2024 IEEE 4th International Conference on Data Science and Computer Application (ICDSCA) , IEEE, Nov. 2024, pp. 1 –7. doi: 10.1109/ICDSCA63855.2024.10859435

  27. [28]

    A modified Adam algorithm for deep neural network optimization,

    M. Reyad, A. M. Sarhan, and M. Arafa, “A modified Adam algorithm for deep neural network optimization,” Neural Comput Appl, vol. 35, no. 23, pp. 17095–17112, Aug. 2023, doi: 10.1007/s00521-023-08568-z

  28. [29]

    Combining Optimization Methods Using an Adaptive Meta Optimizer,

    N. Landro, I. Gallo, and R. La Grassa, “Combining Optimization Methods Using an Adaptive Meta Optimizer,” Algorithms, vol. 14, no. 6, p. 186, Jun. 2021, doi: 10.3390/a14060186

  29. [30]

    Distributed denial of service (DDoS) classification based on random forest model with backward elimination algorithm and grid search algorithm,

    M. S. Sawah, H. Elmannai, A. A. El -Bary, Kh. Lotfy, and O. E. Sheta, “Distributed denial of service (DDoS) classification based on random forest model with backward elimination algorithm and grid search algorithm,” Sci Rep, vol. 15, no. 1, p. 19063, May 2025, doi: 10.1038/s41598-025-03868-x

  30. [31]

    Improving air quality prediction using hybrid BPSO with BWAO for feature selection and hyperparameters optimization,

    M. S. Sawah, H. Elmannai, A. A. El -Bary, Kh. Lotfy, and O. E. Sheta, “Improving air quality prediction using hybrid BPSO with BWAO for feature selection and hyperparameters optimization,” Sci Rep, vol. 15, no. 1, p. 13176, Apr. 2025, doi: 10.1038/s41598-025-95983-y

  31. [32]

    GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks,

    Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich, “GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks,” Proceedings of the 35th International Conference on Machine Learning, 2018

  32. [33]

    Not All Samples Are Created Equal: Deep Learning with Importance Sampling,

    Angelos Katharopoulos and Francois Fleuret, “Not All Samples Are Created Equal: Deep Learning with Importance Sampling,” Proceedings of the 35th International Conference on Machine Learning , 2018

  33. [34]

    MentorNet: Learning Data -Driven Curriculum for Very Deep Neural Networks on Corrupted Labels,

    Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li -Jia Li, and Li Fei -Fei, “MentorNet: Learning Data -Driven Curriculum for Very Deep Neural Networks on Corrupted Labels,” Proceedings of the 35th International Conference on Machine Learning, 2018

  34. [35]

    Automatic Curriculum Learning with Gradient Reward Signals,

    Ryan Campbell and Junsang Yoon, “Automatic Curriculum Learning with Gradient Reward Signals,” Dec. 2023

  35. [36]

    Convergence and Dynamical Behavior of the ADAM Algorithm for Nonconvex Stochastic Optimization,

    A. Barakat and P. Bianchi, “Convergence and Dynamical Behavior of the ADAM Algorithm for Nonconvex Stochastic Optimization,” SIAM Journal on Optimization , vol. 31, no. 1, pp. 244 –274, Jan. 2021, doi: 10.1137/19M1263443

  36. [37]

    Fault Diagnosis and Localization of Power Cables Using Bi -Directional Long Short Term Memory with Adam Optimizer,

    L. Song, “Fault Diagnosis and Localization of Power Cables Using Bi -Directional Long Short Term Memory with Adam Optimizer,” in 2024 4th International Conference on Mobile Networks and Wireless Communications (ICMNWC), IEEE, Dec. 2024, pp. 01–05. doi: 10.1109/ICMNWC63764.2024.10872012

  37. [38]

    Refining the Performance of Indonesian-Javanese Bilingual Neural Machine Translation Using Adam Optimizer,

    F. I. Putri, A. P. Wibawa, and L. H. Collante, “Refining the Performance of Indonesian-Javanese Bilingual Neural Machine Translation Using Adam Optimizer,” ILKOM Jurnal Ilmiah , vol. 16, no. 3, pp. 271 –282, Dec. 2024, doi: 10.33096/ilkom.v16i3.2467.271-282

  38. [39]

    Wireless Networks with Asynchronous Users

    P. K. Mondal, S. S. Khan, M. T. Imrog, M. A. A. Arman, M. M. Islam, and A. U. H. Rupak, “Exploring Authorial Style in Bangla Literature: LSTM and Bi -LSTM -Based Author Detection,” in 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT) , IEEE, Jun. 2024, pp. 1 –9. doi: 10.1109/ICCCNT61001.2024.10725023

  39. [40]

    Effective Adam -Optimized LSTM Neural Network for Electricity Price Forecasting,

    Z. Chang, Y. Zhang, and W. Chen, “Effective Adam -Optimized LSTM Neural Network for Electricity Price Forecasting,” in 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), IEEE, Nov. 2018, pp. 245–248. doi: 10.1109/ICSESS.2018.8663710

  40. [41]

    Analysis and Synthesis of Adaptive Gradient Algorithms in Machine Learning: The Case of AdaBound and MAdamSSM,

    K. Chakrabarti and N. Chopra, “Analysis and Synthesis of Adaptive Gradient Algorithms in Machine Learning: The Case of AdaBound and MAdamSSM,” in 2022 IEEE 61st Conference on Decision and Control (CDC), IEEE, Dec. 2022, pp. 795–800. doi: 10.1109/CDC51059.2022.9992512

  41. [42]

    Convergence analysis of AdaBound with relaxed bound functions for non-convex optimization,

    J. Liu, J. Kong, D. Xu, M. Qi, and Y. Lu, “Convergence analysis of AdaBound with relaxed bound functions for non-convex optimization,” Neural Networks , vol. 145, pp. 300 –307, Jan. 2022, doi: 10.1016/j.neunet.2021.10.026

  42. [43]

    An improved Adam Algorithm using look -ahead,

    A. Zhu, Y. Meng, and C. Zhang, “An improved Adam Algorithm using look -ahead,” in Proceedings of the 2017 International Conference on Deep Learning Technologies , New York, NY, USA: ACM, Jun. 2017, pp. 19 –22. doi: 10.1145/3094243.3094249

  43. [44]

    Using Feature Selection to Reduce the Complexity in Analyzing the Injury Severity of Traffic Accidents,

    J.-T. Wei, H.-H. Wu, and K.-Y. Kou, “Using Feature Selection to Reduce the Complexity in Analyzing the Injury Severity of Traffic Accidents,” in 2011 International Joint Conference on Service Sciences, IEEE, May 2011, pp. 329–333. doi: 10.1109/IJCSS.2011.73

  44. [45]

    Hybrid feature selection-based machine learning Classification system for the prediction of injury severity in single and multiple -vehicle accidents,

    S. Zhang, A. Khattak, C. M. Matara, A. Hussain, and A. Farooq, “Hybrid feature selection-based machine learning Classification system for the prediction of injury severity in single and multiple -vehicle accidents,” PLoS One, vol. 17, no. 2, p. e0262941, Feb. 2022, doi: 10.1371/journal.pone.0262941

  45. [46]

    A data-driven, kinematic feature-based, near real-time algorithm for injury severity prediction of vehicle occupants,

    Q. Wang, S. Gan, W. Chen, Q. Li, and B. Nie, “A data-driven, kinematic feature-based, near real-time algorithm for injury severity prediction of vehicle occupants,” Accid Anal Prev , vol. 156, p. 106149, Jun. 2021, doi: 10.1016/j.aap.2021.106149. 39

  46. [47]

    Predicting Accident Severity: An Analysis of Factors Affecting Accident Severity Using Random Forest Model,

    A. Adefabi, S. Olisah, C. Obunadike, O. Oyetubo, E. Taiwo, and E. Tella, “Predicting Accident Severity: An Analysis of Factors Affecting Accident Severity Using Random Forest Model,” International Journal on Cybernetics & Informatics, vol. 12, no. 6, pp. 107–121, Oct. 2023, doi: 10.5121/ijci.2023.120609

  47. [48]

    Focal loss for dense object detection,

    T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988

  48. [49]

    On the Convergence Proof of AMSGrad and a New Version,

    P. T. Tran and L. T. Phong, “On the Convergence Proof of AMSGrad and a New Version,” IEEE Access, vol. 7, pp. 61706–61716, 2019, doi: 10.1109/ACCESS.2019.2916341

  49. [50]

    Framewise phoneme classification with bidirectional LSTM and other neural network architectures,

    A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Networks , vol. 18, no. 5 –6, pp. 602 –610, Jul. 2005, doi: 10.1016/j.neunet.2005.06.042

  50. [51]

    An exploration of dropout with rnns for natural language inference,

    A. Gajbhiye, S. Jaf, N. Al Moubayed, A. S. McGough, and S. Bradley, “An exploration of dropout with rnns for natural language inference,” in International conference on artificial neural networks , Springer, 2018, pp. 157 – 167

  51. [52]

    Dropout: a simple way to prevent neural networks from overfitting,

    N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014

  52. [53]

    Road traffic accident dataset of addis ababa city,

    T. T. Bedane, “Road traffic accident dataset of addis ababa city,” Addis Ababa, 2020, doi: 10.17632/xytv86278f.1

  53. [54]

    The impact of imputation quality on machine learning classifiers for datasets with missing values,

    T. Shadbahr et al., “The impact of imputation quality on machine learning classifiers for datasets with missing values,” Communications Medicine, vol. 3, no. 1, p. 139, Oct. 2023, doi: 10.1038/s43856-023-00356-z

  54. [55]

    Pattern classification with missing data: a review,

    P. J. García-Laencina, J.-L. Sancho-Gómez, and A. R. Figueiras-Vidal, “Pattern classification with missing data: a review,” Neural Comput Appl, vol. 19, no. 2, pp. 263–282, Mar. 2010, doi: 10.1007/s00521-009-0295-6

  55. [56]

    Developing machine -learning-based models to diminish the severity of injuries sustained by pedestrians in road traffic incidents,

    A. Elalouf, S. Birfir, and T. Rosenbloom, “Developing machine -learning-based models to diminish the severity of injuries sustained by pedestrians in road traffic incidents,” Heliyon, vol. 9, no. 11, p. e21371, Nov. 2023, doi: 10.1016/j.heliyon.2023.e21371

  56. [57]

    Active label cleaning for improved dataset quality under resource constraints,

    M. Bernhardt et al., “Active label cleaning for improved dataset quality under resource constraints,” Nat Commun, vol. 13, no. 1, p. 1161, Mar. 2022, doi: 10.1038/s41467-022-28818-3

  57. [58]

    SMOTE: synthetic minority over -sampling technique,

    N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over -sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002

  58. [59]

    A study of the behavior of several methods for balancing machine learning training data,

    G. E. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD explorations newsletter, vol. 6, no. 1, pp. 20–29, 2004

  59. [60]

    ADASYN: Adaptive synthetic sampling approach for imbalanced learning,

    H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” in 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), Ieee, 2008, pp. 1322–1328

  60. [61]

    Batch-balanced focal loss: a hybrid solution to class imbalance in deep learning,

    J. Singh et al., “Batch-balanced focal loss: a hybrid solution to class imbalance in deep learning,” Journal of Medical Imaging, vol. 10, no. 5, p. 51809, 2023

  61. [62]

    Unified focal loss: Generalising dice and cross entropy - based losses to handle class imbalanced medical image segmentation,

    M. Yeung, E. Sala, C. -B. Schönlieb, and L. Rundo, “Unified focal loss: Generalising dice and cross entropy - based losses to handle class imbalanced medical image segmentation,” Computerized Medical Imaging and Graphics, vol. 95, p. 102026, 2022

  62. [63]

    Road Accident Severity Prediction using Adaptive Custom Weight Initialization and Enhanced Focal Loss Integration Technique,

    R. Verma and M. M. Agarwal, “Road Accident Severity Prediction using Adaptive Custom Weight Initialization and Enhanced Focal Loss Integration Technique,” IETE J Res, pp. 1–13, 2025

  63. [64]

    Handling imbalanced medical datasets: review of a decade of research,

    M. Salmi, D. Atif, D. Oliva, A. Abraham, and S. Ventura, “Handling imbalanced medical datasets: review of a decade of research,” Artif Intell Rev, vol. 57, no. 10, p. 273, 2024

  64. [65]

    Using random forest to learn imbalanced data,

    C. Chen, A. Liaw, and L. Breiman, “Using random forest to learn imbalanced data,” University of California, Berkeley, vol. 110, no. 1–12, p. 24, 2004

  65. [66]

    Exploratory undersampling for class-imbalance learning,

    X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539–550, 2008

  66. [67]

    The foundations of cost-sensitive learning,

    C. Elkan, “The foundations of cost-sensitive learning,” in International joint conference on artificial intelligence, Lawrence Erlbaum Associates Ltd, 2001, pp. 973–978

  67. [68]

    Class-balanced loss based on effective number of samples,

    Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2019, pp. 9268–9277

  68. [69]

    Decoupling representation and classifier for long -tailed recognition,

    B. Kang et al. , “Decoupling representation and classifier for long -tailed recognition,” arXiv preprint arXiv:1910.09217, 2019