Novel Dynamic Batch-Sensitive Adam Optimiser for Vehicular Accident Injury Severity Prediction

Abdul Lateef-Yussiff; Alimatu Saadia-Yussiff; Charles Roland Haruna; Daniel Asare Kyei; Derry Emmanuel; Maame G. Asante-Mensah

arxiv: 2605.15083 · v1 · pith:OOFR3P3Onew · submitted 2026-05-14 · 💻 cs.LG · cs.AI

Novel Dynamic Batch-Sensitive Adam Optimiser for Vehicular Accident Injury Severity Prediction

Daniel Asare Kyei , Alimatu Saadia-Yussiff , Maame G. Asante-Mensah , Abdul Lateef-Yussiff , Charles Roland Haruna , Derry Emmanuel This is my paper

Pith reviewed 2026-06-30 21:11 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords DBS-Adamdynamic optimizerbatch difficulty scoreimbalanced sequential dataaccident injury severityBi-LSTMgradient normlearning rate scaling

0 comments

The pith

DBS-Adam scales learning rates by batch difficulty scores from gradient norms to handle imbalanced sequential accident data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Dynamic Batch-Sensitive Adam (DBS-Adam), an optimizer that computes a batch difficulty score from exponential moving averages of gradient norms and batch loss, then scales the learning rate higher for difficult batches and lower for easier ones. This is tested by pairing the optimizer with Bi-Directional LSTM networks on vehicular accident injury severity prediction, after applying SMOTE-ENN resampling and Focal Loss to address class imbalance. The central goal is to improve convergence stability on minority classes in sequential data without extra architectural changes. Experiments across multiple seeds show DBS-Adam reaching 95.22 percent accuracy and 96.11 percent precision while delivering statistically significant gains over AMSGrad, AdamW, and AdaBound.

Core claim

DBS-Adam improves training on imbalanced sequential datasets by dynamically scaling the learning rate with a batch difficulty score derived from exponential moving averages of gradient norms and batch loss, yielding 95.22 percent test accuracy, 96.11 percent precision, 95.28 percent recall, 95.39 percent F1-score, and 0.0086 test loss while outperforming AMSGrad, AdamW, and AdaBound with p equals 0.020 on precision.

What carries the argument

The batch difficulty score, formed from exponential moving averages of per-batch gradient norms and loss values, which directly multiplies the learning rate to increase updates on hard batches and decrease them on easy ones.

If this is right

DBS-Adam reaches 95.22 percent test accuracy and statistically significant precision gains over three standard Adam variants.
The optimizer integrates directly with Bi-LSTM, SMOTE-ENN, and Focal Loss to manage class imbalance in sequential accident records.
The resulting model supports real-time severity classification for emergency response planning.
Performance holds across five random seeds on the tested vehicular injury dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The difficulty-score mechanism could be ported to other first-order optimizers to reduce reliance on external resampling for imbalance.
If the score remains stable across datasets, it may simplify pipelines for other time-series classification tasks with rare events.
Direct comparison on larger or noisier sequential datasets would test whether the gains generalize beyond the current accident records.

Load-bearing premise

The batch difficulty score supplies a stable, non-overfitting signal that genuinely aids convergence on minority classes rather than capturing noise from the particular accident dataset and resampling steps.

What would settle it

Re-run the Bi-LSTM experiments with DBS-Adam but replace the computed difficulty score with a fixed constant or random values and check whether the reported accuracy and precision advantages disappear.

read the original abstract

The choice of optimiser is important in deep learning, as it strongly influences model efficiency and speed of convergence. However, many commonly used optimisers encounter difficulties when applied to imbalanced and sequential datasets, limiting their ability to capture patterns of minority classes. In this study, we propose Dynamic Batch-Sensitive Adam (DBS-Adam), an optimiser that dynamically scales the learning rate using a batch difficulty score derived from exponential moving averages of gradient norms and batch loss. DBS-Adam improves training stability and accelerates convergence by increasing updates for difficult batches and reducing them for easier ones. We evaluate DBS-Adam by integrating it with Bi-Directional LSTM networks for accident injury severity prediction, addressing class imbalance through SMOTE-ENN resampling and Focal Loss. Four experimental configurations compare baseline Bi-LSTM models and alternative architectures to assess optimiser impact. Rigorous comparison against state-of-the-art optimisers (AMSGrad, AdamW, AdaBound) across five random seeds demonstrated DBS-Adam's competitive performance with statistically significant precision improvements (p=0.020). Results indicate that DBS-Adam outperforms standard optimisation approaches, achieving 95.22% test accuracy, 96.11% precision, 95.28% recall, 95.39% F1-score, and a test loss of 0.0086. The proposed framework enables effective real-time accident severity classification for targeted emergency response and road safety interventions, demonstrating the value of DBS-Adam for learning from imbalanced sequential data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DBS-Adam is a routine Adam tweak whose claimed gains rest on an unspecified difficulty score and experiments without ablations or robustness checks.

read the letter

The main thing to know is that DBS-Adam is presented as a dynamic adjustment to Adam using a batch difficulty score, but the abstract supplies no equations or implementation details for that score, and the experiments lack ablations or robustness checks.

The paper applies this to Bi-LSTM models on a vehicular accident dataset, with SMOTE-ENN resampling and Focal Loss. They compare against AMSGrad, AdamW, and AdaBound over five random seeds and report a p-value of 0.020 for better precision, along with high accuracy, recall, and F1 scores.

This is a straightforward empirical study in the traffic safety domain. The idea of weighting updates by batch difficulty has appeared in curriculum learning work before, so the optimizer change itself is not a new principle.

The soft spots are clear. Without the actual formula for the difficulty score or tests that turn it off while keeping everything else fixed, it's impossible to tell if the gains come from the claimed mechanism or from extra fitting to this particular dataset and its imbalance. Five seeds is minimal for claiming statistical significance on imbalanced data, and the p-value is borderline.

A reader interested in practical tweaks for sequential imbalanced classification in safety applications might find the numbers useful to try. Someone looking for theoretical advance or well-controlled optimizer research will not.

I would not bring this to a reading group or cite it. The work shows basic engagement with the optimizer literature but stops short of the controls needed to make the claims convincing.

It does not deserve peer review until the method is specified and the experiments include proper ablations.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Dynamic Batch-Sensitive Adam (DBS-Adam), an optimizer that dynamically scales the learning rate via a batch difficulty score computed from exponential moving averages of gradient norms and batch loss. It integrates this optimizer with Bi-Directional LSTM networks for vehicular accident injury severity prediction, using SMOTE-ENN resampling and Focal Loss to address class imbalance, and reports superior performance (95.22% accuracy, 96.11% precision) with p=0.020 over AMSGrad, AdamW, and AdaBound across five random seeds.

Significance. If the batch difficulty score supplies a stable, generalizable signal that improves convergence on minority classes in imbalanced sequential data without overfitting to dataset idiosyncrasies, the method could aid optimization in safety-critical applications such as real-time accident classification. The experimental setup with multiple optimizer baselines and a statistical test is a positive step, but the absence of the explicit formulation and isolating ablations prevents assessment of whether the reported gains arise from the claimed mechanism.

major comments (3)

[Abstract and Proposed Method] Abstract and Proposed Method section: No equations are provided for the batch difficulty score (how EMA gradient norms and batch loss are combined) or the difficulty-to-learning-rate mapping function, despite these being the core of DBS-Adam and listed among the free parameters. This is load-bearing because the central performance claims (95.22% test accuracy, 96.11% precision, p=0.020) are attributed to this component.
[Experiments] Experiments section: Only five random seeds are used and no ablation is reported that disables or randomizes the difficulty score while holding SMOTE-ENN resampling, Focal Loss, and Bi-LSTM architecture fixed. This undermines the claim that the score improves minority-class convergence rather than fitting noise in the particular accident dataset.
[Results] Results section: The p=0.020 for precision improvement is reported without specifying the statistical test, providing variance or standard deviation across the five seeds, or applying multiple-comparison correction, making it impossible to evaluate the robustness of the statistical significance claim.

minor comments (2)

[Abstract] The abstract states that four experimental configurations were compared but does not enumerate them or link them to the reported metrics.
[Experiments] No learning curves or convergence plots are mentioned, which would help verify the claimed acceleration on difficult batches.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important issues of clarity, experimental rigor, and statistical reporting. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and Proposed Method] Abstract and Proposed Method section: No equations are provided for the batch difficulty score (how EMA gradient norms and batch loss are combined) or the difficulty-to-learning-rate mapping function, despite these being the core of DBS-Adam and listed among the free parameters. This is load-bearing because the central performance claims (95.22% test accuracy, 96.11% precision, p=0.020) are attributed to this component.

Authors: We agree that the explicit equations for the batch difficulty score and its mapping to the learning-rate scale factor are essential for reproducibility and for allowing readers to evaluate the claimed mechanism. Their omission from the current manuscript was an oversight in the presentation of the method. In the revised version we will insert the full mathematical definitions: the difficulty score computed from the EMA of per-batch gradient norms and loss values, the precise combination rule, and the functional form that converts the difficulty score into a multiplicative adjustment to the base learning rate. The free parameters and the values used in the reported experiments will also be listed explicitly. revision: yes
Referee: [Experiments] Experiments section: Only five random seeds are used and no ablation is reported that disables or randomizes the difficulty score while holding SMOTE-ENN resampling, Focal Loss, and Bi-LSTM architecture fixed. This undermines the claim that the score improves minority-class convergence rather than fitting noise in the particular accident dataset.

Authors: The existing experiments already hold the Bi-LSTM architecture, SMOTE-ENN resampling, and Focal Loss fixed while varying only the optimizer, thereby isolating the contribution of DBS-Adam relative to AMSGrad, AdamW, and AdaBound. Nevertheless, we acknowledge that an explicit ablation that turns the difficulty-score component on and off (or replaces it with a random scalar) would provide more direct evidence that the performance gain stems from the proposed mechanism rather than dataset-specific noise. We will add this ablation study to the revised manuscript. With respect to the number of random seeds, five seeds is the number reported; we will retain this number for computational practicality but will also report per-seed metrics so readers can assess variability. revision: partial
Referee: [Results] Results section: The p=0.020 for precision improvement is reported without specifying the statistical test, providing variance or standard deviation across the five seeds, or applying multiple-comparison correction, making it impossible to evaluate the robustness of the statistical significance claim.

Authors: We will revise the Results section to state the exact statistical test (a paired t-test across the five independent seeds), to report mean performance together with standard deviation for each metric, and to indicate whether any multiple-comparison correction was applied. If the original analysis did not include correction, we will either apply an appropriate correction (e.g., Bonferroni) or discuss the implications for the reported p-value. These additions will allow readers to judge the robustness of the significance claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation is independent of any self-referential derivation

full rationale

The paper proposes DBS-Adam via an explicit construction (EMA-based batch difficulty score for dynamic LR scaling) and reports standard empirical results on a fixed dataset against baselines. No derivation chain, uniqueness theorem, or ansatz is presented that reduces the claimed performance metrics or the optimizer definition to its own fitted inputs by construction. The comparison uses external baselines (AMSGrad etc.) and reports p-values from statistical tests; these are not tautological. The evaluation is self-contained against the chosen benchmarks and does not rely on load-bearing self-citations or renaming of known results.

Axiom & Free-Parameter Ledger

2 free parameters · 0 axioms · 1 invented entities

The central claim rests on the empirical superiority of one particular functional form for the difficulty score; that form introduces at least two smoothing factors (the EMA decay rates) and a scaling function whose exact shape is not given in the abstract, all of which must be chosen or fitted.

free parameters (2)

EMA decay rates for gradient norm and loss
Two exponential smoothing constants that control how quickly the difficulty score reacts to recent batches; their values are not stated and must be selected to obtain the reported numbers.
Difficulty-to-learning-rate mapping function
The rule that converts the scalar difficulty score into a multiplicative adjustment of the base learning rate is not specified and therefore functions as an additional free parameter.

invented entities (1)

batch difficulty score no independent evidence
purpose: Scalar that modulates per-batch learning rate inside Adam
A derived quantity whose definition is internal to the optimizer; no independent physical or statistical justification is supplied beyond the claim that it improves convergence on this dataset.

pith-pipeline@v0.9.1-grok · 5832 in / 1685 out tokens · 30391 ms · 2026-06-30T21:11:38.187357+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 50 canonical work pages · 1 internal anchor

[2]

Exploring Optimization Dynamics: Hybrid Approaches Combining Adaptive and Traditional Techniques for Deep Learning Models,

Y. Pattanaik et al., “Exploring Optimization Dynamics: Hybrid Approaches Combining Adaptive and Traditional Techniques for Deep Learning Models,” in 2025 4th International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE) , IEEE, Apr. 2025, pp. 1 –6. doi: 10.1109/ICDCECE65353.2025.11035942

work page doi:10.1109/icdcece65353.2025.11035942 2025
[3]

A Study of the Optimization Algorithms in Deep Learning,

R. Zaheer and H. Shaziya, “A Study of the Optimization Algorithms in Deep Learning,” in 2019 Third International Conference on Inventive Systems and Control (ICISC) , IEEE, Jan. 2019, pp. 536 –539. doi: 10.1109/ICISC44355.2019.9036442

work page doi:10.1109/icisc44355.2019.9036442 2019
[4]

A Comparison of Optimization Algorithms for Deep Learning,

D. Soydaner, “A Comparison of Optimization Algorithms for Deep Learning,” Intern J Pattern Recognit Artif Intell, vol. 34, no. 13, p. 2052013, Dec. 2020, doi: 10.1142/S0218001420520138

work page doi:10.1142/s0218001420520138 2020
[5]

Road safety in Nigeria: unravelling the challenges, measures, and strategies for improvement,

C. Uzondu, S. Jamson, and G. Marsden, “Road safety in Nigeria: unravelling the challenges, measures, and strategies for improvement,” Int J Inj Contr Saf Promot , vol. 29, no. 4, pp. 522 –532, Oct. 2022, doi: 10.1080/17457300.2022.2087230

work page doi:10.1080/17457300.2022.2087230 2022
[6]

Road traffic accidents in Pakistan: unveiling the emergency service challenge,

M. A. Abdullah and M. A. Yasin, “Road traffic accidents in Pakistan: unveiling the emergency service challenge,” Journal of Basic & Clinical Medical Sciences, vol. 2, pp. 1–3, Jan. 2024, doi: 10.58398/0002.000007

work page doi:10.58398/0002.000007 2024
[7]

Building machine-learning models for reducing the severity of bicyclist road traffic injuries,

S. Birfir, A. Elalouf, and T. Rosenbloom, “Building machine-learning models for reducing the severity of bicyclist road traffic injuries,” Transportation Engineering , vol. 12, p. 100179, Jun. 2023, doi: 10.1016/j.treng.2023.100179. 37

work page doi:10.1016/j.treng.2023.100179 2023
[8]

Severity Prediction of Traffic Accidents with Recurrent Neural Networks,

M. Sameen and B. Pradhan, “Severity Prediction of Traffic Accidents with Recurrent Neural Networks,” Applied Sciences, vol. 7, no. 6, p. 476, Jun. 2017, doi: 10.3390/app7060476

work page doi:10.3390/app7060476 2017
[9]

Real-Time Crash Risk Prediction using Long Short-Term Memory Recurrent Neural Network,

J. Yuan, M. Abdel-Aty, Y. Gong, and Q. Cai, “Real-Time Crash Risk Prediction using Long Short-Term Memory Recurrent Neural Network,” Transportation Research Record: Journal of the Transportation Research Board , vol. 2673, no. 4, pp. 314–326, Apr. 2019, doi: 10.1177/0361198119840611

work page doi:10.1177/0361198119840611 2019
[10]

Accident Prediction Models for Urban Unsignalized Intersections in British Columbia,

T. Sayed and F. Rodriguez, “Accident Prediction Models for Urban Unsignalized Intersections in British Columbia,” Transportation Research Record: Journal of the Transportation Research Board , vol. 1665, no. 1, pp. 93–99, Jan. 1999, doi: 10.3141/1665-13

work page doi:10.3141/1665-13 1999
[11]

Towards an Accident Severity Prediction System with Logistic Regression,

H. Mensouri, A. Azmani, and M. Azmani, “Towards an Accident Severity Prediction System with Logistic Regression,” in International Conference on Advanced Intelligent Systems for Sustainable Development , Springer, 2023, pp. 396–410. doi: 10.1007/978-3-031-26384-2_34

work page doi:10.1007/978-3-031-26384-2_34 2023
[12]

An Accident Prediction Model Based on ARIMA in Kuala Lumpur, Malaysia, Using Time Series of Actual Accidents and Related Data,

B. Chong Choo, M. Abdul Razak, M. Z. Mohd Tohir, D. R. Awang Biak, and S. Syam, “An Accident Prediction Model Based on ARIMA in Kuala Lumpur, Malaysia, Using Time Series of Actual Accidents and Related Data,” Pertanika J Sci Technol, vol. 32, no. 3, pp. 1103–1122, Apr. 2024, doi: 10.47836/pjst.32.3.07

work page doi:10.47836/pjst.32.3.07 2024
[13]

Comparing Prediction Performance for Crash Injury Severity Among Various Machine Learning and Statistical Methods,

J. Zhang, Z. Li, Z. Pu, and C. Xu, “Comparing Prediction Performance for Crash Injury Severity Among Various Machine Learning and Statistical Methods,” IEEE Access , vol. 6, pp. 60079 –60087, 2018, doi: 10.1109/ACCESS.2018.2874979

work page doi:10.1109/access.2018.2874979 2018
[14]

Injury severity prediction of traffic crashes with ensemble machine learning techniques: a comparative study,

A. Jamal et al. , “Injury severity prediction of traffic crashes with ensemble machine learning techniques: a comparative study,” Int J Inj Contr Saf Promot , vol. 28, no. 4, pp. 408 –427, Oct. 2021, doi: 10.1080/17457300.2021.1928233

work page doi:10.1080/17457300.2021.1928233 2021
[15]

Decision Tree Model for Non - Fatal Road Accident Injury,

F. E. Sapri, N. S. Nordin, S. M. Hasan, W. F. Wan Yaacob, and S. A. Md Nasir, “Decision Tree Model for Non - Fatal Road Accident Injury,” Int J Adv Sci Eng Inf Technol , vol. 7, no. 1, p. 63, Feb. 2017, doi: 10.18517/ijaseit.7.1.1110

work page doi:10.18517/ijaseit.7.1.1110 2017
[16]

Performance of Traffic Accidents’ Prediction Models,

H. R. Al-Masaeid and F. J. Khaled, “Performance of Traffic Accidents’ Prediction Models,” Jordan Journal of Civil Engineering, vol. 17, no. 1, pp. 34–44, Jan. 2023, doi: 10.14525/JJCE.v17i1.04

work page doi:10.14525/jjce.v17i1.04 2023
[17]

Using support vector machine models for crash injury severity analysis,

Z. Li, P. Liu, W. Wang, and C. Xu, “Using support vector machine models for crash injury severity analysis,” Accid Anal Prev, vol. 45, pp. 478–486, Mar. 2012, doi: 10.1016/j.aap.2011.08.016

work page doi:10.1016/j.aap.2011.08.016 2012
[18]

A review on neural network techniques for the prediction of road traffic accident severity,

Md. E. Shaik, Md. M. Islam, and Q. S. Hossain, “A review on neural network techniques for the prediction of road traffic accident severity,” Asian Transport Studies , vol. 7, p. 100040, 2021, doi: 10.1016/j.eastsj.2021.100040

work page doi:10.1016/j.eastsj.2021.100040 2021
[19]

Intelligent Automated Interference for the Protection of Road Safety,

G. Pant, R. Bahuguna, S. Pandey, A. Gehlot, S. P. Yadav, and R. K. Pachauri, “Intelligent Automated Interference for the Protection of Road Safety,” in 2023 International Conference on Computational Intelligence, Communication Technology and Networking (CICTN) , IEEE, Apr. 2023, pp. 87 –91. doi: 10.1109/CICTN57981.2023.10141086

work page doi:10.1109/cictn57981.2023.10141086 2023
[20]

Severity Prediction of Traffic Accident Using an Artificial Neural Network,

S. Alkheder, M. Taamneh, and S. Taamneh, “Severity Prediction of Traffic Accident Using an Artificial Neural Network,” J Forecast, vol. 36, no. 1, pp. 100–108, Jan. 2017, doi: 10.1002/for.2425

work page doi:10.1002/for.2425 2017
[21]

Predicting Injury Severity Levels in Traffic Crashes: A Modeling Comparison,

M. A. Abdel -Aty and H. T. Abdelwahab, “Predicting Injury Severity Levels in Traffic Crashes: A Modeling Comparison,” J Transp Eng , vol. 130, no. 2, pp. 204 –210, Mar. 2004, doi: 10.1061/(ASCE)0733 - 947X(2004)130:2(204)

work page doi:10.1061/(asce)0733 2004
[22]

Predicting Crash Injury Severity with Machine Learning Algorithm Synergized with Clustering Technique: A Promising Protocol,

K. Assi, S. M. Rahman, U. Mansoor, and N. Ratrout, “Predicting Crash Injury Severity with Machine Learning Algorithm Synergized with Clustering Technique: A Promising Protocol,” Int J Environ Res Public Health, vol. 17, no. 15, p. 5497, Jul. 2020, doi: 10.3390/ijerph17155497

work page doi:10.3390/ijerph17155497 2020
[23]

Utilization of Artificial Neural Networks (Ann) in Predicting Accidents Within Maharlika Highway San Pablo City, Laguna,

Patrick Louie Jay R. Federizo, Marriel Bondad-Baet, Arhgy L. Batarlo, Paul Andrei Enriquez, Jimuel Edmon V. Landicho, and Juliana Marie B. Pareja, “Utilization of Artificial Neural Networks (Ann) in Predicting Accidents Within Maharlika Highway San Pablo City, Laguna,” International Journal of Latest Technology in Engineering Management & Applied Science ...

work page doi:10.51583/ijltemas.2025.140600049 2025
[24]

Prediction for traffic accident severity: comparing the artificial neural network, genetic algorithm, combined genetic algorithm and pattern search methods,

M. M. Kunt, I. Aghayan, and N. Noii, “Prediction for traffic accident severity: comparing the artificial neural network, genetic algorithm, combined genetic algorithm and pattern search methods,” Transport, vol. 26, no. 4, pp. 353–366, Jan. 2012, doi: 10.3846/16484142.2011.635465

work page doi:10.3846/16484142.2011.635465 2012
[25]

Severity prediction of motorcycle crashes with machine learning methods,

L. Wahab and H. Jiang, “Severity prediction of motorcycle crashes with machine learning methods,” International Journal of Crashworthiness , vol. 25, no. 5, pp. 485 –492, Sep. 2020, doi: 10.1080/13588265.2019.1616885

work page doi:10.1080/13588265.2019.1616885 2020
[26]

Predicting and explaining severity of road accident using artificial intelligence techniques, SHAP and feature analysis,

C. Panda, A. K. Mishra, A. K. Dash, and H. Nawab, “Predicting and explaining severity of road accident using artificial intelligence techniques, SHAP and feature analysis,” International Journal of Crashworthiness, vol. 28, no. 2, pp. 186–201, Mar. 2023, doi: 10.1080/13588265.2022.2074643. 38

work page doi:10.1080/13588265.2022.2074643 2023
[27]

A Survey of Neural Network Optimization Algorithms,

C. Ji, “A Survey of Neural Network Optimization Algorithms,” in 2024 IEEE 4th International Conference on Data Science and Computer Application (ICDSCA) , IEEE, Nov. 2024, pp. 1 –7. doi: 10.1109/ICDSCA63855.2024.10859435

work page doi:10.1109/icdsca63855.2024.10859435 2024
[28]

A modified Adam algorithm for deep neural network optimization,

M. Reyad, A. M. Sarhan, and M. Arafa, “A modified Adam algorithm for deep neural network optimization,” Neural Comput Appl, vol. 35, no. 23, pp. 17095–17112, Aug. 2023, doi: 10.1007/s00521-023-08568-z

work page doi:10.1007/s00521-023-08568-z 2023
[29]

Combining Optimization Methods Using an Adaptive Meta Optimizer,

N. Landro, I. Gallo, and R. La Grassa, “Combining Optimization Methods Using an Adaptive Meta Optimizer,” Algorithms, vol. 14, no. 6, p. 186, Jun. 2021, doi: 10.3390/a14060186

work page doi:10.3390/a14060186 2021
[30]

Distributed denial of service (DDoS) classification based on random forest model with backward elimination algorithm and grid search algorithm,

M. S. Sawah, H. Elmannai, A. A. El -Bary, Kh. Lotfy, and O. E. Sheta, “Distributed denial of service (DDoS) classification based on random forest model with backward elimination algorithm and grid search algorithm,” Sci Rep, vol. 15, no. 1, p. 19063, May 2025, doi: 10.1038/s41598-025-03868-x

work page doi:10.1038/s41598-025-03868-x 2025
[31]

Improving air quality prediction using hybrid BPSO with BWAO for feature selection and hyperparameters optimization,

M. S. Sawah, H. Elmannai, A. A. El -Bary, Kh. Lotfy, and O. E. Sheta, “Improving air quality prediction using hybrid BPSO with BWAO for feature selection and hyperparameters optimization,” Sci Rep, vol. 15, no. 1, p. 13176, Apr. 2025, doi: 10.1038/s41598-025-95983-y

work page doi:10.1038/s41598-025-95983-y 2025
[32]

GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks,

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich, “GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks,” Proceedings of the 35th International Conference on Machine Learning, 2018

2018
[33]

Not All Samples Are Created Equal: Deep Learning with Importance Sampling,

Angelos Katharopoulos and Francois Fleuret, “Not All Samples Are Created Equal: Deep Learning with Importance Sampling,” Proceedings of the 35th International Conference on Machine Learning , 2018

2018
[34]

MentorNet: Learning Data -Driven Curriculum for Very Deep Neural Networks on Corrupted Labels,

Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li -Jia Li, and Li Fei -Fei, “MentorNet: Learning Data -Driven Curriculum for Very Deep Neural Networks on Corrupted Labels,” Proceedings of the 35th International Conference on Machine Learning, 2018

2018
[35]

Automatic Curriculum Learning with Gradient Reward Signals,

Ryan Campbell and Junsang Yoon, “Automatic Curriculum Learning with Gradient Reward Signals,” Dec. 2023

2023
[36]

Convergence and Dynamical Behavior of the ADAM Algorithm for Nonconvex Stochastic Optimization,

A. Barakat and P. Bianchi, “Convergence and Dynamical Behavior of the ADAM Algorithm for Nonconvex Stochastic Optimization,” SIAM Journal on Optimization , vol. 31, no. 1, pp. 244 –274, Jan. 2021, doi: 10.1137/19M1263443

work page doi:10.1137/19m1263443 2021
[37]

Fault Diagnosis and Localization of Power Cables Using Bi -Directional Long Short Term Memory with Adam Optimizer,

L. Song, “Fault Diagnosis and Localization of Power Cables Using Bi -Directional Long Short Term Memory with Adam Optimizer,” in 2024 4th International Conference on Mobile Networks and Wireless Communications (ICMNWC), IEEE, Dec. 2024, pp. 01–05. doi: 10.1109/ICMNWC63764.2024.10872012

work page doi:10.1109/icmnwc63764.2024.10872012 2024
[38]

Refining the Performance of Indonesian-Javanese Bilingual Neural Machine Translation Using Adam Optimizer,

F. I. Putri, A. P. Wibawa, and L. H. Collante, “Refining the Performance of Indonesian-Javanese Bilingual Neural Machine Translation Using Adam Optimizer,” ILKOM Jurnal Ilmiah , vol. 16, no. 3, pp. 271 –282, Dec. 2024, doi: 10.33096/ilkom.v16i3.2467.271-282

work page doi:10.33096/ilkom.v16i3.2467.271-282 2024
[39]

Wireless Networks with Asynchronous Users

P. K. Mondal, S. S. Khan, M. T. Imrog, M. A. A. Arman, M. M. Islam, and A. U. H. Rupak, “Exploring Authorial Style in Bangla Literature: LSTM and Bi -LSTM -Based Author Detection,” in 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT) , IEEE, Jun. 2024, pp. 1 –9. doi: 10.1109/ICCCNT61001.2024.10725023

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/icccnt61001.2024.10725023 2024
[40]

Effective Adam -Optimized LSTM Neural Network for Electricity Price Forecasting,

Z. Chang, Y. Zhang, and W. Chen, “Effective Adam -Optimized LSTM Neural Network for Electricity Price Forecasting,” in 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), IEEE, Nov. 2018, pp. 245–248. doi: 10.1109/ICSESS.2018.8663710

work page doi:10.1109/icsess.2018.8663710 2018
[41]

Analysis and Synthesis of Adaptive Gradient Algorithms in Machine Learning: The Case of AdaBound and MAdamSSM,

K. Chakrabarti and N. Chopra, “Analysis and Synthesis of Adaptive Gradient Algorithms in Machine Learning: The Case of AdaBound and MAdamSSM,” in 2022 IEEE 61st Conference on Decision and Control (CDC), IEEE, Dec. 2022, pp. 795–800. doi: 10.1109/CDC51059.2022.9992512

work page doi:10.1109/cdc51059.2022.9992512 2022
[42]

Convergence analysis of AdaBound with relaxed bound functions for non-convex optimization,

J. Liu, J. Kong, D. Xu, M. Qi, and Y. Lu, “Convergence analysis of AdaBound with relaxed bound functions for non-convex optimization,” Neural Networks , vol. 145, pp. 300 –307, Jan. 2022, doi: 10.1016/j.neunet.2021.10.026

work page doi:10.1016/j.neunet.2021.10.026 2022
[43]

An improved Adam Algorithm using look -ahead,

A. Zhu, Y. Meng, and C. Zhang, “An improved Adam Algorithm using look -ahead,” in Proceedings of the 2017 International Conference on Deep Learning Technologies , New York, NY, USA: ACM, Jun. 2017, pp. 19 –22. doi: 10.1145/3094243.3094249

work page doi:10.1145/3094243.3094249 2017
[44]

Using Feature Selection to Reduce the Complexity in Analyzing the Injury Severity of Traffic Accidents,

J.-T. Wei, H.-H. Wu, and K.-Y. Kou, “Using Feature Selection to Reduce the Complexity in Analyzing the Injury Severity of Traffic Accidents,” in 2011 International Joint Conference on Service Sciences, IEEE, May 2011, pp. 329–333. doi: 10.1109/IJCSS.2011.73

work page doi:10.1109/ijcss.2011.73 2011
[45]

Hybrid feature selection-based machine learning Classification system for the prediction of injury severity in single and multiple -vehicle accidents,

S. Zhang, A. Khattak, C. M. Matara, A. Hussain, and A. Farooq, “Hybrid feature selection-based machine learning Classification system for the prediction of injury severity in single and multiple -vehicle accidents,” PLoS One, vol. 17, no. 2, p. e0262941, Feb. 2022, doi: 10.1371/journal.pone.0262941

work page doi:10.1371/journal.pone.0262941 2022
[46]

A data-driven, kinematic feature-based, near real-time algorithm for injury severity prediction of vehicle occupants,

Q. Wang, S. Gan, W. Chen, Q. Li, and B. Nie, “A data-driven, kinematic feature-based, near real-time algorithm for injury severity prediction of vehicle occupants,” Accid Anal Prev , vol. 156, p. 106149, Jun. 2021, doi: 10.1016/j.aap.2021.106149. 39

work page doi:10.1016/j.aap.2021.106149 2021
[47]

Predicting Accident Severity: An Analysis of Factors Affecting Accident Severity Using Random Forest Model,

A. Adefabi, S. Olisah, C. Obunadike, O. Oyetubo, E. Taiwo, and E. Tella, “Predicting Accident Severity: An Analysis of Factors Affecting Accident Severity Using Random Forest Model,” International Journal on Cybernetics & Informatics, vol. 12, no. 6, pp. 107–121, Oct. 2023, doi: 10.5121/ijci.2023.120609

work page doi:10.5121/ijci.2023.120609 2023
[48]

Focal loss for dense object detection,

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988

2017
[49]

On the Convergence Proof of AMSGrad and a New Version,

P. T. Tran and L. T. Phong, “On the Convergence Proof of AMSGrad and a New Version,” IEEE Access, vol. 7, pp. 61706–61716, 2019, doi: 10.1109/ACCESS.2019.2916341

work page doi:10.1109/access.2019.2916341 2019
[50]

Framewise phoneme classification with bidirectional LSTM and other neural network architectures,

A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Networks , vol. 18, no. 5 –6, pp. 602 –610, Jul. 2005, doi: 10.1016/j.neunet.2005.06.042

work page doi:10.1016/j.neunet.2005.06.042 2005
[51]

An exploration of dropout with rnns for natural language inference,

A. Gajbhiye, S. Jaf, N. Al Moubayed, A. S. McGough, and S. Bradley, “An exploration of dropout with rnns for natural language inference,” in International conference on artificial neural networks , Springer, 2018, pp. 157 – 167

2018
[52]

Dropout: a simple way to prevent neural networks from overfitting,

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014

1929
[53]

Road traffic accident dataset of addis ababa city,

T. T. Bedane, “Road traffic accident dataset of addis ababa city,” Addis Ababa, 2020, doi: 10.17632/xytv86278f.1

work page doi:10.17632/xytv86278f.1 2020
[54]

The impact of imputation quality on machine learning classifiers for datasets with missing values,

T. Shadbahr et al., “The impact of imputation quality on machine learning classifiers for datasets with missing values,” Communications Medicine, vol. 3, no. 1, p. 139, Oct. 2023, doi: 10.1038/s43856-023-00356-z

work page doi:10.1038/s43856-023-00356-z 2023
[55]

Pattern classification with missing data: a review,

P. J. García-Laencina, J.-L. Sancho-Gómez, and A. R. Figueiras-Vidal, “Pattern classification with missing data: a review,” Neural Comput Appl, vol. 19, no. 2, pp. 263–282, Mar. 2010, doi: 10.1007/s00521-009-0295-6

work page doi:10.1007/s00521-009-0295-6 2010
[56]

Developing machine -learning-based models to diminish the severity of injuries sustained by pedestrians in road traffic incidents,

A. Elalouf, S. Birfir, and T. Rosenbloom, “Developing machine -learning-based models to diminish the severity of injuries sustained by pedestrians in road traffic incidents,” Heliyon, vol. 9, no. 11, p. e21371, Nov. 2023, doi: 10.1016/j.heliyon.2023.e21371

work page doi:10.1016/j.heliyon.2023.e21371 2023
[57]

Active label cleaning for improved dataset quality under resource constraints,

M. Bernhardt et al., “Active label cleaning for improved dataset quality under resource constraints,” Nat Commun, vol. 13, no. 1, p. 1161, Mar. 2022, doi: 10.1038/s41467-022-28818-3

work page doi:10.1038/s41467-022-28818-3 2022
[58]

SMOTE: synthetic minority over -sampling technique,

N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over -sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002

2002
[59]

A study of the behavior of several methods for balancing machine learning training data,

G. E. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD explorations newsletter, vol. 6, no. 1, pp. 20–29, 2004

2004
[60]

ADASYN: Adaptive synthetic sampling approach for imbalanced learning,

H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” in 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), Ieee, 2008, pp. 1322–1328

2008
[61]

Batch-balanced focal loss: a hybrid solution to class imbalance in deep learning,

J. Singh et al., “Batch-balanced focal loss: a hybrid solution to class imbalance in deep learning,” Journal of Medical Imaging, vol. 10, no. 5, p. 51809, 2023

2023
[62]

Unified focal loss: Generalising dice and cross entropy - based losses to handle class imbalanced medical image segmentation,

M. Yeung, E. Sala, C. -B. Schönlieb, and L. Rundo, “Unified focal loss: Generalising dice and cross entropy - based losses to handle class imbalanced medical image segmentation,” Computerized Medical Imaging and Graphics, vol. 95, p. 102026, 2022

2022
[63]

Road Accident Severity Prediction using Adaptive Custom Weight Initialization and Enhanced Focal Loss Integration Technique,

R. Verma and M. M. Agarwal, “Road Accident Severity Prediction using Adaptive Custom Weight Initialization and Enhanced Focal Loss Integration Technique,” IETE J Res, pp. 1–13, 2025

2025
[64]

Handling imbalanced medical datasets: review of a decade of research,

M. Salmi, D. Atif, D. Oliva, A. Abraham, and S. Ventura, “Handling imbalanced medical datasets: review of a decade of research,” Artif Intell Rev, vol. 57, no. 10, p. 273, 2024

2024
[65]

Using random forest to learn imbalanced data,

C. Chen, A. Liaw, and L. Breiman, “Using random forest to learn imbalanced data,” University of California, Berkeley, vol. 110, no. 1–12, p. 24, 2004

2004
[66]

Exploratory undersampling for class-imbalance learning,

X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539–550, 2008

2008
[67]

The foundations of cost-sensitive learning,

C. Elkan, “The foundations of cost-sensitive learning,” in International joint conference on artificial intelligence, Lawrence Erlbaum Associates Ltd, 2001, pp. 973–978

2001
[68]

Class-balanced loss based on effective number of samples,

Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2019, pp. 9268–9277

2019
[69]

Decoupling representation and classifier for long -tailed recognition,

B. Kang et al. , “Decoupling representation and classifier for long -tailed recognition,” arXiv preprint arXiv:1910.09217, 2019

work page arXiv 1910

[1] [2]

Exploring Optimization Dynamics: Hybrid Approaches Combining Adaptive and Traditional Techniques for Deep Learning Models,

Y. Pattanaik et al., “Exploring Optimization Dynamics: Hybrid Approaches Combining Adaptive and Traditional Techniques for Deep Learning Models,” in 2025 4th International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE) , IEEE, Apr. 2025, pp. 1 –6. doi: 10.1109/ICDCECE65353.2025.11035942

work page doi:10.1109/icdcece65353.2025.11035942 2025

[2] [3]

A Study of the Optimization Algorithms in Deep Learning,

R. Zaheer and H. Shaziya, “A Study of the Optimization Algorithms in Deep Learning,” in 2019 Third International Conference on Inventive Systems and Control (ICISC) , IEEE, Jan. 2019, pp. 536 –539. doi: 10.1109/ICISC44355.2019.9036442

work page doi:10.1109/icisc44355.2019.9036442 2019

[3] [4]

A Comparison of Optimization Algorithms for Deep Learning,

D. Soydaner, “A Comparison of Optimization Algorithms for Deep Learning,” Intern J Pattern Recognit Artif Intell, vol. 34, no. 13, p. 2052013, Dec. 2020, doi: 10.1142/S0218001420520138

work page doi:10.1142/s0218001420520138 2020

[4] [5]

Road safety in Nigeria: unravelling the challenges, measures, and strategies for improvement,

C. Uzondu, S. Jamson, and G. Marsden, “Road safety in Nigeria: unravelling the challenges, measures, and strategies for improvement,” Int J Inj Contr Saf Promot , vol. 29, no. 4, pp. 522 –532, Oct. 2022, doi: 10.1080/17457300.2022.2087230

work page doi:10.1080/17457300.2022.2087230 2022

[5] [6]

Road traffic accidents in Pakistan: unveiling the emergency service challenge,

M. A. Abdullah and M. A. Yasin, “Road traffic accidents in Pakistan: unveiling the emergency service challenge,” Journal of Basic & Clinical Medical Sciences, vol. 2, pp. 1–3, Jan. 2024, doi: 10.58398/0002.000007

work page doi:10.58398/0002.000007 2024

[6] [7]

Building machine-learning models for reducing the severity of bicyclist road traffic injuries,

S. Birfir, A. Elalouf, and T. Rosenbloom, “Building machine-learning models for reducing the severity of bicyclist road traffic injuries,” Transportation Engineering , vol. 12, p. 100179, Jun. 2023, doi: 10.1016/j.treng.2023.100179. 37

work page doi:10.1016/j.treng.2023.100179 2023

[7] [8]

Severity Prediction of Traffic Accidents with Recurrent Neural Networks,

M. Sameen and B. Pradhan, “Severity Prediction of Traffic Accidents with Recurrent Neural Networks,” Applied Sciences, vol. 7, no. 6, p. 476, Jun. 2017, doi: 10.3390/app7060476

work page doi:10.3390/app7060476 2017

[8] [9]

Real-Time Crash Risk Prediction using Long Short-Term Memory Recurrent Neural Network,

J. Yuan, M. Abdel-Aty, Y. Gong, and Q. Cai, “Real-Time Crash Risk Prediction using Long Short-Term Memory Recurrent Neural Network,” Transportation Research Record: Journal of the Transportation Research Board , vol. 2673, no. 4, pp. 314–326, Apr. 2019, doi: 10.1177/0361198119840611

work page doi:10.1177/0361198119840611 2019

[9] [10]

Accident Prediction Models for Urban Unsignalized Intersections in British Columbia,

T. Sayed and F. Rodriguez, “Accident Prediction Models for Urban Unsignalized Intersections in British Columbia,” Transportation Research Record: Journal of the Transportation Research Board , vol. 1665, no. 1, pp. 93–99, Jan. 1999, doi: 10.3141/1665-13

work page doi:10.3141/1665-13 1999

[10] [11]

Towards an Accident Severity Prediction System with Logistic Regression,

H. Mensouri, A. Azmani, and M. Azmani, “Towards an Accident Severity Prediction System with Logistic Regression,” in International Conference on Advanced Intelligent Systems for Sustainable Development , Springer, 2023, pp. 396–410. doi: 10.1007/978-3-031-26384-2_34

work page doi:10.1007/978-3-031-26384-2_34 2023

[11] [12]

An Accident Prediction Model Based on ARIMA in Kuala Lumpur, Malaysia, Using Time Series of Actual Accidents and Related Data,

B. Chong Choo, M. Abdul Razak, M. Z. Mohd Tohir, D. R. Awang Biak, and S. Syam, “An Accident Prediction Model Based on ARIMA in Kuala Lumpur, Malaysia, Using Time Series of Actual Accidents and Related Data,” Pertanika J Sci Technol, vol. 32, no. 3, pp. 1103–1122, Apr. 2024, doi: 10.47836/pjst.32.3.07

work page doi:10.47836/pjst.32.3.07 2024

[12] [13]

Comparing Prediction Performance for Crash Injury Severity Among Various Machine Learning and Statistical Methods,

J. Zhang, Z. Li, Z. Pu, and C. Xu, “Comparing Prediction Performance for Crash Injury Severity Among Various Machine Learning and Statistical Methods,” IEEE Access , vol. 6, pp. 60079 –60087, 2018, doi: 10.1109/ACCESS.2018.2874979

work page doi:10.1109/access.2018.2874979 2018

[13] [14]

Injury severity prediction of traffic crashes with ensemble machine learning techniques: a comparative study,

A. Jamal et al. , “Injury severity prediction of traffic crashes with ensemble machine learning techniques: a comparative study,” Int J Inj Contr Saf Promot , vol. 28, no. 4, pp. 408 –427, Oct. 2021, doi: 10.1080/17457300.2021.1928233

work page doi:10.1080/17457300.2021.1928233 2021

[14] [15]

Decision Tree Model for Non - Fatal Road Accident Injury,

F. E. Sapri, N. S. Nordin, S. M. Hasan, W. F. Wan Yaacob, and S. A. Md Nasir, “Decision Tree Model for Non - Fatal Road Accident Injury,” Int J Adv Sci Eng Inf Technol , vol. 7, no. 1, p. 63, Feb. 2017, doi: 10.18517/ijaseit.7.1.1110

work page doi:10.18517/ijaseit.7.1.1110 2017

[15] [16]

Performance of Traffic Accidents’ Prediction Models,

H. R. Al-Masaeid and F. J. Khaled, “Performance of Traffic Accidents’ Prediction Models,” Jordan Journal of Civil Engineering, vol. 17, no. 1, pp. 34–44, Jan. 2023, doi: 10.14525/JJCE.v17i1.04

work page doi:10.14525/jjce.v17i1.04 2023

[16] [17]

Using support vector machine models for crash injury severity analysis,

Z. Li, P. Liu, W. Wang, and C. Xu, “Using support vector machine models for crash injury severity analysis,” Accid Anal Prev, vol. 45, pp. 478–486, Mar. 2012, doi: 10.1016/j.aap.2011.08.016

work page doi:10.1016/j.aap.2011.08.016 2012

[17] [18]

A review on neural network techniques for the prediction of road traffic accident severity,

Md. E. Shaik, Md. M. Islam, and Q. S. Hossain, “A review on neural network techniques for the prediction of road traffic accident severity,” Asian Transport Studies , vol. 7, p. 100040, 2021, doi: 10.1016/j.eastsj.2021.100040

work page doi:10.1016/j.eastsj.2021.100040 2021

[18] [19]

Intelligent Automated Interference for the Protection of Road Safety,

G. Pant, R. Bahuguna, S. Pandey, A. Gehlot, S. P. Yadav, and R. K. Pachauri, “Intelligent Automated Interference for the Protection of Road Safety,” in 2023 International Conference on Computational Intelligence, Communication Technology and Networking (CICTN) , IEEE, Apr. 2023, pp. 87 –91. doi: 10.1109/CICTN57981.2023.10141086

work page doi:10.1109/cictn57981.2023.10141086 2023

[19] [20]

Severity Prediction of Traffic Accident Using an Artificial Neural Network,

S. Alkheder, M. Taamneh, and S. Taamneh, “Severity Prediction of Traffic Accident Using an Artificial Neural Network,” J Forecast, vol. 36, no. 1, pp. 100–108, Jan. 2017, doi: 10.1002/for.2425

work page doi:10.1002/for.2425 2017

[20] [21]

Predicting Injury Severity Levels in Traffic Crashes: A Modeling Comparison,

M. A. Abdel -Aty and H. T. Abdelwahab, “Predicting Injury Severity Levels in Traffic Crashes: A Modeling Comparison,” J Transp Eng , vol. 130, no. 2, pp. 204 –210, Mar. 2004, doi: 10.1061/(ASCE)0733 - 947X(2004)130:2(204)

work page doi:10.1061/(asce)0733 2004

[21] [22]

Predicting Crash Injury Severity with Machine Learning Algorithm Synergized with Clustering Technique: A Promising Protocol,

K. Assi, S. M. Rahman, U. Mansoor, and N. Ratrout, “Predicting Crash Injury Severity with Machine Learning Algorithm Synergized with Clustering Technique: A Promising Protocol,” Int J Environ Res Public Health, vol. 17, no. 15, p. 5497, Jul. 2020, doi: 10.3390/ijerph17155497

work page doi:10.3390/ijerph17155497 2020

[22] [23]

Utilization of Artificial Neural Networks (Ann) in Predicting Accidents Within Maharlika Highway San Pablo City, Laguna,

Patrick Louie Jay R. Federizo, Marriel Bondad-Baet, Arhgy L. Batarlo, Paul Andrei Enriquez, Jimuel Edmon V. Landicho, and Juliana Marie B. Pareja, “Utilization of Artificial Neural Networks (Ann) in Predicting Accidents Within Maharlika Highway San Pablo City, Laguna,” International Journal of Latest Technology in Engineering Management & Applied Science ...

work page doi:10.51583/ijltemas.2025.140600049 2025

[23] [24]

Prediction for traffic accident severity: comparing the artificial neural network, genetic algorithm, combined genetic algorithm and pattern search methods,

M. M. Kunt, I. Aghayan, and N. Noii, “Prediction for traffic accident severity: comparing the artificial neural network, genetic algorithm, combined genetic algorithm and pattern search methods,” Transport, vol. 26, no. 4, pp. 353–366, Jan. 2012, doi: 10.3846/16484142.2011.635465

work page doi:10.3846/16484142.2011.635465 2012

[24] [25]

Severity prediction of motorcycle crashes with machine learning methods,

L. Wahab and H. Jiang, “Severity prediction of motorcycle crashes with machine learning methods,” International Journal of Crashworthiness , vol. 25, no. 5, pp. 485 –492, Sep. 2020, doi: 10.1080/13588265.2019.1616885

work page doi:10.1080/13588265.2019.1616885 2020

[25] [26]

Predicting and explaining severity of road accident using artificial intelligence techniques, SHAP and feature analysis,

C. Panda, A. K. Mishra, A. K. Dash, and H. Nawab, “Predicting and explaining severity of road accident using artificial intelligence techniques, SHAP and feature analysis,” International Journal of Crashworthiness, vol. 28, no. 2, pp. 186–201, Mar. 2023, doi: 10.1080/13588265.2022.2074643. 38

work page doi:10.1080/13588265.2022.2074643 2023

[26] [27]

A Survey of Neural Network Optimization Algorithms,

C. Ji, “A Survey of Neural Network Optimization Algorithms,” in 2024 IEEE 4th International Conference on Data Science and Computer Application (ICDSCA) , IEEE, Nov. 2024, pp. 1 –7. doi: 10.1109/ICDSCA63855.2024.10859435

work page doi:10.1109/icdsca63855.2024.10859435 2024

[27] [28]

A modified Adam algorithm for deep neural network optimization,

M. Reyad, A. M. Sarhan, and M. Arafa, “A modified Adam algorithm for deep neural network optimization,” Neural Comput Appl, vol. 35, no. 23, pp. 17095–17112, Aug. 2023, doi: 10.1007/s00521-023-08568-z

work page doi:10.1007/s00521-023-08568-z 2023

[28] [29]

Combining Optimization Methods Using an Adaptive Meta Optimizer,

N. Landro, I. Gallo, and R. La Grassa, “Combining Optimization Methods Using an Adaptive Meta Optimizer,” Algorithms, vol. 14, no. 6, p. 186, Jun. 2021, doi: 10.3390/a14060186

work page doi:10.3390/a14060186 2021

[29] [30]

Distributed denial of service (DDoS) classification based on random forest model with backward elimination algorithm and grid search algorithm,

M. S. Sawah, H. Elmannai, A. A. El -Bary, Kh. Lotfy, and O. E. Sheta, “Distributed denial of service (DDoS) classification based on random forest model with backward elimination algorithm and grid search algorithm,” Sci Rep, vol. 15, no. 1, p. 19063, May 2025, doi: 10.1038/s41598-025-03868-x

work page doi:10.1038/s41598-025-03868-x 2025

[30] [31]

Improving air quality prediction using hybrid BPSO with BWAO for feature selection and hyperparameters optimization,

M. S. Sawah, H. Elmannai, A. A. El -Bary, Kh. Lotfy, and O. E. Sheta, “Improving air quality prediction using hybrid BPSO with BWAO for feature selection and hyperparameters optimization,” Sci Rep, vol. 15, no. 1, p. 13176, Apr. 2025, doi: 10.1038/s41598-025-95983-y

work page doi:10.1038/s41598-025-95983-y 2025

[31] [32]

GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks,

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich, “GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks,” Proceedings of the 35th International Conference on Machine Learning, 2018

2018

[32] [33]

Not All Samples Are Created Equal: Deep Learning with Importance Sampling,

Angelos Katharopoulos and Francois Fleuret, “Not All Samples Are Created Equal: Deep Learning with Importance Sampling,” Proceedings of the 35th International Conference on Machine Learning , 2018

2018

[33] [34]

MentorNet: Learning Data -Driven Curriculum for Very Deep Neural Networks on Corrupted Labels,

Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li -Jia Li, and Li Fei -Fei, “MentorNet: Learning Data -Driven Curriculum for Very Deep Neural Networks on Corrupted Labels,” Proceedings of the 35th International Conference on Machine Learning, 2018

2018

[34] [35]

Automatic Curriculum Learning with Gradient Reward Signals,

Ryan Campbell and Junsang Yoon, “Automatic Curriculum Learning with Gradient Reward Signals,” Dec. 2023

2023

[35] [36]

Convergence and Dynamical Behavior of the ADAM Algorithm for Nonconvex Stochastic Optimization,

A. Barakat and P. Bianchi, “Convergence and Dynamical Behavior of the ADAM Algorithm for Nonconvex Stochastic Optimization,” SIAM Journal on Optimization , vol. 31, no. 1, pp. 244 –274, Jan. 2021, doi: 10.1137/19M1263443

work page doi:10.1137/19m1263443 2021

[36] [37]

Fault Diagnosis and Localization of Power Cables Using Bi -Directional Long Short Term Memory with Adam Optimizer,

L. Song, “Fault Diagnosis and Localization of Power Cables Using Bi -Directional Long Short Term Memory with Adam Optimizer,” in 2024 4th International Conference on Mobile Networks and Wireless Communications (ICMNWC), IEEE, Dec. 2024, pp. 01–05. doi: 10.1109/ICMNWC63764.2024.10872012

work page doi:10.1109/icmnwc63764.2024.10872012 2024

[37] [38]

Refining the Performance of Indonesian-Javanese Bilingual Neural Machine Translation Using Adam Optimizer,

F. I. Putri, A. P. Wibawa, and L. H. Collante, “Refining the Performance of Indonesian-Javanese Bilingual Neural Machine Translation Using Adam Optimizer,” ILKOM Jurnal Ilmiah , vol. 16, no. 3, pp. 271 –282, Dec. 2024, doi: 10.33096/ilkom.v16i3.2467.271-282

work page doi:10.33096/ilkom.v16i3.2467.271-282 2024

[38] [39]

Wireless Networks with Asynchronous Users

P. K. Mondal, S. S. Khan, M. T. Imrog, M. A. A. Arman, M. M. Islam, and A. U. H. Rupak, “Exploring Authorial Style in Bangla Literature: LSTM and Bi -LSTM -Based Author Detection,” in 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT) , IEEE, Jun. 2024, pp. 1 –9. doi: 10.1109/ICCCNT61001.2024.10725023

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/icccnt61001.2024.10725023 2024

[39] [40]

Effective Adam -Optimized LSTM Neural Network for Electricity Price Forecasting,

Z. Chang, Y. Zhang, and W. Chen, “Effective Adam -Optimized LSTM Neural Network for Electricity Price Forecasting,” in 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), IEEE, Nov. 2018, pp. 245–248. doi: 10.1109/ICSESS.2018.8663710

work page doi:10.1109/icsess.2018.8663710 2018

[40] [41]

Analysis and Synthesis of Adaptive Gradient Algorithms in Machine Learning: The Case of AdaBound and MAdamSSM,

K. Chakrabarti and N. Chopra, “Analysis and Synthesis of Adaptive Gradient Algorithms in Machine Learning: The Case of AdaBound and MAdamSSM,” in 2022 IEEE 61st Conference on Decision and Control (CDC), IEEE, Dec. 2022, pp. 795–800. doi: 10.1109/CDC51059.2022.9992512

work page doi:10.1109/cdc51059.2022.9992512 2022

[41] [42]

Convergence analysis of AdaBound with relaxed bound functions for non-convex optimization,

J. Liu, J. Kong, D. Xu, M. Qi, and Y. Lu, “Convergence analysis of AdaBound with relaxed bound functions for non-convex optimization,” Neural Networks , vol. 145, pp. 300 –307, Jan. 2022, doi: 10.1016/j.neunet.2021.10.026

work page doi:10.1016/j.neunet.2021.10.026 2022

[42] [43]

An improved Adam Algorithm using look -ahead,

A. Zhu, Y. Meng, and C. Zhang, “An improved Adam Algorithm using look -ahead,” in Proceedings of the 2017 International Conference on Deep Learning Technologies , New York, NY, USA: ACM, Jun. 2017, pp. 19 –22. doi: 10.1145/3094243.3094249

work page doi:10.1145/3094243.3094249 2017

[43] [44]

Using Feature Selection to Reduce the Complexity in Analyzing the Injury Severity of Traffic Accidents,

J.-T. Wei, H.-H. Wu, and K.-Y. Kou, “Using Feature Selection to Reduce the Complexity in Analyzing the Injury Severity of Traffic Accidents,” in 2011 International Joint Conference on Service Sciences, IEEE, May 2011, pp. 329–333. doi: 10.1109/IJCSS.2011.73

work page doi:10.1109/ijcss.2011.73 2011

[44] [45]

Hybrid feature selection-based machine learning Classification system for the prediction of injury severity in single and multiple -vehicle accidents,

S. Zhang, A. Khattak, C. M. Matara, A. Hussain, and A. Farooq, “Hybrid feature selection-based machine learning Classification system for the prediction of injury severity in single and multiple -vehicle accidents,” PLoS One, vol. 17, no. 2, p. e0262941, Feb. 2022, doi: 10.1371/journal.pone.0262941

work page doi:10.1371/journal.pone.0262941 2022

[45] [46]

A data-driven, kinematic feature-based, near real-time algorithm for injury severity prediction of vehicle occupants,

Q. Wang, S. Gan, W. Chen, Q. Li, and B. Nie, “A data-driven, kinematic feature-based, near real-time algorithm for injury severity prediction of vehicle occupants,” Accid Anal Prev , vol. 156, p. 106149, Jun. 2021, doi: 10.1016/j.aap.2021.106149. 39

work page doi:10.1016/j.aap.2021.106149 2021

[46] [47]

Predicting Accident Severity: An Analysis of Factors Affecting Accident Severity Using Random Forest Model,

A. Adefabi, S. Olisah, C. Obunadike, O. Oyetubo, E. Taiwo, and E. Tella, “Predicting Accident Severity: An Analysis of Factors Affecting Accident Severity Using Random Forest Model,” International Journal on Cybernetics & Informatics, vol. 12, no. 6, pp. 107–121, Oct. 2023, doi: 10.5121/ijci.2023.120609

work page doi:10.5121/ijci.2023.120609 2023

[47] [48]

Focal loss for dense object detection,

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988

2017

[48] [49]

On the Convergence Proof of AMSGrad and a New Version,

P. T. Tran and L. T. Phong, “On the Convergence Proof of AMSGrad and a New Version,” IEEE Access, vol. 7, pp. 61706–61716, 2019, doi: 10.1109/ACCESS.2019.2916341

work page doi:10.1109/access.2019.2916341 2019

[49] [50]

Framewise phoneme classification with bidirectional LSTM and other neural network architectures,

A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Networks , vol. 18, no. 5 –6, pp. 602 –610, Jul. 2005, doi: 10.1016/j.neunet.2005.06.042

work page doi:10.1016/j.neunet.2005.06.042 2005

[50] [51]

An exploration of dropout with rnns for natural language inference,

A. Gajbhiye, S. Jaf, N. Al Moubayed, A. S. McGough, and S. Bradley, “An exploration of dropout with rnns for natural language inference,” in International conference on artificial neural networks , Springer, 2018, pp. 157 – 167

2018

[51] [52]

Dropout: a simple way to prevent neural networks from overfitting,

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014

1929

[52] [53]

Road traffic accident dataset of addis ababa city,

T. T. Bedane, “Road traffic accident dataset of addis ababa city,” Addis Ababa, 2020, doi: 10.17632/xytv86278f.1

work page doi:10.17632/xytv86278f.1 2020

[53] [54]

The impact of imputation quality on machine learning classifiers for datasets with missing values,

T. Shadbahr et al., “The impact of imputation quality on machine learning classifiers for datasets with missing values,” Communications Medicine, vol. 3, no. 1, p. 139, Oct. 2023, doi: 10.1038/s43856-023-00356-z

work page doi:10.1038/s43856-023-00356-z 2023

[54] [55]

Pattern classification with missing data: a review,

P. J. García-Laencina, J.-L. Sancho-Gómez, and A. R. Figueiras-Vidal, “Pattern classification with missing data: a review,” Neural Comput Appl, vol. 19, no. 2, pp. 263–282, Mar. 2010, doi: 10.1007/s00521-009-0295-6

work page doi:10.1007/s00521-009-0295-6 2010

[55] [56]

Developing machine -learning-based models to diminish the severity of injuries sustained by pedestrians in road traffic incidents,

A. Elalouf, S. Birfir, and T. Rosenbloom, “Developing machine -learning-based models to diminish the severity of injuries sustained by pedestrians in road traffic incidents,” Heliyon, vol. 9, no. 11, p. e21371, Nov. 2023, doi: 10.1016/j.heliyon.2023.e21371

work page doi:10.1016/j.heliyon.2023.e21371 2023

[56] [57]

Active label cleaning for improved dataset quality under resource constraints,

M. Bernhardt et al., “Active label cleaning for improved dataset quality under resource constraints,” Nat Commun, vol. 13, no. 1, p. 1161, Mar. 2022, doi: 10.1038/s41467-022-28818-3

work page doi:10.1038/s41467-022-28818-3 2022

[57] [58]

SMOTE: synthetic minority over -sampling technique,

N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over -sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002

2002

[58] [59]

A study of the behavior of several methods for balancing machine learning training data,

G. E. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD explorations newsletter, vol. 6, no. 1, pp. 20–29, 2004

2004

[59] [60]

ADASYN: Adaptive synthetic sampling approach for imbalanced learning,

H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” in 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), Ieee, 2008, pp. 1322–1328

2008

[60] [61]

Batch-balanced focal loss: a hybrid solution to class imbalance in deep learning,

J. Singh et al., “Batch-balanced focal loss: a hybrid solution to class imbalance in deep learning,” Journal of Medical Imaging, vol. 10, no. 5, p. 51809, 2023

2023

[61] [62]

Unified focal loss: Generalising dice and cross entropy - based losses to handle class imbalanced medical image segmentation,

M. Yeung, E. Sala, C. -B. Schönlieb, and L. Rundo, “Unified focal loss: Generalising dice and cross entropy - based losses to handle class imbalanced medical image segmentation,” Computerized Medical Imaging and Graphics, vol. 95, p. 102026, 2022

2022

[62] [63]

Road Accident Severity Prediction using Adaptive Custom Weight Initialization and Enhanced Focal Loss Integration Technique,

R. Verma and M. M. Agarwal, “Road Accident Severity Prediction using Adaptive Custom Weight Initialization and Enhanced Focal Loss Integration Technique,” IETE J Res, pp. 1–13, 2025

2025

[63] [64]

Handling imbalanced medical datasets: review of a decade of research,

M. Salmi, D. Atif, D. Oliva, A. Abraham, and S. Ventura, “Handling imbalanced medical datasets: review of a decade of research,” Artif Intell Rev, vol. 57, no. 10, p. 273, 2024

2024

[64] [65]

Using random forest to learn imbalanced data,

C. Chen, A. Liaw, and L. Breiman, “Using random forest to learn imbalanced data,” University of California, Berkeley, vol. 110, no. 1–12, p. 24, 2004

2004

[65] [66]

Exploratory undersampling for class-imbalance learning,

X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539–550, 2008

2008

[66] [67]

The foundations of cost-sensitive learning,

C. Elkan, “The foundations of cost-sensitive learning,” in International joint conference on artificial intelligence, Lawrence Erlbaum Associates Ltd, 2001, pp. 973–978

2001

[67] [68]

Class-balanced loss based on effective number of samples,

Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2019, pp. 9268–9277

2019

[68] [69]

Decoupling representation and classifier for long -tailed recognition,

B. Kang et al. , “Decoupling representation and classifier for long -tailed recognition,” arXiv preprint arXiv:1910.09217, 2019

work page arXiv 1910