Recognition: unknown
Deep Learning using Rectified Linear Units (ReLU)
read the original abstract
The Rectified Linear Unit (ReLU) is a foundational activation function in artficial neural networks. Recent literature frequently misattributes its origin to the 2018 (initial) version of this paper, which exclusively investigated ReLU at the classification layer. This paper formally corrects the citation record by tracing the mathematical lineage of piecewise linear functions from early biological models to their definitive integration into deep learning by Nair & Hinton (2010). Alongside this historical rectification, we present a comprehensive empirical comparison of the ReLU, Hyperbolic Tangent (Tanh), and Logistic (Sigmoid) activation functions across image classification, text classification, and image reconstruction tasks. To ensure statistical robustness, we evaluated these functions using 10 independent randomized trials and assessed significance using the non-parametric Kruskal-Wallis $H$ test. The empirical data validates the theoretical limitations of saturating functions. Sigmoid failed to converge in deep convolutional vision tasks due to the vanishing gradient problem, thus yielding accuracies equivalent to random probability. Conversely, ReLU and Tanh exhibited stable convergence. ReLU achieved the highest mean accuracy and F1-score on image classification and text classification tasks, while Tanh yielded the highest peak signal to noise ratio in image reconstruction. Ultimately, this study confirms a statistically significant performance variance among activations, thus reaffirming the necessity of non-saturating functions in deep architectures, and restores proper historical attribution to prior literature.
This paper has not been read by Pith yet.
Forward citations
Cited by 15 Pith papers
-
QAP-Router: Tackling Qubit Routing as Dynamic Quadratic Assignment with Reinforcement Learning
QAP-Router models qubit routing as dynamic QAP and applies RL with a solution-aware Transformer to cut CNOT counts by 12-30% versus industry compilers on real circuit benchmarks.
-
Galaxy clusters in the LoTSS-DR3: Catalogues and detection pipeline for diffuse radio emission
A Radio U-Net pipeline produces pixel-level segmentation maps and probability scores for diffuse radio emission in 3822 galaxy clusters from LoTSS-DR3, yielding a high-confidence sample of 357 and confirming trends wi...
-
Machine learning isotope shifts in molecular energy levels
Neural network corrects residual errors in isotopologue energy extrapolations for CO2 (MAE reduction in >87% of levels vs Marvel) and transfers patterns to improve CO predictions in >93% of samples.
-
Winner-Take-All Spiking Transformer for Language Modeling
Winner-take-all spiking self-attention replaces softmax in spiking transformers to support language modeling on 16 datasets with spike-driven, energy-efficient architectures.
-
One-Step Score-Based Density Ratio Estimation
OS-DRE performs score-based density ratio estimation in one step by approximating the temporal score component with a closed-form RBF frame and providing error bounds from approximation theory.
-
Neural Assistive Impulses: Synthesizing Exaggerated Motions for Physics-based Characters
A hybrid neural policy operating in impulse space enables physics-based characters to track exaggerated, dynamically infeasible motions that standard DRL methods cannot stabilize.
-
Geometric Monomial (GEM): a family of rational 2N-differentiable activation functions
GEM is a new family of C^{2N}-smooth rational activation functions with variants that achieve performance on par with or exceeding GELU on ResNet, GPT-2, and BERT benchmarks.
-
Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens
Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.
-
Force Field-Agnostic Phase Classification of Zeolitic Imidazolate Framework Polymorphs
Neural networks trained on molecular configurations from different force fields classify ZIF polymorph phases accurately in simulations and expose transition mechanisms without force-field bias.
-
Accelerating 4D Hyperspectral Imaging through Physics-Informed Neural Representation and Adaptive Sampling
A physics-informed MLP reconstructs high-fidelity 4D spectra from only 1/32 of the samples in experimental 2DIR hyperspectral imaging.
-
Machine Learning Enhanced Laser Spectroscopy for Multi-Species Gas Detection in Complex and Harsh Environments
Machine learning methods including denoising autoencoders, unsupervised interference mitigation, blind source separation, and certifiable classification are developed and experimentally validated to improve multi-spec...
-
Investigation of cardinality classification for bacterial colony counting using explainable artificial intelligence
XAI analysis identifies high visual similarity across colony cardinality classes as the primary limit on MicrobiaNet performance in bacterial colony counting, revising prior model assessments.
-
A Multi-head Attention Fusion Network for Industrial Prognostics under Discrete Operational Conditions
A multi-head attention fusion network integrates monotonic degradation trends, discrete operating state embeddings from clustering, and residual noise using BiLSTM and attention mechanisms to improve prognostic accura...
-
Agentic AI platforms for autonomous training and rule induction of human-human and virus-human protein-protein interactions
Agentic AI platforms autonomously train 87%-accurate PPI prediction models on protein-disjoint data and induce aligning human-readable rules for human-human and virus-human interactions.
-
Learning to count small and clustered objects with application to bacterial colonies
ACFamNet Pro reaches 9.64% mean normalized absolute error on bacterial colony images under 5-fold cross-validation, beating FamNet by 12.71%.
Reference graph
Works this paper leans on
-
[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San- jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Leven- berg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray,...
-
[2]
TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http://tensorflow.org/ Software available from tensorflow.org
work page 2015
- [3]
-
[4]
Abdulrahman Alalshekmubarak and Leslie S Smith. 2013. A novel approach combining recurrent neural network and support vector machines for time series classification. In Innovations in Information Technology (IIT), 2013 9th International Conference on. IEEE, 42–47
work page 2013
-
[5]
François Chollet et al. 2015. Keras. https://github.com/keras-team/keras. (2015)
work page 2015
-
[6]
Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based models for speech recognition. InAdvances in Neural Information Processing Systems . 577–585
work page 2015
-
[7]
Richard HR Hahnloser, Rahul Sarpeshkar, Misha A Mahowald, Rodney J Douglas, and H Sebastian Seung. 2000. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, 6789 (2000), 947
work page 2000
-
[8]
J. D. Hunter. 2007. Matplotlib: A 2D graphics environment. Computing In Science & Engineering 9, 3 (2007), 90–95. https://doi.org/10.1109/MCSE.2007.55
-
[9]
Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimiza- tion. arXiv preprint arXiv:1412.6980 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[10]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifica- tion with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105
work page 2012
-
[11]
Yann LeCun, Corinna Cortes, and Christopher JC Burges. 2010. MNIST hand- written digit database. AT&T Labs [Online]. A vailable: http://yann. lecun. com/exd- b/mnist 2 (2010)
work page 2010
-
[12]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830
work page 2011
- [13]
-
[14]
Ludovic Trottier, Philippe Gigu, Brahim Chaib-draa, et al . 2017. Parametric exponential linear unit for deep convolutional neural networks. In Machine Learning and Applications (ICMLA), 2017 16th IEEE International Conference on . IEEE, 207–214
work page 2017
-
[15]
Stéfan van der Walt, S Chris Colbert, and Gael Varoquaux. 2011. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering 13, 2 (2011), 22–30
work page 2011
- [16]
-
[17]
William H Wolberg, W Nick Street, and Olvi L Mangasarian. 1992. Breast cancer Wisconsin (diagnostic) data set. UCI Machine Learning Repository [http://archive. ics. uci. edu/ml/] (1992)
work page 1992
-
[18]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. (2017). arXiv:cs.LG/1708.07747
work page internal anchor Pith review arXiv 2017
-
[19]
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J Smola, and Ed- uard H Hovy. 2016. Hierarchical Attention Networks for Document Classification.. In HLT-NAACL. 1480–1489
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.