Deep Learning using Rectified Linear Units (ReLU)

Abien Fred Agarap

classification 💻 cs.NE cs.CVcs.LGstat.ML

keywords reluclassificationfunctionsdeepimagelineartanhtasks

read the original abstract

The Rectified Linear Unit (ReLU) is a foundational activation function in artficial neural networks. Recent literature frequently misattributes its origin to the 2018 (initial) version of this paper, which exclusively investigated ReLU at the classification layer. This paper formally corrects the citation record by tracing the mathematical lineage of piecewise linear functions from early biological models to their definitive integration into deep learning by Nair & Hinton (2010). Alongside this historical rectification, we present a comprehensive empirical comparison of the ReLU, Hyperbolic Tangent (Tanh), and Logistic (Sigmoid) activation functions across image classification, text classification, and image reconstruction tasks. To ensure statistical robustness, we evaluated these functions using 10 independent randomized trials and assessed significance using the non-parametric Kruskal-Wallis $H$ test. The empirical data validates the theoretical limitations of saturating functions. Sigmoid failed to converge in deep convolutional vision tasks due to the vanishing gradient problem, thus yielding accuracies equivalent to random probability. Conversely, ReLU and Tanh exhibited stable convergence. ReLU achieved the highest mean accuracy and F1-score on image classification and text classification tasks, while Tanh yielded the highest peak signal to noise ratio in image reconstruction. Ultimately, this study confirms a statistically significant performance variance among activations, thus reaffirming the necessity of non-saturating functions in deep architectures, and restores proper historical attribution to prior literature.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 15 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

QAP-Router: Tackling Qubit Routing as Dynamic Quadratic Assignment with Reinforcement Learning
quant-ph 2026-05 unverdicted novelty 7.0

QAP-Router models qubit routing as dynamic QAP and applies RL with a solution-aware Transformer to cut CNOT counts by 12-30% versus industry compilers on real circuit benchmarks.
Galaxy clusters in the LoTSS-DR3: Catalogues and detection pipeline for diffuse radio emission
astro-ph.CO 2026-05 unverdicted novelty 7.0

A Radio U-Net pipeline produces pixel-level segmentation maps and probability scores for diffuse radio emission in 3822 galaxy clusters from LoTSS-DR3, yielding a high-confidence sample of 357 and confirming trends wi...
Machine learning isotope shifts in molecular energy levels
astro-ph.EP 2026-04 unverdicted novelty 7.0

Neural network corrects residual errors in isotopologue energy extrapolations for CO2 (MAE reduction in >87% of levels vs Marvel) and transfers patterns to improve CO predictions in >93% of samples.
Winner-Take-All Spiking Transformer for Language Modeling
cs.NE 2026-04 unverdicted novelty 7.0

Winner-take-all spiking self-attention replaces softmax in spiking transformers to support language modeling on 16 datasets with spike-driven, energy-efficient architectures.
One-Step Score-Based Density Ratio Estimation
stat.ML 2026-04 unverdicted novelty 7.0

OS-DRE performs score-based density ratio estimation in one step by approximating the temporal score component with a closed-form RBF frame and providing error bounds from approximation theory.
Neural Assistive Impulses: Synthesizing Exaggerated Motions for Physics-based Characters
cs.AI 2026-04 unverdicted novelty 7.0

A hybrid neural policy operating in impulse space enables physics-based characters to track exaggerated, dynamically infeasible motions that standard DRL methods cannot stabilize.
Geometric Monomial (GEM): a family of rational 2N-differentiable activation functions
cs.LG 2026-04 unverdicted novelty 6.0

GEM is a new family of C^{2N}-smooth rational activation functions with variants that achieve performance on par with or exceeding GELU on ResNet, GPT-2, and BERT benchmarks.
Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens
cs.CV 2026-04 unverdicted novelty 6.0

Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.
Force Field-Agnostic Phase Classification of Zeolitic Imidazolate Framework Polymorphs
cond-mat.mtrl-sci 2026-04 unverdicted novelty 6.0

Neural networks trained on molecular configurations from different force fields classify ZIF polymorph phases accurately in simulations and expose transition mechanisms without force-field bias.
Accelerating 4D Hyperspectral Imaging through Physics-Informed Neural Representation and Adaptive Sampling
eess.IV 2026-04 unverdicted novelty 6.0

A physics-informed MLP reconstructs high-fidelity 4D spectra from only 1/32 of the samples in experimental 2DIR hyperspectral imaging.
Machine Learning Enhanced Laser Spectroscopy for Multi-Species Gas Detection in Complex and Harsh Environments
physics.optics 2026-05 unverdicted novelty 5.0

Machine learning methods including denoising autoencoders, unsupervised interference mitigation, blind source separation, and certifiable classification are developed and experimentally validated to improve multi-spec...
Investigation of cardinality classification for bacterial colony counting using explainable artificial intelligence
cs.CV 2026-04 unverdicted novelty 5.0

XAI analysis identifies high visual similarity across colony cardinality classes as the primary limit on MicrobiaNet performance in bacterial colony counting, revising prior model assessments.
A Multi-head Attention Fusion Network for Industrial Prognostics under Discrete Operational Conditions
cs.LG 2026-04 unverdicted novelty 5.0

A multi-head attention fusion network integrates monotonic degradation trends, discrete operating state embeddings from clustering, and residual noise using BiLSTM and attention mechanisms to improve prognostic accura...
Agentic AI platforms for autonomous training and rule induction of human-human and virus-human protein-protein interactions
cs.AI 2026-04 unverdicted novelty 4.0

Agentic AI platforms autonomously train 87%-accurate PPI prediction models on protein-disjoint data and induce aligning human-readable rules for human-human and virus-human interactions.
Learning to count small and clustered objects with application to bacterial colonies
cs.CV 2026-04 unverdicted novelty 4.0

ACFamNet Pro reaches 9.64% mean normalized absolute error on bacterial colony images under 5-fold cross-validation, beating FamNet by 12.71%.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · cited by 15 Pith papers · 2 internal anchors

[1]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San- jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Leven- berg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray,...

work page
[2]

TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http://tensorflow.org/ Software available from tensorflow.org

work page 2015
[3]

Abien Fred Agarap. 2017. A Neural Network Architecture Combining Gated Recurrent Unit (GRU) and Support Vector Machine (SVM) for Intrusion Detection in Network Traffic Data. arXiv preprint arXiv:1709.03082 (2017)

work page arXiv 2017
[4]

Abdulrahman Alalshekmubarak and Leslie S Smith. 2013. A novel approach combining recurrent neural network and support vector machines for time series classification. In Innovations in Information Technology (IIT), 2013 9th International Conference on. IEEE, 42–47

work page 2013
[5]

François Chollet et al. 2015. Keras. https://github.com/keras-team/keras. (2015)

work page 2015
[6]

Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based models for speech recognition. InAdvances in Neural Information Processing Systems . 577–585

work page 2015
[7]

Richard HR Hahnloser, Rahul Sarpeshkar, Misha A Mahowald, Rodney J Douglas, and H Sebastian Seung. 2000. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, 6789 (2000), 947

work page 2000
[8]

J. D. Hunter. 2007. Matplotlib: A 2D graphics environment. Computing In Science & Engineering 9, 3 (2007), 90–95. https://doi.org/10.1109/MCSE.2007.55

work page doi:10.1109/mcse.2007.55 2007
[9]

Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimiza- tion. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[10]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifica- tion with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105

work page 2012
[11]

Yann LeCun, Corinna Cortes, and Christopher JC Burges. 2010. MNIST hand- written digit database. AT&T Labs [Online]. A vailable: http://yann. lecun. com/exd- b/mnist 2 (2010)

work page 2010
[12]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830

work page 2011
[13]

Yichuan Tang. 2013. Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239 (2013)

work page arXiv 2013
[14]

Ludovic Trottier, Philippe Gigu, Brahim Chaib-draa, et al . 2017. Parametric exponential linear unit for deep convolutional neural networks. In Machine Learning and Applications (ICMLA), 2017 16th IEEE International Conference on . IEEE, 207–214

work page 2017
[15]

Stéfan van der Walt, S Chris Colbert, and Gael Varoquaux. 2011. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering 13, 2 (2011), 22–30

work page 2011
[16]

Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, and Steve Young. 2015. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. arXiv preprint arXiv:1508.01745 (2015)

work page arXiv 2015
[17]

William H Wolberg, W Nick Street, and Olvi L Mangasarian. 1992. Breast cancer Wisconsin (diagnostic) data set. UCI Machine Learning Repository [http://archive. ics. uci. edu/ml/] (1992)

work page 1992
[18]

Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. (2017). arXiv:cs.LG/1708.07747

work page internal anchor Pith review arXiv 2017
[19]

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J Smola, and Ed- uard H Hovy. 2016. Hierarchical Attention Networks for Document Classification.. In HLT-NAACL. 1480–1489

work page 2016