pith. machine review for the scientific record. sign in

arxiv: 1803.08375 · v3 · submitted 2018-03-22 · 💻 cs.NE · cs.CV· cs.LG· stat.ML

Recognition: unknown

Deep Learning using Rectified Linear Units (ReLU)

Abien Fred Agarap

classification 💻 cs.NE cs.CVcs.LGstat.ML
keywords reluclassificationfunctionsdeepimagelineartanhtasks
0
0 comments X
read the original abstract

The Rectified Linear Unit (ReLU) is a foundational activation function in artficial neural networks. Recent literature frequently misattributes its origin to the 2018 (initial) version of this paper, which exclusively investigated ReLU at the classification layer. This paper formally corrects the citation record by tracing the mathematical lineage of piecewise linear functions from early biological models to their definitive integration into deep learning by Nair & Hinton (2010). Alongside this historical rectification, we present a comprehensive empirical comparison of the ReLU, Hyperbolic Tangent (Tanh), and Logistic (Sigmoid) activation functions across image classification, text classification, and image reconstruction tasks. To ensure statistical robustness, we evaluated these functions using 10 independent randomized trials and assessed significance using the non-parametric Kruskal-Wallis $H$ test. The empirical data validates the theoretical limitations of saturating functions. Sigmoid failed to converge in deep convolutional vision tasks due to the vanishing gradient problem, thus yielding accuracies equivalent to random probability. Conversely, ReLU and Tanh exhibited stable convergence. ReLU achieved the highest mean accuracy and F1-score on image classification and text classification tasks, while Tanh yielded the highest peak signal to noise ratio in image reconstruction. Ultimately, this study confirms a statistically significant performance variance among activations, thus reaffirming the necessity of non-saturating functions in deep architectures, and restores proper historical attribution to prior literature.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 15 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. QAP-Router: Tackling Qubit Routing as Dynamic Quadratic Assignment with Reinforcement Learning

    quant-ph 2026-05 unverdicted novelty 7.0

    QAP-Router models qubit routing as dynamic QAP and applies RL with a solution-aware Transformer to cut CNOT counts by 12-30% versus industry compilers on real circuit benchmarks.

  2. Galaxy clusters in the LoTSS-DR3: Catalogues and detection pipeline for diffuse radio emission

    astro-ph.CO 2026-05 unverdicted novelty 7.0

    A Radio U-Net pipeline produces pixel-level segmentation maps and probability scores for diffuse radio emission in 3822 galaxy clusters from LoTSS-DR3, yielding a high-confidence sample of 357 and confirming trends wi...

  3. Machine learning isotope shifts in molecular energy levels

    astro-ph.EP 2026-04 unverdicted novelty 7.0

    Neural network corrects residual errors in isotopologue energy extrapolations for CO2 (MAE reduction in >87% of levels vs Marvel) and transfers patterns to improve CO predictions in >93% of samples.

  4. Winner-Take-All Spiking Transformer for Language Modeling

    cs.NE 2026-04 unverdicted novelty 7.0

    Winner-take-all spiking self-attention replaces softmax in spiking transformers to support language modeling on 16 datasets with spike-driven, energy-efficient architectures.

  5. One-Step Score-Based Density Ratio Estimation

    stat.ML 2026-04 unverdicted novelty 7.0

    OS-DRE performs score-based density ratio estimation in one step by approximating the temporal score component with a closed-form RBF frame and providing error bounds from approximation theory.

  6. Neural Assistive Impulses: Synthesizing Exaggerated Motions for Physics-based Characters

    cs.AI 2026-04 unverdicted novelty 7.0

    A hybrid neural policy operating in impulse space enables physics-based characters to track exaggerated, dynamically infeasible motions that standard DRL methods cannot stabilize.

  7. Geometric Monomial (GEM): a family of rational 2N-differentiable activation functions

    cs.LG 2026-04 unverdicted novelty 6.0

    GEM is a new family of C^{2N}-smooth rational activation functions with variants that achieve performance on par with or exceeding GELU on ResNet, GPT-2, and BERT benchmarks.

  8. Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens

    cs.CV 2026-04 unverdicted novelty 6.0

    Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.

  9. Force Field-Agnostic Phase Classification of Zeolitic Imidazolate Framework Polymorphs

    cond-mat.mtrl-sci 2026-04 unverdicted novelty 6.0

    Neural networks trained on molecular configurations from different force fields classify ZIF polymorph phases accurately in simulations and expose transition mechanisms without force-field bias.

  10. Accelerating 4D Hyperspectral Imaging through Physics-Informed Neural Representation and Adaptive Sampling

    eess.IV 2026-04 unverdicted novelty 6.0

    A physics-informed MLP reconstructs high-fidelity 4D spectra from only 1/32 of the samples in experimental 2DIR hyperspectral imaging.

  11. Machine Learning Enhanced Laser Spectroscopy for Multi-Species Gas Detection in Complex and Harsh Environments

    physics.optics 2026-05 unverdicted novelty 5.0

    Machine learning methods including denoising autoencoders, unsupervised interference mitigation, blind source separation, and certifiable classification are developed and experimentally validated to improve multi-spec...

  12. Investigation of cardinality classification for bacterial colony counting using explainable artificial intelligence

    cs.CV 2026-04 unverdicted novelty 5.0

    XAI analysis identifies high visual similarity across colony cardinality classes as the primary limit on MicrobiaNet performance in bacterial colony counting, revising prior model assessments.

  13. A Multi-head Attention Fusion Network for Industrial Prognostics under Discrete Operational Conditions

    cs.LG 2026-04 unverdicted novelty 5.0

    A multi-head attention fusion network integrates monotonic degradation trends, discrete operating state embeddings from clustering, and residual noise using BiLSTM and attention mechanisms to improve prognostic accura...

  14. Agentic AI platforms for autonomous training and rule induction of human-human and virus-human protein-protein interactions

    cs.AI 2026-04 unverdicted novelty 4.0

    Agentic AI platforms autonomously train 87%-accurate PPI prediction models on protein-disjoint data and induce aligning human-readable rules for human-human and virus-human interactions.

  15. Learning to count small and clustered objects with application to bacterial colonies

    cs.CV 2026-04 unverdicted novelty 4.0

    ACFamNet Pro reaches 9.64% mean normalized absolute error on bacterial colony images under 5-fold cross-validation, beating FamNet by 12.71%.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · cited by 15 Pith papers · 2 internal anchors

  1. [1]

    Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San- jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Leven- berg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray,...

  2. [2]

    TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http://tensorflow.org/ Software available from tensorflow.org

  3. [3]

    Abien Fred Agarap. 2017. A Neural Network Architecture Combining Gated Recurrent Unit (GRU) and Support Vector Machine (SVM) for Intrusion Detection in Network Traffic Data. arXiv preprint arXiv:1709.03082 (2017)

  4. [4]

    Abdulrahman Alalshekmubarak and Leslie S Smith. 2013. A novel approach combining recurrent neural network and support vector machines for time series classification. In Innovations in Information Technology (IIT), 2013 9th International Conference on. IEEE, 42–47

  5. [5]

    François Chollet et al. 2015. Keras. https://github.com/keras-team/keras. (2015)

  6. [6]

    Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based models for speech recognition. InAdvances in Neural Information Processing Systems . 577–585

  7. [7]

    Richard HR Hahnloser, Rahul Sarpeshkar, Misha A Mahowald, Rodney J Douglas, and H Sebastian Seung. 2000. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, 6789 (2000), 947

  8. [8]

    J. D. Hunter. 2007. Matplotlib: A 2D graphics environment. Computing In Science & Engineering 9, 3 (2007), 90–95. https://doi.org/10.1109/MCSE.2007.55

  9. [9]

    Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimiza- tion. arXiv preprint arXiv:1412.6980 (2014)

  10. [10]

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifica- tion with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105

  11. [11]

    Yann LeCun, Corinna Cortes, and Christopher JC Burges. 2010. MNIST hand- written digit database. AT&T Labs [Online]. A vailable: http://yann. lecun. com/exd- b/mnist 2 (2010)

  12. [12]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830

  13. [13]

    Yichuan Tang. 2013. Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239 (2013)

  14. [14]

    Ludovic Trottier, Philippe Gigu, Brahim Chaib-draa, et al . 2017. Parametric exponential linear unit for deep convolutional neural networks. In Machine Learning and Applications (ICMLA), 2017 16th IEEE International Conference on . IEEE, 207–214

  15. [15]

    Stéfan van der Walt, S Chris Colbert, and Gael Varoquaux. 2011. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering 13, 2 (2011), 22–30

  16. [16]

    Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, and Steve Young. 2015. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. arXiv preprint arXiv:1508.01745 (2015)

  17. [17]

    William H Wolberg, W Nick Street, and Olvi L Mangasarian. 1992. Breast cancer Wisconsin (diagnostic) data set. UCI Machine Learning Repository [http://archive. ics. uci. edu/ml/] (1992)

  18. [18]

    Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. (2017). arXiv:cs.LG/1708.07747

  19. [19]

    Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J Smola, and Ed- uard H Hovy. 2016. Hierarchical Attention Networks for Document Classification.. In HLT-NAACL. 1480–1489