pith. sign in

arxiv: 2605.24797 · v1 · pith:HDOCTOBQnew · submitted 2026-05-24 · 💻 cs.CV

HCL-FF: Hierarchical and Contrastive Learning for Forward-Forward Algorithm

Pith reviewed 2026-06-30 12:33 UTC · model grok-4.3

classification 💻 cs.CV
keywords Forward-Forward algorithmhierarchical learningcontrastive learninglocal trainingimage classificationCIFAR-10CIFAR-100Tiny-ImageNet
0
0 comments X

The pith

HCL-FF adds a coarse-to-fine hierarchical strategy and supervised contrastive objective to the Forward-Forward algorithm to fix layer coordination and semantic ambiguity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The Forward-Forward algorithm trains each layer locally through goodness objectives but lacks coordination across layers and leaves features semantically unclear. The paper introduces HCL-FF to guide representations progressively from low-level cues to high-level semantics via the hierarchical strategy while using contrastive alignment to tie features to class labels. Experiments show these additions produce clear accuracy lifts over prior FF methods on standard image datasets. A sympathetic reader would care because the changes keep the local training property intact yet make the approach more competitive for practical vision tasks.

Core claim

The paper claims that the Hierarchical and Contrastive Learning FF framework (HCL-FF) overcomes the lack of hierarchical coordination and semantically ambiguous features in the Forward-Forward algorithm by introducing a coarse-to-fine hierarchical learning strategy and a supervised contrastive objective, achieving new state-of-the-art performance among FF-based methods with accuracy gains of +5.46% on CIFAR-10, +17.00% on CIFAR-100, and +12.51% on Tiny-ImageNet.

What carries the argument

The coarse-to-fine hierarchical learning strategy that builds representations from low-level cues to high-level semantics, combined with a supervised contrastive objective that enforces class-discriminative alignment after goodness decoupling.

If this is right

  • FF-based training becomes competitive on complex image classification benchmarks while preserving layer-independent updates.
  • Representations gain semantic clarity and class discriminability without requiring global backpropagation.
  • The method provides a concrete way to add cross-layer coordination to any local goodness-based learner.
  • Accuracy improvements scale with dataset difficulty, as seen in larger relative gains on CIFAR-100 and Tiny-ImageNet.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same two additions could be tested on non-vision domains such as audio or text where local training is desirable.
  • The contrastive component might be adapted to fully unsupervised settings by replacing class labels with other similarity signals.
  • Hardware implementations that already support local updates could adopt HCL-FF with minimal extra communication between layers.

Load-bearing premise

The hierarchical strategy and contrastive objective can be added without undermining the purely local, layer-independent training that defines the Forward-Forward approach.

What would settle it

An ablation that removes either the hierarchical layering or the contrastive objective and checks whether the reported accuracy gains disappear on CIFAR-100.

Figures

Figures reproduced from arXiv: 2605.24797 by C.-C. Jay Kuo, Hong-En Chen, Jie-En Yao.

Figure 1
Figure 1. Figure 1: (a) Overall architecture. (b) Structure of a residual block. (c) CW-Conv layer. Channels are partitioned into [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Hierarchy of CIFAR-10 classes constructed via cluster [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Layer-wise classification accuracy on CIFAR-10. The [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: t-SNE visualization of goodness and goodness [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of class-wise goodness responses from the final layer of each residual block on the CIFAR-10 dataset. Each column [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Hierarchy of CIFAR-10 classes constructed via cluster [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Layer-wise super-class accuracy at different hierarchy [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: CIFAR-10 training curves under a matched 150-epoch [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
read the original abstract

Deep neural networks trained with backpropagation have achieved outstanding performance in vision tasks but remain biologically implausible, computationally demanding, and difficult to interpret. The Forward-Forward (FF) algorithm offers a promising alternative by training each layer independently through local goodness objectives. However, its purely local optimization lacks hierarchical coordination across layers, and the decoupling of goodness from features leaves the representations unconstrained and semantically ambiguous. We propose a Hierarchical and Contrastive Learning FF framework (HCL-FF) to address these limitations. HCL-FF introduces (1) a coarse-to-fine hierarchical learning strategy that guides representations from low-level cues to high-level semantics, and (2) a supervised contrastive objective that enforces class-discriminative alignment after goodness decoupling. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that HCL-FF achieves new state-of-the-art performance among FF-based methods, with notable accuracy gains of +5.46%, +17.00%, and +12.51%, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes HCL-FF, an extension of the Forward-Forward algorithm that adds a coarse-to-fine hierarchical learning strategy to coordinate representations across layers and a supervised contrastive objective to enforce class-discriminative features after goodness decoupling. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet are reported to yield accuracy gains of +5.46%, +17.00%, and +12.51% respectively over prior FF-based methods, establishing new state-of-the-art results within the FF family.

Significance. If the empirical claims hold after full experimental details are supplied, the work would show that targeted global signals can be injected into layer-local training without destroying the locality that distinguishes FF from backpropagation. This would strengthen the case for FF-style algorithms as competitive, biologically motivated alternatives on standard vision benchmarks.

major comments (1)
  1. [Abstract] Abstract: the central claim of new state-of-the-art performance rests on reported accuracy gains of +5.46%, +17.00%, and +12.51%, yet the abstract supplies no information on the precise baseline FF methods, their absolute accuracies, the number of independent runs, hyperparameter search protocol, or statistical significance tests. These omissions render the magnitude and reliability of the improvements impossible to assess from the provided text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive evaluation of the work's potential impact. We address the sole major comment below and will revise the manuscript to strengthen the abstract.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of new state-of-the-art performance rests on reported accuracy gains of +5.46%, +17.00%, and +12.51%, yet the abstract supplies no information on the precise baseline FF methods, their absolute accuracies, the number of independent runs, hyperparameter search protocol, or statistical significance tests. These omissions render the magnitude and reliability of the improvements impossible to assess from the provided text.

    Authors: We agree that the abstract should be more self-contained. In the revised version we will explicitly name the baseline FF methods (original FF, FF with local contrastive loss, and the strongest prior variant), report their absolute accuracies alongside the gains, state that results are averaged over 5 independent runs with standard deviations, briefly note the hyperparameter protocol (grid search over learning rate and layer-wise goodness thresholds), and indicate that gains exceed run-to-run variance. Full tables, search details, and significance testing appear in Section 4; the abstract update will simply surface the key numbers without lengthening the text excessively. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes an algorithmic extension to the Forward-Forward method (hierarchical coarse-to-fine strategy plus supervised contrastive loss) and reports empirical accuracy on public benchmarks (CIFAR-10/100, Tiny-ImageNet). No equations, uniqueness theorems, or first-principles derivations are present that reduce reported results to quantities defined by the method's own fitted parameters or self-citations. All central claims are direct experimental measurements on external datasets, making the evaluation chain independent of internal construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete. No explicit free parameters, axioms, or invented entities are named; the method implicitly relies on standard supervised contrastive loss formulations and hierarchical training schedules whose concrete values are not disclosed.

pith-pipeline@v0.9.1-grok · 5713 in / 1282 out tokens · 33821 ms · 2026-06-30T12:33:09.684065+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

68 extracted references · 16 canonical work pages · 4 internal anchors

  1. [1]

    Marginal con- trastive loss: A step forward for forward-forward

    Hossein Aghagolzadeh and Mehdi Ezoji. Marginal con- trastive loss: A step forward for forward-forward. In2024 13th Iranian/3rd International Machine Vision and Image Processing Conference (MVIP), pages 1–6. IEEE, 2024. 2

  2. [2]

    Forward-forward contrastive learning.arXiv preprint arXiv:2305.02927, 2023

    Md Atik Ahamed, Jin Chen, and Abdullah-Al-Zubaer Im- ran. Forward-forward contrastive learning.arXiv preprint arXiv:2305.02927, 2023. 2

  3. [3]

    Deep learning without weight transport.Advances in neural information processing systems, 32, 2019

    Mohamed Akrout, Collin Wilson, Peter Humphreys, Timo- thy Lillicrap, and Douglas B Tweed. Deep learning without weight transport.Advances in neural information processing systems, 32, 2019. 2

  4. [4]

    Layer-wise learning framework for efficient dnn deployment in biomedical wear- able systems

    Saleh Baghersalimi, Alireza Amirshahi, Tomas Teijeiro, Amir Aminifar, and David Atienza. Layer-wise learning framework for efficient dnn deployment in biomedical wear- able systems. In2023 IEEE 19th International Conference On Body Sensor Networks (BSN), pages 1–4. IEEE, 2023. 1

  5. [5]

    Assessing the scalability of biologically-motivated deep learning algo- rithms and architectures.Advances in neural information processing systems, 31, 2018

    Sergey Bartunov, Adam Santoro, Blake Richards, Luke Mar- ris, Geoffrey E Hinton, and Timothy Lillicrap. Assessing the scalability of biologically-motivated deep learning algo- rithms and architectures.Advances in neural information processing systems, 31, 2018. 1, 2

  6. [6]

    How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation

    Yoshua Bengio. How auto-encoders could provide credit assignment in deep networks via target propagation.arXiv preprint arXiv:1407.7906, 2014. 2

  7. [7]

    Self-contrastive forward-forward algorithm.Na- ture Communications, 16(1):5978, 2025

    Xing Chen, Dongshu Liu, J ´er´emie Laydevant, and Julie Grollier. Self-contrastive forward-forward algorithm.Na- ture Communications, 16(1):5978, 2025. 1, 2, 5, 6

  8. [8]

    An analysis of single-layer networks in unsupervised feature learning

    Adam Coates, Andrew Ng, and Honglak Lee. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 215–223. JMLR Workshop and Conference Proceedings, 2011. 2

  9. [9]

    The recent excitement about neural networks

    Francis Crick. The recent excitement about neural networks. Nature, 337(6203):129–132, 1989. 1

  10. [10]

    Error-driven in- put modulation: solving the credit assignment problem with- out a backward pass

    Giorgia Dellaferrera and Gabriel Kreiman. Error-driven in- put modulation: solving the credit assignment problem with- out a backward pass. InInternational Conference on Ma- chine Learning, pages 4937–4955. PMLR, 2022. 2, 5, 6

  11. [11]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 4

  12. [12]

    The trifecta: Three simple techniques for training deeper forward-forward networks

    Thomas Dooms, Ing Jyh Tsang, and Jose Oramas. The tri- fecta: Three simple techniques for training deeper forward- forward networks.arXiv preprint arXiv:2311.18130, 2023. 1, 2, 5, 6

  13. [13]

    Towards scaling difference target propagation by learning backprop targets

    Maxence M Ernoult, Fabrice Normandin, Abhinav Moudgil, Sean Spinney, Eugene Belilovsky, Irina Rish, Blake Richards, and Yoshua Bengio. Towards scaling difference target propagation by learning backprop targets. InInterna- tional Conference on Machine Learning, pages 5968–5987. PMLR, 2022. 1, 2, 5, 6

  14. [14]

    Learn- ing without feedback: Fixed random learning signals allow for feedforward training of deep neural networks.Frontiers in neuroscience, 15:629892, 2021

    Charlotte Frenkel, Martin Lefebvre, and David Bol. Learn- ing without feedback: Fixed random learning signals allow for feedforward training of deep neural networks.Frontiers in neuroscience, 15:629892, 2021. 2

  15. [15]

    Investigating random variations of the forward-forward algorithm for training neural networks

    Fabio Giampaolo, Stefano Izzo, Edoardo Prezioso, and Francesco Piccialli. Investigating random variations of the forward-forward algorithm for training neural networks. In 2023 International Joint Conference on Neural Networks (IJCNN), pages 1–7. IEEE, 2023. 2

  16. [16]

    Competitive learning: From interactive activation to adaptive resonance.Cognitive science, 11(1): 23–63, 1987

    Stephen Grossberg. Competitive learning: From interactive activation to adaptive resonance.Cognitive science, 11(1): 23–63, 1987. 1, 2

  17. [17]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 5, 6, 11

  18. [18]

    The forward-forward algorithm: Some preliminary investigations

    Geoffrey Hinton. The forward-forward algorithm: Some pre- liminary investigations.arXiv preprint arXiv:2212.13345, 2 (3):5, 2022. 1, 2, 5, 6

  19. [19]

    Decoupled neural interfaces using synthetic gradients

    Max Jaderberg, Wojciech Marian Czarnecki, Simon Osin- dero, Oriol Vinyals, Alex Graves, David Silver, and Koray Kavukcuoglu. Decoupled neural interfaces using synthetic gradients. InInternational conference on machine learning, pages 1627–1635. PMLR, 2017. 1

  20. [20]

    Hebbian deep learning without feed- back.arXiv preprint arXiv:2209.11883, 2022

    Adrien Journ ´e, Hector Garcia Rodriguez, Qinghai Guo, and Timoleon Moraitis. Hebbian deep learning without feed- back.arXiv preprint arXiv:2209.11883, 2022. 1, 2, 5, 6

  21. [21]

    Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673,

    Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673,

  22. [22]

    Signal propagation: The framework for learning and infer- ence in a forward pass.IEEE Transactions on Neural Net- works and Learning Systems, 35(6):8585–8596, 2023

    Adam Kohan, Edward A Rietman, and Hava T Siegelmann. Signal propagation: The framework for learning and infer- ence in a forward pass.IEEE Transactions on Neural Net- works and Learning Systems, 35(6):8585–8596, 2023. 1, 2, 5, 6

  23. [23]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. 5

  24. [24]

    Hebbian semi-supervised learning in a sample efficiency setting.Neural Networks, 143:719–731,

    Gabriele Lagani, Fabrizio Falchi, Claudio Gennaro, and Giuseppe Amato. Hebbian semi-supervised learning in a sample efficiency setting.Neural Networks, 143:719–731,

  25. [25]

    Principled Training of Neural Networks with Direct Feedback Alignment

    Julien Launay, Iacopo Poli, and Florent Krzakala. Principled training of neural networks with direct feedback alignment. arXiv preprint arXiv:1906.04554, 2019. 2

  26. [26]

    Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge. 2015. 5

  27. [27]

    Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 2002

    Yann LeCun, L ´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 2002. 5

  28. [28]

    Difference target propagation

    Dong-Hyun Lee, Saizheng Zhang, Asja Fischer, and Yoshua Bengio. Difference target propagation. InJoint european conference on machine learning and knowledge discovery in databases, pages 498–515. Springer, 2015. 2

  29. [29]

    Symba: Symmetric backpropagation-free contrastive learning with forward-forward algorithm for optimizing convergence

    Heung-Chang Lee and Jeonggeun Song. Symba: Symmet- ric backpropagation-free contrastive learning with forward- forward algorithm for optimizing convergence.arXiv preprint arXiv:2303.08418, 2023. 2, 5, 6

  30. [30]

    Random synaptic feedback weights support error backpropagation for deep learning.Nature communications, 7(1):13276, 2016

    Timothy P Lillicrap, Daniel Cownden, Douglas B Tweed, and Colin J Akerman. Random synaptic feedback weights support error backpropagation for deep learning.Nature communications, 7(1):13276, 2016. 2

  31. [31]

    Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020

    Timothy P Lillicrap, Adam Santoro, Luke Marris, Colin J Akerman, and Geoffrey Hinton. Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020. 1

  32. [32]

    Sphereface: Deep hypersphere embedding for face recognition

    Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. Sphereface: Deep hypersphere embedding for face recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 212–220,

  33. [33]

    Layer collaboration in the forward- forward algorithm

    Guy Lorberbom, Itai Gat, Yossi Adi, Alexander Schwing, and Tamir Hazan. Layer collaboration in the forward- forward algorithm. InProceedings of the AAAI Conference on Artificial Intelligence, pages 14141–14148, 2024. 2

  34. [34]

    Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008

    Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008. 7

  35. [35]

    Hebbian learning with gradients: Hebbian convolutional neural networks with modern deep learning frameworks.arXiv preprint arXiv:2107.01729, 2021

    Thomas Miconi. Hebbian learning with gradients: Hebbian convolutional neural networks with modern deep learning frameworks.arXiv preprint arXiv:2107.01729, 2021. 2

  36. [36]

    Efficient Estimation of Word Representations in Vector Space

    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. 4, 7

  37. [37]

    Wordnet: a lexical database for english

    George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995. 4, 7

  38. [38]

    Softhebb: Bayesian inference in unsupervised hebbian soft winner-take-all networks.Neu- romorphic Computing and Engineering, 2(4):044017, 2022

    Timoleon Moraitis, Dmitry Toichkin, Adrien Journ ´e, Yan- song Chua, and Qinghai Guo. Softhebb: Bayesian inference in unsupervised hebbian soft winner-take-all networks.Neu- romorphic Computing and Engineering, 2(4):044017, 2022. 2

  39. [39]

    Direct feedback alignment provides learning in deep neural networks.Advances in neural information processing systems, 29, 2016

    Arild Nøkland. Direct feedback alignment provides learning in deep neural networks.Advances in neural information processing systems, 29, 2016. 2

  40. [40]

    The predictive forward-forward algorithm

    Alexander Ororbia and Ankur Mali. The predictive forward- forward algorithm.arXiv preprint arXiv:2301.01452, 2023. 2

  41. [41]

    Brain-inspired machine intelligence: A survey of neurobiologically-plausible credit assignment

    Alexander G Ororbia. Brain-inspired machine intelligence: A survey of neurobiologically-plausible credit assignment. arXiv preprint arXiv:2312.09257, 2023. 2

  42. [42]

    Biologically moti- vated algorithms for propagating local target representations

    Alexander G Ororbia and Ankur Mali. Biologically moti- vated algorithms for propagating local target representations. InProceedings of the aaai conference on artificial intelli- gence, pages 4651–4658, 2019. 2

  43. [43]

    Backpropagation-free deep learning with recur- sive local representation alignment

    Alexander G Ororbia, Ankur Mali, Daniel Kifer, and C Lee Giles. Backpropagation-free deep learning with recur- sive local representation alignment. InProceedings of the AAAI conference on artificial intelligence, pages 9327–9335,

  44. [44]

    Graph neural networks go forward-forward.arXiv preprint arXiv:2302.05282, 2023

    Daniele Paliotta, Mathieu Alain, B ´alint M´at´e, and Franc ¸ois Fleuret. Graph neural networks go forward-forward.arXiv preprint arXiv:2302.05282, 2023. 2

  45. [45]

    Convolutional channel- wise competitive learning for the forward-forward algorithm

    Andreas Papachristodoulou, Christos Kyrkou, Stelios Timo- theou, and Theocharis Theocharides. Convolutional channel- wise competitive learning for the forward-forward algorithm. InProceedings of the AAAI Conference on Artificial Intelli- gence, pages 14536–14544, 2024. 1, 2, 3, 5, 6, 13

  46. [46]

    Suitabil- ity of forward-forward and pepita learning to mlcommons- tiny benchmarks

    Danilo Pietro Pau and Fabrizio Maria Aymone. Suitabil- ity of forward-forward and pepita learning to mlcommons- tiny benchmarks. In2023 IEEE International Conference on Omni-layer Intelligent Systems (COINS), pages 1–6. IEEE,

  47. [47]

    Forward- forward algorithm for hyperspectral image classification

    Abel A Reyes-Angulo and Sidike Paheding. Forward- forward algorithm for hyperspectral image classification. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 3153–3161, 2024. 2

  48. [48]

    Feature discovery by competitive learning.Cognitive science, 9(1):75–112, 1985

    David E Rumelhart and David Zipser. Feature discovery by competitive learning.Cognitive science, 9(1):75–112, 1985. 2

  49. [49]

    Learning representations by back-propagating er- rors.nature, 323(6088):533–536, 1986

    David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating er- rors.nature, 323(6088):533–536, 1986. 1

  50. [50]

    Training convolutional neural net- works with the forward-forward algorithm.arXiv preprint arXiv:2312.14924, 2023

    Riccardo Scodellaro, Ajinkya Kulkarni, Frauke Alves, and Matthias Schr ¨oter. Training convolutional neural net- works with the forward-forward algorithm.arXiv preprint arXiv:2312.14924, 2023. 2

  51. [51]

    Competi- tive hebbian learning through spike-timing-dependent synap- tic plasticity.Nature neuroscience, 3(9):919–926, 2000

    Sen Song, Kenneth D Miller, and Larry F Abbott. Competi- tive hebbian learning through spike-timing-dependent synap- tic plasticity.Nature neuroscience, 3(9):919–926, 2000. 2

  52. [52]

    Forward learning with top-down feedback: Empirical and analytical characterization.arXiv preprint arXiv:2302.05440, 2023

    Ravi Srinivasan, Francesca Mignacco, Martino Sorbaro, Maria Refinetti, Avi Cooper, Gabriel Kreiman, and Gior- gia Dellaferrera. Forward learning with top-down feedback: Empirical and analytical characterization.arXiv preprint arXiv:2302.05440, 2023. 1, 2

  53. [53]

    Deeperforward: Enhanced forward- forward training for deeper and better performance

    Liang Sun, Yang Zhang, Jiajun Wen, Linlin Shen, Weicheng Xie, and Weizhao He. Deeperforward: Enhanced forward- forward training for deeper and better performance. InInter- national Conference on Learning Representations, 2025. 1, 2, 3, 5, 6, 11, 12, 13

  54. [54]

    Nbdt: Neural-backed decision trees.arXiv preprint arXiv:2004.00221,

    Alvin Wan, Lisa Dunlap, Daniel Ho, Jihan Yin, Scott Lee, Henry Jin, Suzanne Petryk, Sarah Adel Bargal, and Joseph E Gonzalez. Nbdt: Neural-backed decision trees.arXiv preprint arXiv:2004.00221, 2020. 4, 13

  55. [55]

    Normface: L2 hypersphere embedding for face veri- fication

    Feng Wang, Xiang Xiang, Jian Cheng, and Alan Loddon Yuille. Normface: L2 hypersphere embedding for face veri- fication. InProceedings of the 25th ACM international con- ference on Multimedia, pages 1041–1049, 2017

  56. [56]

    Cosface: Large margin cosine loss for deep face recognition

    Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. Cosface: Large margin cosine loss for deep face recognition. InPro- ceedings of the IEEE conference on computer vision and pat- tern recognition, pages 5265–5274, 2018. 13

  57. [57]

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion- mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017. 5

  58. [58]

    The cascaded forward algorithm for neural network training.Pattern Recognition, 161:111292, 2025

    Gongpei Zhao, Tao Wang, Yi Jin, Congyan Lang, Yidong Li, and Haibin Ling. The cascaded forward algorithm for neural network training.Pattern Recognition, 161:111292, 2025. 2, 5, 6 Table 6. Architectural configurations and training hyperparameters for all datasets. CIFAR-10 CIFAR-100 MNIST F-MNIST Tiny-ImageNet Channels per residual block [100, 200, 400, 8...

  59. [59]

    Table 6 summarizes the full architectural configurations and training hyperparameters

    Implementation Details For all experiments, we adopt a residual Forward-Forward (FF) architecture [53] consisting of four residual blocks. Table 6 summarizes the full architectural configurations and training hyperparameters. CIFAR-10 and CIFAR-100 use channel widths of[100,200,400,800]across the four residual blocks; MNIST and F-MNIST use lighter widths ...

  60. [60]

    In the FF setting, where gra- dients are not propagated across layers, residual shortcuts instead serve to fuse information across depths

    Residual Shortcuts Residual structures are commonly used in backpropagation- based (BP) networks to ease optimization by providing gra- dient shortcut pathways [17]. In the FF setting, where gra- dients are not propagated across layers, residual shortcuts instead serve to fuse information across depths. Follow- ing DeeperForward [53], we adopt parameter-f...

  61. [61]

    Unlike BP-based models, which rely solely on the final layer’s logits, FF networks produce a goodness score at ev- ery layer

    Signal Integrating and Pruning Module In this section, we elaborate on the Signal Integrating and Pruning (SIP) module introduced in DeeperForward [53]. Unlike BP-based models, which rely solely on the final layer’s logits, FF networks produce a goodness score at ev- ery layer. Each layer-wise goodness vectorg (ℓ) already en- codes partial class evidence ...

  62. [62]

    Following the Neural- Backed Decision Tree (NBDT) framework [54], we employ a data-driven procedure that derives a semantic hierarchy directly from a trained classifier

    Building Hierarchies In this section, we describe how we construct the class hi- erarchy used for hierarchical supervision in detail and clar- ify the rationale behind this design. Following the Neural- Backed Decision Tree (NBDT) framework [54], we employ a data-driven procedure that derives a semantic hierarchy directly from a trained classifier. This a...

  63. [63]

    3 of the main paper, we further examine layer-wise accuracy at each hierarchy level, comparing against CwComp [45] and DeeperForward [53]

    Super-class Accuracy To complement the per-layer analysis presented in Fig. 3 of the main paper, we further examine layer-wise accuracy at each hierarchy level, comparing against CwComp [45] and DeeperForward [53]. Figure 7 reports Level 1, Level 2, and Level 3 super-class accuracy, as well as overall fine-grained Figure 6. Hierarchy of CIFAR-10 classes c...

  64. [64]

    Figure 8 re- ports CIFAR-10 accuracy as a function of total parameter count

    Model Efficiency To evaluate parameter efficiency, we compare HCL-FF with DeeperForward, along with BP-trained ResNet-20 and the parameter-matched ResNet-20-wide baseline. Figure 8 re- ports CIFAR-10 accuracy as a function of total parameter count. We evaluate four model capacities for both HCL-FF and DeeperForward, where the channel widths of the four re...

  65. [65]

    All models are trained for 150 epochs for a fair comparison

    Learning Dynamics To further compare the convergence behavior of HCL-FF, DeeperForward, and standard backpropagation, we exam- ine their CIFAR-10 learning curves under a matched train- ing schedule. All models are trained for 150 epochs for a fair comparison. Figure 9 reports test accuracy across train- ing epochs. HCL-FF converges noticeably faster and t...

  66. [66]

    10 of the main paper, we evaluate two alternative strate- gies for assigning hierarchy levels across layers

    Hierarchy Mapping In addition to the hierarchy mapping method described in Eq. 10 of the main paper, we evaluate two alternative strate- gies for assigning hierarchy levels across layers. Table 9. Effect of alternative hierarchy mapping strategies. Dataset Incremental Decremental Balanced CIFAR-100 Accuracy 70.30 68.40 70.76 Table 10. Effect of batch size...

  67. [67]

    HCL-FF demonstrates stable performance across a broad range, with accuracy varying by less than 0.5% between batch sizes of 32 and 512

    Effect of Batch Size Table 10 shows the effect of batch size on CIFAR-10 per- formance. HCL-FF demonstrates stable performance across a broad range, with accuracy varying by less than 0.5% between batch sizes of 32 and 512. This indicates that the contrastive component of HCL-FF does not rely on ex- tremely large batches, in contrast to many contrastive l...

  68. [68]

    All models have comparable numbers of parameters and are trained for 200 epochs with a batch size of 512, ensuring fair model size and equal numbers of weight updates

    Computational Cost To demonstrate the computational aspect of HCL-FF, we compare HCL-FF against a BP-trained ResNet variant and DeeperForward under matched training conditions. All models have comparable numbers of parameters and are trained for 200 epochs with a batch size of 512, ensuring fair model size and equal numbers of weight updates. For HCL-FF, ...