Adaptive Multi-Scale Goodness Aggregation for Forward-Forward Learning

Salar Beigzad; Vansh Verma

arxiv: 2605.18804 · v1 · pith:BLUT7JGUnew · submitted 2026-05-11 · 💻 cs.LG · cs.AI

Adaptive Multi-Scale Goodness Aggregation for Forward-Forward Learning

Salar Beigzad , Vansh Verma This is my paper

Pith reviewed 2026-05-20 22:25 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords forward-forward algorithmlocal learninggoodness aggregationadaptive thresholdshard negative miningMNISTimage classificationneural network stability

0 comments

The pith

Adaptive multi-scale goodness aggregation improves Forward-Forward accuracy and stability on image tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Adaptive Multi-Scale Goodness Aggregation as an extension to the Forward-Forward algorithm for training neural networks locally. It adds multi-scale goodness measures across network layers, curriculum-guided selection of hard negative examples, thresholds that adapt per layer, and a warm-up cosine annealing schedule for the learning rate. These elements together target better stability and generalization while keeping the original method's low memory use and biological plausibility. Tests on MNIST and Fashion-MNIST show accuracy gains of up to 1.45 percent and 1.50 percent respectively, with no notable rise in computation. A reader would care because the work shows how local learning signals can be refined to narrow the performance difference with global backpropagation methods.

Core claim

The authors claim that combining multi-scale goodness aggregation across local, intermediate, and global representations with adaptive curriculum-guided hard negative mining, layer-dependent adaptive thresholds, and a warm-up cosine annealing schedule strengthens the Forward-Forward algorithm, yielding higher accuracy, greater stability, and better generalization on classification tasks without sacrificing its memory-efficient and locally updated nature.

What carries the argument

Adaptive Multi-Scale Goodness Aggregation (AMSGA), which pools goodness estimates from multiple scales of the network and pairs them with adaptive negative mining and per-layer thresholds to drive local updates.

If this is right

Local-learning networks achieve higher test accuracy on MNIST and Fashion-MNIST than the original Forward-Forward baseline.
The added components introduce no significant computational overhead.
The method retains the memory efficiency and layer-local update rules of the baseline algorithm.
Training dynamics become more stable through the use of adaptive thresholds and curriculum-based negative selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adaptive aggregation and threshold ideas could be tested on other local-learning algorithms to see if they produce similar gains outside the Forward-Forward setting.
If the multi-scale approach scales, it might reduce reliance on global error signals for training deeper networks on larger image datasets.
The curriculum-guided mining component suggests a general way to focus local updates on informative examples that could apply to unsupervised local learning variants.

Load-bearing premise

The observed accuracy and stability gains arise from the specific combination of multi-scale aggregation, adaptive mining, layer thresholds, and annealing schedule rather than from unstated hyperparameter choices or effects limited to the MNIST and Fashion-MNIST datasets.

What would settle it

An ablation experiment that adds only the warm-up cosine annealing schedule to the baseline Forward-Forward algorithm and measures whether the reported accuracy gains of 1.45-1.50 percent still appear on MNIST and Fashion-MNIST; absence of those gains would indicate the other proposed components are not the source of improvement.

read the original abstract

We propose Adaptive Multi-Scale Goodness Aggregation (AMSGA), a novel extension of the Forward-Forward (FF) algorithm designed to improve stability, robustness, and generalization in local-learning neural networks. AMSGA addresses several limitations of the original FF framework by introducing multi-scale goodness aggregation across local, intermediate, and global representations; adaptive curriculum-guided hard negative mining; layer-dependent adaptive thresholds; and a warm-up cosine annealing learning-rate schedule for improved optimization stability. Together, these modifications strengthen the FF paradigm while preserving its biologically plausible and memory-efficient properties. Experiments on MNIST and Fashion-MNIST demonstrate consistent performance improvements over the baseline FF algorithm, achieving up to +1.45% improvement on MNIST and +1.50% improvement on Fashion-MNIST without significant computational overhead. Our results suggest that local learning methods can become substantially more competitive when goodness estimation and training dynamics are carefully designed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Modest gains on MNIST and Fashion-MNIST from a bundle of tweaks to Forward-Forward, but no ablations yet to show which pieces actually drive the improvement.

read the letter

The paper bundles several changes into the Forward-Forward algorithm: multi-scale goodness aggregation across local, intermediate, and global levels, curriculum-guided hard negative mining, layer-dependent adaptive thresholds, and a warm-up cosine annealing schedule. It reports roughly 1.45% and 1.50% lifts over the plain FF baseline on MNIST and Fashion-MNIST with little added compute. That combination is presented as new within the FF literature, and the authors keep the original memory-efficient and biologically plausible character intact. Those are the concrete moves that stand out on a first read. The work is aimed at people already following local-learning alternatives to backpropagation, and the engineering choices look like reasonable attempts to stabilize goodness estimation and training dynamics. If the gains turn out to be robust, the specific mix could be a useful reference point for that subfield. The soft spot is the missing evidence on what is actually responsible for the numbers. The abstract gives the full package but does not describe ablations that isolate the multi-scale aggregation or the mining step from the new learning-rate schedule alone. Without those controls, or at least a clear baseline that receives only the cosine annealing, it is difficult to credit the new components rather than generic hyperparameter improvement. The experimental protocol, exact baselines, and any statistical checks are also not laid out in the provided text, which leaves reproducibility questions open for now. This is an incremental engineering paper rather than a foundational shift. Readers deep in Forward-Forward or local learning might want to see the full experiments and ablations, but it is unlikely to change practice outside that niche. I would send it for peer review so the authors can supply the missing controls and the referees can check whether the gains generalize or hold under tighter scrutiny.

Referee Report

2 major / 2 minor

Summary. The paper proposes Adaptive Multi-Scale Goodness Aggregation (AMSGA) as an extension to the Forward-Forward (FF) algorithm. It introduces multi-scale goodness aggregation across local, intermediate, and global representations; adaptive curriculum-guided hard negative mining; layer-dependent adaptive thresholds; and a warm-up cosine annealing learning-rate schedule. The central empirical claim is that these changes together improve stability, robustness, and generalization while preserving FF's biologically plausible and memory-efficient properties, yielding up to +1.45% accuracy on MNIST and +1.50% on Fashion-MNIST over baseline FF with negligible computational overhead.

Significance. If the reported gains are robustly attributable to the proposed AMSGA components rather than hyperparameter retuning, the work would meaningfully advance local-learning methods by addressing stability and generalization limitations in the original FF framework. The retention of memory efficiency and biological plausibility would be a notable strength for applications where backpropagation is undesirable.

major comments (2)

[§4] §4 (Experiments): The manuscript reports +1.45% and +1.50% improvements but supplies no ablation studies that isolate the contributions of multi-scale goodness aggregation, curriculum-guided hard-negative mining, and layer-dependent thresholds from the warm-up cosine annealing schedule alone. This is load-bearing for the central claim, because the skeptic concern that the schedule may drive the gains cannot be ruled out without such controls.
[§4] §4 and abstract: No experimental protocol details (baseline FF implementation, number of runs, variance, or statistical tests) are provided to support the numerical improvements. Without these, the data-to-claim link remains unverifiable and the generalization claims rest on unstated hyperparameter choices.

minor comments (2)

[§3] The multi-scale aggregation procedure would benefit from an explicit equation early in §3 defining how local, intermediate, and global goodness values are combined.
[Figures] Figure captions should explicitly state the number of independent trials and error bars used for the reported accuracy curves.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of empirical validation that will strengthen the presentation of AMSGA. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [§4] §4 (Experiments): The manuscript reports +1.45% and +1.50% improvements but supplies no ablation studies that isolate the contributions of multi-scale goodness aggregation, curriculum-guided hard-negative mining, and layer-dependent thresholds from the warm-up cosine annealing schedule alone. This is load-bearing for the central claim, because the skeptic concern that the schedule may drive the gains cannot be ruled out without such controls.

Authors: We agree that the absence of ablation studies leaves open the possibility that gains could be driven primarily by the warm-up cosine annealing schedule. The current manuscript does not contain such controls. In the revised version we will add a dedicated ablation subsection in §4 that systematically disables each AMSGA component (multi-scale aggregation, curriculum-guided hard negative mining, layer-dependent thresholds) while retaining the learning-rate schedule, and vice versa. These experiments will be run under identical conditions to the main results and will be accompanied by tables reporting accuracy deltas for each variant. revision: yes
Referee: [§4] §4 and abstract: No experimental protocol details (baseline FF implementation, number of runs, variance, or statistical tests) are provided to support the numerical improvements. Without these, the data-to-claim link remains unverifiable and the generalization claims rest on unstated hyperparameter choices.

Authors: We acknowledge that the manuscript currently provides insufficient detail on the experimental protocol. We will expand §4 (and add an appendix if space is limited) to specify: the exact baseline FF implementation (including layer sizes, goodness function, and negative-sample generation matching the original FF paper), the number of independent runs (five), mean accuracy with standard deviation, and the statistical test used to assess significance of the reported improvements. All hyperparameter choices, including those for the warm-up cosine schedule, will be listed explicitly. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method proposal with no derivation chain

full rationale

The paper introduces AMSGA as an algorithmic extension to Forward-Forward learning, combining multi-scale aggregation, adaptive mining, layer-dependent thresholds, and a cosine annealing schedule, then reports empirical accuracy gains on MNIST and Fashion-MNIST. No first-principles derivation, uniqueness theorem, or predictive equation is claimed; performance numbers are presented strictly as experimental outcomes rather than quantities forced by the paper's own equations or by self-citation reduction. The central claims therefore remain independent of any definitional loop or fitted-input renaming.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Abstract-only access limits visibility; the ledger reflects components explicitly named in the abstract. Actual numerical values for thresholds and schedule coefficients are not reported.

free parameters (2)

layer-dependent adaptive thresholds
Per-layer thresholds for goodness decisions introduced as part of the method.
warm-up cosine annealing schedule coefficients
Parameters controlling the learning-rate warm-up and cosine decay.

axioms (1)

domain assumption The original Forward-Forward algorithm supplies a valid local learning baseline.
The paper positions AMSGA as an extension that strengthens this baseline.

pith-pipeline@v0.9.0 · 5672 in / 1177 out tokens · 60952 ms · 2026-05-20T22:25:50.768865+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We compute goodness at three scales and combine them with weights that shift across depth... g(h,l)=w_local(l)·g_local + 0.35·g_inter + w_global(l)·g_global
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

θ(l,p)=θ₀×(1+0.15 l/L)×(1+0.3p)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 3 internal anchors

[1]

& Others Human-level control through deep reinforcement learning.Nature.518, 529-533 (2015)

Mnih, V ., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., Graves, A., Riedmiller, M., Fidjeland, A., Ostrovski, G. & Others Human-level control through deep reinforcement learning.Nature.518, 529-533 (2015)

work page 2015
[2]

& Cambria, E

Young, T., Hazarika, D., Poria, S. & Cambria, E. Recent trends in deep learning based natural language processing.IEEE Computational Intelligence Magazine.13, 55-75 (2018)

work page 2018
[3]

& Sadeghmalakabadi, S

Karkehabadi, A. & Sadeghmalakabadi, S. Evaluating Deep Learning Models for Architectural Image Classification: A Case Study on the UC Davis Campus.2024 IEEE 8th International Conference On Information And Communication Technology (CICT). pp. 1-6 (2024)

work page 2024
[4]

& Hinton, G

LeCun, Y ., Bengio, Y . & Hinton, G. Deep learning.Nature.521, 436-444 (2015)

work page 2015
[5]

& Williams, R

Rumelhart, D., Hinton, G. & Williams, R. Learning representations by back-propagating errors.Nature.323, 533-536 (1986)

work page 1986
[6]

& Padoy, N

Hassanpour, J., Srivastav, V ., Mutter, D. & Padoy, N. Overcoming Di- mensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation. (2024), https://arxiv.org/abs/2402.14611

work page arXiv 2024
[7]

& Sotoudeh, R

Salahi Chashmi, F. & Sotoudeh, R. Enhancing Polyp Segmentation via Encoder Attention and Dynamic Kernel Update.ArXiv E-prints. pp. arXiv-2509 (2025)

work page 2025
[8]

The forward-forward algorithm: Some preliminary investigations

Hinton, G. The forward-forward algorithm: Some preliminary investi- gations.ArXiv Preprint ArXiv:2212.13345. (2022)

work page arXiv 2022
[9]

& Sasan, A

Karkehabadi, A., Homayoun, H. & Sasan, A. FFCL: forward-forward net with cortical loops, training and inference on edge without Backpro- pogation.Proceedings Of The Great Lakes Symposium On VLSI 2024. pp. 626-632 (2024)

work page 2024
[10]

& Hinton, G

Lillicrap, T., Santoro, A., Marris, L., Akerman, C. & Hinton, G. Backpropagation and the brain.Nature Reviews Neuroscience.21, 335- 346 (2020)

work page 2020
[11]

& Richards, B

Guerguiev, J., Lillicrap, T. & Richards, B. Towards deep learning with segregated dendrites.eLife.6, e22901 (2017)

work page 2017
[12]

& Kording, K

Marblestone, A., Wayne, G. & Kording, K. Toward an integration of deep learning and neuroscience.Frontiers in Computational Neuro- science.10, 94 (2016)

work page 2016
[13]

The free-energy principle: a unified brain theory?Nature Reviews Neuroscience.11, 127-138 (2010)

Friston, K. The free-energy principle: a unified brain theory?Nature Reviews Neuroscience.11, 127-138 (2010)

work page 2010
[14]

& Sasan, A

Karkehabadi, A., Homayoun, H. & Sasan, A. SMOOT: Saliency guided mask optimized online training.2024 IEEE 17th Dallas Circuits And Systems Conference (DCAS). pp. 1-6 (2024)

work page 2024
[15]

Novel Saliency Analysis for the Forward-Forward Algo- rithm.2024 2nd International Conference On Artificial Intelligence, Blockchain, And Internet Of Things (AIBThings)

Bakhshi, M. Novel Saliency Analysis for the Forward-Forward Algo- rithm.2024 2nd International Conference On Artificial Intelligence, Blockchain, And Internet Of Things (AIBThings). pp. 1-5 (2024)

work page 2024
[16]

& Sasan, A

Karkehabadi, A., Latibari, B., Homayoun, H. & Sasan, A. HLGM: A novel methodology for improving model accuracy using saliency-guided high and low gradient masking.2024 14th International Conference On Information Science And Technology (ICIST). pp. 909-917 (2024)

work page 2024
[17]

& Mirikhoozani, S

Rezabeyk, E., Beigzad, S., Hamzavi, Y ., Bagheritabar, M. & Mirikhoozani, S. Saliency Assisted Quantization for Neural Networks. ArXiv Preprint ArXiv:2411.05858. (2024)

work page arXiv 2024
[18]

& Akerman, C

Lillicrap, T., Cownden, D., Tweed, D. & Akerman, C. Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications.7, 13276 (2016)

work page 2016
[19]

NetworkFF: Unified Layer Optimization in Forward-Only Neural Networks.ArXiv Preprint ArXiv:2512.17531

Beigzad, S. NetworkFF: Unified Layer Optimization in Forward-Only Neural Networks.ArXiv Preprint ArXiv:2512.17531. (2025)

work page arXiv 2025
[20]

& Oberhammer, J

Goldar, M., Hassanpour, J. & Oberhammer, J. Concept analysis of a frequency-sweeping delta/sigma beam-switching radar using machine learning.2021 18th European Radar Conference (EuRAD). pp. 145-148 (2022)

work page 2021
[21]

Direct feedback alignment provides learning in deep neural networks.Advances in Neural Information Processing Systems.29 (2016)

Nøkland, A. Direct feedback alignment provides learning in deep neural networks.Advances in Neural Information Processing Systems.29 (2016)

work page 2016
[22]

& LeCun, Y

Hadsell, R., Chopra, S. & LeCun, Y . Dimensionality reduction by learning an invariant mapping.2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06).2, 1735-1742 (2006)

work page 2006
[23]

& Kartsakli, E

Shafaie, D., Hassanpour, J., Karkehabadi, A., Qui ˜nones, E. & Kartsakli, E. Competitive Task Offloading in Hierarchical Edge-Cloud Compute Continuum.2025 IEEE Conference On Network Function Virtualization And Software-Defined Networking (NFV-SDN). pp. 1-6 (2025)

work page 2025
[24]

Scalable, High-Quality Object Detection

Szegedy, C., Reed, S., Erhan, D., Anguelov, D. & Ioffe, S. Scalable, high-quality object detection.ArXiv Preprint ArXiv:1412.1441. (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[25]

& Hyv ¨arinen, A

Gutmann, M. & Hyv ¨arinen, A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models.Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. pp. 297-304 (2010)

work page 2010
[26]

& Flower, B

Jabri, M. & Flower, B. Weight perturbation: An optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks.IEEE Transactions on Neural Networks.3, 154-157 (1992)

work page 1992
[27]

& Sasan, A

Karkehabadi, A. & Sasan, A. Energy-Efficient Quantization-Aware Training with Dynamic Bit-Width Optimization.Proceedings Of The Great Lakes Symposium On VLSI 2025. pp. 854-859 (2025)

work page 2025
[28]

& Others Learning distributed representations of concepts

Hinton, G. & Others Learning distributed representations of concepts. Proceedings of the Eighth Annual Conference of the Cognitive Science Society.1, 12 (1986)

work page 1986
[29]

& Abadi, Z

Maleki, A., Lavaei, M., Bagheritabar, M., Beigzad, S. & Abadi, Z. Quantized and interpretable learning scheme for deep neural networks in classification task.2024 IEEE 8th International Conference On Information And Communication Technology (CICT). pp. 1-6 (2024)

work page 2024
[30]

The MNIST database of handwritten digits

LeCun, Y . The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/. (1998)

work page 1998
[31]

Kingma, D. & Ba, J. Adam: A method for stochastic optimization.ArXiv Preprint ArXiv:1412.6980. (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[32]

& Sasan, A

Karkehabadi, A., Homayoun, H. & Sasan, A. Unified Gravity Loss for Robust Neural Networks Through Feature Space Optimization. Proceedings Of The Great Lakes Symposium On VLSI 2025. pp. 947-953 (2025)

work page 2025
[33]

& Maleki, A

Lavaei, M., Abadi, Z., Beigzad, S. & Maleki, A. Resource-efficient medical image classification for edge devices.2025 International Con- ference On Applications Of Machine Intelligence And Data Analytics (ICAMIDA). pp. 1-6 (2025)

work page 2025
[34]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Xiao, H., Rasul, K. & V ollgraf, R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms.ArXiv Preprint ArXiv:1708.07747. (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

& Others Human-level control through deep reinforcement learning.Nature.518, 529-533 (2015)

Mnih, V ., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., Graves, A., Riedmiller, M., Fidjeland, A., Ostrovski, G. & Others Human-level control through deep reinforcement learning.Nature.518, 529-533 (2015)

work page 2015

[2] [2]

& Cambria, E

Young, T., Hazarika, D., Poria, S. & Cambria, E. Recent trends in deep learning based natural language processing.IEEE Computational Intelligence Magazine.13, 55-75 (2018)

work page 2018

[3] [3]

& Sadeghmalakabadi, S

Karkehabadi, A. & Sadeghmalakabadi, S. Evaluating Deep Learning Models for Architectural Image Classification: A Case Study on the UC Davis Campus.2024 IEEE 8th International Conference On Information And Communication Technology (CICT). pp. 1-6 (2024)

work page 2024

[4] [4]

& Hinton, G

LeCun, Y ., Bengio, Y . & Hinton, G. Deep learning.Nature.521, 436-444 (2015)

work page 2015

[5] [5]

& Williams, R

Rumelhart, D., Hinton, G. & Williams, R. Learning representations by back-propagating errors.Nature.323, 533-536 (1986)

work page 1986

[6] [6]

& Padoy, N

Hassanpour, J., Srivastav, V ., Mutter, D. & Padoy, N. Overcoming Di- mensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation. (2024), https://arxiv.org/abs/2402.14611

work page arXiv 2024

[7] [7]

& Sotoudeh, R

Salahi Chashmi, F. & Sotoudeh, R. Enhancing Polyp Segmentation via Encoder Attention and Dynamic Kernel Update.ArXiv E-prints. pp. arXiv-2509 (2025)

work page 2025

[8] [8]

The forward-forward algorithm: Some preliminary investigations

Hinton, G. The forward-forward algorithm: Some preliminary investi- gations.ArXiv Preprint ArXiv:2212.13345. (2022)

work page arXiv 2022

[9] [9]

& Sasan, A

Karkehabadi, A., Homayoun, H. & Sasan, A. FFCL: forward-forward net with cortical loops, training and inference on edge without Backpro- pogation.Proceedings Of The Great Lakes Symposium On VLSI 2024. pp. 626-632 (2024)

work page 2024

[10] [10]

& Hinton, G

Lillicrap, T., Santoro, A., Marris, L., Akerman, C. & Hinton, G. Backpropagation and the brain.Nature Reviews Neuroscience.21, 335- 346 (2020)

work page 2020

[11] [11]

& Richards, B

Guerguiev, J., Lillicrap, T. & Richards, B. Towards deep learning with segregated dendrites.eLife.6, e22901 (2017)

work page 2017

[12] [12]

& Kording, K

Marblestone, A., Wayne, G. & Kording, K. Toward an integration of deep learning and neuroscience.Frontiers in Computational Neuro- science.10, 94 (2016)

work page 2016

[13] [13]

The free-energy principle: a unified brain theory?Nature Reviews Neuroscience.11, 127-138 (2010)

Friston, K. The free-energy principle: a unified brain theory?Nature Reviews Neuroscience.11, 127-138 (2010)

work page 2010

[14] [14]

& Sasan, A

Karkehabadi, A., Homayoun, H. & Sasan, A. SMOOT: Saliency guided mask optimized online training.2024 IEEE 17th Dallas Circuits And Systems Conference (DCAS). pp. 1-6 (2024)

work page 2024

[15] [15]

Novel Saliency Analysis for the Forward-Forward Algo- rithm.2024 2nd International Conference On Artificial Intelligence, Blockchain, And Internet Of Things (AIBThings)

Bakhshi, M. Novel Saliency Analysis for the Forward-Forward Algo- rithm.2024 2nd International Conference On Artificial Intelligence, Blockchain, And Internet Of Things (AIBThings). pp. 1-5 (2024)

work page 2024

[16] [16]

& Sasan, A

Karkehabadi, A., Latibari, B., Homayoun, H. & Sasan, A. HLGM: A novel methodology for improving model accuracy using saliency-guided high and low gradient masking.2024 14th International Conference On Information Science And Technology (ICIST). pp. 909-917 (2024)

work page 2024

[17] [17]

& Mirikhoozani, S

Rezabeyk, E., Beigzad, S., Hamzavi, Y ., Bagheritabar, M. & Mirikhoozani, S. Saliency Assisted Quantization for Neural Networks. ArXiv Preprint ArXiv:2411.05858. (2024)

work page arXiv 2024

[18] [18]

& Akerman, C

Lillicrap, T., Cownden, D., Tweed, D. & Akerman, C. Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications.7, 13276 (2016)

work page 2016

[19] [19]

NetworkFF: Unified Layer Optimization in Forward-Only Neural Networks.ArXiv Preprint ArXiv:2512.17531

Beigzad, S. NetworkFF: Unified Layer Optimization in Forward-Only Neural Networks.ArXiv Preprint ArXiv:2512.17531. (2025)

work page arXiv 2025

[20] [20]

& Oberhammer, J

Goldar, M., Hassanpour, J. & Oberhammer, J. Concept analysis of a frequency-sweeping delta/sigma beam-switching radar using machine learning.2021 18th European Radar Conference (EuRAD). pp. 145-148 (2022)

work page 2021

[21] [21]

Direct feedback alignment provides learning in deep neural networks.Advances in Neural Information Processing Systems.29 (2016)

Nøkland, A. Direct feedback alignment provides learning in deep neural networks.Advances in Neural Information Processing Systems.29 (2016)

work page 2016

[22] [22]

& LeCun, Y

Hadsell, R., Chopra, S. & LeCun, Y . Dimensionality reduction by learning an invariant mapping.2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06).2, 1735-1742 (2006)

work page 2006

[23] [23]

& Kartsakli, E

Shafaie, D., Hassanpour, J., Karkehabadi, A., Qui ˜nones, E. & Kartsakli, E. Competitive Task Offloading in Hierarchical Edge-Cloud Compute Continuum.2025 IEEE Conference On Network Function Virtualization And Software-Defined Networking (NFV-SDN). pp. 1-6 (2025)

work page 2025

[24] [24]

Scalable, High-Quality Object Detection

Szegedy, C., Reed, S., Erhan, D., Anguelov, D. & Ioffe, S. Scalable, high-quality object detection.ArXiv Preprint ArXiv:1412.1441. (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[25] [25]

& Hyv ¨arinen, A

Gutmann, M. & Hyv ¨arinen, A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models.Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. pp. 297-304 (2010)

work page 2010

[26] [26]

& Flower, B

Jabri, M. & Flower, B. Weight perturbation: An optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks.IEEE Transactions on Neural Networks.3, 154-157 (1992)

work page 1992

[27] [27]

& Sasan, A

Karkehabadi, A. & Sasan, A. Energy-Efficient Quantization-Aware Training with Dynamic Bit-Width Optimization.Proceedings Of The Great Lakes Symposium On VLSI 2025. pp. 854-859 (2025)

work page 2025

[28] [28]

& Others Learning distributed representations of concepts

Hinton, G. & Others Learning distributed representations of concepts. Proceedings of the Eighth Annual Conference of the Cognitive Science Society.1, 12 (1986)

work page 1986

[29] [29]

& Abadi, Z

Maleki, A., Lavaei, M., Bagheritabar, M., Beigzad, S. & Abadi, Z. Quantized and interpretable learning scheme for deep neural networks in classification task.2024 IEEE 8th International Conference On Information And Communication Technology (CICT). pp. 1-6 (2024)

work page 2024

[30] [30]

The MNIST database of handwritten digits

LeCun, Y . The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/. (1998)

work page 1998

[31] [31]

Kingma, D. & Ba, J. Adam: A method for stochastic optimization.ArXiv Preprint ArXiv:1412.6980. (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[32] [32]

& Sasan, A

Karkehabadi, A., Homayoun, H. & Sasan, A. Unified Gravity Loss for Robust Neural Networks Through Feature Space Optimization. Proceedings Of The Great Lakes Symposium On VLSI 2025. pp. 947-953 (2025)

work page 2025

[33] [33]

& Maleki, A

Lavaei, M., Abadi, Z., Beigzad, S. & Maleki, A. Resource-efficient medical image classification for edge devices.2025 International Con- ference On Applications Of Machine Intelligence And Data Analytics (ICAMIDA). pp. 1-6 (2025)

work page 2025

[34] [34]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Xiao, H., Rasul, K. & V ollgraf, R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms.ArXiv Preprint ArXiv:1708.07747. (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017