pith. machine review for the scientific record. sign in

arxiv: 2605.01074 · v1 · submitted 2026-05-01 · 💻 cs.NE · cs.LG

Recognition: unknown

Benchmarking local Hebbian learning rules for memory storage and prototype extraction

Anders Lansner, Andreas Knoblauch, Naresh B Ravichandran, Pawel Herman

Pith reviewed 2026-05-09 14:24 UTC · model grok-4.3

classification 💻 cs.NE cs.LG
keywords Hebbian learning rulesassociative memoryprototype extractionBayesian learningrecurrent neural networkswinner-take-all dynamicsmemory capacitybinary patterns
0
0 comments X

The pith

Bayesian-Hebbian learning rules achieve the highest capacity for memory storage and prototype extraction across tested conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks seven Hebbian learning rules inside non-modular and modular recurrent networks that use winner-take-all dynamics and operate on moderately sparse binary patterns. It evaluates pattern storage capacity, weight information content, the ability to recover clean prototypes from distorted training examples, and robustness to input correlations. A reader would care because associative memory and prototype extraction are core functions in both artificial systems and brain models, and local rules that perform well without global supervision could simplify large-scale implementations.

Core claim

Among the seven rules, the Bayesian-Hebbian variants produce the highest storage and retrieval capacity in almost all conditions examined, while the original additive Hebb rule yields the lowest capacity and covariance learning delivers moderate yet stable performance.

What carries the argument

Bayesian-Hebbian learning rules that update weights according to probabilistic estimates of pattern co-occurrence, which directly support higher information capacity and cleaner prototype recovery.

If this is right

  • Bayesian-Hebbian rules enable larger numbers of stored patterns before retrieval degrades in both modular and non-modular architectures.
  • Prototype extraction accuracy improves when inputs are noisy or incomplete versions of the original prototypes.
  • Performance advantages persist under moderate levels of correlation among the stored patterns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These local rules could be substituted into larger hybrid systems that currently rely on back-propagation for memory components.
  • The capacity edge suggests that probabilistic interpretations of synaptic plasticity may be worth exploring in other brain-inspired tasks such as perceptual grouping.
  • Direct tests on continuous-valued or non-stationary data streams would clarify how far the observed advantages extend.

Load-bearing premise

The specific choice of winner-take-all dynamics, recurrent connectivity, and moderately sparse binary patterns is representative enough to generalize to associative memory and prototype extraction in broader settings.

What would settle it

Re-running the same capacity and prototype-extraction measures on denser patterns or in networks that lack winner-take-all competition and finding that other Hebbian rules match or exceed the Bayesian ones.

Figures

Figures reproduced from arXiv: 2605.01074 by Anders Lansner, Andreas Knoblauch, Naresh B Ravichandran, Pawel Herman.

Figure 2
Figure 2. Figure 2: Recall fraction and weight information of a modular network. view at source ↗
Figure 4
Figure 4. Figure 4: Pattern capacity of non-modular and modular networks. Pattern capacity depending on input noise and learning rule, measured as the maximum number of patterns that allows for an error-free recall fraction of 90% (P90). The upper row corresponds to non-modular networks and the lower to modular networks. Each data point is the mean and standard deviation of 5 runs. The legend in the upper left panel holds for… view at source ↗
Figure 5
Figure 5. Figure 5: Weight information capacity of non-modular and modular networks. Mean values of weight capacity (C) for networks of size N = 2304 with K/H = 48, depending on learning rule and level of input noise at recall. Non-modular networks in upper and modular ones in lower row. The low values of PRCOV is mainly due to its asymmetric weight matrix. Each data point is the mean and standard deviation of 5 runs view at source ↗
Figure 7
Figure 7. Figure 7: Fraction of correct recall of prototypes. view at source ↗
read the original abstract

Associative memory or content-addressable memory is an important component function in computer science and information processing, and at the same time a key concept in cognitive and computational brain science. Many different neural network architectures and learning rules have been proposed to model the brain's associative memory while investigating key component functions like figure-ground segmentation, perceptual reconstruction and rivalry. A less investigated but equally important capability of associative memory is prototype extraction where the training set comprises distorted prototype instances and the task is to recall the correct generating prototype given a new distorted instance. In this paper we benchmark associative memory function of seven different Hebbian learning rules employed in non-modular and modular recurrent networks with winner-take-all dynamics operating on moderately sparse binary patterns. We measure pattern storage and weight information capacity, prototype extraction capabilities, and sensitivity to correlations in data. The original additive Hebb rule comes out with worst capacity, covariance learning proves to be robust but with moderate capacity, and the Bayesian-Hebbian learning rules show highest capacity in almost all different conditions tested.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript benchmarks seven local Hebbian learning rules (additive Hebb, covariance, and Bayesian-Hebbian variants) in non-modular and modular recurrent networks using winner-take-all dynamics on moderately sparse binary patterns. It evaluates pattern storage capacity, weight information capacity, prototype extraction from distorted instances, and sensitivity to correlations in the data. The central claim is that the additive Hebb rule shows the lowest capacity, covariance learning is robust but moderate, and Bayesian-Hebbian rules achieve the highest capacity in almost all tested conditions.

Significance. If the results hold under rigorous verification, the work provides a useful empirical comparison of biologically plausible local learning rules for associative memory and the less-studied prototype extraction task. The controlled design across modular/non-modular architectures and correlation tests adds concrete data points to the literature on Hebbian models of brain-like memory. The scoping to the specific tested conditions avoids overgeneralization.

major comments (2)
  1. [Methods] Methods section: The description of network size, exact sparsity level, WTA implementation details, pattern generation procedure, and number of independent runs is insufficient. These parameters are load-bearing for reproducing and validating the reported capacity rankings and prototype extraction performance.
  2. [Results] Results section (capacity and prototype extraction figures/tables): No error bars, standard deviations, or statistical significance tests are reported for the performance differences across conditions. This directly weakens the claim that Bayesian-Hebbian rules show highest capacity 'in almost all different conditions tested,' as apparent differences could be within noise.
minor comments (3)
  1. [Abstract] Abstract: 'Weight information capacity' is mentioned but not defined or distinguished from pattern storage capacity; add a brief clarification.
  2. [Figures] Figure legends: Include more detail on exact experimental conditions, number of trials, and what each curve represents to improve clarity.
  3. [Discussion] Discussion: The limitations of the chosen architecture (WTA, binary patterns) for broader generalization could be stated more explicitly, even if the claims are scoped.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help improve the reproducibility and statistical robustness of our work. We have revised the manuscript accordingly to address both major points.

read point-by-point responses
  1. Referee: [Methods] Methods section: The description of network size, exact sparsity level, WTA implementation details, pattern generation procedure, and number of independent runs is insufficient. These parameters are load-bearing for reproducing and validating the reported capacity rankings and prototype extraction performance.

    Authors: We agree that additional detail is needed for full reproducibility. The revised Methods section now explicitly reports: network sizes of N=1000 (non-modular) and N=100 per module (modular); sparsity p=0.1; WTA implemented via iterative soft-max with inhibition parameter 0.9 and convergence after at most 20 steps; patterns generated as independent Bernoulli trials with exact density p (no additional correlations unless tested); and all metrics averaged over 20 independent runs with distinct random seeds for patterns, weights, and initial states. These parameters match the simulation code that will be released upon publication. revision: yes

  2. Referee: [Results] Results section (capacity and prototype extraction figures/tables): No error bars, standard deviations, or statistical significance tests are reported for the performance differences across conditions. This directly weakens the claim that Bayesian-Hebbian rules show highest capacity 'in almost all different conditions tested,' as apparent differences could be within noise.

    Authors: We accept that the lack of variability measures and significance testing weakens the strength of the claims. In the revised manuscript we have added standard-error bars (across the 20 runs) to every figure and table. We also performed paired Wilcoxon signed-rank tests between each pair of rules and report p-values; the superiority of the Bayesian-Hebbian family remains statistically significant (p<0.01) in 11 of the 12 tested conditions, with the single non-significant case noted explicitly in the text. The revised claim now reads 'highest capacity in almost all conditions, with statistical support'. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a purely empirical benchmarking paper that evaluates seven Hebbian learning rules by running simulations on recurrent networks with WTA dynamics and measuring storage capacity, prototype extraction, and correlation sensitivity directly from the resulting performance metrics. No derivations, parameter fits presented as predictions, or load-bearing self-citations appear in the central claims; the reported superiority of Bayesian-Hebbian rules is an observed outcome of the experiments rather than a reduction to prior inputs by construction. The work is self-contained against external benchmarks because all quantities are obtained from fresh simulations under explicitly stated conditions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical simulation study with no mathematical derivations, new postulates, or invented entities; performance is measured from network dynamics without additional assumptions beyond standard neural network modeling.

pith-pipeline@v0.9.0 · 5486 in / 949 out tokens · 16696 ms · 2026-05-09T14:24:33.407441+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

  1. [1]

    4-Bit ADDER problem

    https://doi.org/10.1162/neco.a.28 Brunel, N. (2000). Persistent activity and the single-cell frequency-current curve in a cortical network model. In Network: Comput. Neural Syst (Vol. 11). Carandini, M., Heeger, D. J., & Movshon, J. A. (1997). Linearity and Normalization in Simple Cells of the Macaque Primary Visual Cortex. The Journal of Neuroscience, 17...

  2. [2]

    dendritic potential

    https://doi.org/10.1016/j.neunet.2012.08.013 Powell, N. J., Hein, B., Kong, D., Elpelt, J., Mulholland, H. N., Kaschube, M., & Smith, G. B. (2024). Common modular architecture across diverse cortical areas in early development. Proceedings of the National Academy of Sciences of the United States of America, 121(11). https://doi.org/10.1073/pnas.2313743121...