pith. sign in

arxiv: 2605.19965 · v1 · pith:SLJ767SZnew · submitted 2026-05-19 · 💻 cs.LG · eess.SP

Normative Networks for Source Separation via Local Plasticity and Dendritic Computation

Pith reviewed 2026-05-20 07:40 UTC · model grok-4.3

classification 💻 cs.LG eess.SP
keywords blind source separationlocal plasticityentropy maximizationdendritic computationpredictive codingHebbian learningneural networks
0
0 comments X

The pith

Predictive Entropy Maximization derives a neural network for blind source separation that relies solely on local plasticity and dendritic error signals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Predictive Entropy Maximization to recover latent sources from sensory mixtures using only online, local weight updates. It approximates an entropy objective so that the resulting loss decomposes into interpretable terms that map directly onto feedforward error-driven synapses, Hebbian lateral inhibition, and simple output nonlinearities. A sympathetic reader cares because the approach relaxes the strong independence assumptions common in earlier biologically plausible BSS methods while still remaining competitive with exact global algorithms. The derivation supplies explicit spectral bounds that quantify when the approximation holds. Empirical tests confirm robustness to both source correlation and observation noise.

Core claim

Minimizing the surrogate objective produces a predictive architecture in which feedforward connections follow an error-driven rule realizable by dendritic mechanisms, lateral connections are updated by local Hebbian plasticity, and domain constraints appear as output nonlinearities; spectral bounds on the approximation error characterize the regimes where this local procedure remains faithful to the original entropy measure.

What carries the argument

The second-order surrogate for the entropy objective, whose minimization yields local Hebbian and error-driven plasticity rules together with output nonlinearities.

If this is right

  • Separation performance remains stable as source correlation and additive noise increase.
  • The network outperforms earlier local algorithms that impose stronger decorrelation or independence assumptions.
  • Domain constraints are enforced simply by choosing appropriate output nonlinearities.
  • Spectral error bounds predict the range of source statistics for which the approximation stays valid.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same local-objective construction could be applied to other unsupervised tasks that require separation of structured latent variables.
  • Biological circuits that already exhibit dendritic error computation and Hebbian lateral inhibition might be implementing an entropy-maximization objective without global communication.
  • Replacing the fixed output nonlinearities with learned ones would test whether the framework can discover domain constraints rather than receive them.

Load-bearing premise

The second-order approximation to the entropy measure stays sufficiently accurate across the source distributions the network is meant to separate.

What would settle it

An experiment in which the surrogate objective produces a measurably different separation solution than exact entropy maximization on the same data set would falsify the claim that the local rules achieve the normative goal.

Figures

Figures reproduced from arXiv: 2605.19965 by Bariscan Bozkurt, Efe Ali Gorguner, Francesco Innocenti, Rafal Bogacz.

Figure 1
Figure 1. Figure 1: Representative Predictive Entropy Maximization architectures for two source do￾mains. (a) Antisparse architecture. Mixture inputs are mapped through feedforward weights W to local prediction compartments, each paired with an output unit yk. Prediction errors ek are computed as the differences between somatic and dendritic activity. The output layer is coupled through adap￾tive recurrent inhibitory interact… view at source ↗
Figure 2
Figure 2. Figure 2: Performance comparisons across source domains. (a) Mean component SNR versus correlation ρ for nonnegative antisparse sources. Predictive Entropy Maximization (PEM) and its unnormalized variant (u-PEM) remain robust and outperform online baselines relying on stronger independence or decorrelation assumptions. (b) Mean component SNR versus input SNR for sparse sources. The PEM models stay close to the batch… view at source ↗
Figure 3
Figure 3. Figure 3: Receptive fields learned by the sparse Predictive Entropy Maximization model from natural image patches exhibit localized and oriented structure charac￾teristic of sparse sensory representations. Learning sparse receptive fields. We evaluate the sparse architecture illustrated in Figure 1b on pre￾whitened 12 × 12 natural image patches [4].2 The net￾work is trained on vectorized patches in R 144. The learne… view at source ↗
Figure 4
Figure 4. Figure 4: Taylor-surrogate diagnostics. (a) Exact Taylor remainder versus the theoretical bound from (Eq. 8), over recorded runs and iterations, color-coded by source correlation ρ. The solid line denotes y = x. (b) Mean Taylor remainder and corresponding bound (Eq. 8) as functions of ρ. 5 Conclusion Summary. In this work, we introduced Predictive Entropy Maximization, an online determinant￾based entropy maximizatio… view at source ↗
Figure 5
Figure 5. Figure 5: Representative auditory source-separation result. Temporal alignment between the ground-truth and recovered sources for one trial in the cocktail-party experiment described in Sec￾tion 4. E.2 Transform invariance in auditory source separation In Section 4, we applied Predictive Entropy Maximization to audio mixtures in a sparse wavelet domain. The justification is straightforward. In the noise-free setting… view at source ↗
Figure 6
Figure 6. Figure 6: Additional performance comparisons. (a) Mean component SNR (mSNR) as a function of the source correlation level ρ for antisparse sources. (b) Mean component SNR as a function of the input SNR for nonnegative sparse sources. (c) Mean component SNR as a function of the input SNR for simplex sources. These complementary experiments confirm the same pattern observed in the main text: Predictive Entropy Maximiz… view at source ↗
Figure 7
Figure 7. Figure 7: Transient Taylor-surrogate diagnostics. Exact Taylor remainder and corresponding spectral upper bound over training iterations for two representative source-correlation levels. supplied, the feedforward matrix was initialized as W(0) = I + ξW , (E.3) where I denotes the rectangular identity and ξW has i.i.d. Gaussian entries with standard deviation 0.01. Likewise, if not specified explicitly, the running m… view at source ↗
Figure 8
Figure 8. Figure 8: Ablation with respect to the distribution of the mixing-matrix entries. Box plots summarize the distribution of mSNR over 30 seeds in the uncorrelated setting (ρ = 0) at input SNR 30 dB. All candidate mixing laws are centered and scaled to unit variance: N (0, 1), U(− √ 3, √ 3), Laplace(0, 1/ √ 2), Rad(±1), and p 3/5 t5. The performance of Predictive Entropy Maximization remains broadly stable across these… view at source ↗
Figure 9
Figure 9. Figure 9: Ablation with respect to the number of mixtures. Mean mSNR is plotted against the number of mixtures m ∈ {7, . . . , 13} while keeping the number of sources fixed at n = 5. The shaded envelopes show one standard error over 30 seeds. For each seed, a single Gaussian 13 × 5 mixing matrix is sampled and the experiment with m mixtures uses its first m rows, so the comparison isolates the effect of adding obser… view at source ↗
Figure 10
Figure 10. Figure 10: Evolution of the variance term in PEM during training. The evolution of the mean variance (averaged across the sources) is plotted across training samples for the PEM model. Solid lines and shaded bands indicate the mean and the 95% confidence interval over 30 seeds, respec￾tively. For each seed, a single Gaussian 10 × 5 mixing matrix is sampled. These experiments demonstrate that the variance terms stabi… view at source ↗
read the original abstract

Blind source separation (BSS) is a natural framework for studying how latent causes may be recovered from sensory mixtures, but deriving online and biologically plausible algorithms for structured (i.e., constrained to known domains) and potentially correlated sources remains challenging. Recent work has derived neural networks for BSS from maximization of an entropy measure, yet its online implementations involve complex and nonlocal recurrent dynamics. Motivated by this perspective, we propose Predictive Entropy Maximization, which achieves competitive performance in BSS, using only local weight updates. The method employs a close approximation of an entropy measure, yielding an objective function with easily interpretable components. Minimizing this objective leads to a predictive neural architecture in which feedforward synapses follow an error-driven rule (that can be realized through dendritic mechanisms), lateral inhibitory connections are learned with local Hebbian plasticity, and source-domain constraints are enforced through simple output nonlinearities. We derive explicit spectral bounds on the surrogate error, characterizing when the approximation is accurate. Empirically, Predictive Entropy Maximization remains robust under increasing source correlation and observation noise, outperforms biologically plausible algorithms that rely on stronger independence or decorrelation assumptions, and remains competitive with exact determinant- and correlative-information-based baselines. These results show how local plasticity and adaptive lateral inhibition can emerge from maximizing a regularized second-order entropy over structured source domains. Our implementation code is available at https://github.com/BariscanBozkurt/Predictive-Entropy-Maximization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Predictive Entropy Maximization (PEM) for blind source separation, deriving a neural network from a close surrogate approximation to entropy maximization. This yields an objective with interpretable terms that translate into local plasticity rules: error-driven feedforward updates (via dendritic mechanisms), Hebbian lateral inhibition, and output nonlinearities enforcing source-domain structure. Explicit spectral bounds on the surrogate approximation error are derived, and experiments demonstrate competitive BSS performance that is robust to increasing source correlation and observation noise, outperforming other local biologically plausible methods while remaining competitive with exact baselines. Code is provided.

Significance. If the spectral bounds ensure the surrogate objective preserves the key properties of entropy maximization (i.e., its local minima recover the sources) even under moderate correlation, the work provides a valuable normative bridge between information-theoretic objectives and biologically plausible local mechanisms. Strengths include the explicit error bounds, the emergence of dendritic-style computation and Hebbian rules from a single objective, and public code for reproducibility. This could inform models of sensory processing that rely on structured rather than fully independent sources.

major comments (2)
  1. [§4] §4 (Spectral Bounds on Surrogate Error): The derived bounds are expressed in terms of source-domain properties (e.g., eigenvalues of the covariance) that degrade exactly when source correlation rises; the manuscript does not show that the resulting error remains below the threshold at which the interpretable components (error-driven feedforward, Hebbian lateral) continue to enforce source recovery rather than relying primarily on the output nonlinearities.
  2. [§5.3] §5.3 (Robustness Experiments): While performance is reported as robust under increasing correlation, the actual pointwise or integrated surrogate error on the test sets is not quantified; without this, it is unclear whether the empirical success validates the normative derivation or occurs despite the approximation becoming loose.
minor comments (2)
  1. [§3] Notation for the dendritic error computation (around Eq. (12)) could be clarified by explicitly linking each term to the corresponding biological mechanism in a single summary table.
  2. [Figure 4] Figure 4 caption should state the number of independent runs and whether error bars represent standard deviation or standard error.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We address each major comment below and have incorporated revisions to clarify the role of the spectral bounds and to quantify the surrogate error in the experimental sections.

read point-by-point responses
  1. Referee: [§4] §4 (Spectral Bounds on Surrogate Error): The derived bounds are expressed in terms of source-domain properties (e.g., eigenvalues of the covariance) that degrade exactly when source correlation rises; the manuscript does not show that the resulting error remains below the threshold at which the interpretable components (error-driven feedforward, Hebbian lateral) continue to enforce source recovery rather than relying primarily on the output nonlinearities.

    Authors: We appreciate this observation regarding the dependence of the bounds on source covariance eigenvalues. The derived spectral bounds are intended as a theoretical characterization of the approximation error rather than a guarantee of a fixed error threshold; they explicitly delineate the regimes in which the surrogate remains close to the original entropy objective. In the revised manuscript we have added a discussion in §4 explaining how the local plasticity rules (error-driven feedforward and Hebbian lateral inhibition) retain their normative interpretation even as the bound loosens, supported by a new ablation experiment that isolates the contribution of these rules versus the output nonlinearities under increasing correlation. revision: partial

  2. Referee: [§5.3] §5.3 (Robustness Experiments): While performance is reported as robust under increasing correlation, the actual pointwise or integrated surrogate error on the test sets is not quantified; without this, it is unclear whether the empirical success validates the normative derivation or occurs despite the approximation becoming loose.

    Authors: We agree that reporting the surrogate error directly on the test sets would strengthen the connection between the theoretical analysis and the empirical results. In the revised §5.3 we now include both pointwise and integrated surrogate error values across the correlation and noise levels used in the robustness experiments. These additional results indicate that the error remains within moderate bounds in the regimes where Predictive Entropy Maximization continues to outperform other local methods, thereby supporting that the normative components contribute to the observed performance. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained with independent mathematical and empirical support

full rationale

The paper introduces Predictive Entropy Maximization as a close approximation to an entropy objective for blind source separation, derives spectral bounds on the resulting surrogate error directly from the approximation's properties, and demonstrates competitive performance via local plasticity rules through both the interpretable components of the objective and empirical tests across source correlations and noise levels. No step reduces by construction to its own inputs: the bounds are obtained via standard spectral analysis of the error term rather than being fitted or self-referential, the local update rules follow from minimizing the stated objective without tautological renaming, and performance claims rest on external validation rather than self-citation chains or ansatzes smuggled from prior author work. The derivation remains self-contained against the provided benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the method rests on the standard assumption that entropy maximization separates sources and on the validity of a close approximation whose error is bounded spectrally; no explicit free parameters or new invented entities are described.

axioms (1)
  • domain assumption Maximizing (an approximation to) entropy recovers independent or structured latent sources from linear mixtures.
    Invoked as the normative starting point for deriving the local update rules.

pith-pipeline@v0.9.0 · 5803 in / 1249 out tokens · 34610 ms · 2026-05-20T07:40:32.054599+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 1 internal anchor

  1. [1]

    Academic press, 2010

    Pierre Comon and Christian Jutten.Handbook of Blind Source Separation: Independent com- ponent analysis and applications. Academic press, 2010

  2. [2]

    John Wiley & Sons, 2009

    Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, and Shun-ichi Amari.Nonnegative ma- trix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons, 2009

  3. [3]

    An information -maximization approach to blind separation and blind deconvolution,

    Anthony J. Bell and Terrence J. Sejnowski. An information-maximization approach to blind separation and blind deconvolution.Neural Comput., 7(6):1129–1159, November 1995. ISSN 0899-7667. doi: 10.1162/neco.1995.7.6.1129. URLhttps://doi.org/10.1162/neco. 1995.7.6.1129

  4. [4]

    Olshausen and David J

    Bruno A. Olshausen and David J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images.Nature, 381(6583):607–609, 1996. doi: 10.1038/ 381607a0

  5. [5]

    Michael S. Lewicki. Efficient coding of natural sounds.Nature Neuroscience, 5(4):356–363,

  6. [6]

    URLhttps://doi.org/10.1038/nn831

    doi: 10.1038/nn831. URLhttps://doi.org/10.1038/nn831

  7. [7]

    Some experiments on the recognition of speech, with one and with two ears

    E Colin Cherry. Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America, 25(5):975–979, 1953

  8. [8]

    The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions.Acta Acustica united with Acustica, 86(1):117– 128, 2000

    Adelbert W Bronkhorst. The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions.Acta Acustica united with Acustica, 86(1):117– 128, 2000

  9. [9]

    The cocktail party problem.Neural Computation, 17(9):1875– 1902, 2005

    Simon Haykin and Zhe Chen. The cocktail party problem.Neural Computation, 17(9):1875– 1902, 2005. doi: 10.1162/0899766054322964

  10. [10]

    Independent component analysis: A tutorial, 1999

    Aapo Hyvärinen and Erkki Oja. Independent component analysis: A tutorial, 1999. URL https://api.semanticscholar.org/CorpusID:118629

  11. [11]

    Independent component analysis: Algorithms and applica- tions.Neural Networks, 13(4–5):411–430, 2000

    Aapo Hyvärinen and Erkki Oja. Independent component analysis: Algorithms and applica- tions.Neural Networks, 13(4–5):411–430, 2000. doi: 10.1016/S0893-6080(00)00026-5

  12. [12]

    Chia-Hsiang Lin, Wing-Kin Ma, Wei-Chiang Li, Chong-Yung Chi, and Arulmurugan Am- bikapathi. Identifiability of the simplex volume minimization criterion for blind hyperspectral unmixing: The no pure-pixel case.IEEE Transactions on Geoscience and Remote Sensing, 53, 06 2014. doi: 10.1109/TGRS.2015.2424719

  13. [13]

    Chia-Hsiang Lin, Ruiyuan Wu, Wing-Kin Ma, Chong-Yung Chi, and Yue Wang. Maximum volume inscribed ellipsoid: A new simplex-structured matrix factorization framework via facet enumeration and convex optimization.SIAM Journal on Imaging Sciences, 11(2):1651–1679,

  14. [14]

    URLhttps://doi.org/10.1137/17M114145X

    doi: 10.1137/17M114145X. URLhttps://doi.org/10.1137/17M114145X

  15. [15]

    An algorithmic framework for sparse bounded component analysis.IEEE Transactions on Signal Processing, 66(19):5194–5205, August 2018

    Eren Babatas and Alper T Erdogan. An algorithmic framework for sparse bounded component analysis.IEEE Transactions on Signal Processing, 66(19):5194–5205, August 2018

  16. [16]

    A class of bounded component analysis algorithms for the separation of both independent and dependent sources.IEEE Transactions on Signal Processing, 61(22): 5730–5743, August 2013

    Alper T Erdogan. A class of bounded component analysis algorithms for the separation of both independent and dependent sources.IEEE Transactions on Signal Processing, 61(22): 5730–5743, August 2013

  17. [17]

    Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values.Environmetrics, 5(2):111–126, 1994

    Pentti Paatero and Unto Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values.Environmetrics, 5(2):111–126, 1994. doi: https://doi.org/10.1002/env.3170050203. URLhttps://onlinelibrary.wiley.com/ doi/abs/10.1002/env.3170050203

  18. [18]

    Sidiropoulos, and Wing-Kin Ma

    Xiao Fu, Kejun Huang, Nicholas D. Sidiropoulos, and Wing-Kin Ma. Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications.IEEE Signal Processing Magazine, 36(2):59–80, 2019. doi: 10.1109/MSP.2018.2877582. 11

  19. [19]

    Gokcan Tatli and Alper T. Erdogan. Polytopic matrix factorization: Determinant maximization based criterion and identifiability.IEEE Transactions on Signal Processing, 69:5431–5447,

  20. [20]

    doi: 10.1109/TSP.2021.3112918

  21. [21]

    Gokcan Tatli and Alper T. Erdogan. Generalized polytopic matrix factorization. InICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3235–3239, 2021. doi: 10.1109/ICASSP39728.2021.9413709

  22. [23]

    Alper T. Erdogan. An information maximization based blind source separation approach for dependent and independent sources. InICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4378–4382, 2022. doi: 10.1109/ ICASSP43922.2022.9746099

  23. [24]

    A local learning rule for independent component analy- sis.Scientific Reports, 6(1):28073, Jun 2016

    Takuya Isomura and Taro Toyoizumi. A local learning rule for independent component analy- sis.Scientific Reports, 6(1):28073, Jun 2016. ISSN 2045-2322. doi: 10.1038/srep28073. URL https://doi.org/10.1038/srep28073

  24. [25]

    Error-gated hebbian rule: A local learning rule for principal and independent component analysis.Scientific Reports, 8(1):1835, Jan 2018

    Takuya Isomura and Taro Toyoizumi. Error-gated hebbian rule: A local learning rule for principal and independent component analysis.Scientific Reports, 8(1):1835, Jan 2018. ISSN 2045-2322. doi: 10.1038/s41598-018-20082-0. URLhttps://doi.org/10.1038/ s41598-018-20082-0

  25. [26]

    Sengupta

    Yanis Bahroun, Dmitri Chklovskii, and Anirvan M. Sengupta. A normative and biologically plausible algorithm for independent component analysis. In A. Beygelzimer, Y . Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Sys- tems, 2021. URLhttps://openreview.net/forum?id=fpvUKdqcPV

  26. [27]

    Chklovskii

    Cengiz Pehlevan, Sreyas Mohan, and Dmitri B. Chklovskii. Blind nonnegative source sep- aration using biological neural networks.Neural Computation, 29(11):2925–2954, 11 2017. ISSN 0899-7667. doi: 10.1162/neco_a_01007. URLhttps://doi.org/10.1162/neco_a_ 01007

  27. [28]

    Alper Tunga Erdogan and Cengiz Pehlevan. Blind bounded source separation using neural networks with local learning rules.ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3812–3816, 2020. URLhttps: //api.semanticscholar.org/CorpusID:215745493

  28. [29]

    Berfin Simsek and Alper T. Erdogan. Online bounded component analysis: A simple recurrent neural network with local update rule for unsupervised separation of dependent and indepen- dent sources. In2019 53rd Asilomar Conference on Signals, Systems, and Computers, pages 1639–1643, 2019. doi: 10.1109/IEEECONF44664.2019.9048916

  29. [30]

    Biologically-plausible determi- nant maximization neural networks for blind separation of correlated sources

    Bariscan Bozkurt, Cengiz Pehlevan, and Alper Tunga Erdogan. Biologically-plausible determi- nant maximization neural networks for blind separation of correlated sources. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Informa- tion Processing Systems, 2022. URLhttps://openreview.net/forum?id=espX_4CLr46

  30. [31]

    Cor- relative information maximization based biologically plausible neural networks for correlated source separation

    Bariscan Bozkurt, Ate¸ s ˙Isfendiyaro˘glu, Cengiz Pehlevan, and Alper Tunga Erdogan. Cor- relative information maximization based biologically plausible neural networks for correlated source separation. InThe Eleventh International Conference on Learning Representations,

  31. [32]

    URLhttps://openreview.net/forum?id=8JsaP7j1cL0

  32. [33]

    A tutorial on the free-energy framework for modelling perception and learning

    Rafal Bogacz. A tutorial on the free-energy framework for modelling perception and learning. Journal of mathematical psychology, 76:198–211, 2017

  33. [34]

    Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature neuroscience, 2(1):79–87,

    Rajesh PN Rao and Dana H Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature neuroscience, 2(1):79–87,

  34. [35]

    doi: 10.1038/4580. 12

  35. [36]

    Theories of error back-propagation in the brain

    James CR Whittington and Rafal Bogacz. Theories of error back-propagation in the brain. Trends in cognitive sciences, 23(3):235–250, 2019

  36. [37]

    Correlative information max- imization: A biologically plausible approach to supervised deep neural networks without weight symmetry

    Bariscan Bozkurt, Cengiz Pehlevan, and Alper Tunga Erdogan. Correlative information max- imization: A biologically plausible approach to supervised deep neural networks without weight symmetry. InThirty-seventh Conference on Neural Information Processing Systems,

  37. [38]

    URLhttps://openreview.net/forum?id=TUGoUNkccV

  38. [39]

    The combination of hebbian and predictive plasticity learns invariant object representations in deep sensory networks.Nature Neuro- science, 26(11):1906–1915, Nov 2023

    Manu Srinath Halvagal and Friedemann Zenke. The combination of hebbian and predictive plasticity learns invariant object representations in deep sensory networks.Nature Neuro- science, 26(11):1906–1915, Nov 2023. ISSN 1546-1726. doi: 10.1038/s41593-023-01460-y. URLhttps://doi.org/10.1038/s41593-023-01460-y

  39. [40]

    VICReg: Variance-invariance-covariance regu- larization for self-supervised learning

    Adrien Bardes, Jean Ponce, and Yann LeCun. VICReg: Variance-invariance-covariance regu- larization for self-supervised learning. InInternational Conference on Learning Representa- tions, 2022. URLhttps://openreview.net/forum?id=xm6YD62D1Ub

  40. [41]

    Proximal algorithms.Foundations and trends® in Optimiza- tion, 1(3):127–239, 2014

    Neal Parikh, Stephen Boyd, et al. Proximal algorithms.Foundations and trends® in Optimiza- tion, 1(3):127–239, 2014

  41. [42]

    Learning by the dendritic prediction of somatic spiking

    Robert Urbanczik and Walter Senn. Learning by the dendritic prediction of somatic spiking. Neuron, 81(3):521–528, 2014

  42. [43]

    Homeostatic synaptic plasticity: Local and global mechanisms for stabilizing neuronal function.Cold Spring Harbor Perspectives in Biology, 4(1):a005736, 2012

    Gina Turrigiano. Homeostatic synaptic plasticity: Local and global mechanisms for stabilizing neuronal function.Cold Spring Harbor Perspectives in Biology, 4(1):a005736, 2012. doi: 10.1101/cshperspect.a005736

  43. [44]

    Brian McFee, Matt McVicar, Daniel Faronbi, Iran Roman, Matan Gover, Stefan Balke, Scott Seyfarth, Ayoub Malek, Colin Raffel, Vincent Lostanlen, Benjamin van Niekirk, Dana Lee, Frank Cwitkowitz, Frank Zalkow, Oriol Nieto, Dan Ellis, Jack Mason, Kyungyun Lee, Bea Steers, Emily Halvachs, Carl Thomé, Fabian Robert-Stöter, Rachel Bittner, Ziyao Wei, Adam Weiss...

  44. [45]

    Efficient auditory coding.Nature, 439(7079):978–982, 2006

    Evan C Smith and Michael S Lewicki. Efficient auditory coding.Nature, 439(7079):978–982, 2006

  45. [46]

    SIAM, Philadelphia, PA, 1992

    Ingrid Daubechies.Ten Lectures on Wavelets. SIAM, Philadelphia, PA, 1992. ISBN 978-0- 89871-274-2

  46. [47]

    Self- supervised learning with an information maximization criterion

    Serdar Ozsoy, Shadi Hamdan, Sercan O Arik, Deniz Yuret, and Alper T Erdogan. Self- supervised learning with an information maximization criterion. In Alice H. Oh, Alekh Agar- wal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Pro- cessing Systems, 2022. URLhttps://openreview.net/forum?id=5MgZAu2NR7X

  47. [48]

    Bootstrap your own latent-a new approach to self-supervised learning.Ad- vances in neural information processing systems, 33:21271–21284, 2020

    Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Ghesh- laghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning.Ad- vances in neural information processing systems, 33:21271–21284, 2020. 13

  48. [49]

    Barlow twins: Self- supervised learning via redundancy reduction

    Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. Barlow twins: Self- supervised learning via redundancy reduction. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, Proceedings of Machine Learning Research, pages 12310–12320. PMLR, 202...

  49. [50]

    A path towards autonomous machine intelligence version 0.9

    Yann LeCun et al. A path towards autonomous machine intelligence version 0.9. 2, 2022-06- 27.Open Review, 62(1):1–62, 2022

  50. [51]

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. A simple frame- work for contrastive learning of visual representations.CoRR, abs/2002.05709, 2020. URL https://arxiv.org/abs/2002.05709

  51. [52]

    Horace B. Barlow. Possible principles underlying the transformations of sensory messages. In Walter A. Rosenblith, editor,Sensory Communication, pages 217–234. MIT Press, Cambridge, MA, 1961

  52. [53]

    b11(t)b 12(t)b 13(t) b21(t)b 22(t)b 23(t) b31(t)b 32(t)b 33(t) # ,y(t) =

    Katrina Drozdov, Ravid Shwartz-Ziv, and Yann LeCun. Video representation learning with joint-embedding predictive architectures, 2024. URLhttps://arxiv.org/abs/2412. 10925. 14 Appendix A Review of Correlative Information Maximization 16 A.1 Correlative entropy and mutual information . . . . . . . . . . . . . . . . . . . . . 16 A.2 Batch CorInfoMax objecti...

  53. [54]

    Hence Dε =D+εIis positive definite and(D ε)−1/2 is well-defined

    the regularized log-determinant admits the exact decomposition log det(C+εI) = nX i=1 log(Cii +ε)− 1 2 nX i=1 λ2 i +R 2,(D.1) where the remainder is given exactly by R2 = nX i=1 log(1 +λ i)−λ i + 1 2 λ2 i .(D.2) Moreover, the second-order term can be written entrywise as nX i=1 λ2 i = Tr((Bε)2) = nX i=1 nX j=1 j̸=i C 2 ij (Cii +ε)(C jj +ε) ,(D.3) and ther...

  54. [55]

    for everyY∈ Y, J batch sur (Y)− J batch det (Y) ≤¯εY;(D.23)

  55. [56]

    representation collapse

    ifY sur is a global minimizer ofJ batch sur overY, then J batch det (Ysur)≤inf Y∈Y J batch det (Y) + 2¯εY .(D.24) Proof.By Equation D.4, applied withC= ˆCy(Y), the difference between the surrogate and exact batch objectives is exactly the Taylor remainder: J batch sur (Y)− J batch det (Y) =R 2(Y). Applying Corollary D.3 withC= ˆCy(Y)yields |R2(Y)| ≤ ∥Bε(Y...