Normative Networks for Source Separation via Local Plasticity and Dendritic Computation

Bariscan Bozkurt; Efe Ali Gorguner; Francesco Innocenti; Rafal Bogacz

arxiv: 2605.19965 · v2 · pith:SLJ767SZnew · submitted 2026-05-19 · 💻 cs.LG · eess.SP

Normative Networks for Source Separation via Local Plasticity and Dendritic Computation

Bariscan Bozkurt , Efe Ali Gorguner , Francesco Innocenti , Rafal Bogacz This is my paper

Pith reviewed 2026-05-22 09:17 UTC · model grok-4.3

classification 💻 cs.LG eess.SP

keywords blind source separationlocal plasticityentropy maximizationdendritic computationpredictive networksHebbian learningneural networks

0 comments

The pith

Blind source separation can be performed by a neural network whose feedforward synapses learn via local error-driven rules, lateral connections via Hebbian plasticity, and outputs via simple nonlinearities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors start from the goal of maximizing entropy over structured source domains to recover latent causes from mixtures. They replace the exact entropy with a close surrogate whose minimization produces an objective with three clear parts. Solving that objective yields a predictive network in which feedforward weights are updated by an error signal that dendrites can compute locally, lateral inhibitory weights follow straightforward Hebbian updates, and output nonlinearities enforce the known source domains. The paper also supplies spectral bounds that tell when the surrogate stays faithful to the original entropy. If the bounds hold, the resulting local rules recover sources even when the sources are correlated or the observations are noisy.

Core claim

Minimizing the Predictive Entropy Maximization objective produces a recurrent predictive architecture in which feedforward synapses obey an error-driven plasticity rule that can be realized through dendritic mechanisms, lateral inhibitory connections are learned with local Hebbian plasticity, and source-domain constraints are enforced through simple output nonlinearities, all while explicit spectral bounds characterize the accuracy of the entropy approximation.

What carries the argument

Predictive Entropy Maximization objective formed by a surrogate entropy approximation whose minimization directly supplies the local update rules for the predictive network.

If this is right

The architecture recovers sources that are correlated or observed with noise without requiring stronger independence or decorrelation assumptions.
Lateral inhibition emerges from local Hebbian updates rather than from explicit global coordination.
Performance stays competitive with exact determinant-based and correlative-information baselines while using only local computations.
Adaptive lateral inhibition and local error-driven feedforward learning arise together from maximizing a regularized second-order entropy over structured domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hardware implementations could use purely local circuits for both feedforward and lateral updates, avoiding the wiring cost of global error signals.
The same surrogate approach might be tested on other unsupervised objectives where exact entropy or determinant calculations are intractable.
Measuring the actual surrogate error in larger-scale simulations would indicate how far the spectral bounds can be pushed before performance degrades.

Load-bearing premise

The surrogate error of the entropy approximation must stay small enough under the derived spectral bounds that the local plasticity rules still achieve the intended entropy maximization.

What would settle it

A demonstration that the network fails to separate sources at the level predicted by the exact entropy objective precisely when the spectral bounds on the surrogate error are violated would show the approximation does not preserve the original goal.

Figures

Figures reproduced from arXiv: 2605.19965 by Bariscan Bozkurt, Efe Ali Gorguner, Francesco Innocenti, Rafal Bogacz.

**Figure 1.** Figure 1: Representative Predictive Entropy Maximization architectures for two source domains. (a) Antisparse architecture. Mixture inputs are mapped through feedforward weights W to local prediction compartments, each paired with an output unit yk. Prediction errors ek are computed as the differences between somatic and dendritic activity. The output layer is coupled through adaptive recurrent inhibitory interact… view at source ↗

**Figure 2.** Figure 2: Performance comparisons across source domains. (a) Mean component SNR versus correlation ρ for nonnegative antisparse sources. Predictive Entropy Maximization (PEM) and its unnormalized variant (u-PEM) remain robust and outperform online baselines relying on stronger independence or decorrelation assumptions. (b) Mean component SNR versus input SNR for sparse sources. The PEM models stay close to the batch… view at source ↗

**Figure 3.** Figure 3: Receptive fields learned by the sparse Predictive Entropy Maximization model from natural image patches exhibit localized and oriented structure characteristic of sparse sensory representations. Learning sparse receptive fields. We evaluate the sparse architecture illustrated in Figure 1b on prewhitened 12 × 12 natural image patches [4].2 The network is trained on vectorized patches in R 144. The learne… view at source ↗

**Figure 4.** Figure 4: Taylor-surrogate diagnostics. (a) Exact Taylor remainder versus the theoretical bound from (Eq. 8), over recorded runs and iterations, color-coded by source correlation ρ. The solid line denotes y = x. (b) Mean Taylor remainder and corresponding bound (Eq. 8) as functions of ρ. 5 Conclusion Summary. In this work, we introduced Predictive Entropy Maximization, an online determinantbased entropy maximizatio… view at source ↗

**Figure 5.** Figure 5: Representative auditory source-separation result. Temporal alignment between the ground-truth and recovered sources for one trial in the cocktail-party experiment described in Section 4. E.2 Transform invariance in auditory source separation In Section 4, we applied Predictive Entropy Maximization to audio mixtures in a sparse wavelet domain. The justification is straightforward. In the noise-free setting… view at source ↗

**Figure 6.** Figure 6: Additional performance comparisons. (a) Mean component SNR (mSNR) as a function of the source correlation level ρ for antisparse sources. (b) Mean component SNR as a function of the input SNR for nonnegative sparse sources. (c) Mean component SNR as a function of the input SNR for simplex sources. These complementary experiments confirm the same pattern observed in the main text: Predictive Entropy Maximiz… view at source ↗

**Figure 7.** Figure 7: Transient Taylor-surrogate diagnostics. Exact Taylor remainder and corresponding spectral upper bound over training iterations for two representative source-correlation levels. supplied, the feedforward matrix was initialized as W(0) = I + ξW , (E.3) where I denotes the rectangular identity and ξW has i.i.d. Gaussian entries with standard deviation 0.01. Likewise, if not specified explicitly, the running m… view at source ↗

**Figure 8.** Figure 8: Ablation with respect to the distribution of the mixing-matrix entries. Box plots summarize the distribution of mSNR over 30 seeds in the uncorrelated setting (ρ = 0) at input SNR 30 dB. All candidate mixing laws are centered and scaled to unit variance: N (0, 1), U(− √ 3, √ 3), Laplace(0, 1/ √ 2), Rad(±1), and p 3/5 t5. The performance of Predictive Entropy Maximization remains broadly stable across these… view at source ↗

**Figure 9.** Figure 9: Ablation with respect to the number of mixtures. Mean mSNR is plotted against the number of mixtures m ∈ {7, . . . , 13} while keeping the number of sources fixed at n = 5. The shaded envelopes show one standard error over 30 seeds. For each seed, a single Gaussian 13 × 5 mixing matrix is sampled and the experiment with m mixtures uses its first m rows, so the comparison isolates the effect of adding obser… view at source ↗

**Figure 10.** Figure 10: Evolution of the variance term in PEM during training. The evolution of the mean variance (averaged across the sources) is plotted across training samples for the PEM model. Solid lines and shaded bands indicate the mean and the 95% confidence interval over 30 seeds, respectively. For each seed, a single Gaussian 10 × 5 mixing matrix is sampled. These experiments demonstrate that the variance terms stabi… view at source ↗

read the original abstract

Blind source separation (BSS) is a natural framework for studying how latent causes may be recovered from sensory mixtures, but deriving online and biologically plausible algorithms for structured (i.e., constrained to known domains) and potentially correlated sources remains challenging. Recent work has derived neural networks for BSS from maximization of an entropy measure, yet its online implementations involve complex and nonlocal recurrent dynamics. Motivated by this perspective, we propose Predictive Entropy Maximization, which achieves competitive performance in BSS, using only local weight updates. The method employs a close approximation of an entropy measure, yielding an objective function with easily interpretable components. Minimizing this objective leads to a predictive neural architecture in which feedforward synapses follow an error-driven rule (that can be realized through dendritic mechanisms), lateral inhibitory connections are learned with local Hebbian plasticity, and source-domain constraints are enforced through simple output nonlinearities. We derive explicit spectral bounds on the surrogate error, characterizing when the approximation is accurate. Empirically, Predictive Entropy Maximization remains robust under increasing source correlation and observation noise, outperforms biologically plausible algorithms that rely on stronger independence or decorrelation assumptions, and remains competitive with exact determinant- and correlative-information-based baselines. These results show how local plasticity and adaptive lateral inhibition can emerge from maximizing a regularized second-order entropy over structured source domains. Our implementation code is available at https://github.com/BariscanBozkurt/Predictive-Entropy-Maximization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives local error-driven and Hebbian rules for blind source separation from a surrogate entropy objective, with spectral bounds and competitive empirical results, though the approximation's accuracy under rising source correlation needs tighter checking.

read the letter

The main takeaway is that this work turns an entropy-maximization goal for structured blind source separation into a network with only local updates: error-driven feedforward weights that can use dendritic computation, Hebbian lateral inhibition, and simple output nonlinearities to enforce domain constraints. That combination is not in the earlier entropy-based BSS papers they cite, and the code is public, which helps.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes Predictive Entropy Maximization for blind source separation (BSS) of structured and potentially correlated sources. It approximates an entropy objective to derive a predictive neural architecture whose feedforward synapses follow an error-driven rule (realizable via dendritic mechanisms), lateral inhibitory connections use local Hebbian plasticity, and source-domain constraints are enforced by output nonlinearities. Explicit spectral bounds on the surrogate error are derived, and experiments demonstrate robustness to increasing source correlation and observation noise, competitive performance against exact determinant- and correlative-information baselines, and superiority over biologically plausible methods relying on stronger independence assumptions. Code is provided at https://github.com/BariscanBozkurt/Predictive-Entropy-Maximization.

Significance. If the approximation and bounds hold under the tested conditions, the work provides a valuable normative bridge between entropy maximization and local, biologically plausible plasticity rules for BSS, avoiding nonlocal recurrent dynamics. Explicit credit is due for the reproducible code, the derivation of spectral bounds on surrogate error, and the direct comparison to independent baselines that keeps circularity low. The approach could inform models of how structured source separation emerges in neural circuits.

major comments (1)

[spectral bounds derivation and robustness experiments] The derivation of spectral bounds (detailed after the abstract and in the methods): the bounds are stated under assumptions of bounded spectra and limited source correlation, yet the robustness experiments deliberately increase source correlation (and thus violate those conditions). This raises a load-bearing concern for the central claim that the derived local rules maximize the intended entropy objective, as the surrogate error may grow with correlation and break the normative link.

minor comments (1)

[Abstract and Introduction] The abstract and introduction could more explicitly state the precise form of the entropy approximation and the conditions under which the spectral bounds apply, to aid readers in assessing the scope of the normative derivation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review and for identifying this important point about the relationship between the derived bounds and the experimental conditions. We address the major comment below.

read point-by-point responses

Referee: The derivation of spectral bounds (detailed after the abstract and in the methods): the bounds are stated under assumptions of bounded spectra and limited source correlation, yet the robustness experiments deliberately increase source correlation (and thus violate those conditions). This raises a load-bearing concern for the central claim that the derived local rules maximize the intended entropy objective, as the surrogate error may grow with correlation and break the normative link.

Authors: We appreciate the referee highlighting this tension. The spectral bounds are indeed derived under assumptions of bounded spectra and limited source correlation to guarantee that the surrogate error remains small relative to the true entropy objective. The robustness experiments intentionally probe higher correlation regimes that can violate these assumptions, and we acknowledge that in such regimes the surrogate may deviate more substantially from the exact entropy measure, weakening the direct normative link in those specific conditions. The local rules themselves are obtained by minimizing the surrogate objective (via error-driven feedforward and local Hebbian lateral updates), so the derivation remains valid as an approximation even if the bound tightness is not guaranteed. Empirically, the architecture continues to yield competitive separation performance. In the revision we will add an explicit analysis of the realized surrogate error (computed via the released code) across the correlation sweep, together with a clarified discussion that distinguishes the sufficient conditions provided by the bounds from the broader empirical utility of the surrogate-derived rules. This will better delineate the theoretical guarantees. revision: yes

Circularity Check

0 steps flagged

Derivation of local rules from surrogate entropy objective remains independent of fitted results

full rationale

The paper starts from an external entropy-maximization objective for blind source separation, introduces an explicit approximation whose error is bounded by derived spectral conditions, and then algebraically obtains the local plasticity rules (error-driven feedforward, Hebbian lateral) as the gradient of that surrogate. No step renames a fitted parameter as a prediction, no self-citation supplies a uniqueness theorem that forces the architecture, and the experimental comparisons use independent baselines rather than the same data used to tune the surrogate. The derivation chain is therefore self-contained against external benchmarks and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard entropy-maximization framing for BSS plus a new approximation whose accuracy is bounded spectrally; no explicit free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The entropy measure admits a close surrogate whose minimization yields effective local plasticity rules for structured sources.
Invoked when deriving the objective and when claiming the local rules achieve competitive BSS performance.

pith-pipeline@v0.9.0 · 5803 in / 1217 out tokens · 34148 ms · 2026-05-22T09:17:47.085357+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We replace the exact log-determinant with an online second-order surrogate obtained by a Taylor expansion around the diagonal part of the output covariance... variance-expansion term and a variance-normalized cross-covariance penalty
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Corollary D.4... |R2(t)| ≤ ∥B̂λ,ε(t)∥²_F ∥B̂λ,ε(t)∥² / (1 + λ_min(B̂λ,ε(t)))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 1 internal anchor

[1]

Academic press, 2010

Pierre Comon and Christian Jutten.Handbook of Blind Source Separation: Independent com- ponent analysis and applications. Academic press, 2010

work page 2010
[2]

John Wiley & Sons, 2009

Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, and Shun-ichi Amari.Nonnegative ma- trix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons, 2009

work page 2009
[3]

Bell and Terrence J

Anthony J. Bell and Terrence J. Sejnowski. An information-maximization approach to blind separation and blind deconvolution.Neural Comput., 7(6):1129–1159, November 1995. ISSN 0899-7667. doi: 10.1162/neco.1995.7.6.1129. URLhttps://doi.org/10.1162/neco. 1995.7.6.1129

work page doi:10.1162/neco.1995.7.6.1129 1995
[4]

Olshausen and David J

Bruno A. Olshausen and David J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images.Nature, 381(6583):607–609, 1996. doi: 10.1038/ 381607a0

work page 1996
[5]

Michael S. Lewicki. Efficient coding of natural sounds.Nature Neuroscience, 5(4):356–363,

work page
[6]

URLhttps://doi.org/10.1038/nn831

doi: 10.1038/nn831. URLhttps://doi.org/10.1038/nn831

work page doi:10.1038/nn831
[7]

Some experiments on the recognition of speech, with one and with two ears

E Colin Cherry. Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America, 25(5):975–979, 1953

work page 1953
[8]

The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions.Acta Acustica united with Acustica, 86(1):117– 128, 2000

Adelbert W Bronkhorst. The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions.Acta Acustica united with Acustica, 86(1):117– 128, 2000

work page 2000
[9]

The cocktail party problem.Neural Computation, 17(9):1875– 1902, 2005

Simon Haykin and Zhe Chen. The cocktail party problem.Neural Computation, 17(9):1875– 1902, 2005. doi: 10.1162/0899766054322964

work page doi:10.1162/0899766054322964 1902
[10]

Independent component analysis: A tutorial, 1999

Aapo Hyvärinen and Erkki Oja. Independent component analysis: A tutorial, 1999. URL https://api.semanticscholar.org/CorpusID:118629

work page 1999
[11]

Independent component analysis: Algorithms and applica- tions.Neural Networks, 13(4–5):411–430, 2000

Aapo Hyvärinen and Erkki Oja. Independent component analysis: Algorithms and applica- tions.Neural Networks, 13(4–5):411–430, 2000. doi: 10.1016/S0893-6080(00)00026-5

work page doi:10.1016/s0893-6080(00)00026-5 2000
[12]

Chia-Hsiang Lin, Wing-Kin Ma, Wei-Chiang Li, Chong-Yung Chi, and Arulmurugan Am- bikapathi. Identifiability of the simplex volume minimization criterion for blind hyperspectral unmixing: The no pure-pixel case.IEEE Transactions on Geoscience and Remote Sensing, 53, 06 2014. doi: 10.1109/TGRS.2015.2424719

work page doi:10.1109/tgrs.2015.2424719 2014
[13]

Chia-Hsiang Lin, Ruiyuan Wu, Wing-Kin Ma, Chong-Yung Chi, and Yue Wang. Maximum volume inscribed ellipsoid: A new simplex-structured matrix factorization framework via facet enumeration and convex optimization.SIAM Journal on Imaging Sciences, 11(2):1651–1679,

work page
[14]

URLhttps://doi.org/10.1137/17M114145X

doi: 10.1137/17M114145X. URLhttps://doi.org/10.1137/17M114145X

work page doi:10.1137/17m114145x
[15]

An algorithmic framework for sparse bounded component analysis.IEEE Transactions on Signal Processing, 66(19):5194–5205, August 2018

Eren Babatas and Alper T Erdogan. An algorithmic framework for sparse bounded component analysis.IEEE Transactions on Signal Processing, 66(19):5194–5205, August 2018

work page 2018
[16]

A class of bounded component analysis algorithms for the separation of both independent and dependent sources.IEEE Transactions on Signal Processing, 61(22): 5730–5743, August 2013

Alper T Erdogan. A class of bounded component analysis algorithms for the separation of both independent and dependent sources.IEEE Transactions on Signal Processing, 61(22): 5730–5743, August 2013

work page 2013
[17]

Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values.Environmetrics, 5(2):111–126, 1994

Pentti Paatero and Unto Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values.Environmetrics, 5(2):111–126, 1994. doi: https://doi.org/10.1002/env.3170050203. URLhttps://onlinelibrary.wiley.com/ doi/abs/10.1002/env.3170050203

work page doi:10.1002/env.3170050203 1994
[18]

Sidiropoulos, and Wing-Kin Ma

Xiao Fu, Kejun Huang, Nicholas D. Sidiropoulos, and Wing-Kin Ma. Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications.IEEE Signal Processing Magazine, 36(2):59–80, 2019. doi: 10.1109/MSP.2018.2877582. 11

work page doi:10.1109/msp.2018.2877582 2019
[19]

Gokcan Tatli and Alper T. Erdogan. Polytopic matrix factorization: Determinant maximization based criterion and identifiability.IEEE Transactions on Signal Processing, 69:5431–5447,

work page
[20]

doi: 10.1109/TSP.2021.3112918

work page doi:10.1109/tsp.2021.3112918 2021
[21]

Gokcan Tatli and Alper T. Erdogan. Generalized polytopic matrix factorization. InICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3235–3239, 2021. doi: 10.1109/ICASSP39728.2021.9413709

work page doi:10.1109/icassp39728.2021.9413709 2021
[23]

Alper T. Erdogan. An information maximization based blind source separation approach for dependent and independent sources. InICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4378–4382, 2022. doi: 10.1109/ ICASSP43922.2022.9746099

work page arXiv 2022
[24]

A local learning rule for independent component analy- sis.Scientific Reports, 6(1):28073, Jun 2016

Takuya Isomura and Taro Toyoizumi. A local learning rule for independent component analy- sis.Scientific Reports, 6(1):28073, Jun 2016. ISSN 2045-2322. doi: 10.1038/srep28073. URL https://doi.org/10.1038/srep28073

work page doi:10.1038/srep28073 2016
[25]

Error-gated hebbian rule: A local learning rule for principal and independent component analysis.Scientific Reports, 8(1):1835, Jan 2018

Takuya Isomura and Taro Toyoizumi. Error-gated hebbian rule: A local learning rule for principal and independent component analysis.Scientific Reports, 8(1):1835, Jan 2018. ISSN 2045-2322. doi: 10.1038/s41598-018-20082-0. URLhttps://doi.org/10.1038/ s41598-018-20082-0

work page doi:10.1038/s41598-018-20082-0 2018
[26]

Sengupta

Yanis Bahroun, Dmitri Chklovskii, and Anirvan M. Sengupta. A normative and biologically plausible algorithm for independent component analysis. In A. Beygelzimer, Y . Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Sys- tems, 2021. URLhttps://openreview.net/forum?id=fpvUKdqcPV

work page 2021
[27]

Chklovskii

Cengiz Pehlevan, Sreyas Mohan, and Dmitri B. Chklovskii. Blind nonnegative source sep- aration using biological neural networks.Neural Computation, 29(11):2925–2954, 11 2017. ISSN 0899-7667. doi: 10.1162/neco_a_01007. URLhttps://doi.org/10.1162/neco_a_ 01007

work page doi:10.1162/neco_a_01007 2017
[28]

Alper Tunga Erdogan and Cengiz Pehlevan. Blind bounded source separation using neural networks with local learning rules.ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3812–3816, 2020. URLhttps: //api.semanticscholar.org/CorpusID:215745493

work page 2020
[29]

Berfin Simsek and Alper T. Erdogan. Online bounded component analysis: A simple recurrent neural network with local update rule for unsupervised separation of dependent and indepen- dent sources. In2019 53rd Asilomar Conference on Signals, Systems, and Computers, pages 1639–1643, 2019. doi: 10.1109/IEEECONF44664.2019.9048916

work page doi:10.1109/ieeeconf44664.2019.9048916 2019
[30]

Biologically-plausible determi- nant maximization neural networks for blind separation of correlated sources

Bariscan Bozkurt, Cengiz Pehlevan, and Alper Tunga Erdogan. Biologically-plausible determi- nant maximization neural networks for blind separation of correlated sources. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Informa- tion Processing Systems, 2022. URLhttps://openreview.net/forum?id=espX_4CLr46

work page 2022
[31]

Cor- relative information maximization based biologically plausible neural networks for correlated source separation

Bariscan Bozkurt, Ate¸ s ˙Isfendiyaro˘glu, Cengiz Pehlevan, and Alper Tunga Erdogan. Cor- relative information maximization based biologically plausible neural networks for correlated source separation. InThe Eleventh International Conference on Learning Representations,

work page
[32]

URLhttps://openreview.net/forum?id=8JsaP7j1cL0

work page
[33]

A tutorial on the free-energy framework for modelling perception and learning

Rafal Bogacz. A tutorial on the free-energy framework for modelling perception and learning. Journal of mathematical psychology, 76:198–211, 2017

work page 2017
[34]

Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature neuroscience, 2(1):79–87,

Rajesh PN Rao and Dana H Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature neuroscience, 2(1):79–87,

work page
[35]

doi: 10.1038/4580. 12

work page doi:10.1038/4580
[36]

Theories of error back-propagation in the brain

James CR Whittington and Rafal Bogacz. Theories of error back-propagation in the brain. Trends in cognitive sciences, 23(3):235–250, 2019

work page 2019
[37]

Correlative information max- imization: A biologically plausible approach to supervised deep neural networks without weight symmetry

Bariscan Bozkurt, Cengiz Pehlevan, and Alper Tunga Erdogan. Correlative information max- imization: A biologically plausible approach to supervised deep neural networks without weight symmetry. InThirty-seventh Conference on Neural Information Processing Systems,

work page
[38]

URLhttps://openreview.net/forum?id=TUGoUNkccV

work page
[39]

The combination of hebbian and predictive plasticity learns invariant object representations in deep sensory networks.Nature Neuro- science, 26(11):1906–1915, Nov 2023

Manu Srinath Halvagal and Friedemann Zenke. The combination of hebbian and predictive plasticity learns invariant object representations in deep sensory networks.Nature Neuro- science, 26(11):1906–1915, Nov 2023. ISSN 1546-1726. doi: 10.1038/s41593-023-01460-y. URLhttps://doi.org/10.1038/s41593-023-01460-y

work page doi:10.1038/s41593-023-01460-y 1906
[40]

VICReg: Variance-invariance-covariance regu- larization for self-supervised learning

Adrien Bardes, Jean Ponce, and Yann LeCun. VICReg: Variance-invariance-covariance regu- larization for self-supervised learning. InInternational Conference on Learning Representa- tions, 2022. URLhttps://openreview.net/forum?id=xm6YD62D1Ub

work page 2022
[41]

Proximal algorithms.Foundations and trends® in Optimiza- tion, 1(3):127–239, 2014

Neal Parikh, Stephen Boyd, et al. Proximal algorithms.Foundations and trends® in Optimiza- tion, 1(3):127–239, 2014

work page 2014
[42]

Learning by the dendritic prediction of somatic spiking

Robert Urbanczik and Walter Senn. Learning by the dendritic prediction of somatic spiking. Neuron, 81(3):521–528, 2014

work page 2014
[43]

Homeostatic synaptic plasticity: Local and global mechanisms for stabilizing neuronal function.Cold Spring Harbor Perspectives in Biology, 4(1):a005736, 2012

Gina Turrigiano. Homeostatic synaptic plasticity: Local and global mechanisms for stabilizing neuronal function.Cold Spring Harbor Perspectives in Biology, 4(1):a005736, 2012. doi: 10.1101/cshperspect.a005736

work page doi:10.1101/cshperspect.a005736 2012
[44]

Brian McFee, Matt McVicar, Daniel Faronbi, Iran Roman, Matan Gover, Stefan Balke, Scott Seyfarth, Ayoub Malek, Colin Raffel, Vincent Lostanlen, Benjamin van Niekirk, Dana Lee, Frank Cwitkowitz, Frank Zalkow, Oriol Nieto, Dan Ellis, Jack Mason, Kyungyun Lee, Bea Steers, Emily Halvachs, Carl Thomé, Fabian Robert-Stöter, Rachel Bittner, Ziyao Wei, Adam Weiss...

work page doi:10.5281/zenodo.15006942 2025
[45]

Efficient auditory coding.Nature, 439(7079):978–982, 2006

Evan C Smith and Michael S Lewicki. Efficient auditory coding.Nature, 439(7079):978–982, 2006

work page 2006
[46]

SIAM, Philadelphia, PA, 1992

Ingrid Daubechies.Ten Lectures on Wavelets. SIAM, Philadelphia, PA, 1992. ISBN 978-0- 89871-274-2

work page 1992
[47]

Self- supervised learning with an information maximization criterion

Serdar Ozsoy, Shadi Hamdan, Sercan O Arik, Deniz Yuret, and Alper T Erdogan. Self- supervised learning with an information maximization criterion. In Alice H. Oh, Alekh Agar- wal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Pro- cessing Systems, 2022. URLhttps://openreview.net/forum?id=5MgZAu2NR7X

work page 2022
[48]

Bootstrap your own latent-a new approach to self-supervised learning.Ad- vances in neural information processing systems, 33:21271–21284, 2020

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Ghesh- laghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning.Ad- vances in neural information processing systems, 33:21271–21284, 2020. 13

work page 2020
[49]

Barlow twins: Self- supervised learning via redundancy reduction

Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. Barlow twins: Self- supervised learning via redundancy reduction. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, Proceedings of Machine Learning Research, pages 12310–12320. PMLR, 202...

work page 2021
[50]

A path towards autonomous machine intelligence version 0.9

Yann LeCun et al. A path towards autonomous machine intelligence version 0.9. 2, 2022-06- 27.Open Review, 62(1):1–62, 2022

work page 2022
[51]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. A simple frame- work for contrastive learning of visual representations.CoRR, abs/2002.05709, 2020. URL https://arxiv.org/abs/2002.05709

work page internal anchor Pith review Pith/arXiv arXiv 2002
[52]

Horace B. Barlow. Possible principles underlying the transformations of sensory messages. In Walter A. Rosenblith, editor,Sensory Communication, pages 217–234. MIT Press, Cambridge, MA, 1961

work page 1961
[53]

b11(t)b 12(t)b 13(t) b21(t)b 22(t)b 23(t) b31(t)b 32(t)b 33(t) # ,y(t) =

Katrina Drozdov, Ravid Shwartz-Ziv, and Yann LeCun. Video representation learning with joint-embedding predictive architectures, 2024. URLhttps://arxiv.org/abs/2412. 10925. 14 Appendix A Review of Correlative Information Maximization 16 A.1 Correlative entropy and mutual information . . . . . . . . . . . . . . . . . . . . . 16 A.2 Batch CorInfoMax objecti...

work page 2024
[54]

Hence Dε =D+εIis positive definite and(D ε)−1/2 is well-defined

the regularized log-determinant admits the exact decomposition log det(C+εI) = nX i=1 log(Cii +ε)− 1 2 nX i=1 λ2 i +R 2,(D.1) where the remainder is given exactly by R2 = nX i=1 log(1 +λ i)−λ i + 1 2 λ2 i .(D.2) Moreover, the second-order term can be written entrywise as nX i=1 λ2 i = Tr((Bε)2) = nX i=1 nX j=1 j̸=i C 2 ij (Cii +ε)(C jj +ε) ,(D.3) and ther...

work page
[55]

for everyY∈ Y, J batch sur (Y)− J batch det (Y) ≤¯εY;(D.23)

work page
[56]

representation collapse

ifY sur is a global minimizer ofJ batch sur overY, then J batch det (Ysur)≤inf Y∈Y J batch det (Y) + 2¯εY .(D.24) Proof.By Equation D.4, applied withC= ˆCy(Y), the difference between the surrogate and exact batch objectives is exactly the Taylor remainder: J batch sur (Y)− J batch det (Y) =R 2(Y). Applying Corollary D.3 withC= ˆCy(Y)yields |R2(Y)| ≤ ∥Bε(Y...

work page 2000

[1] [1]

Academic press, 2010

Pierre Comon and Christian Jutten.Handbook of Blind Source Separation: Independent com- ponent analysis and applications. Academic press, 2010

work page 2010

[2] [2]

John Wiley & Sons, 2009

Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, and Shun-ichi Amari.Nonnegative ma- trix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons, 2009

work page 2009

[3] [3]

Bell and Terrence J

Anthony J. Bell and Terrence J. Sejnowski. An information-maximization approach to blind separation and blind deconvolution.Neural Comput., 7(6):1129–1159, November 1995. ISSN 0899-7667. doi: 10.1162/neco.1995.7.6.1129. URLhttps://doi.org/10.1162/neco. 1995.7.6.1129

work page doi:10.1162/neco.1995.7.6.1129 1995

[4] [4]

Olshausen and David J

Bruno A. Olshausen and David J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images.Nature, 381(6583):607–609, 1996. doi: 10.1038/ 381607a0

work page 1996

[5] [5]

Michael S. Lewicki. Efficient coding of natural sounds.Nature Neuroscience, 5(4):356–363,

work page

[6] [6]

URLhttps://doi.org/10.1038/nn831

doi: 10.1038/nn831. URLhttps://doi.org/10.1038/nn831

work page doi:10.1038/nn831

[7] [7]

Some experiments on the recognition of speech, with one and with two ears

E Colin Cherry. Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America, 25(5):975–979, 1953

work page 1953

[8] [8]

The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions.Acta Acustica united with Acustica, 86(1):117– 128, 2000

Adelbert W Bronkhorst. The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions.Acta Acustica united with Acustica, 86(1):117– 128, 2000

work page 2000

[9] [9]

The cocktail party problem.Neural Computation, 17(9):1875– 1902, 2005

Simon Haykin and Zhe Chen. The cocktail party problem.Neural Computation, 17(9):1875– 1902, 2005. doi: 10.1162/0899766054322964

work page doi:10.1162/0899766054322964 1902

[10] [10]

Independent component analysis: A tutorial, 1999

Aapo Hyvärinen and Erkki Oja. Independent component analysis: A tutorial, 1999. URL https://api.semanticscholar.org/CorpusID:118629

work page 1999

[11] [11]

Independent component analysis: Algorithms and applica- tions.Neural Networks, 13(4–5):411–430, 2000

Aapo Hyvärinen and Erkki Oja. Independent component analysis: Algorithms and applica- tions.Neural Networks, 13(4–5):411–430, 2000. doi: 10.1016/S0893-6080(00)00026-5

work page doi:10.1016/s0893-6080(00)00026-5 2000

[12] [12]

Chia-Hsiang Lin, Wing-Kin Ma, Wei-Chiang Li, Chong-Yung Chi, and Arulmurugan Am- bikapathi. Identifiability of the simplex volume minimization criterion for blind hyperspectral unmixing: The no pure-pixel case.IEEE Transactions on Geoscience and Remote Sensing, 53, 06 2014. doi: 10.1109/TGRS.2015.2424719

work page doi:10.1109/tgrs.2015.2424719 2014

[13] [13]

Chia-Hsiang Lin, Ruiyuan Wu, Wing-Kin Ma, Chong-Yung Chi, and Yue Wang. Maximum volume inscribed ellipsoid: A new simplex-structured matrix factorization framework via facet enumeration and convex optimization.SIAM Journal on Imaging Sciences, 11(2):1651–1679,

work page

[14] [14]

URLhttps://doi.org/10.1137/17M114145X

doi: 10.1137/17M114145X. URLhttps://doi.org/10.1137/17M114145X

work page doi:10.1137/17m114145x

[15] [15]

An algorithmic framework for sparse bounded component analysis.IEEE Transactions on Signal Processing, 66(19):5194–5205, August 2018

Eren Babatas and Alper T Erdogan. An algorithmic framework for sparse bounded component analysis.IEEE Transactions on Signal Processing, 66(19):5194–5205, August 2018

work page 2018

[16] [16]

A class of bounded component analysis algorithms for the separation of both independent and dependent sources.IEEE Transactions on Signal Processing, 61(22): 5730–5743, August 2013

Alper T Erdogan. A class of bounded component analysis algorithms for the separation of both independent and dependent sources.IEEE Transactions on Signal Processing, 61(22): 5730–5743, August 2013

work page 2013

[17] [17]

Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values.Environmetrics, 5(2):111–126, 1994

Pentti Paatero and Unto Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values.Environmetrics, 5(2):111–126, 1994. doi: https://doi.org/10.1002/env.3170050203. URLhttps://onlinelibrary.wiley.com/ doi/abs/10.1002/env.3170050203

work page doi:10.1002/env.3170050203 1994

[18] [18]

Sidiropoulos, and Wing-Kin Ma

Xiao Fu, Kejun Huang, Nicholas D. Sidiropoulos, and Wing-Kin Ma. Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications.IEEE Signal Processing Magazine, 36(2):59–80, 2019. doi: 10.1109/MSP.2018.2877582. 11

work page doi:10.1109/msp.2018.2877582 2019

[19] [19]

Gokcan Tatli and Alper T. Erdogan. Polytopic matrix factorization: Determinant maximization based criterion and identifiability.IEEE Transactions on Signal Processing, 69:5431–5447,

work page

[20] [20]

doi: 10.1109/TSP.2021.3112918

work page doi:10.1109/tsp.2021.3112918 2021

[21] [21]

Gokcan Tatli and Alper T. Erdogan. Generalized polytopic matrix factorization. InICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3235–3239, 2021. doi: 10.1109/ICASSP39728.2021.9413709

work page doi:10.1109/icassp39728.2021.9413709 2021

[22] [23]

Alper T. Erdogan. An information maximization based blind source separation approach for dependent and independent sources. InICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4378–4382, 2022. doi: 10.1109/ ICASSP43922.2022.9746099

work page arXiv 2022

[23] [24]

A local learning rule for independent component analy- sis.Scientific Reports, 6(1):28073, Jun 2016

Takuya Isomura and Taro Toyoizumi. A local learning rule for independent component analy- sis.Scientific Reports, 6(1):28073, Jun 2016. ISSN 2045-2322. doi: 10.1038/srep28073. URL https://doi.org/10.1038/srep28073

work page doi:10.1038/srep28073 2016

[24] [25]

Error-gated hebbian rule: A local learning rule for principal and independent component analysis.Scientific Reports, 8(1):1835, Jan 2018

Takuya Isomura and Taro Toyoizumi. Error-gated hebbian rule: A local learning rule for principal and independent component analysis.Scientific Reports, 8(1):1835, Jan 2018. ISSN 2045-2322. doi: 10.1038/s41598-018-20082-0. URLhttps://doi.org/10.1038/ s41598-018-20082-0

work page doi:10.1038/s41598-018-20082-0 2018

[25] [26]

Sengupta

Yanis Bahroun, Dmitri Chklovskii, and Anirvan M. Sengupta. A normative and biologically plausible algorithm for independent component analysis. In A. Beygelzimer, Y . Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Sys- tems, 2021. URLhttps://openreview.net/forum?id=fpvUKdqcPV

work page 2021

[26] [27]

Chklovskii

Cengiz Pehlevan, Sreyas Mohan, and Dmitri B. Chklovskii. Blind nonnegative source sep- aration using biological neural networks.Neural Computation, 29(11):2925–2954, 11 2017. ISSN 0899-7667. doi: 10.1162/neco_a_01007. URLhttps://doi.org/10.1162/neco_a_ 01007

work page doi:10.1162/neco_a_01007 2017

[27] [28]

Alper Tunga Erdogan and Cengiz Pehlevan. Blind bounded source separation using neural networks with local learning rules.ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3812–3816, 2020. URLhttps: //api.semanticscholar.org/CorpusID:215745493

work page 2020

[28] [29]

Berfin Simsek and Alper T. Erdogan. Online bounded component analysis: A simple recurrent neural network with local update rule for unsupervised separation of dependent and indepen- dent sources. In2019 53rd Asilomar Conference on Signals, Systems, and Computers, pages 1639–1643, 2019. doi: 10.1109/IEEECONF44664.2019.9048916

work page doi:10.1109/ieeeconf44664.2019.9048916 2019

[29] [30]

Biologically-plausible determi- nant maximization neural networks for blind separation of correlated sources

Bariscan Bozkurt, Cengiz Pehlevan, and Alper Tunga Erdogan. Biologically-plausible determi- nant maximization neural networks for blind separation of correlated sources. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Informa- tion Processing Systems, 2022. URLhttps://openreview.net/forum?id=espX_4CLr46

work page 2022

[30] [31]

Cor- relative information maximization based biologically plausible neural networks for correlated source separation

Bariscan Bozkurt, Ate¸ s ˙Isfendiyaro˘glu, Cengiz Pehlevan, and Alper Tunga Erdogan. Cor- relative information maximization based biologically plausible neural networks for correlated source separation. InThe Eleventh International Conference on Learning Representations,

work page

[31] [32]

URLhttps://openreview.net/forum?id=8JsaP7j1cL0

work page

[32] [33]

A tutorial on the free-energy framework for modelling perception and learning

Rafal Bogacz. A tutorial on the free-energy framework for modelling perception and learning. Journal of mathematical psychology, 76:198–211, 2017

work page 2017

[33] [34]

Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature neuroscience, 2(1):79–87,

Rajesh PN Rao and Dana H Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature neuroscience, 2(1):79–87,

work page

[34] [35]

doi: 10.1038/4580. 12

work page doi:10.1038/4580

[35] [36]

Theories of error back-propagation in the brain

James CR Whittington and Rafal Bogacz. Theories of error back-propagation in the brain. Trends in cognitive sciences, 23(3):235–250, 2019

work page 2019

[36] [37]

Correlative information max- imization: A biologically plausible approach to supervised deep neural networks without weight symmetry

Bariscan Bozkurt, Cengiz Pehlevan, and Alper Tunga Erdogan. Correlative information max- imization: A biologically plausible approach to supervised deep neural networks without weight symmetry. InThirty-seventh Conference on Neural Information Processing Systems,

work page

[37] [38]

URLhttps://openreview.net/forum?id=TUGoUNkccV

work page

[38] [39]

The combination of hebbian and predictive plasticity learns invariant object representations in deep sensory networks.Nature Neuro- science, 26(11):1906–1915, Nov 2023

Manu Srinath Halvagal and Friedemann Zenke. The combination of hebbian and predictive plasticity learns invariant object representations in deep sensory networks.Nature Neuro- science, 26(11):1906–1915, Nov 2023. ISSN 1546-1726. doi: 10.1038/s41593-023-01460-y. URLhttps://doi.org/10.1038/s41593-023-01460-y

work page doi:10.1038/s41593-023-01460-y 1906

[39] [40]

VICReg: Variance-invariance-covariance regu- larization for self-supervised learning

Adrien Bardes, Jean Ponce, and Yann LeCun. VICReg: Variance-invariance-covariance regu- larization for self-supervised learning. InInternational Conference on Learning Representa- tions, 2022. URLhttps://openreview.net/forum?id=xm6YD62D1Ub

work page 2022

[40] [41]

Proximal algorithms.Foundations and trends® in Optimiza- tion, 1(3):127–239, 2014

Neal Parikh, Stephen Boyd, et al. Proximal algorithms.Foundations and trends® in Optimiza- tion, 1(3):127–239, 2014

work page 2014

[41] [42]

Learning by the dendritic prediction of somatic spiking

Robert Urbanczik and Walter Senn. Learning by the dendritic prediction of somatic spiking. Neuron, 81(3):521–528, 2014

work page 2014

[42] [43]

Homeostatic synaptic plasticity: Local and global mechanisms for stabilizing neuronal function.Cold Spring Harbor Perspectives in Biology, 4(1):a005736, 2012

Gina Turrigiano. Homeostatic synaptic plasticity: Local and global mechanisms for stabilizing neuronal function.Cold Spring Harbor Perspectives in Biology, 4(1):a005736, 2012. doi: 10.1101/cshperspect.a005736

work page doi:10.1101/cshperspect.a005736 2012

[43] [44]

Brian McFee, Matt McVicar, Daniel Faronbi, Iran Roman, Matan Gover, Stefan Balke, Scott Seyfarth, Ayoub Malek, Colin Raffel, Vincent Lostanlen, Benjamin van Niekirk, Dana Lee, Frank Cwitkowitz, Frank Zalkow, Oriol Nieto, Dan Ellis, Jack Mason, Kyungyun Lee, Bea Steers, Emily Halvachs, Carl Thomé, Fabian Robert-Stöter, Rachel Bittner, Ziyao Wei, Adam Weiss...

work page doi:10.5281/zenodo.15006942 2025

[44] [45]

Efficient auditory coding.Nature, 439(7079):978–982, 2006

Evan C Smith and Michael S Lewicki. Efficient auditory coding.Nature, 439(7079):978–982, 2006

work page 2006

[45] [46]

SIAM, Philadelphia, PA, 1992

Ingrid Daubechies.Ten Lectures on Wavelets. SIAM, Philadelphia, PA, 1992. ISBN 978-0- 89871-274-2

work page 1992

[46] [47]

Self- supervised learning with an information maximization criterion

Serdar Ozsoy, Shadi Hamdan, Sercan O Arik, Deniz Yuret, and Alper T Erdogan. Self- supervised learning with an information maximization criterion. In Alice H. Oh, Alekh Agar- wal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Pro- cessing Systems, 2022. URLhttps://openreview.net/forum?id=5MgZAu2NR7X

work page 2022

[47] [48]

Bootstrap your own latent-a new approach to self-supervised learning.Ad- vances in neural information processing systems, 33:21271–21284, 2020

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Ghesh- laghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning.Ad- vances in neural information processing systems, 33:21271–21284, 2020. 13

work page 2020

[48] [49]

Barlow twins: Self- supervised learning via redundancy reduction

Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. Barlow twins: Self- supervised learning via redundancy reduction. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, Proceedings of Machine Learning Research, pages 12310–12320. PMLR, 202...

work page 2021

[49] [50]

A path towards autonomous machine intelligence version 0.9

Yann LeCun et al. A path towards autonomous machine intelligence version 0.9. 2, 2022-06- 27.Open Review, 62(1):1–62, 2022

work page 2022

[50] [51]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. A simple frame- work for contrastive learning of visual representations.CoRR, abs/2002.05709, 2020. URL https://arxiv.org/abs/2002.05709

work page internal anchor Pith review Pith/arXiv arXiv 2002

[51] [52]

Horace B. Barlow. Possible principles underlying the transformations of sensory messages. In Walter A. Rosenblith, editor,Sensory Communication, pages 217–234. MIT Press, Cambridge, MA, 1961

work page 1961

[52] [53]

b11(t)b 12(t)b 13(t) b21(t)b 22(t)b 23(t) b31(t)b 32(t)b 33(t) # ,y(t) =

Katrina Drozdov, Ravid Shwartz-Ziv, and Yann LeCun. Video representation learning with joint-embedding predictive architectures, 2024. URLhttps://arxiv.org/abs/2412. 10925. 14 Appendix A Review of Correlative Information Maximization 16 A.1 Correlative entropy and mutual information . . . . . . . . . . . . . . . . . . . . . 16 A.2 Batch CorInfoMax objecti...

work page 2024

[53] [54]

Hence Dε =D+εIis positive definite and(D ε)−1/2 is well-defined

the regularized log-determinant admits the exact decomposition log det(C+εI) = nX i=1 log(Cii +ε)− 1 2 nX i=1 λ2 i +R 2,(D.1) where the remainder is given exactly by R2 = nX i=1 log(1 +λ i)−λ i + 1 2 λ2 i .(D.2) Moreover, the second-order term can be written entrywise as nX i=1 λ2 i = Tr((Bε)2) = nX i=1 nX j=1 j̸=i C 2 ij (Cii +ε)(C jj +ε) ,(D.3) and ther...

work page

[54] [55]

for everyY∈ Y, J batch sur (Y)− J batch det (Y) ≤¯εY;(D.23)

work page

[55] [56]

representation collapse

ifY sur is a global minimizer ofJ batch sur overY, then J batch det (Ysur)≤inf Y∈Y J batch det (Y) + 2¯εY .(D.24) Proof.By Equation D.4, applied withC= ˆCy(Y), the difference between the surrogate and exact batch objectives is exactly the Taylor remainder: J batch sur (Y)− J batch det (Y) =R 2(Y). Applying Corollary D.3 withC= ˆCy(Y)yields |R2(Y)| ≤ ∥Bε(Y...

work page 2000