arxiv: 2605.11870 · v1 · submitted 2026-05-12 · 💻 cs.LG · cs.IT· math.IT

Recognition: 2 theorem links

· Lean Theorem

Information theoretic underpinning of self-supervised learning by clustering

Josef Kittler , Sara Atito , Muhammad Awais

Authors on Pith no claims yet

Pith reviewed 2026-05-13 06:14 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT

keywords self-supervised learningdeep clusteringKL divergencemode collapsebatch centeringinformation theorydistillation

0 comments

The pith

Self-supervised learning by clustering emerges from KL-divergence minimization with a teacher-distribution constraint.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper casts self-supervised learning via clustering as an optimization of Kullback-Leibler divergence, directly analogous to the objective in supervised classification. To stop the student model from collapsing to trivial solutions, an explicit constraint is placed on the teacher distribution; this forces normalization by the inverse of the cluster priors. Jensen's inequality applied to the resulting expression then recovers the batch-centering step that practitioners already use. The derivation therefore supplies a principled account for two widespread heuristics—distillation and centering—rather than treating them as ad-hoc fixes.

Core claim

By analogy to supervised learning, SSL is formulated as KL-divergence optimization. Mode collapse is prevented by imposing an optimisation constraint on the teacher distribution. This leads to normalization using inverse cluster priors. Using Jensen's inequality this normalization simplifies to the popular batch centering procedure. The theoretical model supports specific existing successful SSL methods and suggests directions for future investigations.

What carries the argument

KL-divergence minimization between student predictions and a teacher distribution whose normalization is fixed by inverse cluster priors.

If this is right

Distillation and centering shift from heuristics to consequences of the constrained KL objective.
Existing clustering-based SSL algorithms receive a common information-theoretic justification.
New SSL procedures can be obtained by varying the form of the teacher constraint while preserving the KL structure.
The same framework supplies a route to analyze why certain normalizations succeed or fail in practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The KL-plus-constraint view could be tested on contrastive or reconstruction-based SSL to see whether analogous teacher constraints emerge.
Relaxing the inverse-prior requirement might reveal whether centering remains necessary or can be replaced by other normalizers.
Information-theoretic bounds derived from the same objective could quantify how much supervision is implicitly provided by the clustering signal.

Load-bearing premise

That the required constraint on the teacher distribution takes precisely the form of inverse cluster priors, which both blocks collapse and allows Jensen's inequality to recover batch centering.

What would settle it

An explicit calculation or numerical check demonstrating that the constrained KL objective does not reduce to batch centering after applying Jensen's inequality, or an implementation in which the inverse-prior normalization fails to prevent collapse while centering still succeeds.

read the original abstract

Self-supervised learning (SSL) is recognized as an essential tool for building foundation models for Artificial Intelligence applications. The advances in SSL have been made thanks to vigorous arguments about the principles of SSL and through extensive empirical research. The aim of this paper is to contribute to the development of the underpinning theory of SSL, focusing on the deep clustering approach. By analogy to supervised learning, we formulate SSL as K-L divergence optimization. The mode collapse is prevented by imposing an optimisation constraint on the teacher distribution. This leads to normalization using inverse cluster priors. We show that using Jensen inequality this normalization simplifies to the popular batch centering procedure. Distillation and centering are common {heuristics-based} practices in SSL, {but our work underpins them theoretically.} The theoretical model developed not only supports specific existing successful SSL methods, but also suggests directions for future investigations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives batch centering in clustering SSL from a constrained KL setup, but the teacher-distribution constraint looks selected to recover the heuristic.

read the letter

The paper models self-supervised clustering as KL divergence minimization between student and teacher outputs, then adds a constraint on the teacher distribution using inverse cluster priors to block mode collapse. Applying Jensen's inequality to that normalized form yields the batch centering step used in many SSL pipelines. This gives a direct optimization story for why centering and distillation appear in practice and sketches some paths for new variants.

Referee Report

1 major / 0 minor

Summary. The manuscript formulates self-supervised learning (SSL) by clustering as minimization of the Kullback-Leibler (KL) divergence between student and teacher distributions, by analogy to supervised learning. Mode collapse is prevented via an optimization constraint on the teacher distribution that normalizes using inverse cluster priors; Jensen's inequality is then applied to show that this normalization reduces to the standard batch-centering procedure. The work claims this supplies a theoretical underpinning for distillation and centering heuristics used in existing SSL methods.

Significance. If the constraint on the teacher distribution can be shown to arise necessarily from the KL objective and collapse-prevention requirement rather than being selected to recover centering, the result would provide a principled information-theoretic justification for widely used SSL practices and could guide the design of new algorithms. The paper correctly highlights the role of normalization in avoiding collapse and connects it to an existing heuristic, but the overall significance hinges on resolving the independence of the constraint derivation.

major comments (1)

[Abstract and derivation of teacher-distribution constraint] Abstract and main derivation: The optimization constraint on the teacher distribution is introduced as normalization by inverse cluster priors without an independent derivation showing why this specific form is the minimal or natural choice that both prevents mode collapse and remains compatible with the KL objective. The subsequent application of Jensen's inequality then recovers batch centering, which raises the possibility that the constraint was chosen precisely because it produces the known result. This step is load-bearing for the central claim of providing a 'theoretical underpinning.'

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their insightful comments on our work. We address the major comment regarding the derivation of the teacher distribution constraint in detail below. We believe our response clarifies the motivation and we propose revisions to enhance the presentation.

read point-by-point responses

Referee: [Abstract and derivation of teacher-distribution constraint] Abstract and main derivation: The optimization constraint on the teacher distribution is introduced as normalization by inverse cluster priors without an independent derivation showing why this specific form is the minimal or natural choice that both prevents mode collapse and remains compatible with the KL objective. The subsequent application of Jensen's inequality then recovers batch centering, which raises the possibility that the constraint was chosen precisely because it produces the known result. This step is load-bearing for the central claim of providing a 'theoretical underpinning.'

Authors: We agree with the referee that a clearer independent motivation for the specific form of the constraint would strengthen the paper. In the revised manuscript, we will expand the derivation section to show that the constraint arises from requiring the teacher distribution to have uniform marginal probabilities to prevent mode collapse in the KL minimization. This leads naturally to normalization by the inverse of the cluster priors (estimated from the batch), as this ensures the expected value under the student is balanced. This is not chosen to recover centering but is the minimal constraint that maintains the probabilistic interpretation while avoiding trivial solutions. The subsequent use of Jensen's inequality demonstrates that this is equivalent to batch centering, thereby providing the theoretical link. We will also discuss potential alternative constraints and why this one is natural. revision: partial

Circularity Check

1 steps flagged

Teacher-distribution constraint introduced to recover batch centering via Jensen

specific steps

self definitional [Abstract]
"The mode collapse is prevented by imposing an optimisation constraint on the teacher distribution. This leads to normalization using inverse cluster priors. We show that using Jensen inequality this normalization simplifies to the popular batch centering procedure."

The constraint is defined such that its normalization form (inverse cluster priors) is the one that, under Jensen, yields batch centering. The reduction to the known heuristic therefore holds by the choice of constraint rather than as a necessary consequence of the KL formulation alone; a different anti-collapse constraint would not recover centering.

full rationale

The paper formulates SSL clustering as KL-divergence minimization between student and teacher distributions. It then states that mode collapse is prevented by imposing an optimisation constraint on the teacher distribution, which directly leads to normalization by inverse cluster priors; Jensen's inequality is applied to show this equals batch centering. The specific constraint form is not derived as the unique or minimal anti-collapse requirement from the KL objective; instead it is presented because it produces the known centering heuristic. This makes the 'theoretical underpinning' reduce to a post-hoc choice whose output matches the target procedure by construction. No independent derivation or external validation of the constraint is supplied.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim rests on the domain assumption that SSL clustering is usefully analogous to supervised KL minimization and on the choice of teacher constraint; Jensen's inequality is a standard mathematical result. No new entities are postulated. Inverse cluster priors may be data-dependent and therefore implicitly fitted.

free parameters (1)

inverse cluster priors
Normalization factor derived from cluster priors; likely estimated from batch statistics or data distribution to enforce the anti-collapse constraint.

axioms (1)

domain assumption SSL by clustering can be formulated as KL-divergence optimization by direct analogy to supervised learning
Stated explicitly as the starting point of the derivation.

pith-pipeline@v0.9.0 · 5441 in / 1257 out tokens · 42440 ms · 2026-05-13T06:14:12.092891+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

formulate SSL as K-L divergence optimization... normalisation using inverse cluster priors... Jensen inequality this normalization simplifies to the popular batch centering procedure
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

regularised K-L divergence... Q(y|z) = P(y|z)/Q(y)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

186 extracted references · 186 canonical work pages · 9 internal anchors

[1]

European conference on computer vision , pages=

Colorful image colorization , author=. European conference on computer vision , pages=. 2016 , organization=

work page 2016
[2]

Emanuele Sansone and Robin Manhaeve , title =. Trans. Mach. Learn. Res. , year =

work page
[3]

A Probabilistic Model behind Self- Supervised Learning , journal =

Alice Bizeul and Bernhard Sch. A Probabilistic Model behind Self- Supervised Learning , journal =

work page
[4]

CoRR , volume =

Mehmet Can Yavuz and Berrin Yanikoglu , title =. CoRR , volume =

work page
[5]

Bronstein , editor =

Elad Amrani and Leonid Karlinsky and Alexander M. Bronstein , editor =. Self-Supervised Classification Network , booktitle =

work page
[6]

Jin Li and Yaoming Wang and Xiaopeng Zhang and Dongsheng Jiang and Wenrui Dai and Chenglin Li and Hongkai Xiong and Qi Tian , title =

work page
[7]

Hinton , title =

Ting Chen and Simon Kornblith and Mohammad Norouzi and Geoffrey E. Hinton , title =. Proceedings of the 37th International Conference on Machine Learning,

work page
[8]

Forty-first International Conference on Machine Learning,

Zhiquan Tan and Jingqin Yang and Weiran Huang and Yang Yuan and Yifan Zhang , title =. Forty-first International Conference on Machine Learning,

work page
[9]

CoRR , volume =

Ajinkya Tejankar and Soroush Abbasi Koohpayegani and Vipin Pillai and Paolo Favaro and Hamed Pirsiavash , title =. CoRR , volume =

work page
[10]

Knowledge Distillation Meets Self-supervision , booktitle =

Guodong Xu and Ziwei Liu and Xiaoxiao Li and Chen Change Loy , editor =. Knowledge Distillation Meets Self-supervision , booktitle =

work page
[11]

Patch-level Contrastive Learning via Positional Query for Visual Pre-training , booktitle =

Shaofeng Zhang and Qiang Zhou and Zhibin Wang and Fan Wang and Junchi Yan , editor =. Patch-level Contrastive Learning via Positional Query for Visual Pre-training , booktitle =

work page
[12]

European conference on computer vision , pages=

Learning representations for automatic colorization , author=. European conference on computer vision , pages=. 2016 , organization=

work page 2016
[13]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

Colorization as a proxy task for visual understanding , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

work page
[14]

Proceedings of the IEEE international conference on computer vision , pages=

Unsupervised visual representation learning by context prediction , author=. Proceedings of the IEEE international conference on computer vision , pages=

work page
[15]

European conference on computer vision , pages=

Unsupervised learning of visual representations by solving jigsaw puzzles , author=. European conference on computer vision , pages=. 2016 , organization=

work page 2016
[16]

2018 IEEE Winter Conference on Applications of Computer Vision (WACV) , pages=

Learning image representations by completing damaged jigsaw puzzles , author=. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) , pages=. 2018 , organization=

work page 2018
[17]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

Split-brain autoencoders: Unsupervised learning by cross-channel prediction , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

work page
[18]

International Conference on Machine Learning , pages=

Unsupervised learning by predicting noise , author=. International Conference on Machine Learning , pages=. 2017 , organization=

work page 2017
[19]

Unsupervised representation learning by predicting image rotations

Unsupervised representation learning by predicting image rotations , author=. arXiv preprint arXiv:1803.07728 , year=

work page arXiv
[20]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

Self-supervised feature learning by learning to spot artifacts , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

work page
[21]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

work page
[22]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

work page
[23]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

work page 2016
[24]

DINOv3

DINOv3 , author=. arXiv preprint arXiv:2508.10104 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Proceedings of the European conference on computer vision (ECCV) , pages=

Deep clustering for unsupervised learning of visual features , author=. Proceedings of the European conference on computer vision (ECCV) , pages=

work page
[26]

Nature Biomedical Engineering , volume=

Self-supervised learning in medicine and healthcare , author=. Nature Biomedical Engineering , volume=. 2022 , publisher=

work page 2022
[27]

International journal of computer vision , volume=

The pascal visual object classes challenge: A retrospective , author=. International journal of computer vision , volume=. 2015 , publisher=

work page 2015
[28]

International journal of computer vision , volume=

Visual genome: Connecting language and vision using crowdsourced dense image annotations , author=. International journal of computer vision , volume=. 2017 , publisher=

work page 2017
[29]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Exploring simple siamese representation learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[30]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

An empirical study of training self-supervised vision transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page
[31]

European conference on computer vision , pages=

Microsoft coco: Common objects in context , author=. European conference on computer vision , pages=. 2014 , organization=

work page 2014
[32]

IEEE Geoscience and Remote Sensing Magazine , volume=

Self-supervised learning in remote sensing: A review , author=. IEEE Geoscience and Remote Sensing Magazine , volume=. 2022 , publisher=

work page 2022
[33]

Neurocomputing , volume=

Underwater self-supervised depth estimation , author=. Neurocomputing , volume=. 2022 , publisher=

work page 2022
[34]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Self-supervised learning of object parts for semantic segmentation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[35]

Pattern Recognition , volume=

Self-supervised learning for RGB-D object tracking , author=. Pattern Recognition , volume=. 2024 , publisher=

work page 2024
[36]

Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

Self-supervised learning of domain invariant features for depth estimation , author=. Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

work page
[37]

Atito, Sara and Awais, Muhammad and Kittler, Josef , journal=

work page
[38]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Masked autoencoders are scalable vision learners , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[39]

2022 , organization=

Chen, Yabo and Liu, Yuchen and Jiang, Dongsheng and Zhang, Xiaopeng and Dai, Wenrui and Xiong, Hongkai and Tian, Qi , booktitle=. 2022 , organization=

work page 2022
[40]

Xie, Zhenda and Zhang, Zheng and Cao, Yue and Lin, Yutong and Bao, Jianmin and Yao, Zhuliang and Dai, Qi and Hu, Han , booktitle=

work page
[41]

, pages=

A path towards autonomous machine intelligence , author=. , pages=

work page
[42]

Cognitive psychology , volume=

Forest before trees: The precedence of global features in visual perception , author=. Cognitive psychology , volume=. 1977 , publisher=

work page 1977
[43]

Progress in brain research , volume=

Building the gist of a scene: The role of global image features in recognition , author=. Progress in brain research , volume=. 2006 , publisher=

work page 2006
[44]

Trends in cognitive sciences , volume=

Making sense of real-world scenes , author=. Trends in cognitive sciences , volume=. 2016 , publisher=

work page 2016
[45]

Neuron , volume=

How does the brain solve visual object recognition? , author=. Neuron , volume=. 2012 , publisher=

work page 2012
[46]

1997 , publisher=

Information theory and statistics , author=. 1997 , publisher=

work page 1997
[47]

Annual review of neuroscience , volume=

Neural mechanisms of selective visual attention , author=. Annual review of neuroscience , volume=

work page
[48]

Nature , volume=

A map of object space in primate inferotemporal cortex , author=. Nature , volume=. 2020 , publisher=

work page 2020
[49]

1982 , publisher=

Analysis of visual behavior , author=. 1982 , publisher=

work page 1982
[50]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Momentum contrast for unsupervised visual representation learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[51]

arXiv preprint arXiv:2210.07277 , year=

The hidden uniform cluster prior in self-supervised learning , author=. arXiv preprint arXiv:2210.07277 , year=

work page arXiv
[52]

arXiv preprint arXiv:1901.07017 , year=

Spatial broadcast decoder: A simple architecture for learning disentangled representations in vaes , author=. arXiv preprint arXiv:1901.07017 , year=

work page arXiv 1901
[53]

Advances in neural information processing systems , volume=

Bootstrap your own latent-a new approach to self-supervised learning , author=. Advances in neural information processing systems , volume=

work page
[54]

Transactions on Machine Learning Research , issn=

Maxime Oquab and Timoth. Transactions on Machine Learning Research , issn=. 2024 , url=

work page 2024
[55]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Emerging properties in self-supervised vision transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page
[56]

International Conference on Learning Representations (ICLR) , year=

iBOT: Image BERT Pre-Training with Online Tokenizer , author=. International Conference on Learning Representations (ICLR) , year=

work page
[57]

Scientific reports , volume=

Navon’s classical paradigm concerning local and global processing relates systematically to visual object classification performance , author=. Scientific reports , volume=. 2018 , publisher=

work page 2018
[58]

Current Biology , volume=

Shape representation in the inferior temporal cortex of monkeys , author=. Current Biology , volume=. 1995 , publisher=

work page 1995
[59]

European conference on computer vision , pages=

Masked siamese networks for label-efficient learning , author=. European conference on computer vision , pages=. 2022 , organization=

work page 2022
[60]

Representation Learning with Contrastive Predictive Coding

Representation learning with contrastive predictive coding , author=. arXiv preprint arXiv:1807.03748 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[61]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Dense contrastive learning for self-supervised visual pre-training , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[62]

International conference on machine learning , pages=

Barlow twins: Self-supervised learning via redundancy reduction , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021
[63]

2024 IEEE International Conference on Image Processing (ICIP) , pages=

Masked Momentum Contrastive Learning for Semantic Understanding by Observation , author=. 2024 IEEE International Conference on Image Processing (ICIP) , pages=. 2024 , organization=

work page 2024
[64]

Efﬁcient self-supervised vision transformers for representation learning

Efficient self-supervised vision transformers for representation learning , author=. arXiv preprint arXiv:2106.09785 , year=

work page arXiv
[65]

Advances in Neural Information Processing Systems , volume=

Vicregl: Self-supervised learning of local visual features , author=. Advances in Neural Information Processing Systems , volume=

work page
[66]

arXiv preprint arXiv:2204.10926 , year=

Segdiscover: Visual concept discovery via unsupervised semantic segmentation , author=. arXiv preprint arXiv:2204.10926 , year=

work page arXiv
[67]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Picie: Unsupervised semantic segmentation using invariance and equivariance in clustering , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[68]

arXiv preprint arXiv:2203.08414 , year=

Unsupervised semantic segmentation by distilling feature correspondences , author=. arXiv preprint arXiv:2203.08414 , year=

work page arXiv
[69]

Advances in neural information processing systems , volume=

Self-supervised visual representation learning with semantic grouping , author=. Advances in neural information processing systems , volume=

work page
[70]

openreview.net , year=

An Image is Worth K Slots: Data-efficient Scaling of Self-supervised Visual Pre-training , author=. openreview.net , year=

work page
[71]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Croc: Cross-view online clustering for dense visual representation learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[72]

Tim Lebailly and Thomas Stegm. Cr. The Twelfth International Conference on Learning Representations , year=

work page
[73]

Advances in Neural Information Processing Systems , volume=

Unsupervised object-level representation learning from scene images , author=. Advances in Neural Information Processing Systems , volume=

work page
[74]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Efficient visual pretraining with contrastive detection , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[75]

2023 IEEE International Conference on Image Processing (ICIP) , pages=

GMML is all you need , author=. 2023 IEEE International Conference on Image Processing (ICIP) , pages=. 2023 , organization=

work page 2023
[76]

The Thirteenth International Conference on Learning Representations , year=

Object-Centric Pretraining via Target Encoder Bootstrapping , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[77]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Freesolo: Learning to segment objects without annotations , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[78]

Advances in Neural Information Processing Systems , volume=

Simple unsupervised object-centric learning for complex and naturalistic videos , author=. Advances in Neural Information Processing Systems , volume=

work page
[79]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Unsupervised feature learning via non-parametric instance discrimination , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[80]

Advances in neural information processing systems , volume=

Object-centric learning with slot attention , author=. Advances in neural information processing systems , volume=

work page

Showing first 80 references.