Barlow Twins: Self-Supervised Learning via Redundancy Reduction
read the original abstract
Self-supervised learning (SSL) is rapidly closing the gap with supervised methods on large computer vision benchmarks. A successful approach to SSL is to learn embeddings which are invariant to distortions of the input sample. However, a recurring issue with this approach is the existence of trivial constant solutions. Most current methods avoid such solutions by careful implementation details. We propose an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. This causes the embedding vectors of distorted versions of a sample to be similar, while minimizing the redundancy between the components of these vectors. The method is called Barlow Twins, owing to neuroscientist H. Barlow's redundancy-reduction principle applied to a pair of identical networks. Barlow Twins does not require large batches nor asymmetry between the network twins such as a predictor network, gradient stopping, or a moving average on the weight updates. Intriguingly it benefits from very high-dimensional output vectors. Barlow Twins outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime, and is on par with current state of the art for ImageNet classification with a linear classifier head, and for transfer tasks of classification and object detection.
This paper has not been read by Pith yet.
Forward citations
Cited by 15 Pith papers
-
Emerging Properties in Self-Supervised Vision Transformers
Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.
-
UR-JEPA: Uniform Rectifiability as a Regularizer for Joint-Embedding Predictive Architectures
UR-JEPA applies uniform rectifiability regularization via a smoothed Carleson square function to JEPA training, producing embeddings with 4-5 order PCA spectral drop at dimension 20-25 and lower seed variance than Gau...
-
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
VICReg prevents collapse in self-supervised image embeddings via explicit variance, invariance, and covariance regularization and matches state-of-the-art downstream performance.
-
Data Provenance for Image Auto-Regressive Generation
A post-hoc detection framework exploits generation-induced patterns in autoregressive image outputs to enable provenance tracing across multiple IAR models without altering the generation process.
-
Radial Basis Function Networks as Projection Heads in Self-Supervised Learning
RBFN projection heads serve as competitive replacements for MLP heads in SSL and enable SNS, a label-free metric from RBF parameters that correlates strongly with logistic regression evaluation.
-
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Empirical benchmarks show distribution similarity between adaptation and pretraining data increases practical privacy leakage in DP-adapted LLMs at fixed theoretical guarantees, with LoRA providing strongest protectio...
-
Doing well with less! On Sampling Techniques for Empirical Pairwise Loss Estimation/Minimization
Sampling pairs directly with auxiliary information for higher inclusion probabilities on informative pairs yields near-full pairwise loss performance at reduced computational cost.
-
Velox: Learning Representations of 4D Geometry and Appearance
Velox compresses dynamic point clouds into latent tokens that support geometry via 4D surface modeling and appearance via 3D Gaussians, showing strong results on video-to-4D generation, tracking, and image-to-4D cloth...
-
MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis
A decorrelation-based training framework recovers complementary linguistic information from Broca's area signals, improving speech neuroprosthesis WER from 26.3% to 21.6% over prior end-to-end methods.
-
MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis
MoDAl discovers complementary neurolinguistic modalities via contrastive-decorrelation objectives, cutting brain-to-text word error rate from 26.3% to 21.6% by incorporating area 44 signals.
-
TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis
TESSERA learns robust label-efficient embeddings from irregular multi-modal EO time series via Barlow Twins plus global shuffling and mix-based regularizers, delivering SOTA accuracy on classification, segmentation an...
-
Revisiting Feature Prediction for Learning Visual Representations from Video
V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
-
Representation Without Reward: A JEPA Audit for LLM Fine-Tuning
An empirical audit of 22 JEPA-style training auxiliaries on Llama-3.2-1B fine-tuning for regex generation finds no statistically significant task improvement after multiple-testing correction, even when auxiliaries vi...
-
Non-identifiability of Explanations from Model Behavior in Deep Networks of Image Authenticity Judgments
Models predicting human authenticity judgments produce inconsistent attribution maps across architectures, showing that explanations are non-identifiable.
-
3D Foundation Model for Generalizable Disease Detection in Head Computed Tomography
A 3D self-supervised foundation model trained on over 360k head CT scans improves downstream disease classification on limited-label internal and external datasets versus scratch-trained and prior models.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.