Barlow Twins: Self-Supervised Learning via Redundancy Reduction

Ishan Misra; Jure Zbontar; Li Jing; St\'ephane Deny; Yann LeCun

arxiv: 2103.03230 · v3 · pith:MKDVKMPKnew · submitted 2021-03-04 · 💻 cs.CV · cs.AI· cs.LG· q-bio.NC

Barlow Twins: Self-Supervised Learning via Redundancy Reduction

Jure Zbontar , Li Jing , Ishan Misra , Yann LeCun , St\'ephane Deny This is my paper

classification 💻 cs.CV cs.AIcs.LGq-bio.NC

keywords barlowtwinsclassificationmethodssamplevectorsapproachcurrent

0 comments

read the original abstract

Self-supervised learning (SSL) is rapidly closing the gap with supervised methods on large computer vision benchmarks. A successful approach to SSL is to learn embeddings which are invariant to distortions of the input sample. However, a recurring issue with this approach is the existence of trivial constant solutions. Most current methods avoid such solutions by careful implementation details. We propose an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. This causes the embedding vectors of distorted versions of a sample to be similar, while minimizing the redundancy between the components of these vectors. The method is called Barlow Twins, owing to neuroscientist H. Barlow's redundancy-reduction principle applied to a pair of identical networks. Barlow Twins does not require large batches nor asymmetry between the network twins such as a predictor network, gradient stopping, or a moving average on the weight updates. Intriguingly it benefits from very high-dimensional output vectors. Barlow Twins outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime, and is on par with current state of the art for ImageNet classification with a linear classifier head, and for transfer tasks of classification and object detection.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 15 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Emerging Properties in Self-Supervised Vision Transformers
cs.CV 2021-04 conditional novelty 8.0

Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.
UR-JEPA: Uniform Rectifiability as a Regularizer for Joint-Embedding Predictive Architectures
cs.LG 2026-05 unverdicted novelty 7.0

UR-JEPA applies uniform rectifiability regularization via a smoothed Carleson square function to JEPA training, producing embeddings with 4-5 order PCA spectral drop at dimension 20-25 and lower seed variance than Gau...
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
cs.CV 2021-05 accept novelty 7.0

VICReg prevents collapse in self-supervised image embeddings via explicit variance, invariance, and covariance regularization and matches state-of-the-art downstream performance.
Data Provenance for Image Auto-Regressive Generation
cs.CV 2026-06 unverdicted novelty 6.0

A post-hoc detection framework exploits generation-induced patterns in autoregressive image outputs to enable provenance tracing across multiple IAR models without altering the generation process.
Radial Basis Function Networks as Projection Heads in Self-Supervised Learning
cs.CV 2026-06 unverdicted novelty 6.0

RBFN projection heads serve as competitive replacements for MLP heads in SSL and enable SNS, a label-free metric from RBF parameters that correlates strongly with logistic regression evaluation.
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
cs.LG 2026-06 unverdicted novelty 6.0

Empirical benchmarks show distribution similarity between adaptation and pretraining data increases practical privacy leakage in DP-adapted LLMs at fixed theoretical guarantees, with LoRA providing strongest protectio...
Doing well with less! On Sampling Techniques for Empirical Pairwise Loss Estimation/Minimization
stat.ML 2026-06 unverdicted novelty 6.0

Sampling pairs directly with auxiliary information for higher inclusion probabilities on informative pairs yields near-full pairwise loss performance at reduced computational cost.
Velox: Learning Representations of 4D Geometry and Appearance
cs.CV 2026-05 unverdicted novelty 6.0

Velox compresses dynamic point clouds into latent tokens that support geometry via 4D surface modeling and appearance via 3D Gaussians, showing strong results on video-to-4D generation, tracking, and image-to-4D cloth...
MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis
q-bio.NC 2026-04 conditional novelty 6.0

A decorrelation-based training framework recovers complementary linguistic information from Broca's area signals, improving speech neuroprosthesis WER from 26.3% to 21.6% over prior end-to-end methods.
MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis
q-bio.NC 2026-04 unverdicted novelty 6.0

MoDAl discovers complementary neurolinguistic modalities via contrastive-decorrelation objectives, cutting brain-to-text word error rate from 26.3% to 21.6% by incorporating area 44 signals.
TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis
cs.LG 2025-06 unverdicted novelty 6.0

TESSERA learns robust label-efficient embeddings from irregular multi-modal EO time series via Barlow Twins plus global shuffling and mix-based regularizers, delivering SOTA accuracy on classification, segmentation an...
Revisiting Feature Prediction for Learning Visual Representations from Video
cs.CV 2024-02 conditional novelty 6.0

V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
Representation Without Reward: A JEPA Audit for LLM Fine-Tuning
cs.LG 2026-05 conditional novelty 5.0

An empirical audit of 22 JEPA-style training auxiliaries on Llama-3.2-1B fine-tuning for regex generation finds no statistically significant task improvement after multiple-testing correction, even when auxiliaries vi...
Non-identifiability of Explanations from Model Behavior in Deep Networks of Image Authenticity Judgments
cs.CV 2026-04 unverdicted novelty 5.0

Models predicting human authenticity judgments produce inconsistent attribution maps across architectures, showing that explanations are non-identifiable.
3D Foundation Model for Generalizable Disease Detection in Head Computed Tomography
cs.CV 2025-02 unverdicted novelty 5.0

A 3D self-supervised foundation model trained on over 360k head CT scans improves downstream disease classification on limited-label internal and external datasets versus scratch-trained and prior models.