arxiv: 2605.01971 · v1 · submitted 2026-05-03 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

ProtoFair: Fair Self-Supervised Contrastive Learning via Pseudo-Counterfactual Pairs

Marah Halawa , Olaf Hellwich

Authors on Pith no claims yet

Pith reviewed 2026-05-08 19:34 UTC · model grok-4.3

classification 💻 cs.CV

keywords protofairself-supervisedlearningrepresentationssensitiveattributecontrastiveexisting

0 comments

The pith

ProtoFair introduces a fairness-aware contrastive loss that uses unsupervised prototype clustering to create pseudo-counterfactual pairs, encouraging representations invariant to sensitive attributes while integrating with standard SSL frameworks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Self-supervised learning trains AI to understand images by comparing different views of the same picture, but the resulting models often pick up unfair patterns tied to attributes like gender, age, or race from the training data. ProtoFair tackles this by first grouping images into clusters based on their visual content without using any labels. It then identifies pairs of images that land in the same cluster but come from different sensitive groups, treating these as pseudo-counterfactuals that should have similar representations. A new loss term pulls these pairs closer in the embedding space while the original SSL loss continues to operate unchanged. This setup requires only the sensitive attribute labels, not task-specific targets, and works alongside popular methods such as SimCLR and SupCon. On face datasets like CelebA and UTKFace, the approach reportedly improves fairness metrics without hurting overall accuracy much. The core idea is that forcing content-matched but demographically different samples to share embeddings pushes the model to ignore the sensitive attribute.

Core claim

By pulling these content-matched, cross-group samples together in the embedding space, ProtoFair encourages the encoder to learn representations that are invariant to the sensitive attribute.

Load-bearing premise

That unsupervised prototype clustering reliably identifies samples sharing the same content but belonging to different sensitive groups, making the cluster assignments independent of the sensitive attribute.

Figures

Figures reproduced from arXiv: 2605.01971 by Marah Halawa, Olaf Hellwich.

**Figure 1.** Figure 1: Illustration of the key steps involved in the ProtoFair loss. (a) A shared encoder fθ produces representations that are projected by two separate heads: a contrastive head gϕ (green) for the base SSL loss and a cluster head hψ (purple) for computing cluster assignments via momentum-updated prototypes (stars). Prototypes are initialized with K-Means and tracked between re-initializations using exponential … view at source ↗

**Figure 2.** Figure 2: t-SNE visualizations colored by the sensitive attribute (Male vs. Not Male). In each subfigure, the baseline SupCon is shown on the (left) and SupCon + Fair Loss on the (right). The fairness-regularized model produces representations with substantially greater overlap between the two sensitive groups across both target tasks. prototypes via K-Means after a 10-epoch warmup and train with the combined SupCon… view at source ↗

read the original abstract

Self-supervised learning methods learn high-quality visual representations, yet recent studies show that these representations often capture demographic biases present in the training data. Existing fairness-aware methods address this by redesigning the self-supervised objective itself, limiting portability across the rapidly evolving landscape of self-supervised learning (SSL) frameworks. We propose ProtoFair, a fairness-aware contrastive loss designed to work alongside existing SSL objectives without modifying them. ProtoFair leverages unsupervised prototype clustering to identify pseudo-counterfactual pairs: samples sharing the same cluster assignment but belonging to different sensitive groups. By pulling these content-matched, cross-group samples together in the embedding space, ProtoFair encourages the encoder to learn representations that are invariant to the sensitive attribute. The method requires only sensitive attribute annotations, no target labels, and integrates seamlessly with both SimCLR and SupCon. Experiments on CelebA and UTKFace demonstrate consistent fairness improvements while maintaining competitive accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ProtoFair adds a modular fairness loss on top of existing contrastive SSL via prototype-based pseudo-counterfactual pairs, but the unsupervised clustering step risks aligning with sensitive attributes rather than content.

read the letter

The paper's main contribution is a plug-in contrastive term that clusters prototypes unsupervised, then pulls together same-cluster samples from different sensitive groups. This leaves the base SimCLR or SupCon objective unchanged and needs only sensitive-attribute labels, not target labels. That portability is the practical angle worth noting for anyone stacking fairness on top of current SSL pipelines in vision.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes ProtoFair, a plug-in fairness-aware contrastive loss for self-supervised learning frameworks. It uses unsupervised prototype clustering to form pseudo-counterfactual pairs (same cluster assignment, different sensitive groups) and adds a loss term that pulls their embeddings together, with the goal of learning representations invariant to the sensitive attribute. The approach requires only sensitive-attribute labels (no target labels), integrates with SimCLR and SupCon without modifying their objectives, and is evaluated on CelebA and UTKFace where it reportedly yields consistent fairness gains alongside competitive accuracy.

Significance. If the pseudo-counterfactual pairs are verifiably content-matched and cross-group, the method would provide a portable fairness module that avoids redesigning core SSL objectives, a practical advantage given the rapid evolution of contrastive and other self-supervised techniques. This could facilitate bias mitigation in vision tasks with demographic data while preserving the benefits of existing SSL pipelines.

major comments (2)

[§3] §3 (Method): The central claim that unsupervised prototype clustering reliably produces assignments independent of the sensitive attribute is not supported by any analysis or regularization in the manuscript. In face datasets such as CelebA and UTKFace, where demographic cues are visually prominent, the initial SSL representations commonly encode sensitive attributes; without explicit measures (e.g., adversarial decorrelation or post-clustering checks), clusters can align with sensitive groups rather than content, rendering the added loss term ineffective or counterproductive for invariance.
[§4] §4 (Experiments): The reported fairness improvements on CelebA and UTKFace are presented without ablation on the clustering component, without quantification of cluster-sensitive-attribute correlation, and without statistical significance tests or multiple random seeds for the prototype assignments. This leaves the load-bearing assumption untested and the quantitative claims only weakly grounded.

minor comments (2)

The abstract and method description would benefit from explicit notation for the prototype update schedule and the precise form of the ProtoFair loss term (e.g., temperature scaling, weighting relative to the base SSL loss).
Table or figure captions should clarify the exact fairness metrics used (e.g., demographic parity gap, equal opportunity) and the baseline SSL models against which gains are measured.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which identify key assumptions in our method and gaps in experimental validation. We address each point below and will make substantial revisions to strengthen the manuscript, including new analyses and ablations.

read point-by-point responses

Referee: [§3] §3 (Method): The central claim that unsupervised prototype clustering reliably produces assignments independent of the sensitive attribute is not supported by any analysis or regularization in the manuscript. In face datasets such as CelebA and UTKFace, where demographic cues are visually prominent, the initial SSL representations commonly encode sensitive attributes; without explicit measures (e.g., adversarial decorrelation or post-clustering checks), clusters can align with sensitive groups rather than content, rendering the added loss term ineffective or counterproductive for invariance.

Authors: We acknowledge that the manuscript provides no explicit analysis or regularization to ensure cluster assignments are independent of the sensitive attribute, and that this is a load-bearing assumption for the pseudo-counterfactual pairs. While ProtoFair does not claim the clustering step itself enforces independence (it relies on the contrastive loss to promote invariance), we agree that without verification, clusters may capture demographic cues in face data. In the revised version, we will add a post-clustering analysis quantifying the correlation between assignments and sensitive attributes (e.g., via normalized mutual information and per-cluster demographic distributions). We will also discuss this as a limitation and explore adding a lightweight decorrelation regularizer if warranted by the results. revision: yes
Referee: [§4] §4 (Experiments): The reported fairness improvements on CelebA and UTKFace are presented without ablation on the clustering component, without quantification of cluster-sensitive-attribute correlation, and without statistical significance tests or multiple random seeds for the prototype assignments. This leaves the load-bearing assumption untested and the quantitative claims only weakly grounded.

Authors: We agree that the current experiments lack these critical elements, leaving the clustering assumptions insufficiently tested. In the revision, we will incorporate: (i) ablations on the number of prototypes, clustering algorithm variants, and their impact on fairness/accuracy; (ii) explicit quantification of cluster-sensitive attribute correlation using the metrics noted above; and (iii) results over at least five random seeds for prototype initialization, with statistical significance testing (e.g., paired t-tests and standard deviations) on the fairness metrics. These changes will provide stronger empirical grounding. revision: yes

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that prototype clusters capture content independently of sensitive attributes and that pulling cross-group pairs within clusters produces invariance; no free parameters are explicitly named in the abstract, and no new physical entities are postulated.

axioms (1)

domain assumption Unsupervised prototype clustering produces groups whose assignments are independent of the sensitive attribute
This is required for the identified pairs to function as content-matched counterfactuals rather than sensitive-attribute proxies.

invented entities (1)

pseudo-counterfactual pairs no independent evidence
purpose: To supply content-matched samples from different sensitive groups for the fairness contrastive term
These pairs are constructed via clustering and have no independent falsifiable handle outside the method itself.

pith-pipeline@v0.9.0 · 5451 in / 1395 out tokens · 78723 ms · 2026-05-08T19:34:05.426504+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 11 canonical work pages

[1]

In: European Conference on Computer Vision (ECCV) (2018) 3

Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: European Conference on Computer Vision (ECCV) (2018) 3

2018
[2]

In: Advances in Neural Information Processing Systems (NeurIPS) (2020) 3

Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsuper- vised learning of visual features by contrasting cluster assignments. In: Advances in Neural Information Processing Systems (NeurIPS) (2020) 3

2020
[3]

Advances in Neural Information Processing Systems35, 27100–27113 (2022) 4

Chai,J.,Wang,X.:Self-supervisedfairrepresentationlearningwithoutdemograph- ics. Advances in Neural Information Processing Systems35, 27100–27113 (2022) 4

2022
[4]

In: Proceedings of the 37th International Conference on Machine Learning

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning. ICML’20, JMLR.org (2020) 1, 2, 6, 12

2020
[5]

CoRRabs/2301.02989(2023),https://arxiv

Chiu, C.H., Chung, H.W., Chen, Y.J., Shi, Y., Ho, T.Y.: Fair multi-exit framework for facial attribute classification. CoRRabs/2301.02989(2023),https://arxiv. org/abs/2301.029894

work page arXiv 2023
[6]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

D’Incà, M., Tzelepis, C., Patras, I., Sebe, N.: Improving fairness using vision- language driven image augmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 4695–4704 (2024) 4

2024
[7]

In: Proceedings of the 34th International Conference on Neural Information Processing Systems

Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Do- ersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, R., Valko, M.: Bootstrap your own latent a new approach to self-supervised learn- ing. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20...

2020
[8]

In: Advances in Neural Information Processing Systems

Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems. vol. 29, pp. 3315–3323 (2016) 2, 4, 10

2016
[9]

Momentum contrast for unsupervised visual representation learning

He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning (2019),http : / / arxiv . org / abs / 1911.05722, cite arxiv:1911.05722Comment: CVPR 2020 camera-ready. Code: https://github.com/facebookresearch/moco 1, 3

work page arXiv 2019
[10]

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 770–778 (2015),https://api.semanticscholar.org/CorpusID:20659469210

2016
[11]

In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778. IEEE (2016).https://doi.org/10.1109/CVPR.2016.9010, 11

work page doi:10.1109/cvpr.2016.9010 2016
[12]

Taming Transformers for High-Resolution Image Synthesis , booktitle =

Jung, S., Lee, D., Park, T., Moon, T.: Fair Feature Distillation for Visual Recognition . In: 2021 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR). pp. 12110–12119. IEEE Computer Society, Los Alami- tos, CA, USA (Jun 2021).https://doi.org/10.1109/CVPR46437.2021.01194, https://doi.ieeecomputersociety.org/10.1109/CVPR46437.2021.0119411

work page doi:10.1109/cvpr46437.2021.01194 2021
[13]

In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2021) 4

Karkkainen, K., Joo, J.: Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2021) 4

2021
[14]

In: Larochelle, H., 16 Halawa et al

Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., Krishnan, D.: Supervised contrastive learning. In: Larochelle, H., 16 Halawa et al. Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Infor- mation Processing Systems. vol. 33, pp. 18661–18673. Curran Associates, Inc. (2020),https : / / procee...

2020
[15]

Context-awarecrowdcounting, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, Computer Vision Foundation / IEEE

Kim,B.,Kim,H.,Kim,K.,Kim,S.,Kim,J.: LearningNottoLearn:TrainingDeep Neural Networks With Biased Data . In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 9004–9012. IEEE Computer Soci- ety, Los Alamitos, CA, USA (Jun 2019).https://doi.org/10.1109/CVPR.2019. 00922,https://doi.ieeecomputersociety.org/10.1109/CVPR.2019.009221...

work page doi:10.1109/cvpr.2019 2019
[16]

In: Ad- vances in Neural Information Processing Systems (NeurIPS) (2017) 2, 5

Kusner, M.J., Loftus, J., Russell, C., Silva, R.: Counterfactual fairness. In: Ad- vances in Neural Information Processing Systems (NeurIPS) (2017) 2, 5

2017
[17]

In: International Conference on Learning Representations (ICLR) (2021) 3, 8

Li, J., Zhou, P., Xiong, C., Hoi, S.: Prototypical contrastive learning of unsuper- vised representations. In: International Conference on Learning Representations (ICLR) (2021) 3, 8

2021
[18]

In: Proceedings of International Conference on Computer Vision (ICCV) (December

Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV) (December
[19]

Journal of Machine Learning Research9, 2579–2605 (2008) 13

van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of Machine Learning Research9, 2579–2605 (2008) 13

2008
[20]

Confidence in Assurance 2.0 Cases

Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI. Lecture Notes in Computer Science, vol. 9910, pp. 69–84. Springer (2016).ht...

work page doi:10.1007/978- 2016
[21]

Park, S., Hwang, S., Kim, D., Byun, H.: Learning disentangled representation for fair facial attribute classification via fairness-aware information alignment. Pro- ceedings of the AAAI Conference on Artificial Intelligence35(3), 2403–2411 (May 2021).https://doi.org/10.1609/aaai.v35i3.16341,https://ojs.aaai.org/ index.php/AAAI/article/view/163414, 11

work page doi:10.1609/aaai.v35i3.16341 2021
[22]

Parmar, R

Park, S., Lee, J., Lee, P., Hwang, S., Kim, D., Byun, H.: Fair contrastive learning for facial attribute classification. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10379–10388 (2022).https://doi. org/10.1109/CVPR52688.2022.010141, 2, 4, 5, 10, 11, 12

work page doi:10.1109/cvpr52688.2022.010141 2022
[23]

Radford, A., Narasimhan, K.: Improving language understanding by generative pre-training (2018),https://api.semanticscholar.org/CorpusID:4931324514

2018
[24]

In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)

Raff, E., Sylvester, J.: Gradient reversal against discrimination: A fair neural net- work learning approach. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). pp. 189–198. IEEE (2018).https: //doi.org/10.1109/DSAA.2018.0002911, 12

work page doi:10.1109/dsaa.2018.0002911 2018
[25]

Parmar, R

Sirotkin, K., Carballeira, P., Escudero-Viñolo, M.: A study on the distribution of social biases in self-supervised learning visual models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10442–10451 (2022).https://doi.org/10.1109/CVPR52688.2022.010191, 3

work page doi:10.1109/cvpr52688.2022.010191 2022
[26]

In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency

Steed, R., Caliskan, A.: Image representations learned with unsupervised pre- training contain human-like biases. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. p. 701–713. FAccT ’21, Association for Computing Machinery, New York, NY, USA (2021).https://doi.org/10. 1145/3442188.3445932,https://doi.org/10.1145/344...

work page doi:10.1145/3442188.34459321 2021
[27]

In: Proceedings of the 11th International Conference on Learning Representations (ICLR) (2023), https://openreview.net/forum?id=woa783QMul1, 5

Zhang, F., Kuang, K., Chen, L., Liu, Y., Wu, C., Xiao, J.: Fairness-aware con- trastive learning with partially annotated sensitive attributes. In: Proceedings of the 11th International Conference on Learning Representations (ICLR) (2023), https://openreview.net/forum?id=woa783QMul1, 5

2023
[28]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Zhang, Z., Song, Y., Qi, H.: Age progression/regression by conditional adversarial autoencoder. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4352–4360 (2017) 2, 10

2017