arxiv: 2605.04904 · v1 · submitted 2026-05-06 · 💻 cs.CV

Recognition: unknown

Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification

Jens van Bijsterveld , Daniele Avitabile , Fons J. Verbeek , Rita Pucci

Authors on Pith no claims yet

Pith reviewed 2026-05-08 16:21 UTC · model grok-4.3

classification 💻 cs.CV

keywords animal individual identificationskin pattern recognitionimage inpaintingdeep learning embeddingszebrafishembedding clusteringGradCAM

0 comments

The pith

Training inpainting of task-specific masks as an auxiliary task makes deep learning encoders focus on animal skin patterns for individual identification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether adding image inpainting of task-specific masks during training helps machine learning models extract embeddings that capture individual skin patterns in animals instead of background details or body shape. This matters for biodiversity monitoring because reliable non-invasive identification supports tracking population changes and social interactions. The authors test the approach on zebrafish by comparing four encoder backbones and measuring results through classification accuracy, embedding clustering metrics, and GradCAM attention maps.

Core claim

Image inpainting of task-specific masks serves as an auxiliary task that trains the encoder to produce visual embeddings more responsive to skin pattern structure, which in turn improves clustering and classification performance for individual zebrafish identification compared to standard training.

What carries the argument

Image inpainting of task-specific masks used as an auxiliary task alongside the individual identification objective to steer the encoder backbone.

If this is right

Classification accuracy for identifying individual zebrafish will increase when the inpainting task is included.
Embeddings will form tighter clusters corresponding to individual identities based on pattern features.
GradCAM visualizations will show greater attention to skin patterns and less to non-specific regions.
One or more of the four tested encoder backbones will demonstrate measurable gains in all evaluation metrics from the auxiliary task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same auxiliary inpainting strategy could extend to other species with stable, individual-specific coat or skin markings.
Models trained this way may maintain performance even when body shape changes with growth or injury.
Pairing inpainting with additional self-supervised tasks might further isolate pattern information without extra labels.

Load-bearing premise

Inpainting task-specific masks will shift model attention to skin pattern structure rather than background details or body shape.

What would settle it

If models trained with the inpainting auxiliary task show no gain in embedding clustering metrics or if GradCAM maps continue to highlight background and body shape instead of skin patterns, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.04904 by Daniele Avitabile, Fons J. Verbeek, Jens van Bijsterveld, Rita Pucci.

**Figure 1.** Figure 1: Graphical overview of the proposed experiments. Architectures are pre-trained on the inpainting task, and view at source ↗

**Figure 2.** Figure 2: Hand-drawn masks for the background, fish, and pattern areas as used for the ablation study. The background view at source ↗

**Figure 3.** Figure 3: Example of an image-mask pair in the classification dataset. This pair is annotated with the label view at source ↗

**Figure 4.** Figure 4: Ground truth, mask, and input image for one of the images used for the inpainting task view at source ↗

**Figure 5.** Figure 5: Output of inpainting using all four selected inpainting models view at source ↗

**Figure 6.** Figure 6: Differences in number of embedding features and encoder parameters between AOT-GAN, DeepFillV2, view at source ↗

**Figure 7.** Figure 7: Classification quality metrics using the inpainting pre-trained encoder and a classification head. Shallow view at source ↗

**Figure 8.** Figure 8: GradCAM-overlaid zebrafish image results after 15 epochs of shallow backpropagation classification fine view at source ↗

**Figure 9.** Figure 9: GradCAM-overlaid zebrafish image results during deep backpropagation classification fine-tuning for view at source ↗

**Figure 10.** Figure 10: UMAP visualizations of the embedding space before (fig. 10a) and after (fig. 10b) classification fine-tuning. view at source ↗

read the original abstract

In this paper, we explore deep learning techniques for individual identification of animals based on their skin patterns. Individual identification is crucial in biodiversity monitoring, since it enables analysis of decline or growth of populations, or intra-species interactions within populations. Models trained for the task of individual identification often do not focus on the skin pattern of animals, but on background details or body shape details. These characteristics are not individually specific, or can change drastically through time. We focus on techniques that will make machine learning models more responsive to skin pattern structure when extracting individual visual embeddings from images. For this, we explore image inpainting of task-specific masks as an auxiliary task to enhance ML-based individual identification from animal skin patterns. We propose a comparative analysis among four models as an encoder backbone for the individual identification task. We focus on the case study of zebrafish, which is a widely recognized biological model organism, and which exhibits individually identifying skin patterns. To evaluate encoder backbone performance, we present standard metrics for classification accuracy, embedding clustering metrics, and GradCAM visualizations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper explores image inpainting of task-specific masks as an auxiliary task to improve deep learning embeddings for individual animal identification from skin patterns, using zebrafish as the case study. It compares four encoder backbones for the identification task and evaluates performance via classification accuracy, embedding clustering metrics, and GradCAM visualizations to assess whether models prioritize skin pattern structure over background or body shape.

Significance. If the empirical results hold under proper controls, the work could offer a practical multi-task strategy for making visual embeddings more robust to non-specific cues in pattern-based re-identification, which is relevant for biodiversity monitoring applications. The comparative backbone analysis and GradCAM interpretability are positive elements, but the absence of targeted ablations means the claimed mechanism remains unisolated from generic regularization effects.

major comments (2)

[Experiments and Results] The central claim that task-specific inpainting masks cause the encoder to prioritize individually unique skin patterns (rather than background or shape) is load-bearing yet unsupported by ablation experiments. No comparison is presented between task-specific masks, random masks, or a pure classification baseline, so any reported gains in clustering metrics or accuracy could arise from multi-task regularization alone. (Experiments and Results sections)
[Abstract and Results] Quantitative results for the claimed improvements are not summarized with specific numbers, standard deviations, or statistical significance tests in the abstract or early sections. Without these, it is impossible to assess whether the inpainting auxiliary task produces practically meaningful gains in classification accuracy or clustering quality (e.g., NMI, ARI, silhouette score) over the four backbones.

minor comments (2)

[Method] The description of how task-specific masks are generated and applied during training should include a concrete example or pseudocode for reproducibility.
[Figures] Figure captions for GradCAM visualizations should explicitly state which backbone and training condition (with/without inpainting) each panel corresponds to.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the experimental design and result presentation, which we will address in the revision.

read point-by-point responses

Referee: [Experiments and Results] The central claim that task-specific inpainting masks cause the encoder to prioritize individually unique skin patterns (rather than background or shape) is load-bearing yet unsupported by ablation experiments. No comparison is presented between task-specific masks, random masks, or a pure classification baseline, so any reported gains in clustering metrics or accuracy could arise from multi-task regularization alone. (Experiments and Results sections)

Authors: We agree that the current experiments do not fully isolate the contribution of task-specific inpainting masks from generic multi-task regularization effects. Our manuscript compares four encoder backbones trained with the inpainting auxiliary task and uses GradCAM to demonstrate attention to skin patterns rather than background or shape. However, we did not include ablations with random masks or a pure classification (no-auxiliary-task) baseline. We will add these targeted ablation experiments to the revised manuscript to better substantiate the mechanism. revision: yes
Referee: [Abstract and Results] Quantitative results for the claimed improvements are not summarized with specific numbers, standard deviations, or statistical significance tests in the abstract or early sections. Without these, it is impossible to assess whether the inpainting auxiliary task produces practically meaningful gains in classification accuracy or clustering quality (e.g., NMI, ARI, silhouette score) over the four backbones.

Authors: We agree that the abstract and early sections would be clearer with explicit quantitative summaries. The results section reports classification accuracy, NMI, ARI, and silhouette scores for the four backbones, but these are not highlighted with specific values, standard deviations, or significance tests in the abstract. We will revise the abstract and introduction to include key numerical results with variability measures and note any statistical comparisons performed. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison with no derivations or self-referential reductions

full rationale

The paper conducts an empirical study comparing four encoder backbones for zebrafish individual identification, using inpainting of task-specific masks as an auxiliary task. Evaluation relies on standard classification accuracy, embedding clustering metrics (e.g., silhouette scores), and GradCAM visualizations. No equations, derivations, fitted parameters presented as predictions, or uniqueness theorems appear in the provided text. The central claim is supported by experimental results rather than reducing to self-definition or self-citation chains. This is a self-contained experimental exploration against external benchmarks (zebrafish datasets and standard ML metrics), warranting a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only abstract available, so ledger reflects implied assumptions rather than explicit content. No free parameters, invented entities, or non-standard axioms are stated.

axioms (2)

domain assumption Deep learning encoders can extract individually discriminative embeddings from animal images when trained with auxiliary tasks.
Core premise of the proposed method, standard in computer vision but unverified here.
ad hoc to paper Task-specific inpainting masks will steer embeddings toward skin patterns rather than background or shape.
Central hypothesis of the work, presented without supporting derivation or prior evidence in the abstract.

pith-pipeline@v0.9.0 · 5492 in / 1199 out tokens · 48328 ms · 2026-05-08T16:21:10.499163+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

78 extracted references · 16 canonical work pages · 5 internal anchors

[1]

Brondízio, Hien T

Sandra Díaz, Josef Settele, Eduardo S. Brondízio, Hien T. Ngo, John Agard, Almut Arneth, Patricia Balvanera, Kate A. Brauman, Stuart H. M. Butchart, Kai M. A. Chan, Lucas A. Garibaldi, Kazuhito Ichii, Jianguo Liu, Suneetha M. Subramanian, Guy F. Midgley, Patricia Miloslavich, Zsolt Molnár, David Obura, Alexander Pfaff, Stephen Polasky, Andy Purvis, Jona R...

2019
[2]

Aichi Biodiversity Targets, September 2020

Biosafety Unit. Aichi Biodiversity Targets, September 2020

2020
[3]

THE 17 GOALS | Sustainable Development
[4]

Helena Freitas and António C. Gouveia. Biodiversity futures: digital approaches to knowledge and conservation of biological diversity.Web Ecology, 25(1):29–37, February 2025

2025
[5]

Pelagic Publishing Ltd, June 2016

Francesco Rovero and Fridolin Zimmermann.Camera Trapping for Wildlife Research. Pelagic Publishing Ltd, June 2016. Google-Books-ID: UaJyDAAAQBAJ

2016
[6]

Wich and Lian Pin Koh.Conservation Drones: Mapping and Monitoring Biodiversity

Serge A. Wich and Lian Pin Koh.Conservation Drones: Mapping and Monitoring Biodiversity. Oxford University Press, 2018. Google-Books-ID: 5P5cDwAAQBAJ

2018
[7]

Terrestrial Passive Acoustic Monitoring: Review and Perspectives.BioScience, 69(1):15–25, January 2019

Larissa Sayuri Moreira Sugai, Thiago Sanna Freire Silva, José Wagner Ribeiro, Jr, and Diego Llusia. Terrestrial Passive Acoustic Monitoring: Review and Perspectives.BioScience, 69(1):15–25, January 2019

2019
[8]

Bik, Elvira Mächler, Mathew Seymour, Anaïs Lacoursière-Roussel, Florian Altermatt, Simon Creer, Iliana Bista, David M

Kristy Deiner, Holly M. Bik, Elvira Mächler, Mathew Seymour, Anaïs Lacoursière-Roussel, Florian Altermatt, Simon Creer, Iliana Bista, David M. Lodge, Natasha de Vere, Michael E. Pfrender, and Louis Bernatchez. Environmental DNA metabarcoding: Transforming how we survey animal and plant communities.Molecular Ecology, 26(21):5872–5895, 2017. _eprint: https:...

work page doi:10.1111/mec.14350 2017
[9]

Mark Chandler, Linda See, Kyle Copas, Astrid M. Z. Bonde, Bernat Claramunt López, Finn Danielsen, Jan Kristof- fer Legind, Siro Masinde, Abraham J. Miller-Rushing, Greg Newman, Alyssa Rosemartin, and Eren Turak. Con- tribution of citizen science towards international biodiversity monitoring.Biological Conservation, 213:280–294, September 2017

2017
[10]

Christian Molls. The Obs-Services and their potentials for biodiversity data assessments with a test of the current reliability of photo-identification of Coleoptera in the field.Tijdschrift voor Entomologie, 164(1-3):143–153, December 2021

2021
[11]

YOLOv11: An Overview of the Key Architectural Enhancements

Rahima Khanam and Muhammad Hussain. YOLOv11: An Overview of the Key Architectural Enhancements, October 2024. arXiv:2410.17725 [cs]

work page internal anchor Pith review arXiv 2024
[12]

Object Tracking Using Computer Vision: A Review.Computers, 13(6), May 2024

Pushkar Kadam, Gu Fang, and Ju Jia Zou. Object Tracking Using Computer Vision: A Review.Computers, 13(6), May 2024

2024
[13]

Chen Li, Marcel Polling, Lu Cao, Barbara Gravendeel, and Fons J. Verbeek. Analysis of automatic image classification methods for Urticaceae pollen classification.Neurocomputing, 522:181–193, February 2023

2023
[14]

Alexander Gomez Villa, Augusto Salazar, and Francisco Vargas. Towards automatic wild animal monitoring: Identification of animal species in camera-trap images using very deep convolutional neural networks.Ecological Informatics, 41:24–32, September 2017

2017
[15]

Animal species classification using deep neural networks with noise labels.Ecological Informatics, 57:101063, May 2020

Ahmed Ahmed, Hayder Yousif, Roland Kays, and Zhihai He. Animal species classification using deep neural networks with noise labels.Ecological Informatics, 57:101063, May 2020

2020
[16]

Plant Species Identification Using Computer Vision Techniques: A Systematic Literature Review.Archives of Computational Methods in Engineering, 25(2):507–543, April 2018

Jana Wäldchen and Patrick Mäder. Plant Species Identification Using Computer Vision Techniques: A Systematic Literature Review.Archives of Computational Methods in Engineering, 25(2):507–543, April 2018

2018
[17]

Machine learning for image based species identification.Methods in Ecology and Evolution, 9(11):2216–2225, 2018

Jana Wäldchen and Patrick Mäder. Machine learning for image based species identification.Methods in Ecology and Evolution, 9(11):2216–2225, 2018. _eprint: https://besjournals.onlinelibrary.wiley.com/doi/pdf/10.1111/2041- 210X.13075

work page doi:10.1111/2041- 2018
[18]

Taylor, and Stefan C

Stefan Schneider, Graham W. Taylor, and Stefan C. Kremer. Similarity learning networks for animal individual re-identification: an ecological perspective.Mammalian Biology, 102(3):899–914, June 2022

2022
[19]

Krebs.Ecological Methodology

Charles J. Krebs.Ecological Methodology. Harper & Row, New York, 1989. Technical report

1989
[20]

Kühl and Tilo Burghardt

Hjalmar S. Kühl and Tilo Burghardt. Animal biometrics: quantifying and detecting phenotypic appearance.Trends in Ecology & Evolution, 28(7):432–441, July 2013

2013
[21]

Allen and James P

William L. Allen and James P. Higham. Assessing the potential information content of multicomponent visual sig- nals: a machine learning approach.Proceedings of the Royal Society B: Biological Sciences, 282(1802):20142284, March 2015. 13 Inpainting Model Embeddings for Pattern-based Individual IdentificationA PREPRINT

2015
[22]

Identification and Recognition of Animals from Biometric Markers Using Computer Vision Approaches: A Review.Kafkas Universitesi Veteriner Fakultesi Dergisi, 29(6):581–593, 2023

Pinar Cihan, Ahmet Saygili, Nihat Eren Ozmen, and Muhammed Akyuzlu. Identification and Recognition of Animals from Biometric Markers Using Computer Vision Approaches: A Review.Kafkas Universitesi Veteriner Fakultesi Dergisi, 29(6):581–593, 2023

2023
[23]

R. T. Liu, S. S. Liaw, and P. K. Maini. Two-stage Turing model for generating pigment patterns on the leopard and the jaguar.Physical Review E, 74(1):011914, July 2006

2006
[24]

S. S. Liaw, C. C. Yang, R. T. Liu, and J. T. Hong. Turing model for the patterns of lady beetles.Physical Review E, 64(4):041909, September 2001

2001
[25]

Bullara and Y

D. Bullara and Y . De Decker. Pigment cell movement is not required for generation of Turing patterns in zebrafish skin.Nature Communications, 6(1):6971, May 2015

2015
[26]

Reeves, Cyrill B

Gregory T. Reeves, Cyrill B. Muratov, Trudi Schüpbach, and Stanislav Y . Shvartsman. Quantitative Models of Developmental Pattern Formation.Developmental Cell, 11(3):289–300, September 2006

2006
[27]

Perspectives on Individual Animal Identification from Biology and Computer Vision.Integrative and Comparative Biology, 61(3):900–916, September 2021

Maxime Vidal, Nathan Wolf, Beth Rosenberg, Bradley P Harris, and Alexander Mathis. Perspectives on Individual Animal Identification from Biology and Computer Vision.Integrative and Comparative Biology, 61(3):900–916, September 2021

2021
[28]

Samba Kumar, Arjun M

Lex Hiby, Phil Lovell, Narendra Patil, N. Samba Kumar, Arjun M. Gopalaswamy, and K. Ullas Karanth. A tiger cannot change its stripes: using a three-dimensional model to match images of living tigers and tiger skins. Biology Letters, 5(3):383–386, March 2009

2009
[29]

Computer vision based individual fish identification using skin dot pattern.Scientific Reports, 11(1):16904, August 2021

Petr Cisar, Dinara Bekkozhayeva, Oleksandr Movchan, Mohammadmehdi Saberioon, and Rudolf Schraml. Computer vision based individual fish identification using skin dot pattern.Scientific Reports, 11(1):16904, August 2021

2021
[30]

Moeslund

Malte Pedersen, Marianne Nyegaard, and Thomas B. Moeslund. Finding Nemo’s Giant Cousin: Keypoint Matching for Robust Re-Identification of Giant Sunfish.Journal of Marine Science and Engineering, 11(5), April 2023

2023
[31]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition, April 2015. arXiv:1409.1556 [cs]

work page internal anchor Pith review arXiv 2015
[32]

Ekaterina Nepovinnykh, Ilia Chelak, Tuomas Eerola, Veikka Immonen, Heikki Kälviäinen, Maksim Kholi- avchenko, and Charles V . Stewart. Species-Agnostic Patterned Animal Re-identification by Aggregating Deep Local Features.International Journal of Computer Vision, 132(9):4003–4018, September 2024

2024
[33]

Zebrafish identification with deep CNN and ViT architectures using a rolling training window.Scientific Reports, 15(1):8580, March 2025

Jason Puchalla, Aaron Serianni, and Bo Deng. Zebrafish identification with deep CNN and ViT architectures using a rolling training window.Scientific Reports, 15(1):8580, March 2025

2025
[34]

Studies of Turing pattern formation in zebrafish skin

Shigeru Kondo, Masakatsu Watanabe, and Seita Miyazawa. Studies of Turing pattern formation in zebrafish skin. Philosophical Transactions of the Royal Society A, 379(2213), December 2021

2021
[35]

McElligott and Donald M

Melissa B. McElligott and Donald M. O’Malley. Prey Tracking by Larval Zebrafish: Axial Kinematics and Visual Control.Brain Behavior and Evolution, 66(3):177–196, September 2005

2005
[36]

Regression Based Multi-View Zebrafish Tracking

Mathias Gudiksen. Regression Based Multi-View Zebrafish Tracking. 2021

2021
[37]

Tracking Multiple Zebrafish Larvae Using YOLOv5 and DeepSORT

Guoning Si, Fuhuan Zhou, Zhuo Zhang, and Xuping Zhang. Tracking Multiple Zebrafish Larvae Using YOLOv5 and DeepSORT. In2022 8th International Conference on Automation, Robotics and Applications (ICARA), pages 228–232, February 2022. ISSN: 2767-7745

2022
[38]

Roth, and Horst Bischof

Martin Köstinger, Martin Hirzer, Paul Wohlhart, Peter M. Roth, and Horst Bischof. Large scale metric learning from equivalence constraints. In2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2288–2295, June 2012. ISSN: 1063-6919

2012
[39]

Large Scale Similarity Learning Using Similar Pairs for Person Verification.Proceedings of the AAAI Conference on Artificial Intelligence, 30(1), March 2016

Yang Yang, Shengcai Liao, Zhen Lei, and Stan Li. Large Scale Similarity Learning Using Similar Pairs for Person Verification.Proceedings of the AAAI Conference on Artificial Intelligence, 30(1), March 2016

2016
[40]

Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. Person re-identification by Local Maximal Occurrence representation and metric learning. In2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2197–2206, Boston, MA, USA, June 2015. IEEE

2015
[41]

Learning a Discriminative Null Space for Person Re-Identification

Li Zhang, Tao Xiang, and Shaogang Gong. Learning a Discriminative Null Space for Person Re-Identification. pages 1239–1248, 2016

2016
[42]

Moeslund

Joakim Bruslund Haurum, Anastasija Karpova, Malte Pedersen, Stefan Hein Bengtson, and Thomas B. Moeslund. Re-Identification of Zebrafish using Metric Learning. In2020 IEEE Winter Applications of Computer Vision Workshops (WACVW), pages 1–11, Snowmass Village, CO, USA, March 2020. IEEE. 14 Inpainting Model Embeddings for Pattern-based Individual Identifica...

2020
[43]

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the Inception Architecture for Computer Vision, December 2015. arXiv:1512.00567 [cs]

work page Pith review arXiv 2015
[44]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, June 2021. arXiv:2010.11929 [cs]

work page internal anchor Pith review arXiv 2021
[45]

Longitudinal Identification of Zebrafish Individuals by Deep Learning

Danying Cao, Cheng Guo, Yingyin Cheng, Wanting Zhang, and Mijuan Shi. Longitudinal Identification of Zebrafish Individuals by Deep Learning
[46]

Generative Adversarial Nets.NIPS proceedings, 2014

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Nets.NIPS proceedings, 2014

2014
[47]

Auto-Encoding Variational Bayes

Diederik P. Kingma and Max Welling. Auto-Encoding Variational Bayes, December 2013. arXiv:1312.6114 [stat]

work page internal anchor Pith review arXiv 2013
[48]

Masked Autoencoders Are Scalable Vision Learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollar, and Ross Girshick. Masked Autoencoders Are Scalable Vision Learners. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15979–15988, New Orleans, LA, USA, June 2022. IEEE

2022
[49]

Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. Context Encoders: Feature Learning by Inpainting. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2536–2544, Las Vegas, NV , USA, June 2016. IEEE

2016
[50]

BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping

Srikumar Sastry, Subash Khanal, Aayush Dhakal, Di Huang, and Nathan Jacobs. BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping. In2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 7121–7130, Waikoloa, HI, USA, January 2024. IEEE

2024
[51]

Masked autoencoder-based self-supervised learning for forest plant classification.Cybernetics and Physics, (V olume 13, 2024, Number 1):32–41, June 2024

Luu Van Huy, Nguyen Huy Tuong, Le Hoang Ngoc Han, and Nguyen Van Hieu. Masked autoencoder-based self-supervised learning for forest plant classification.Cybernetics and Physics, (V olume 13, 2024, Number 1):32–41, June 2024

2024
[52]

Aggregated Contextual Transformations for High-Resolution Image Inpainting, April 2021

Yanhong Zeng, Jianlong Fu, Hongyang Chao, and Baining Guo. Aggregated Contextual Transformations for High-Resolution Image Inpainting, April 2021. arXiv:2104.01431 [cs]

work page arXiv 2021
[53]

Free-Form Image Inpainting with Gated Convolution, October 2019

Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas Huang. Free-Form Image Inpainting with Gated Convolution, October 2019. arXiv:1806.03589 [cs]

work page arXiv 2019
[54]

Qureshi, and Mehran Ebrahimi

Kamyar Nazeri, Eric Ng, Tony Joseph, Faisal Z. Qureshi, and Mehran Ebrahimi. EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning, January 2019. arXiv:1901.00212 [cs]

work page arXiv 2019
[55]

Resolution-robust Large Mask Inpainting with Fourier Convolutions

Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Sil- vestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. Resolution-robust Large Mask Inpainting with Fourier Convolutions. In2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3172–3182, Waikoloa, HI, USA, J...

2022
[56]

Fast Non-Local Neural Networks with Spectral Residual Learning

Lu Chi, Guiyu Tian, Yadong Mu, Lingxi Xie, and Qi Tian. Fast Non-Local Neural Networks with Spectral Residual Learning. InProceedings of the 27th ACM International Conference on Multimedia, pages 2142–2151, Nice France, October 2019. ACM

2019
[57]

Bovik, H.R

Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, April 2004

2004
[58]

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. InAdvances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

2017
[59]

Places: A 10 Million Image Database for Scene Recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452– 1464, June 2018

Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 Million Image Database for Scene Recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452– 1464, June 2018

2018
[60]

original-date: 2021-03-28T06:10:19Z

researchmm/AOT-GAN-for-Inpainting, February 2026. original-date: 2021-03-28T06:10:19Z

2026
[61]

knazeri/edge-connect, February 2026

Kamyar Nazeri. knazeri/edge-connect, February 2026. original-date: 2018-12-16T02:14:19Z

2026
[62]

original-date: 2021-08-30T18:27:52Z

advimman/lama, February 2026. original-date: 2021-08-30T18:27:52Z

2026
[63]

nipponjo/deepfillv2-pytorch, February 2026

nipponjo. nipponjo/deepfillv2-pytorch, February 2026. original-date: 2021-11-29T15:19:40Z

2026
[64]

Geometric GAN

Jae Hyun Lim and Jong Chul Ye. Geometric GAN, May 2017. arXiv:1705.02894 [stat]

work page arXiv 2017
[65]

Perceptual Losses for Real-Time Style Transfer and Super- Resolution

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual Losses for Real-Time Style Transfer and Super- Resolution. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors,Computer Vision - ECCV 2016, pages 694–711, Cham, 2016. Springer International Publishing. 15 Inpainting Model Embeddings for Pattern-based Individual IdentificationA PREPRINT

2016
[66]

Mehdi S. M. Sajjadi, Bernhard Scholkopf, and Michael Hirsch. EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis. pages 4491–4500, 2017

2017
[67]

Selvaraju and Abhishek Das and Ramakrishna Vedantam and Michael Cogswell and Devi Parikh and Dhruv Batra , title =

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization.International Journal of Computer Vision, 128(2):336–359, February 2020. arXiv:1610.02391 [cs]

work page arXiv 2020
[68]

jacobgil/pytorch-grad-cam, February 2026

Jacob Gildenblat. jacobgil/pytorch-grad-cam, February 2026. original-date: 2017-05-31T19:55:15Z

2026
[69]

Principal components analysis (PCA).Computers & Geosciences, 19(3):303–342, March 1993

Andrzej Ma´ckiewicz and Waldemar Ratajczak. Principal components analysis (PCA).Computers & Geosciences, 19(3):303–342, March 1993

1993
[70]

Visualizing Data using t-SNE.Journal of Machine Learning Research, 9(86):2579–2605, 2008

Laurens van der Maaten and Geoffrey Hinton. Visualizing Data using t-SNE.Journal of Machine Learning Research, 9(86):2579–2605, 2008

2008
[71]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Leland McInnes, John Healy, and James Melville. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, September 2020. arXiv:1802.03426 [stat]

work page internal anchor Pith review arXiv 2020
[72]

S. Lloyd. Least squares quantization in PCM.IEEE Transactions on Information Theory, 28(2):129–137, March 1982

1982
[73]

Rousseeuw

Peter J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.Journal of Computational and Applied Mathematics, 20:53–65, November 1987

1987
[74]

Davies and Donald W

David L. Davies and Donald W. Bouldin. A Cluster Separation Measure.IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2):224–227, April 1979

1979
[75]

Communications in Statistics 3(1), 1-27 (1974)

T. Cali´nski and J Harabasz. A dendrite method for cluster analysis.Communications in Statistics, 3(1):1–27, January 1974. _eprint: https://www.tandfonline.com/doi/pdf/10.1080/03610927408827101

work page doi:10.1080/03610927408827101 1974
[76]

William M. Rand. Objective Criteria for the Evaluation of Clustering Methods.Jour- nal of the American Statistical Association, 66(336):846–850, December 1971. _eprint: https://www.tandfonline.com/doi/pdf/10.1080/01621459.1971.10482356

work page doi:10.1080/01621459.1971.10482356 1971
[77]

C. E. Shannon. A mathematical theory of communication.The Bell System Technical Journal, 27(3):379–423, July 1948

1948
[78]

Mingxing Tan and Quoc V . Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, September 2020. arXiv:1905.11946 [cs]. 16

work page arXiv 2020