pith. machine review for the scientific record. sign in

arxiv: 2605.04904 · v1 · submitted 2026-05-06 · 💻 cs.CV

Recognition: unknown

Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification

Authors on Pith no claims yet

Pith reviewed 2026-05-08 16:21 UTC · model grok-4.3

classification 💻 cs.CV
keywords animal individual identificationskin pattern recognitionimage inpaintingdeep learning embeddingszebrafishembedding clusteringGradCAM
0
0 comments X

The pith

Training inpainting of task-specific masks as an auxiliary task makes deep learning encoders focus on animal skin patterns for individual identification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether adding image inpainting of task-specific masks during training helps machine learning models extract embeddings that capture individual skin patterns in animals instead of background details or body shape. This matters for biodiversity monitoring because reliable non-invasive identification supports tracking population changes and social interactions. The authors test the approach on zebrafish by comparing four encoder backbones and measuring results through classification accuracy, embedding clustering metrics, and GradCAM attention maps.

Core claim

Image inpainting of task-specific masks serves as an auxiliary task that trains the encoder to produce visual embeddings more responsive to skin pattern structure, which in turn improves clustering and classification performance for individual zebrafish identification compared to standard training.

What carries the argument

Image inpainting of task-specific masks used as an auxiliary task alongside the individual identification objective to steer the encoder backbone.

If this is right

  • Classification accuracy for identifying individual zebrafish will increase when the inpainting task is included.
  • Embeddings will form tighter clusters corresponding to individual identities based on pattern features.
  • GradCAM visualizations will show greater attention to skin patterns and less to non-specific regions.
  • One or more of the four tested encoder backbones will demonstrate measurable gains in all evaluation metrics from the auxiliary task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same auxiliary inpainting strategy could extend to other species with stable, individual-specific coat or skin markings.
  • Models trained this way may maintain performance even when body shape changes with growth or injury.
  • Pairing inpainting with additional self-supervised tasks might further isolate pattern information without extra labels.

Load-bearing premise

Inpainting task-specific masks will shift model attention to skin pattern structure rather than background details or body shape.

What would settle it

If models trained with the inpainting auxiliary task show no gain in embedding clustering metrics or if GradCAM maps continue to highlight background and body shape instead of skin patterns, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.04904 by Daniele Avitabile, Fons J. Verbeek, Jens van Bijsterveld, Rita Pucci.

Figure 1
Figure 1. Figure 1: Graphical overview of the proposed experiments. Architectures are pre-trained on the inpainting task, and view at source ↗
Figure 2
Figure 2. Figure 2: Hand-drawn masks for the background, fish, and pattern areas as used for the ablation study. The background view at source ↗
Figure 3
Figure 3. Figure 3: Example of an image-mask pair in the classification dataset. This pair is annotated with the label view at source ↗
Figure 4
Figure 4. Figure 4: Ground truth, mask, and input image for one of the images used for the inpainting task view at source ↗
Figure 5
Figure 5. Figure 5: Output of inpainting using all four selected inpainting models view at source ↗
Figure 6
Figure 6. Figure 6: Differences in number of embedding features and encoder parameters between AOT-GAN, DeepFillV2, view at source ↗
Figure 7
Figure 7. Figure 7: Classification quality metrics using the inpainting pre-trained encoder and a classification head. Shallow view at source ↗
Figure 8
Figure 8. Figure 8: GradCAM-overlaid zebrafish image results after 15 epochs of shallow backpropagation classification fine view at source ↗
Figure 9
Figure 9. Figure 9: GradCAM-overlaid zebrafish image results during deep backpropagation classification fine-tuning for view at source ↗
Figure 10
Figure 10. Figure 10: UMAP visualizations of the embedding space before (fig. 10a) and after (fig. 10b) classification fine-tuning. view at source ↗
read the original abstract

In this paper, we explore deep learning techniques for individual identification of animals based on their skin patterns. Individual identification is crucial in biodiversity monitoring, since it enables analysis of decline or growth of populations, or intra-species interactions within populations. Models trained for the task of individual identification often do not focus on the skin pattern of animals, but on background details or body shape details. These characteristics are not individually specific, or can change drastically through time. We focus on techniques that will make machine learning models more responsive to skin pattern structure when extracting individual visual embeddings from images. For this, we explore image inpainting of task-specific masks as an auxiliary task to enhance ML-based individual identification from animal skin patterns. We propose a comparative analysis among four models as an encoder backbone for the individual identification task. We focus on the case study of zebrafish, which is a widely recognized biological model organism, and which exhibits individually identifying skin patterns. To evaluate encoder backbone performance, we present standard metrics for classification accuracy, embedding clustering metrics, and GradCAM visualizations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper explores image inpainting of task-specific masks as an auxiliary task to improve deep learning embeddings for individual animal identification from skin patterns, using zebrafish as the case study. It compares four encoder backbones for the identification task and evaluates performance via classification accuracy, embedding clustering metrics, and GradCAM visualizations to assess whether models prioritize skin pattern structure over background or body shape.

Significance. If the empirical results hold under proper controls, the work could offer a practical multi-task strategy for making visual embeddings more robust to non-specific cues in pattern-based re-identification, which is relevant for biodiversity monitoring applications. The comparative backbone analysis and GradCAM interpretability are positive elements, but the absence of targeted ablations means the claimed mechanism remains unisolated from generic regularization effects.

major comments (2)
  1. [Experiments and Results] The central claim that task-specific inpainting masks cause the encoder to prioritize individually unique skin patterns (rather than background or shape) is load-bearing yet unsupported by ablation experiments. No comparison is presented between task-specific masks, random masks, or a pure classification baseline, so any reported gains in clustering metrics or accuracy could arise from multi-task regularization alone. (Experiments and Results sections)
  2. [Abstract and Results] Quantitative results for the claimed improvements are not summarized with specific numbers, standard deviations, or statistical significance tests in the abstract or early sections. Without these, it is impossible to assess whether the inpainting auxiliary task produces practically meaningful gains in classification accuracy or clustering quality (e.g., NMI, ARI, silhouette score) over the four backbones.
minor comments (2)
  1. [Method] The description of how task-specific masks are generated and applied during training should include a concrete example or pseudocode for reproducibility.
  2. [Figures] Figure captions for GradCAM visualizations should explicitly state which backbone and training condition (with/without inpainting) each panel corresponds to.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the experimental design and result presentation, which we will address in the revision.

read point-by-point responses
  1. Referee: [Experiments and Results] The central claim that task-specific inpainting masks cause the encoder to prioritize individually unique skin patterns (rather than background or shape) is load-bearing yet unsupported by ablation experiments. No comparison is presented between task-specific masks, random masks, or a pure classification baseline, so any reported gains in clustering metrics or accuracy could arise from multi-task regularization alone. (Experiments and Results sections)

    Authors: We agree that the current experiments do not fully isolate the contribution of task-specific inpainting masks from generic multi-task regularization effects. Our manuscript compares four encoder backbones trained with the inpainting auxiliary task and uses GradCAM to demonstrate attention to skin patterns rather than background or shape. However, we did not include ablations with random masks or a pure classification (no-auxiliary-task) baseline. We will add these targeted ablation experiments to the revised manuscript to better substantiate the mechanism. revision: yes

  2. Referee: [Abstract and Results] Quantitative results for the claimed improvements are not summarized with specific numbers, standard deviations, or statistical significance tests in the abstract or early sections. Without these, it is impossible to assess whether the inpainting auxiliary task produces practically meaningful gains in classification accuracy or clustering quality (e.g., NMI, ARI, silhouette score) over the four backbones.

    Authors: We agree that the abstract and early sections would be clearer with explicit quantitative summaries. The results section reports classification accuracy, NMI, ARI, and silhouette scores for the four backbones, but these are not highlighted with specific values, standard deviations, or significance tests in the abstract. We will revise the abstract and introduction to include key numerical results with variability measures and note any statistical comparisons performed. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison with no derivations or self-referential reductions

full rationale

The paper conducts an empirical study comparing four encoder backbones for zebrafish individual identification, using inpainting of task-specific masks as an auxiliary task. Evaluation relies on standard classification accuracy, embedding clustering metrics (e.g., silhouette scores), and GradCAM visualizations. No equations, derivations, fitted parameters presented as predictions, or uniqueness theorems appear in the provided text. The central claim is supported by experimental results rather than reducing to self-definition or self-citation chains. This is a self-contained experimental exploration against external benchmarks (zebrafish datasets and standard ML metrics), warranting a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only abstract available, so ledger reflects implied assumptions rather than explicit content. No free parameters, invented entities, or non-standard axioms are stated.

axioms (2)
  • domain assumption Deep learning encoders can extract individually discriminative embeddings from animal images when trained with auxiliary tasks.
    Core premise of the proposed method, standard in computer vision but unverified here.
  • ad hoc to paper Task-specific inpainting masks will steer embeddings toward skin patterns rather than background or shape.
    Central hypothesis of the work, presented without supporting derivation or prior evidence in the abstract.

pith-pipeline@v0.9.0 · 5492 in / 1199 out tokens · 48328 ms · 2026-05-08T16:21:10.499163+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

78 extracted references · 16 canonical work pages · 5 internal anchors

  1. [1]

    Brondízio, Hien T

    Sandra Díaz, Josef Settele, Eduardo S. Brondízio, Hien T. Ngo, John Agard, Almut Arneth, Patricia Balvanera, Kate A. Brauman, Stuart H. M. Butchart, Kai M. A. Chan, Lucas A. Garibaldi, Kazuhito Ichii, Jianguo Liu, Suneetha M. Subramanian, Guy F. Midgley, Patricia Miloslavich, Zsolt Molnár, David Obura, Alexander Pfaff, Stephen Polasky, Andy Purvis, Jona R...

  2. [2]

    Aichi Biodiversity Targets, September 2020

    Biosafety Unit. Aichi Biodiversity Targets, September 2020

  3. [3]

    THE 17 GOALS | Sustainable Development

  4. [4]

    Helena Freitas and António C. Gouveia. Biodiversity futures: digital approaches to knowledge and conservation of biological diversity.Web Ecology, 25(1):29–37, February 2025

  5. [5]

    Pelagic Publishing Ltd, June 2016

    Francesco Rovero and Fridolin Zimmermann.Camera Trapping for Wildlife Research. Pelagic Publishing Ltd, June 2016. Google-Books-ID: UaJyDAAAQBAJ

  6. [6]

    Wich and Lian Pin Koh.Conservation Drones: Mapping and Monitoring Biodiversity

    Serge A. Wich and Lian Pin Koh.Conservation Drones: Mapping and Monitoring Biodiversity. Oxford University Press, 2018. Google-Books-ID: 5P5cDwAAQBAJ

  7. [7]

    Terrestrial Passive Acoustic Monitoring: Review and Perspectives.BioScience, 69(1):15–25, January 2019

    Larissa Sayuri Moreira Sugai, Thiago Sanna Freire Silva, José Wagner Ribeiro, Jr, and Diego Llusia. Terrestrial Passive Acoustic Monitoring: Review and Perspectives.BioScience, 69(1):15–25, January 2019

  8. [8]

    Bik, Elvira Mächler, Mathew Seymour, Anaïs Lacoursière-Roussel, Florian Altermatt, Simon Creer, Iliana Bista, David M

    Kristy Deiner, Holly M. Bik, Elvira Mächler, Mathew Seymour, Anaïs Lacoursière-Roussel, Florian Altermatt, Simon Creer, Iliana Bista, David M. Lodge, Natasha de Vere, Michael E. Pfrender, and Louis Bernatchez. Environmental DNA metabarcoding: Transforming how we survey animal and plant communities.Molecular Ecology, 26(21):5872–5895, 2017. _eprint: https:...

  9. [9]

    Mark Chandler, Linda See, Kyle Copas, Astrid M. Z. Bonde, Bernat Claramunt López, Finn Danielsen, Jan Kristof- fer Legind, Siro Masinde, Abraham J. Miller-Rushing, Greg Newman, Alyssa Rosemartin, and Eren Turak. Con- tribution of citizen science towards international biodiversity monitoring.Biological Conservation, 213:280–294, September 2017

  10. [10]

    Christian Molls. The Obs-Services and their potentials for biodiversity data assessments with a test of the current reliability of photo-identification of Coleoptera in the field.Tijdschrift voor Entomologie, 164(1-3):143–153, December 2021

  11. [11]

    YOLOv11: An Overview of the Key Architectural Enhancements

    Rahima Khanam and Muhammad Hussain. YOLOv11: An Overview of the Key Architectural Enhancements, October 2024. arXiv:2410.17725 [cs]

  12. [12]

    Object Tracking Using Computer Vision: A Review.Computers, 13(6), May 2024

    Pushkar Kadam, Gu Fang, and Ju Jia Zou. Object Tracking Using Computer Vision: A Review.Computers, 13(6), May 2024

  13. [13]

    Chen Li, Marcel Polling, Lu Cao, Barbara Gravendeel, and Fons J. Verbeek. Analysis of automatic image classification methods for Urticaceae pollen classification.Neurocomputing, 522:181–193, February 2023

  14. [14]

    Alexander Gomez Villa, Augusto Salazar, and Francisco Vargas. Towards automatic wild animal monitoring: Identification of animal species in camera-trap images using very deep convolutional neural networks.Ecological Informatics, 41:24–32, September 2017

  15. [15]

    Animal species classification using deep neural networks with noise labels.Ecological Informatics, 57:101063, May 2020

    Ahmed Ahmed, Hayder Yousif, Roland Kays, and Zhihai He. Animal species classification using deep neural networks with noise labels.Ecological Informatics, 57:101063, May 2020

  16. [16]

    Plant Species Identification Using Computer Vision Techniques: A Systematic Literature Review.Archives of Computational Methods in Engineering, 25(2):507–543, April 2018

    Jana Wäldchen and Patrick Mäder. Plant Species Identification Using Computer Vision Techniques: A Systematic Literature Review.Archives of Computational Methods in Engineering, 25(2):507–543, April 2018

  17. [17]

    Machine learning for image based species identification.Methods in Ecology and Evolution, 9(11):2216–2225, 2018

    Jana Wäldchen and Patrick Mäder. Machine learning for image based species identification.Methods in Ecology and Evolution, 9(11):2216–2225, 2018. _eprint: https://besjournals.onlinelibrary.wiley.com/doi/pdf/10.1111/2041- 210X.13075

  18. [18]

    Taylor, and Stefan C

    Stefan Schneider, Graham W. Taylor, and Stefan C. Kremer. Similarity learning networks for animal individual re-identification: an ecological perspective.Mammalian Biology, 102(3):899–914, June 2022

  19. [19]

    Krebs.Ecological Methodology

    Charles J. Krebs.Ecological Methodology. Harper & Row, New York, 1989. Technical report

  20. [20]

    Kühl and Tilo Burghardt

    Hjalmar S. Kühl and Tilo Burghardt. Animal biometrics: quantifying and detecting phenotypic appearance.Trends in Ecology & Evolution, 28(7):432–441, July 2013

  21. [21]

    Allen and James P

    William L. Allen and James P. Higham. Assessing the potential information content of multicomponent visual sig- nals: a machine learning approach.Proceedings of the Royal Society B: Biological Sciences, 282(1802):20142284, March 2015. 13 Inpainting Model Embeddings for Pattern-based Individual IdentificationA PREPRINT

  22. [22]

    Identification and Recognition of Animals from Biometric Markers Using Computer Vision Approaches: A Review.Kafkas Universitesi Veteriner Fakultesi Dergisi, 29(6):581–593, 2023

    Pinar Cihan, Ahmet Saygili, Nihat Eren Ozmen, and Muhammed Akyuzlu. Identification and Recognition of Animals from Biometric Markers Using Computer Vision Approaches: A Review.Kafkas Universitesi Veteriner Fakultesi Dergisi, 29(6):581–593, 2023

  23. [23]

    R. T. Liu, S. S. Liaw, and P. K. Maini. Two-stage Turing model for generating pigment patterns on the leopard and the jaguar.Physical Review E, 74(1):011914, July 2006

  24. [24]

    S. S. Liaw, C. C. Yang, R. T. Liu, and J. T. Hong. Turing model for the patterns of lady beetles.Physical Review E, 64(4):041909, September 2001

  25. [25]

    Bullara and Y

    D. Bullara and Y . De Decker. Pigment cell movement is not required for generation of Turing patterns in zebrafish skin.Nature Communications, 6(1):6971, May 2015

  26. [26]

    Reeves, Cyrill B

    Gregory T. Reeves, Cyrill B. Muratov, Trudi Schüpbach, and Stanislav Y . Shvartsman. Quantitative Models of Developmental Pattern Formation.Developmental Cell, 11(3):289–300, September 2006

  27. [27]

    Perspectives on Individual Animal Identification from Biology and Computer Vision.Integrative and Comparative Biology, 61(3):900–916, September 2021

    Maxime Vidal, Nathan Wolf, Beth Rosenberg, Bradley P Harris, and Alexander Mathis. Perspectives on Individual Animal Identification from Biology and Computer Vision.Integrative and Comparative Biology, 61(3):900–916, September 2021

  28. [28]

    Samba Kumar, Arjun M

    Lex Hiby, Phil Lovell, Narendra Patil, N. Samba Kumar, Arjun M. Gopalaswamy, and K. Ullas Karanth. A tiger cannot change its stripes: using a three-dimensional model to match images of living tigers and tiger skins. Biology Letters, 5(3):383–386, March 2009

  29. [29]

    Computer vision based individual fish identification using skin dot pattern.Scientific Reports, 11(1):16904, August 2021

    Petr Cisar, Dinara Bekkozhayeva, Oleksandr Movchan, Mohammadmehdi Saberioon, and Rudolf Schraml. Computer vision based individual fish identification using skin dot pattern.Scientific Reports, 11(1):16904, August 2021

  30. [30]

    Moeslund

    Malte Pedersen, Marianne Nyegaard, and Thomas B. Moeslund. Finding Nemo’s Giant Cousin: Keypoint Matching for Robust Re-Identification of Giant Sunfish.Journal of Marine Science and Engineering, 11(5), April 2023

  31. [31]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition, April 2015. arXiv:1409.1556 [cs]

  32. [32]

    Ekaterina Nepovinnykh, Ilia Chelak, Tuomas Eerola, Veikka Immonen, Heikki Kälviäinen, Maksim Kholi- avchenko, and Charles V . Stewart. Species-Agnostic Patterned Animal Re-identification by Aggregating Deep Local Features.International Journal of Computer Vision, 132(9):4003–4018, September 2024

  33. [33]

    Zebrafish identification with deep CNN and ViT architectures using a rolling training window.Scientific Reports, 15(1):8580, March 2025

    Jason Puchalla, Aaron Serianni, and Bo Deng. Zebrafish identification with deep CNN and ViT architectures using a rolling training window.Scientific Reports, 15(1):8580, March 2025

  34. [34]

    Studies of Turing pattern formation in zebrafish skin

    Shigeru Kondo, Masakatsu Watanabe, and Seita Miyazawa. Studies of Turing pattern formation in zebrafish skin. Philosophical Transactions of the Royal Society A, 379(2213), December 2021

  35. [35]

    McElligott and Donald M

    Melissa B. McElligott and Donald M. O’Malley. Prey Tracking by Larval Zebrafish: Axial Kinematics and Visual Control.Brain Behavior and Evolution, 66(3):177–196, September 2005

  36. [36]

    Regression Based Multi-View Zebrafish Tracking

    Mathias Gudiksen. Regression Based Multi-View Zebrafish Tracking. 2021

  37. [37]

    Tracking Multiple Zebrafish Larvae Using YOLOv5 and DeepSORT

    Guoning Si, Fuhuan Zhou, Zhuo Zhang, and Xuping Zhang. Tracking Multiple Zebrafish Larvae Using YOLOv5 and DeepSORT. In2022 8th International Conference on Automation, Robotics and Applications (ICARA), pages 228–232, February 2022. ISSN: 2767-7745

  38. [38]

    Roth, and Horst Bischof

    Martin Köstinger, Martin Hirzer, Paul Wohlhart, Peter M. Roth, and Horst Bischof. Large scale metric learning from equivalence constraints. In2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2288–2295, June 2012. ISSN: 1063-6919

  39. [39]

    Large Scale Similarity Learning Using Similar Pairs for Person Verification.Proceedings of the AAAI Conference on Artificial Intelligence, 30(1), March 2016

    Yang Yang, Shengcai Liao, Zhen Lei, and Stan Li. Large Scale Similarity Learning Using Similar Pairs for Person Verification.Proceedings of the AAAI Conference on Artificial Intelligence, 30(1), March 2016

  40. [40]

    Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. Person re-identification by Local Maximal Occurrence representation and metric learning. In2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2197–2206, Boston, MA, USA, June 2015. IEEE

  41. [41]

    Learning a Discriminative Null Space for Person Re-Identification

    Li Zhang, Tao Xiang, and Shaogang Gong. Learning a Discriminative Null Space for Person Re-Identification. pages 1239–1248, 2016

  42. [42]

    Moeslund

    Joakim Bruslund Haurum, Anastasija Karpova, Malte Pedersen, Stefan Hein Bengtson, and Thomas B. Moeslund. Re-Identification of Zebrafish using Metric Learning. In2020 IEEE Winter Applications of Computer Vision Workshops (WACVW), pages 1–11, Snowmass Village, CO, USA, March 2020. IEEE. 14 Inpainting Model Embeddings for Pattern-based Individual Identifica...

  43. [43]

    Rethinking the Inception Architecture for Computer Vision

    Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the Inception Architecture for Computer Vision, December 2015. arXiv:1512.00567 [cs]

  44. [44]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, June 2021. arXiv:2010.11929 [cs]

  45. [45]

    Longitudinal Identification of Zebrafish Individuals by Deep Learning

    Danying Cao, Cheng Guo, Yingyin Cheng, Wanting Zhang, and Mijuan Shi. Longitudinal Identification of Zebrafish Individuals by Deep Learning

  46. [46]

    Generative Adversarial Nets.NIPS proceedings, 2014

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Nets.NIPS proceedings, 2014

  47. [47]

    Auto-Encoding Variational Bayes

    Diederik P. Kingma and Max Welling. Auto-Encoding Variational Bayes, December 2013. arXiv:1312.6114 [stat]

  48. [48]

    Masked Autoencoders Are Scalable Vision Learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollar, and Ross Girshick. Masked Autoencoders Are Scalable Vision Learners. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15979–15988, New Orleans, LA, USA, June 2022. IEEE

  49. [49]

    Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. Context Encoders: Feature Learning by Inpainting. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2536–2544, Las Vegas, NV , USA, June 2016. IEEE

  50. [50]

    BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping

    Srikumar Sastry, Subash Khanal, Aayush Dhakal, Di Huang, and Nathan Jacobs. BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping. In2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 7121–7130, Waikoloa, HI, USA, January 2024. IEEE

  51. [51]

    Masked autoencoder-based self-supervised learning for forest plant classification.Cybernetics and Physics, (V olume 13, 2024, Number 1):32–41, June 2024

    Luu Van Huy, Nguyen Huy Tuong, Le Hoang Ngoc Han, and Nguyen Van Hieu. Masked autoencoder-based self-supervised learning for forest plant classification.Cybernetics and Physics, (V olume 13, 2024, Number 1):32–41, June 2024

  52. [52]

    Aggregated Contextual Transformations for High-Resolution Image Inpainting, April 2021

    Yanhong Zeng, Jianlong Fu, Hongyang Chao, and Baining Guo. Aggregated Contextual Transformations for High-Resolution Image Inpainting, April 2021. arXiv:2104.01431 [cs]

  53. [53]

    Free-Form Image Inpainting with Gated Convolution, October 2019

    Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas Huang. Free-Form Image Inpainting with Gated Convolution, October 2019. arXiv:1806.03589 [cs]

  54. [54]

    Qureshi, and Mehran Ebrahimi

    Kamyar Nazeri, Eric Ng, Tony Joseph, Faisal Z. Qureshi, and Mehran Ebrahimi. EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning, January 2019. arXiv:1901.00212 [cs]

  55. [55]

    Resolution-robust Large Mask Inpainting with Fourier Convolutions

    Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Sil- vestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. Resolution-robust Large Mask Inpainting with Fourier Convolutions. In2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3172–3182, Waikoloa, HI, USA, J...

  56. [56]

    Fast Non-Local Neural Networks with Spectral Residual Learning

    Lu Chi, Guiyu Tian, Yadong Mu, Lingxi Xie, and Qi Tian. Fast Non-Local Neural Networks with Spectral Residual Learning. InProceedings of the 27th ACM International Conference on Multimedia, pages 2142–2151, Nice France, October 2019. ACM

  57. [57]

    Bovik, H.R

    Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, April 2004

  58. [58]

    GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. InAdvances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

  59. [59]

    Places: A 10 Million Image Database for Scene Recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452– 1464, June 2018

    Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 Million Image Database for Scene Recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452– 1464, June 2018

  60. [60]

    original-date: 2021-03-28T06:10:19Z

    researchmm/AOT-GAN-for-Inpainting, February 2026. original-date: 2021-03-28T06:10:19Z

  61. [61]

    knazeri/edge-connect, February 2026

    Kamyar Nazeri. knazeri/edge-connect, February 2026. original-date: 2018-12-16T02:14:19Z

  62. [62]

    original-date: 2021-08-30T18:27:52Z

    advimman/lama, February 2026. original-date: 2021-08-30T18:27:52Z

  63. [63]

    nipponjo/deepfillv2-pytorch, February 2026

    nipponjo. nipponjo/deepfillv2-pytorch, February 2026. original-date: 2021-11-29T15:19:40Z

  64. [64]

    Geometric GAN

    Jae Hyun Lim and Jong Chul Ye. Geometric GAN, May 2017. arXiv:1705.02894 [stat]

  65. [65]

    Perceptual Losses for Real-Time Style Transfer and Super- Resolution

    Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual Losses for Real-Time Style Transfer and Super- Resolution. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors,Computer Vision - ECCV 2016, pages 694–711, Cham, 2016. Springer International Publishing. 15 Inpainting Model Embeddings for Pattern-based Individual IdentificationA PREPRINT

  66. [66]

    Mehdi S. M. Sajjadi, Bernhard Scholkopf, and Michael Hirsch. EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis. pages 4491–4500, 2017

  67. [67]

    Selvaraju and Abhishek Das and Ramakrishna Vedantam and Michael Cogswell and Devi Parikh and Dhruv Batra , title =

    Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization.International Journal of Computer Vision, 128(2):336–359, February 2020. arXiv:1610.02391 [cs]

  68. [68]

    jacobgil/pytorch-grad-cam, February 2026

    Jacob Gildenblat. jacobgil/pytorch-grad-cam, February 2026. original-date: 2017-05-31T19:55:15Z

  69. [69]

    Principal components analysis (PCA).Computers & Geosciences, 19(3):303–342, March 1993

    Andrzej Ma´ckiewicz and Waldemar Ratajczak. Principal components analysis (PCA).Computers & Geosciences, 19(3):303–342, March 1993

  70. [70]

    Visualizing Data using t-SNE.Journal of Machine Learning Research, 9(86):2579–2605, 2008

    Laurens van der Maaten and Geoffrey Hinton. Visualizing Data using t-SNE.Journal of Machine Learning Research, 9(86):2579–2605, 2008

  71. [71]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    Leland McInnes, John Healy, and James Melville. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, September 2020. arXiv:1802.03426 [stat]

  72. [72]

    S. Lloyd. Least squares quantization in PCM.IEEE Transactions on Information Theory, 28(2):129–137, March 1982

  73. [73]

    Rousseeuw

    Peter J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.Journal of Computational and Applied Mathematics, 20:53–65, November 1987

  74. [74]

    Davies and Donald W

    David L. Davies and Donald W. Bouldin. A Cluster Separation Measure.IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2):224–227, April 1979

  75. [75]

    Communications in Statistics 3(1), 1-27 (1974)

    T. Cali´nski and J Harabasz. A dendrite method for cluster analysis.Communications in Statistics, 3(1):1–27, January 1974. _eprint: https://www.tandfonline.com/doi/pdf/10.1080/03610927408827101

  76. [76]

    William M. Rand. Objective Criteria for the Evaluation of Clustering Methods.Jour- nal of the American Statistical Association, 66(336):846–850, December 1971. _eprint: https://www.tandfonline.com/doi/pdf/10.1080/01621459.1971.10482356

  77. [77]

    C. E. Shannon. A mathematical theory of communication.The Bell System Technical Journal, 27(3):379–423, July 1948

  78. [78]

    Mingxing Tan and Quoc V . Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, September 2020. arXiv:1905.11946 [cs]. 16