Recognition: 2 theorem links
· Lean TheoremInference-Path Optimization via Circuit Duplication in Frozen Visual Transformers for Marine Species Classification
Pith reviewed 2026-05-13 19:52 UTC · model grok-4.3
The pith
Duplicating selected transformer layers at inference time improves frozen embeddings for marine species classification without any training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Circuit duplication lets a frozen visual transformer traverse a selected range of its layers twice during the forward pass, yielding an optimized inference path that consistently outperforms the standard frozen embedding on the AQUA20 marine species dataset; class-specific selection reaches a macro F1 of 0.875 at maximum label budget, closing the gap to a fully supervised ConvNeXt benchmark (0.889) to 1.4 points and exceeding it for some classes, all without gradient updates or weight changes.
What carries the argument
Circuit duplication: selecting a range of transformer layers and traversing them twice during the forward pass to optimize the inference path.
If this is right
- Circuit duplication improves over the standard frozen forward pass across all label budgets.
- Class-specific selection reaches 0.875 macro F1 at maximum budget, within 1.4 points of the fully supervised ConvNeXt reference.
- Four species exceed their supervised performance, with octopus gaining +12.1 F1 points.
- Roughly 75 percent of classes benefit from a tailored circuit, showing class-dependent value.
- The method requires no gradient-based training, preserving the frozen foundation model.
Where Pith is reading between the lines
- Inference-time path changes like this could let foundation models adapt to other label-scarce scientific imaging domains without retraining.
- Automating circuit choice from image statistics might remove the need for per-class search at deployment.
- Combining circuit duplication with other semi-supervised signals could further shrink the remaining gap to fully supervised results.
- Testing the same duplication ranges on terrestrial or medical vision tasks would reveal whether the gains are marine-specific or more general.
Load-bearing premise
The performance lift comes from duplicating the chosen layers rather than from the circuit-selection procedure overfitting to AQUA20 statistics or from differences in downstream classifier training.
What would settle it
Reproducing the experiments on an independent marine image dataset and observing no improvement or a performance drop from circuit duplication would falsify the central claim.
Figures
read the original abstract
Automated underwater species classification is constrained by annotation cost and environmental variation that limits the transferability of fully supervised models. Recent work has shown that frozen embeddings from self-supervised vision foundation models already provide a strong label-efficient baseline for marine image classification. Here we investigate whether this frozen-embedding regime can be improved at inference time, without fine-tuning or changing model weights. We apply Circuit Duplication, an inference-time method originally proposed for Large Language Models, in which a selected range of transformer layers is traversed twice during the forward pass. We evaluate on the class-imbalanced AQUA20 benchmark using frozen DINOv3 embeddings under two settings: global circuit selection, where a single duplicated circuit is chosen for the full dataset, and class-specific circuit selection, where each species may receive a different optimal circuit. Both settings use simple semi-supervised downstream classifiers. Circuit Duplication consistently improves over the standard frozen forward pass. At the maximum label budget, class-specific selection reaches a macro F1 of 0.875, closing the gap to the fully supervised ConvNeXt benchmark (0.889) to 1.4 points without any gradient-based training. Four species exceed their fully supervised reference, with octopus improving by +12.1 F1 points. Across all budgets, roughly 75% of classes prefer a class-specific circuit, indicating a genuinely class-dependent benefit. To our knowledge, this is the first application of Circuit Duplication to computer vision.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Circuit Duplication—an inference-time technique that traverses a selected range of transformer layers twice—as a way to improve frozen DINOv3 embeddings for class-imbalanced marine species classification on the AQUA20 benchmark. It evaluates both global and class-specific circuit selection using simple semi-supervised downstream classifiers, reporting that class-specific selection reaches macro F1 of 0.875 at the maximum label budget (closing the gap to a fully supervised ConvNeXt baseline of 0.889 to 1.4 points) without any gradient-based training, with four species exceeding the supervised reference and roughly 75% of classes preferring class-specific circuits.
Significance. If the reported gains can be shown to arise from duplication rather than selection leakage, the work would be a meaningful first demonstration of inference-path optimization for vision transformers in label-scarce domains. It highlights the potential for class-dependent layer traversal to approach supervised performance at low annotation cost and could influence efficient deployment of foundation models in environmental monitoring.
major comments (3)
- [Methods] The methods description provides no explicit statement that circuit selection (global or class-specific) is performed on data strictly disjoint from the AQUA20 evaluation splits. Class-specific selection, where each species independently chooses its duplicated range, therefore risks fitting to test-set statistics; the headline result (0.875 macro F1) cannot be verified as arising from duplication itself rather than per-class optimization on the reported benchmark.
- [Results] No error bars, standard deviations across runs, or statistical significance tests are supplied for any F1 numbers in the abstract or results. Given the class imbalance and the small absolute gap to the supervised baseline (1.4 points), the claim of “consistent improvement” across label budgets cannot be assessed for reliability.
- [Results] The statement that “roughly 75% of classes prefer a class-specific circuit” is itself computed after selection on the same data used for final reporting; an ablation that repeats selection on a held-out validation partition and then evaluates on test data is required to support the central claim that the benefit is genuinely class-dependent rather than an artifact of the selection procedure.
minor comments (1)
- [Abstract] The abstract refers to “simple semi-supervised downstream classifiers” without specifying their architecture, loss, or hyper-parameters; these details are needed for reproducibility even if the focus is on the frozen backbone.
Simulated Author's Rebuttal
We thank the referee for their insightful comments, which have helped us improve the clarity and rigor of our work. We address each major comment below and have made revisions to the manuscript where necessary.
read point-by-point responses
-
Referee: [Methods] The methods description provides no explicit statement that circuit selection (global or class-specific) is performed on data strictly disjoint from the AQUA20 evaluation splits. Class-specific selection, where each species independently chooses its duplicated range, therefore risks fitting to test-set statistics; the headline result (0.875 macro F1) cannot be verified as arising from duplication itself rather than per-class optimization on the reported benchmark.
Authors: We thank the referee for pointing this out. The original manuscript indeed lacked an explicit statement on this matter. In our experimental setup, circuit selection was performed exclusively on a held-out validation set that is strictly disjoint from the test splits used for final evaluation. We have revised the Methods section to include a detailed description of the data partitioning and selection procedure, confirming no test-set leakage. This ensures the reported improvements are attributable to the duplication technique. revision: yes
-
Referee: [Results] No error bars, standard deviations across runs, or statistical significance tests are supplied for any F1 numbers in the abstract or results. Given the class imbalance and the small absolute gap to the supervised baseline (1.4 points), the claim of “consistent improvement” across label budgets cannot be assessed for reliability.
Authors: We agree that the absence of error bars and statistical analysis limits the assessment of reliability. We have rerun the experiments with multiple random seeds and now report mean macro F1 scores with standard deviations in the revised Results section and tables. Additionally, we have included paired t-tests to assess statistical significance of the improvements over the baseline frozen embeddings. These additions confirm that the gains are consistent and statistically significant across label budgets. revision: yes
-
Referee: [Results] The statement that “roughly 75% of classes prefer a class-specific circuit” is itself computed after selection on the same data used for final reporting; an ablation that repeats selection on a held-out validation partition and then evaluates on test data is required to support the central claim that the benefit is genuinely class-dependent rather than an artifact of the selection procedure.
Authors: This is a valid concern. The original 75% figure was indeed derived from selection on the full dataset. We have conducted the requested ablation: circuit selection is now performed on a separate validation partition, followed by evaluation on the held-out test set. In this setup, 72% of classes still prefer class-specific circuits, with similar performance gains. We have added this ablation study to the Results section, along with updated figures, to substantiate the class-dependent nature of the benefit. revision: yes
Circularity Check
Class-specific circuit selection fits to AQUA20, making reported gains partly by construction
specific steps
-
fitted input called prediction
[Abstract (and implied Results)]
"At the maximum label budget, class-specific selection reaches a macro F1 of 0.875, closing the gap to the fully supervised ConvNeXt benchmark (0.889) to 1.4 points without any gradient-based training. ... Across all budgets, roughly 75% of classes prefer a class-specific circuit"
The 'optimal' circuit per species is chosen to maximize performance on the AQUA20 benchmark splits; the reported F1 is therefore the fitted value after selection rather than the result of applying a fixed duplication rule to unseen data. The improvement is statistically forced by the selection procedure itself.
full rationale
The paper's headline result (class-specific macro F1 0.875) rests on selecting a per-class duplicated layer range that maximizes a metric on the same AQUA20 splits used for final reporting. This matches the fitted-input-called-prediction pattern: the choice of circuit is optimized on the evaluation data, so the lift is not a fixed inference-path property but the outcome of data-dependent cherry-picking. Global selection improves less, consistent with reduced overfitting opportunity. The claim that 75% of classes prefer class-specific circuits is itself a post-selection statistic. No quoted statement confirms selection uses a strictly held-out validation set disjoint from the reported test splits. The duplication mechanism itself is not shown to be load-bearing once selection is removed.
Axiom & Free-Parameter Ledger
free parameters (1)
- circuit range and selection rule
axioms (1)
- domain assumption Frozen self-supervised vision embeddings already provide a strong label-efficient baseline for marine image classification
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We apply Circuit Duplication... a selected range of transformer layers is traversed twice during the forward pass... class-specific circuit selection
-
IndisputableMonolith/Cost/FunctionalEquation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Circuit Duplication consistently improves... macro F1 of 0.875
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Taufikur Rahman Fuad, Sabbir Ahmed, and Shahriar Ivan. AQUA20: A benchmark dataset for underwater species classification under challenging conditions.Arabian Jour- nal for Science and Engineering, 2026
work page 2026
-
[2]
Alzayat Saleh, Issam H. Laradji, Dmitry A. Konovalov, Michael Bradley, David Vazquez, and Marcus Sheaves. Computer vision and deep learning for fish classifica- tion in underwater habitats: A survey.Fish and Fisheries, 23:977–999, 2022
work page 2022
-
[3]
Deep learn- ing and the oceans.Computer, 55(5):39–50, 2022
Marko Radeta, Agustin Zuniga, Naser Hos- sein Motlagh, Mohan Liyanage, Ruben Fre- itas, Moustafa Youssef, Sasu Tarkoma, Hu- ber Flores, and Petteri Nurmi. Deep learn- ing and the oceans.Computer, 55(5):39–50, 2022
work page 2022
-
[4]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timoth´ ee Darcet, Th´ eo Moutakanni, et al. DINOv2: Learning ro- bust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Oriane Sim´ eoni, Huy V. Vo, Maximil- ian Seitzer, Federico Baldassarre, Maxime Oquab, et al. DINOv3.arXiv preprint arXiv:2508.10104, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[6]
Thomas Manuel Rost. Label-efficient un- derwater species classification with semi- supervised learning on frozen foundation model embeddings, 2026
work page 2026
-
[7]
LLM neuroanatomy: How I topped the LLM leaderboard without changing a single weight
David Noel Ng. LLM neuroanatomy: How I topped the LLM leaderboard without changing a single weight. Blog post, 2026. 13
work page 2026
-
[8]
Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation.Technical Report CMU-CALD-02-107, 2002
work page 2002
-
[9]
Pseudo-label: The sim- ple and efficient semi-supervised learning method for deep neural networks
Dong-Hyun Lee. Pseudo-label: The sim- ple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, 2013
work page 2013
-
[10]
Unsupervised word sense disambiguation rivaling supervised meth- ods
David Yarowsky. Unsupervised word sense disambiguation rivaling supervised meth- ods. InACL, 1995
work page 1995
-
[11]
Robert B. Fisher, Yun-Heh Chen-Burger, Daniela Giordano, Lynda Hardman, and Fang-Pang Lin.Fish4Knowledge: Collect- ing and Analyzing Massive Coral Reef Fish Video Data. Springer, 2016
work page 2016
-
[12]
Morten Goodwin, Kim Tallaksen Halvorsen, Lei Jiao, et al. Unlocking the potential of deep learning for marine ecology: Overview, applications, and outlook.ICES Journal of Marine Science, 79(2):319–336, 2022
work page 2022
-
[13]
Carlos Dominguez-Carri´ o, Joan Llu´ ıs Riera, Katleen Robert, Mikel Zabala, Susana Re- quena, Josep-Maria Gili, Jordi Griny´ o, Co- vadonga Orejas, Claudio Lo Iacono, En- rique Isla, Alejandra Londo˜ no-Burbano, and Telmo Morato. A cost-effective video system for a rapid appraisal of deep-sea ben- thic habitats: The Azor drift-cam.Meth- ods in Ecology an...
work page 2021
-
[14]
Sparsh Mittal, Srishti Srivastava, and J. Phani Jayanth. A survey of deep learn- ing techniques for underwater image classi- fication.IEEE Transactions on Neural Net- works and Learning Systems, 34(10):6968– 6982, 2023
work page 2023
-
[15]
Emerging properties in self-supervised vision trans- formers
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv´ e J´ egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision trans- formers. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion (ICCV), 2021
work page 2021
-
[16]
An image is worth 16x16 words: Transformers for image recog- nition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexan- der Kolesnikov, et al. An image is worth 16x16 words: Transformers for image recog- nition at scale. InInternational Conference on Learning Representations (ICLR), 2021
work page 2021
-
[17]
Vision transformers for zero-shot clustering of animal images: A comparative benchmarking study
Hugo Markoff, Stefan Hein Bengtson, and Michael Ørsted. Vision transformers for zero-shot clustering of animal images: A comparative benchmarking study. arXiv preprint arXiv:2602.03894, 2026
-
[18]
Multi-label plant species classification with self-supervised vi- sion transformers
Murilo Gustineli et al. Multi-label plant species classification with self-supervised vi- sion transformers. InCLEF 2024 Working Notes, 2024
work page 2024
-
[19]
Artzai Picon et al. Robust multi- species agricultural segmentation across de- vices, seasons, and sensors using hierar- chical DINOv2 models.arXiv preprint arXiv:2508.07514, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[20]
Sangmin Ying et al. Relaxed recursive transformers: Effective parameter sharing with layer-wise LoRA. arXiv preprint arXiv:2410.20672, 2024. 14 A Additional Results A.1 Global accuracy Figure 7 shows the global accuracy compari- son, complementing the macro F1 results in Fig- ure 4. The pattern is consistent: circuit dupli- cation improves over the baseli...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.