arxiv: 2604.03428 · v1 · submitted 2026-04-03 · 💻 cs.CV · cs.AI· cs.LG

Recognition: 2 theorem links

· Lean Theorem

Inference-Path Optimization via Circuit Duplication in Frozen Visual Transformers for Marine Species Classification

Thomas Manuel Rost

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:52 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords circuit duplicationfrozen vision transformersmarine species classificationinference optimizationlabel-efficient learningDINOv3 embeddingsAQUA20 benchmarkunderwater image classification

0 comments

The pith

Duplicating selected transformer layers at inference time improves frozen embeddings for marine species classification without any training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether circuit duplication—traversing a chosen range of layers twice in a frozen visual transformer—can lift performance on underwater images when labels are scarce. Using DINOv3 embeddings on the class-imbalanced AQUA20 benchmark, both global and class-specific selection of duplicated circuits beat the standard single-pass frozen baseline with simple semi-supervised classifiers. At the largest label budget, class-specific selection reaches 0.875 macro F1, shrinking the gap to a fully supervised ConvNeXt model (0.889) to only 1.4 points. Four species, including octopus, surpass their supervised references. About 75 percent of classes favor their own circuit, pointing to genuinely class-dependent gains and marking the first use of the technique in computer vision.

Core claim

Circuit duplication lets a frozen visual transformer traverse a selected range of its layers twice during the forward pass, yielding an optimized inference path that consistently outperforms the standard frozen embedding on the AQUA20 marine species dataset; class-specific selection reaches a macro F1 of 0.875 at maximum label budget, closing the gap to a fully supervised ConvNeXt benchmark (0.889) to 1.4 points and exceeding it for some classes, all without gradient updates or weight changes.

What carries the argument

Circuit duplication: selecting a range of transformer layers and traversing them twice during the forward pass to optimize the inference path.

If this is right

Circuit duplication improves over the standard frozen forward pass across all label budgets.
Class-specific selection reaches 0.875 macro F1 at maximum budget, within 1.4 points of the fully supervised ConvNeXt reference.
Four species exceed their supervised performance, with octopus gaining +12.1 F1 points.
Roughly 75 percent of classes benefit from a tailored circuit, showing class-dependent value.
The method requires no gradient-based training, preserving the frozen foundation model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Inference-time path changes like this could let foundation models adapt to other label-scarce scientific imaging domains without retraining.
Automating circuit choice from image statistics might remove the need for per-class search at deployment.
Combining circuit duplication with other semi-supervised signals could further shrink the remaining gap to fully supervised results.
Testing the same duplication ranges on terrestrial or medical vision tasks would reveal whether the gains are marine-specific or more general.

Load-bearing premise

The performance lift comes from duplicating the chosen layers rather than from the circuit-selection procedure overfitting to AQUA20 statistics or from differences in downstream classifier training.

What would settle it

Reproducing the experiments on an independent marine image dataset and observing no improvement or a performance drop from circuit duplication would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.03428 by Thomas Manuel Rost.

**Figure 2.** Figure 2: Pipeline for global circuit selection. All 66 duplicated circuits are swept over the frozen [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Pipeline for class-specific circuit selection. The same sweep is performed, but the best [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Global macro F1 across label budgets. Red: standard frozen baseline. Blue: globally [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Per-class F1 scores at the 100% label budget. Red: best standard frozen baseline classifier [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Composition of per-class winning strategies across label budgets. Red: number of classes where the standard forward pass (baseline) achieves the best F1. Blue: classes where the globally optimized circuit is best. Purple: classes where a unique circuit, distinct from both baseline and global winner, achieves the best F1. Across all budgets, approximately 75% of classes prefer a class-specific circuit. on … view at source ↗

**Figure 7.** Figure 7: Global accuracy across label budgets. Conventions as in Figure 4. The ConvNeXt fully [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Per-class F1 scores at 5% label budget [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Per-class F1 scores at 10% label budget. [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: Per-class F1 scores at 15% label budget. [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: Per-class F1 scores at 5 seeds per class. [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

**Figure 12.** Figure 12: Per-class F1 scores at 10 seeds per class. [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 13.** Figure 13: Per-class F1 scores at 20 seeds per class. [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

read the original abstract

Automated underwater species classification is constrained by annotation cost and environmental variation that limits the transferability of fully supervised models. Recent work has shown that frozen embeddings from self-supervised vision foundation models already provide a strong label-efficient baseline for marine image classification. Here we investigate whether this frozen-embedding regime can be improved at inference time, without fine-tuning or changing model weights. We apply Circuit Duplication, an inference-time method originally proposed for Large Language Models, in which a selected range of transformer layers is traversed twice during the forward pass. We evaluate on the class-imbalanced AQUA20 benchmark using frozen DINOv3 embeddings under two settings: global circuit selection, where a single duplicated circuit is chosen for the full dataset, and class-specific circuit selection, where each species may receive a different optimal circuit. Both settings use simple semi-supervised downstream classifiers. Circuit Duplication consistently improves over the standard frozen forward pass. At the maximum label budget, class-specific selection reaches a macro F1 of 0.875, closing the gap to the fully supervised ConvNeXt benchmark (0.889) to 1.4 points without any gradient-based training. Four species exceed their fully supervised reference, with octopus improving by +12.1 F1 points. Across all budgets, roughly 75% of classes prefer a class-specific circuit, indicating a genuinely class-dependent benefit. To our knowledge, this is the first application of Circuit Duplication to computer vision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Circuit duplication gives a modest inference-time lift to frozen DINOv3 on AQUA20, but class-specific selection risks attributing gains to data-dependent cherry-picking rather than the duplication itself.

read the letter

The main point is that duplicating a chosen range of layers in a frozen vision transformer improves macro F1 on the AQUA20 marine benchmark, and letting each class pick its own range gets the score to 0.875—closing most of the gap to a fully supervised ConvNeXt at 0.889—without any weight updates. Global selection helps too, though by less. Four species even beat the supervised reference, with octopus gaining over 12 points. This is the first reported use of the circuit-duplication trick outside language models, and the per-class variant is a reasonable extension for imbalanced ecological data where different species may benefit from different layer paths. The downstream classifiers are kept simple and semi-supervised, which keeps the focus on the inference change. The numbers are concrete enough to be worth checking. The soft spot is the selection procedure. If the optimal circuit range for each class is chosen by maximizing performance on the same AQUA20 splits used for final reporting, then the reported lift and the claim that 75 percent of classes prefer a class-specific circuit could partly reflect fitting to test statistics rather than a general property of duplication. The abstract gives no explicit statement that selection and evaluation use disjoint data, and global selection improving less is consistent with reduced opportunity for overfitting. No error bars, no ablation of the selection rule, and no statistical tests are mentioned, so the reliability of the gains is hard to judge from the given text. This work is for people building label-efficient pipelines in marine or ecological vision who already have a strong frozen embedding and want a cheap inference knob. A reader working on transformer inference optimization would find the concrete transfer and the benchmark results useful to see. It deserves a serious referee because the idea is new in computer vision, the empirical claims are specific, and the potential flaw is fixable with clearer methods rather than fatal to the premise.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes Circuit Duplication—an inference-time technique that traverses a selected range of transformer layers twice—as a way to improve frozen DINOv3 embeddings for class-imbalanced marine species classification on the AQUA20 benchmark. It evaluates both global and class-specific circuit selection using simple semi-supervised downstream classifiers, reporting that class-specific selection reaches macro F1 of 0.875 at the maximum label budget (closing the gap to a fully supervised ConvNeXt baseline of 0.889 to 1.4 points) without any gradient-based training, with four species exceeding the supervised reference and roughly 75% of classes preferring class-specific circuits.

Significance. If the reported gains can be shown to arise from duplication rather than selection leakage, the work would be a meaningful first demonstration of inference-path optimization for vision transformers in label-scarce domains. It highlights the potential for class-dependent layer traversal to approach supervised performance at low annotation cost and could influence efficient deployment of foundation models in environmental monitoring.

major comments (3)

[Methods] The methods description provides no explicit statement that circuit selection (global or class-specific) is performed on data strictly disjoint from the AQUA20 evaluation splits. Class-specific selection, where each species independently chooses its duplicated range, therefore risks fitting to test-set statistics; the headline result (0.875 macro F1) cannot be verified as arising from duplication itself rather than per-class optimization on the reported benchmark.
[Results] No error bars, standard deviations across runs, or statistical significance tests are supplied for any F1 numbers in the abstract or results. Given the class imbalance and the small absolute gap to the supervised baseline (1.4 points), the claim of “consistent improvement” across label budgets cannot be assessed for reliability.
[Results] The statement that “roughly 75% of classes prefer a class-specific circuit” is itself computed after selection on the same data used for final reporting; an ablation that repeats selection on a held-out validation partition and then evaluates on test data is required to support the central claim that the benefit is genuinely class-dependent rather than an artifact of the selection procedure.

minor comments (1)

[Abstract] The abstract refers to “simple semi-supervised downstream classifiers” without specifying their architecture, loss, or hyper-parameters; these details are needed for reproducibility even if the focus is on the frozen backbone.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us improve the clarity and rigor of our work. We address each major comment below and have made revisions to the manuscript where necessary.

read point-by-point responses

Referee: [Methods] The methods description provides no explicit statement that circuit selection (global or class-specific) is performed on data strictly disjoint from the AQUA20 evaluation splits. Class-specific selection, where each species independently chooses its duplicated range, therefore risks fitting to test-set statistics; the headline result (0.875 macro F1) cannot be verified as arising from duplication itself rather than per-class optimization on the reported benchmark.

Authors: We thank the referee for pointing this out. The original manuscript indeed lacked an explicit statement on this matter. In our experimental setup, circuit selection was performed exclusively on a held-out validation set that is strictly disjoint from the test splits used for final evaluation. We have revised the Methods section to include a detailed description of the data partitioning and selection procedure, confirming no test-set leakage. This ensures the reported improvements are attributable to the duplication technique. revision: yes
Referee: [Results] No error bars, standard deviations across runs, or statistical significance tests are supplied for any F1 numbers in the abstract or results. Given the class imbalance and the small absolute gap to the supervised baseline (1.4 points), the claim of “consistent improvement” across label budgets cannot be assessed for reliability.

Authors: We agree that the absence of error bars and statistical analysis limits the assessment of reliability. We have rerun the experiments with multiple random seeds and now report mean macro F1 scores with standard deviations in the revised Results section and tables. Additionally, we have included paired t-tests to assess statistical significance of the improvements over the baseline frozen embeddings. These additions confirm that the gains are consistent and statistically significant across label budgets. revision: yes
Referee: [Results] The statement that “roughly 75% of classes prefer a class-specific circuit” is itself computed after selection on the same data used for final reporting; an ablation that repeats selection on a held-out validation partition and then evaluates on test data is required to support the central claim that the benefit is genuinely class-dependent rather than an artifact of the selection procedure.

Authors: This is a valid concern. The original 75% figure was indeed derived from selection on the full dataset. We have conducted the requested ablation: circuit selection is now performed on a separate validation partition, followed by evaluation on the held-out test set. In this setup, 72% of classes still prefer class-specific circuits, with similar performance gains. We have added this ablation study to the Results section, along with updated figures, to substantiate the class-dependent nature of the benefit. revision: yes

Circularity Check

1 steps flagged

Class-specific circuit selection fits to AQUA20, making reported gains partly by construction

specific steps

fitted input called prediction [Abstract (and implied Results)]
"At the maximum label budget, class-specific selection reaches a macro F1 of 0.875, closing the gap to the fully supervised ConvNeXt benchmark (0.889) to 1.4 points without any gradient-based training. ... Across all budgets, roughly 75% of classes prefer a class-specific circuit"

The 'optimal' circuit per species is chosen to maximize performance on the AQUA20 benchmark splits; the reported F1 is therefore the fitted value after selection rather than the result of applying a fixed duplication rule to unseen data. The improvement is statistically forced by the selection procedure itself.

full rationale

The paper's headline result (class-specific macro F1 0.875) rests on selecting a per-class duplicated layer range that maximizes a metric on the same AQUA20 splits used for final reporting. This matches the fitted-input-called-prediction pattern: the choice of circuit is optimized on the evaluation data, so the lift is not a fixed inference-path property but the outcome of data-dependent cherry-picking. Global selection improves less, consistent with reduced overfitting opportunity. The claim that 75% of classes prefer class-specific circuits is itself a post-selection statistic. No quoted statement confirms selection uses a strictly held-out validation set disjoint from the reported test splits. The duplication mechanism itself is not shown to be load-bearing once selection is removed.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on the empirical transfer of an LLM inference trick to vision and on the assumption that frozen DINOv3 embeddings already encode useful marine features; no new mathematical objects are introduced.

free parameters (1)

circuit range and selection rule
Which transformer layers are duplicated and whether the choice is global or per-class is determined by performance on the target benchmark.

axioms (1)

domain assumption Frozen self-supervised vision embeddings already provide a strong label-efficient baseline for marine image classification
Invoked in the opening paragraph as established by recent work.

pith-pipeline@v0.9.0 · 5560 in / 1279 out tokens · 48608 ms · 2026-05-13T19:52:38.524721+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We apply Circuit Duplication... a selected range of transformer layers is traversed twice during the forward pass... class-specific circuit selection
IndisputableMonolith/Cost/FunctionalEquation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Circuit Duplication consistently improves... macro F1 of 0.875

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 3 internal anchors

[1]

AQUA20: A benchmark dataset for underwater species classification under challenging conditions.Arabian Jour- nal for Science and Engineering, 2026

Taufikur Rahman Fuad, Sabbir Ahmed, and Shahriar Ivan. AQUA20: A benchmark dataset for underwater species classification under challenging conditions.Arabian Jour- nal for Science and Engineering, 2026

work page 2026
[2]

Laradji, Dmitry A

Alzayat Saleh, Issam H. Laradji, Dmitry A. Konovalov, Michael Bradley, David Vazquez, and Marcus Sheaves. Computer vision and deep learning for fish classifica- tion in underwater habitats: A survey.Fish and Fisheries, 23:977–999, 2022

work page 2022
[3]

Deep learn- ing and the oceans.Computer, 55(5):39–50, 2022

Marko Radeta, Agustin Zuniga, Naser Hos- sein Motlagh, Mohan Liyanage, Ruben Fre- itas, Moustafa Youssef, Sasu Tarkoma, Hu- ber Flores, and Petteri Nurmi. Deep learn- ing and the oceans.Computer, 55(5):39–50, 2022

work page 2022
[4]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth´ ee Darcet, Th´ eo Moutakanni, et al. DINOv2: Learning ro- bust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

DINOv3

Oriane Sim´ eoni, Huy V. Vo, Maximil- ian Seitzer, Federico Baldassarre, Maxime Oquab, et al. DINOv3.arXiv preprint arXiv:2508.10104, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

Label-efficient un- derwater species classification with semi- supervised learning on frozen foundation model embeddings, 2026

Thomas Manuel Rost. Label-efficient un- derwater species classification with semi- supervised learning on frozen foundation model embeddings, 2026

work page 2026
[7]

LLM neuroanatomy: How I topped the LLM leaderboard without changing a single weight

David Noel Ng. LLM neuroanatomy: How I topped the LLM leaderboard without changing a single weight. Blog post, 2026. 13

work page 2026
[8]

Learning from labeled and unlabeled data with label propagation.Technical Report CMU-CALD-02-107, 2002

Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation.Technical Report CMU-CALD-02-107, 2002

work page 2002
[9]

Pseudo-label: The sim- ple and efficient semi-supervised learning method for deep neural networks

Dong-Hyun Lee. Pseudo-label: The sim- ple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, 2013

work page 2013
[10]

Unsupervised word sense disambiguation rivaling supervised meth- ods

David Yarowsky. Unsupervised word sense disambiguation rivaling supervised meth- ods. InACL, 1995

work page 1995
[11]

Fisher, Yun-Heh Chen-Burger, Daniela Giordano, Lynda Hardman, and Fang-Pang Lin.Fish4Knowledge: Collect- ing and Analyzing Massive Coral Reef Fish Video Data

Robert B. Fisher, Yun-Heh Chen-Burger, Daniela Giordano, Lynda Hardman, and Fang-Pang Lin.Fish4Knowledge: Collect- ing and Analyzing Massive Coral Reef Fish Video Data. Springer, 2016

work page 2016
[12]

Unlocking the potential of deep learning for marine ecology: Overview, applications, and outlook.ICES Journal of Marine Science, 79(2):319–336, 2022

Morten Goodwin, Kim Tallaksen Halvorsen, Lei Jiao, et al. Unlocking the potential of deep learning for marine ecology: Overview, applications, and outlook.ICES Journal of Marine Science, 79(2):319–336, 2022

work page 2022
[13]

A cost-effective video system for a rapid appraisal of deep-sea ben- thic habitats: The Azor drift-cam.Meth- ods in Ecology and Evolution, 12:1379–1388, 2021

Carlos Dominguez-Carri´ o, Joan Llu´ ıs Riera, Katleen Robert, Mikel Zabala, Susana Re- quena, Josep-Maria Gili, Jordi Griny´ o, Co- vadonga Orejas, Claudio Lo Iacono, En- rique Isla, Alejandra Londo˜ no-Burbano, and Telmo Morato. A cost-effective video system for a rapid appraisal of deep-sea ben- thic habitats: The Azor drift-cam.Meth- ods in Ecology an...

work page 2021
[14]

Phani Jayanth

Sparsh Mittal, Srishti Srivastava, and J. Phani Jayanth. A survey of deep learn- ing techniques for underwater image classi- fication.IEEE Transactions on Neural Net- works and Learning Systems, 34(10):6968– 6982, 2023

work page 2023
[15]

Emerging properties in self-supervised vision trans- formers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv´ e J´ egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision trans- formers. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion (ICCV), 2021

work page 2021
[16]

An image is worth 16x16 words: Transformers for image recog- nition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexan- der Kolesnikov, et al. An image is worth 16x16 words: Transformers for image recog- nition at scale. InInternational Conference on Learning Representations (ICLR), 2021

work page 2021
[17]

Vision transformers for zero-shot clustering of animal images: A comparative benchmarking study

Hugo Markoff, Stefan Hein Bengtson, and Michael Ørsted. Vision transformers for zero-shot clustering of animal images: A comparative benchmarking study. arXiv preprint arXiv:2602.03894, 2026

work page arXiv 2026
[18]

Multi-label plant species classification with self-supervised vi- sion transformers

Murilo Gustineli et al. Multi-label plant species classification with self-supervised vi- sion transformers. InCLEF 2024 Working Notes, 2024

work page 2024
[19]

Mitigating Domain Drift in Multi Species Segmentation with DINOv2: A Cross-Domain Evaluation in Herbicide Research Trials

Artzai Picon et al. Robust multi- species agricultural segmentation across de- vices, seasons, and sensors using hierar- chical DINOv2 models.arXiv preprint arXiv:2508.07514, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[20]

Relaxed recursive transformers: Effective parameter sharing with layer-wise lora.arXiv preprint arXiv:2410.20672, 2024

Sangmin Ying et al. Relaxed recursive transformers: Effective parameter sharing with layer-wise LoRA. arXiv preprint arXiv:2410.20672, 2024. 14 A Additional Results A.1 Global accuracy Figure 7 shows the global accuracy compari- son, complementing the macro F1 results in Fig- ure 4. The pattern is consistent: circuit dupli- cation improves over the baseli...

work page arXiv 2024