arxiv: 2605.07086 · v1 · submitted 2026-05-08 · 💻 cs.CV · cs.LG

Recognition: no theorem link

Task Relevance Is Not Local Replaceability: A Two-Axis View of Channel Information

Houman Safaai , Andrew T. Landau , Celia C. Beron , Yasin Mazloumi , Bernardo L. Sabatini

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:22 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords channel pruningneural network channelstask relevancelocal replaceabilityvision networksCIFAR-100

0 comments

The pith

Task relevance does not equal local replaceability for channels in vision networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that a single importance score for network channels conceals two separate questions: how much the channel relates to the task and whether its function can be supplied by other channels in the same layer when removed. The authors separate these into a target axis for task information and a local axis for input capture plus peer overlap, then show that the axes are weakly aligned, produce different channel groups, and diverge as training proceeds. Under fixed FLOPs-matched pruning, local-axis metrics turn out to be stronger predictors of which channels can be removed than target-axis metrics, with the pattern holding across backbones and datasets.

Core claim

The paper establishes that the two axes remain distinct after training, with local replaceability refining removability predictions beyond what input capture and task relevance alone provide, and that local-axis metrics outperform target-axis metrics for predicting channel removability under the fixed FLOPs-matched pruning protocol across ResNet-18, VGG-16, and MobileNetV2 on CIFAR-100 as well as in stress tests on other datasets.

What carries the argument

The two-axis view that separates the local axis (input capture and peer overlap) from the target axis (task information and target-excess information) to distinguish relevance from replaceability.

If this is right

Local-axis metrics are more reliable predictors of removability than target-axis metrics.
The axes induce different channel groupings and separate rapidly during training despite strong coupling at random initialization.
Peer support refines removability beyond input capture and task relevance alone.
Norm-based baselines remain competitive in architectures such as VGG-16.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Pruning methods could be redesigned to measure peer overlap directly instead of depending only on gradient or activation relevance scores.
The axis distinction may apply to understanding redundancy in layers or networks beyond the tested vision backbones.
Single-score importance rankings may systematically retain replaceable channels and discard irreplaceable ones.

Load-bearing premise

That the lesion-plus-peer-replacement experiments isolate local replaceability without confounding effects from the specific pruning protocol or network initialization.

What would settle it

Experiments on new architectures or datasets in which target-axis metrics predict channel removability better than local-axis metrics under the identical fixed FLOPs-matched pruning protocol.

Figures

Figures reproduced from arXiv: 2605.07086 by Andrew T. Landau, Bernardo L. Sabatini, Celia C. Beron, Houman Safaai, Yasin Mazloumi.

**Figure 1.** Figure 1: Conceptual overview. (A) Local input variation and target relevance need not follow the same depth profile. (B) Channels with similar task relevance can differ in peer support, so relevance alone does not determine removability. (C) Real channels occupy a weakly coupled two-axis plane; the highlighted band holds I(T; Y ) fixed while local input capture IX varies. et al., 2019, Luo et al., 2017, He et al., … view at source ↗

**Figure 2.** Figure 2: Weakly aligned axes of channel information (CIFAR-100; 3 architectures × 5 seeds, all layers pooled). (A) Spearman rank correlation matrix after within-layer rank normalization for six channel metrics: three local-axis (IX, R¯X, ∥w∥ 2 ) and three target-axis (I(T; Y ), RedT , Syn). The clear block-diagonal structure (high within-block, near-zero between-block) supports weak cross-axis alignment rather than… view at source ↗

**Figure 3.** Figure 3: Higher-order support for the two-axis decomposition. (A) Newman modularity of the local redundancy graph (QR, solid) is larger than that of the target-excess graph (QS, dashed) at every relative depth. (B,C) Local replaceability shifts from singleton duplicate regimes to distributed hulls with depth. (D) Triplet target-excess over the best pair, S3/S2, rises with depth. CIFAR-100, 3 backbones × 5 seeds, me… view at source ↗

**Figure 4.** Figure 4: The two-axis structure emerges through learning dynamics and propagates weakly across layers. Throughout this figure IT ≡ I(T; Y ) and “update” is the SGD update direction −∇L. (A) Cross-axis coupling drops during training. (B) Coupled early motion gives way to separated local and target updates; ∆IX, ∆IT are checkpoint-to-checkpoint changes within an epoch interval. (C) Local ranks stabilize earlier than … view at source ↗

**Figure 5.** Figure 5: Direct lesion evidence. Larger scores predict lower lesion damage. Score key: −IX flips the input-capture proxy so low capture is high score; peer is the within-layer overlap R¯X; H is the compact-hull support score Efull i / max(1, |Hi |) (Appendix A defines Efull i and the greedy hull Hi); +H is the within-cell standardized sum z(−IX) + z(H); +P adds the top-8 peer-reconstruction R2 in place of H. Depth … view at source ↗

**Figure 6.** Figure 6: FLOPs-matched pruning. AUC uses common model-specific FLOPs intervals. Dashed lines in A–C show unpruned accuracy. In D: blue = local−target, orange = local−magnitude, purple = hybrid−local; horizontal bars show paired bootstrap 95% CIs, MN2 denotes MobileNetV2, and magnitude remains strongest on VGG-16. Against the strongest non-local baseline, best local remains ahead on ResNet-18 (+32.4pp [+31.7, +33.1]… view at source ↗

**Figure 7.** Figure 7: Broader benchmark consistency for the structural claims. Each row is one dataset/backbone cell, with points showing mean±SEM over reusable checkpoints from that family. (A) Cross-axis correlation ρ(IX, I(T; Y )). (B) ARI between local (IX, R¯X) and target (I(T; Y ), Syn) clusterings. (C) Gaussian target-side collapse, measured by ρ(I(T; Y ), RT ), on the newer ImageNet-100 families that retain target PID … view at source ↗

**Figure 8.** Figure 8: Fixed-protocol pruning breadth: best-local vs. prior baselines across 11 benchmark cells. Each row in panels A–C is one dataset/backbone cell from the fixed breadth suite (CIFAR-10 × 3 backbones, Tiny-ImageNet × 3, ImageNet-100 × 5); bars are mean ± SEM over 3–5 reusable checkpoints per cell. (A) vs. best target PID score, positive on 11/11 cells. (B) vs. Taylor, positive on 9/11 cells (negative on CIFAR-1… view at source ↗

**Figure 9.** Figure 9: Targeted uniform-allocation weight sweep. Uniform allocation only: this figure isolates score construction from cross-layer allocation and should not be compared directly to the globalthreshold AUC values in [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗

**Figure 10.** Figure 10: Pairwise redundancy dominates pairwise organization. (A) Pairwise redundancy matrix for ResNet-18 conv1, with channels reordered by descriptive local cluster label. Clear block-diagonal structure: within-type pairwise redundancy exceeds between-type redundancy, particularly in early layers. (B) Within-type minus between-type pairwise redundancy (Rwithin − Rbetween) vs. relative depth. The difference is la… view at source ↗

**Figure 11.** Figure 11: Direct lesion evidence for replaceability. Single-channel ablations without fine-tuning on the fixed CIFAR-100 evaluation split. All scores are oriented so that larger values predict lower lesion damage; peer denotes R¯X, compact hull expl. denotes Efull i / max(1, |Hi |), and “+” labels add standardized support terms to −IX. (A) Removal ranking; positive Spearman means agreement with lower damage. (B) Ga… view at source ↗

**Figure 12.** Figure 12: Higher-order structure across training. ResNet-18/CIFAR-100, 10 checkpointed seeds, six depth-spaced convolutional layers, mean ± SEM over seeds after averaging layers within seed. QR − QS is the local-redundancy minus target-excess graph modularity gap; “sat.” denotes the saturated-hull fraction (|Hi | ≥ 10). The R-vs-S modularity gap is already positive at initialization, but hull size, distributed-hull… view at source ↗

**Figure 13.** Figure 13: Why two axes matter for pruning. (A) A single-score importance view of one conv layer (n = 256 channels of ResNet-18 layer3.0.conv2, seed 42): channels are ranked along a 1-D axis and the top 50% are kept. (B) The same channels plotted in the two-axis plane (IX, I(T; Y )): red = kept under the prior view, grey = dropped. Markers show the two-axis decision (◦ kept / × dropped). About 26% of channels disagr… view at source ↗

**Figure 14.** Figure 14: BROJA target-side PID decomposition (pooled within-layer Spearman correlations on ResNet-18 and VGG-16; mean ± SEM over layers). BROJA unique information aligns with task MI (ρ¯ ≈ 0.93); BROJA synergy anti-aligns with task MI (ρ¯ ≈ −0.55); BROJA shared information SI tracks R¯X only weakly (ρ¯ ≈ 0.35); and the irrecoverable information loss IIL (defined above) is near zero against IX. BROJA SI is therefor… view at source ↗

read the original abstract

Channel importance in vision networks is usually summarized by a single score. That summary hides two different questions: how much a channel is related to the task, and whether its function can be supplied by same-layer peers when the channel is removed. We call the second property local replaceability. We introduce a two-axis view that separates these questions. The local axis measures input capture and peer overlap, while the target axis measures task information and target-excess information. Across ResNet-18, VGG-16, and MobileNetV2 trained on CIFAR-100, the two axes are weakly aligned, induce different channel groupings, and separate rapidly during training despite being strongly coupled at random initialization. A Gaussian linear analysis accounts for how this separation can arise through residualized gradient directions, and lesion plus peer-replacement experiments show that peer support refines removability beyond input capture and task relevance alone. Under the fixed FLOPs-matched pruning protocol, local-axis metrics are more reliable predictors of removability than target-axis metrics across the three CIFAR-100 backbones, with the same direction preserved in stress tests on CIFAR-10, Tiny-ImageNet, ImageNet-100, and a ConvNeXt-T/ImageNet-100 pilot. These findings identify an axis-level distinction rather than a universal ranking of pruning scores: local replaceability is a more reliable guide to removability than target relevance, while norm-based baselines remain competitive in architectures such as VGG-16. Relevance-based scores ask what a channel says about the task; pruning asks whether the network still needs that channel when its peers remain available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper splits channel importance into task relevance versus local replaceability and finds local metrics predict removability better under their pruning protocol.

read the letter

The main point is that a channel's value for pruning isn't captured by how much it relates to the final task alone. You also need to know whether other channels in the same layer can cover its function when it is removed. The authors label this second property local replaceability and treat it as a separate axis from target relevance. They show the two axes are only weakly aligned, group channels differently, and pull apart quickly during training even though they start coupled at initialization. A simple Gaussian linear model explains one route for this separation through residualized gradients. Lesion and peer-replacement tests then indicate that local measures track which channels can actually be dropped more reliably than target measures under a fixed FLOPs-matched pruning rule, and the pattern holds across ResNet-18, VGG-16, MobileNetV2 on CIFAR-100 plus checks on other datasets and a ConvNeXt pilot. Norm baselines stay competitive in some architectures like VGG-16. This reframing is useful because single-score pruning heuristics have dominated the literature. The multi-architecture, multi-dataset tests and the attempt to derive the axis split from a tractable model are the clearest strengths. The empirical direction on local versus target metrics under their protocol is the central new observation. The soft spot is that the pruning protocol itself may couple the axes. Because the rule is fixed and applied after training, channels with high peer overlap could be retained for reasons that have nothing to do with an independent local axis. The paper does not appear to ablate the pruning rule or re-run from varied initializations while holding everything else fixed, so the claimed isolation rests on the lesion-plus-peer design without those controls. Quantitative details on effect sizes and statistical tests are also thin in the summary. This work is for people who build or analyze pruning and interpretability tools for vision networks. A reader who already works on channel redundancy would find the two-axis distinction worth testing, but they would want the full methods and robustness checks before treating the superiority of local metrics as settled. I would send it to peer review. The idea is concrete enough and the experiments broad enough that referees can usefully pressure the controls and the generality of the protocol dependence.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that channel importance in convolutional vision networks is not captured by a single score but separates into two weakly aligned axes: a local axis (input capture and peer overlap within the layer) and a target axis (task information and target-excess information). These axes diverge rapidly during training despite strong coupling at random initialization. Lesion studies combined with peer-replacement tests show that local replaceability refines removability predictions beyond input capture or task relevance alone. Under a fixed FLOPs-matched pruning protocol, local-axis metrics outperform target-axis metrics as predictors of channel removability across ResNet-18, VGG-16, and MobileNetV2 on CIFAR-100, with the same directional pattern preserved in stress tests on CIFAR-10, Tiny-ImageNet, ImageNet-100, and a ConvNeXt-T pilot. A linear Gaussian analysis is offered to explain the separation via residualized gradients.

Significance. If the two-axis separation and the predictive superiority of local metrics hold after addressing potential confounds, the work would meaningfully advance pruning and compression research by shifting emphasis from pure task relevance to local replaceability. Strengths include testing across three primary backbones plus stress-test datasets/architectures and the provision of an explanatory linear model. These elements support a falsifiable distinction rather than a universal ranking of pruning scores, which could inform more robust channel selection methods.

major comments (2)

[pruning experiments and lesion-plus-peer-replacement tests] The central claim that local-axis metrics are more reliable predictors of removability than target-axis metrics under the fixed FLOPs-matched pruning protocol (stated in the abstract and supported by lesion-plus-peer-replacement experiments) may be confounded by interactions with the pruning rule itself. The protocol could preferentially retain channels with high peer overlap, rendering the observed superiority an artifact of the selection criterion rather than evidence of separable axes; an ablation of the pruning rule or re-initialization from varied seeds while holding architecture fixed is required to isolate the effect.
[abstract and results sections] The abstract and experimental results provide no quantitative details on statistical significance, effect sizes, confidence intervals, or sensitivity to hyper-parameters for the cross-backbone superiority of local metrics. This weakens support for the claim that the direction is preserved across CIFAR-100 backbones and stress tests, as the linear Gaussian model is presented as explanatory rather than as the source of the empirical result.

minor comments (2)

[methodology] The distinction between 'input capture' and 'peer overlap' on the local axis, and between 'task information' and 'target-excess information' on the target axis, would benefit from an explicit summary table or diagram to clarify how each metric is computed.
[related work] The manuscript should include additional references situating the two-axis view against prior work on channel redundancy, mutual information-based pruning, and gradient-based importance scores.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address the concerns about potential confounding in the pruning protocol and the absence of statistical details. We defend the core experimental design on substantive grounds while agreeing to add controls and quantitative analyses in the revision.

read point-by-point responses

Referee: [pruning experiments and lesion-plus-peer-replacement tests] The central claim that local-axis metrics are more reliable predictors of removability than target-axis metrics under the fixed FLOPs-matched pruning protocol (stated in the abstract and supported by lesion-plus-peer-replacement experiments) may be confounded by interactions with the pruning rule itself. The protocol could preferentially retain channels with high peer overlap, rendering the observed superiority an artifact of the selection criterion rather than evidence of separable axes; an ablation of the pruning rule or re-initialization from varied seeds while holding architecture fixed is required to isolate the effect.

Authors: We thank the referee for this observation. The fixed FLOPs-matched protocol removes the same number of channels (or equivalent compute) for every metric, with the ranking supplied by the metric under test; post-pruning accuracy then measures how well that metric identified removable channels. Because the protocol is identical across metrics, differences in outcome directly compare predictive reliability rather than being driven by unequal removal budgets. The lesion-plus-peer-replacement tests are performed independently of any pruning rule and already isolate the contribution of local replaceability. Nevertheless, to rule out seed-specific artifacts we will add, in the revision, re-initialization experiments from multiple random seeds while holding architecture and dataset fixed. We therefore treat the request as addressable by partial revision rather than requiring a full change to the central claim. revision: partial
Referee: [abstract and results sections] The abstract and experimental results provide no quantitative details on statistical significance, effect sizes, confidence intervals, or sensitivity to hyper-parameters for the cross-backbone superiority of local metrics. This weakens support for the claim that the direction is preserved across CIFAR-100 backbones and stress tests, as the linear Gaussian model is presented as explanatory rather than as the source of the empirical result.

Authors: We agree that the current presentation would be strengthened by explicit statistical reporting. In the revised manuscript we will augment both the abstract and the results sections with (i) statistical significance tests (or p-values) for the accuracy differences between local- and target-axis metrics, (ii) effect sizes together with standard deviations across the three primary backbones, (iii) confidence intervals on the reported deltas, and (iv) a brief sensitivity table showing that the directional superiority is stable under modest changes in pruning ratio and training seed. These additions will make the empirical support for cross-backbone consistency fully quantitative while leaving the linear Gaussian analysis in its explanatory role. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on independent lesion and peer-replacement measurements

full rationale

The paper's central results are obtained from direct lesion experiments and peer-replacement tests that measure removability on trained networks under a fixed pruning protocol. Local-axis and target-axis metrics are computed from input capture, peer overlap, task information, and target-excess information, none of which are defined in terms of the other or fitted to the target removability outcome. The Gaussian linear analysis is presented only as a post-hoc account of how axis separation can arise, not as the source or definition of the empirical findings. No equations reduce a claimed prediction to its inputs by construction, no uniqueness theorems are imported via self-citation, and no ansatz is smuggled in. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The two-axis framework and the Gaussian linear analysis introduce new constructs whose validity is tested only within this work; no external benchmarks or formal derivations are mentioned.

axioms (2)

domain assumption Local replaceability can be isolated by measuring peer overlap and input capture independently of task labels.
This separation is assumed when defining the local axis and when designing the peer-replacement experiments.
domain assumption The Gaussian linear model accurately captures how residualized gradients cause the two axes to separate during training.
Invoked to explain the observed decoupling without further empirical validation in the abstract.

invented entities (2)

Local axis no independent evidence
purpose: Quantifies input capture and peer overlap to measure replaceability
New measurement axis introduced to separate replaceability from relevance.
Target axis no independent evidence
purpose: Quantifies task information and target-excess information
New measurement axis introduced to separate relevance from replaceability.

pith-pipeline@v0.9.0 · 5614 in / 1610 out tokens · 41743 ms · 2026-05-11T01:22:55.121475+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Nils Bertschinger, Johannes Rauh, Eckehard Olbrich, J¨urgen Jost, and Nihat Ay

doi: 10.1103/PhysRevE.91.052802. Nils Bertschinger, Johannes Rauh, Eckehard Olbrich, J¨urgen Jost, and Nihat Ay. Quantifying unique information.Entropy, 16(4):2161–2183,

work page doi:10.1103/physreve.91.052802
[2]

Quantifying Unique Information , volume=

doi: 10.3390/e16042161. Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Conerly, Nicholas L. Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Alex Tamkin, Karina Nguyen, Brayden McLean, Josiah E. Burke, Tristan Hume, Shan Carter, Tom Heni...

work page doi:10.3390/e16042161
[3]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei

URL https://transformer-circuits.pub/2023/ monosemantic-features/. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255,

work page 2023
[4]

Ziv Goldfeld, Ewout van den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, and Yury Polyanskiy

URL https://transformer-circuits.pub/2022/ toy_model/. Ziv Goldfeld, Ewout van den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, and Yury Polyanskiy. Estimating information flow in deep neural networks. InInternational Conference on Machine Learning (ICML),

work page 2022
[5]

Alexander Kraskov, Harald St ¨ogbauer, and Peter Grassberger

URL https://proceedings.iclr.cc/paper_files/ paper/2024/hash/1fa1ab11f4bd5f94b2ec20e794dbfa3b-Abstract-Conference.html. Alexander Kraskov, Harald St ¨ogbauer, and Peter Grassberger. Estimating mutual information. Physical Review E, 69(6):066138,

work page 2024
[6]

Estimating Mutual Information

doi: 10.1103/PhysRevE.69.066138. Alex Krizhevsky. Learning multiple layers of features from tiny images. Techni- cal report, University of Toronto,

work page doi:10.1103/physreve.69.066138
[7]

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf

URL https://www.cs.toronto.edu/~kriz/ learning-features-2009-TR.pdf. Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. InInternational Conference on Learning Representations (ICLR),

work page 2009
[8]

doi: 10.1073/pnas.0601602103. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: A...

work page doi:10.1073/pnas.0601602103
[9]

PyTorch Contributors

doi: 10.1109/TPAMI.2005.159. PyTorch Contributors. TorchVision: PyTorch’s computer vision library. https://github.com/ pytorch/vision,

work page doi:10.1109/tpami.2005.159 2005
[10]

Opening the Black Box of Deep Neural Networks via Information

Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810,

work page Pith review arXiv
[11]

The information bottleneck method

URL https://transformer-circuits.pub/2024/scaling-monosemanticity/. Naftali Tishby, Fernando C Pereira, and William Bialek. The information bottleneck method.arXiv preprint physics/0004057,

work page Pith review arXiv 2024
[12]

Mutual information preserving neural network pruning.arXiv preprint arXiv:2411.00147,

Charles Westphal, Stephen Hailes, and Mirco Musolesi. Mutual information preserving neural network pruning.arXiv preprint arXiv:2411.00147,

work page arXiv
[13]

Mutual information preserving neural network pruning.arXiv preprint arXiv:2411.00147,

doi: 10.48550/arXiv.2411.00147. URL https://arxiv.org/abs/2411.00147. 11 Charles Westphal, Stephen Hailes, and Mirco Musolesi. Partial information decomposition for data interpretability and feature selection. InProceedings of The 28th International Confer- ence on Artificial Intelligence and Statistics, volume 258 ofProceedings of Machine Learning Resear...

work page doi:10.48550/arxiv.2411.00147
[15]

Nonnegative Decomposition of Multivariate Information

URLhttps://arxiv.org/abs/1004.2515. Appendix guide.The appendix is organized thematically as a support package for the four main claims. H1 denotes weak alignment between the local and target axes; H2 denotes learned sep- aration and the Gaussian residualization mechanism; H3 denotes weak axis-specific cross-layer propagation; and H4 denotes the intervent...

work page Pith review arXiv
[16]

When reporting graph modularity, we retain the top 10% of positive within-layer edges and compute greedy Newman modularity [Newman, 2006]; this is the source of the R-graph/S-graph comparisons in Figure 3A and Appendix K.4. For replaceability hulls, the peer explanation of channel i by a peer set S is the Gaussian linear- regression explained variance Ei(...

work page 2006
[17]

metric extraction

Datasets, software, and assets.The main experiments use CIFAR-100, with additional CIFAR-10 breadth checks [Krizhevsky, 2009]. Tiny-ImageNet [Stanford CS231N, 2015] is used as a small ImageNet-derived stress test, and ImageNet-100 subsets inherit the ImageNet access and usage terms [Deng et al., 2009]. Implementations use PyTorch and torchvision [Paszke e...

work page 2009
[18]

redundant on average

(A) FLOPs-AUC as the mixed score Magnitude+αI X varies. VGG-16 improves monotonically up to α≈0.75 , while ResNet-18 shows only a modest gain and MobileNetV2 remains far below the strongest pure local score. (B) FLOPs-AUC for IX −β ¯RX. Small redundancy penalties help mildly on ResNet-18 and VGG-16 but do not beat the best local composite on ResNet-18 or ...

work page 2006
[19]

absent at initialization, present after training

For each seed and epoch, we recomputed the R-graph/S-graph modularity gap, replaceability hull statistics, and triplet target-excess on six depth-spaced convolutional layers using a fixed 2000-image CIFAR-100 calibration subset and the same Gaussian proxies as the main analysis. The result is not simply “absent at initialization, present after training.” ...

work page 2000
[20]

Thus some local-topology bias is present before training, but learning makes replaceability more distributed and target information more triplet-level

By contrast, the more distributed higher-order quantities grow with learning: mean hull size increases from 3.08±0.07 to 4.27±0.04 , the saturated-hull fraction from 0.067±0.006 to 0.195±0.008 , and S3/S2 from 0.192±0.017 to 0.381±0.008 . Thus some local-topology bias is present before training, but learning makes replaceability more distributed and targe...

work page 2014