The Confusion is Real: GRAPHIC -- A Network Science Approach to Confusion Matrices in Deep Learning

Bastian Heinlein; Hans Rosenberger; Jan U. Claar; Johanna S. Fr\"ohlich; Ralf R. M\"uller; Vasileios Belagiannis

arxiv: 2602.19770 · v2 · submitted 2026-02-23 · 💻 cs.LG · cs.AI

The Confusion is Real: GRAPHIC -- A Network Science Approach to Confusion Matrices in Deep Learning

Johanna S. Fr\"ohlich , Bastian Heinlein , Jan U. Claar , Hans Rosenberger , Vasileios Belagiannis , Ralf R. M\"uller This is my paper

Pith reviewed 2026-05-15 20:15 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords confusion matrixnetwork sciencedeep learningexplainable AIclass relationshipslearning dynamicsgraph analysisintermediate layers

0 comments

The pith

Confusion matrices from intermediate layers can be turned into directed graphs to track how neural networks learn class relationships over training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GRAPHIC, an approach that builds directed graphs from confusion matrices obtained by training linear classifiers on the activations of intermediate layers. Network science metrics applied to these graphs quantify and visualize how class confusions evolve across epochs and layers. This yields concrete observations about linear separability, dataset labeling problems, and model-specific behaviors, including unexpected similarities such as between flatfish and man that were confirmed in a human study. A sympathetic reader would value the method because it supplies a systematic, architecture-agnostic lens on what the network is actually learning beyond final accuracy numbers.

Core claim

GRAPHIC converts confusion matrices derived from linear classifiers on intermediate-layer activations into adjacency matrices of directed graphs and then applies network-science tools to measure and display the evolution of class relationships throughout training, thereby exposing patterns of linear separability, dataset ambiguities, and architectural differences.

What carries the argument

Directed graphs whose edges are weighted by confusion-matrix entries from linear probes on successive layers, with standard network metrics used to track changes across training.

If this is right

Linear separability of classes can be monitored layer by layer and epoch by epoch.
Labeling ambiguities in a dataset become visible as persistent high-confusion edges in the graph.
Specific inter-class similarities that hinder performance, such as flatfish and man, are identified automatically.
Different network architectures produce distinguishable patterns in the evolution of their class graphs.
Human studies can be used to validate whether the detected confusions reflect genuine data issues.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph-construction step could be applied to sequence models to expose confusion patterns among tokens or actions.
Early identification of high-confusion subgraphs might guide data-augmentation or curriculum strategies.
Comparing class graphs across datasets could quantify how data distribution shapes what a model treats as similar.
The method offers a possible bridge between model-internal representations and human perceptual categories when the human study matches the graph findings.

Load-bearing premise

Confusion matrices produced by linear classifiers on intermediate activations capture the class relationships that matter for the full nonlinear model's decisions.

What would settle it

If the graph-derived metrics fail to correlate with observed misclassification rates or with human judgments of class similarity on an independent dataset, the approach would not deliver the claimed insights.

Figures

Figures reproduced from arXiv: 2602.19770 by Bastian Heinlein, Hans Rosenberger, Jan U. Claar, Johanna S. Fr\"ohlich, Ralf R. M\"uller, Vasileios Belagiannis.

**Figure 1.** Figure 1: Proposed analysis workflow. LCs are trained using feature vectors from hidden layers. The trained LCs are then used to generate CMs on previously unseen feature vectors. Subsequently, these matrices are used to generate graphs that can be analyzed using methods from network science. with neighbor-embedding techniques, producing visualizations that reveal semantic structure well enough to identify issues an… view at source ↗

**Figure 2.** Figure 2: Confusion evolution of ResNet-50 using the training set. Visualization of CCs for layer 4 at early (left), intermediate (right), and final (bottom) epochs using our graph representation for the training set. number of groupings. While this trend arises in all layers of ResNet-50, the difference is in the certainty of the grouping and the intergroup connectivity. In the final training epoch, early layers sh… view at source ↗

**Figure 3.** Figure 3: Layer-wise assortativity over training epochs. Assortativity computed by superclasses (solid lines) and by natural vs. man-made grouping (dashed lines), for layers 1 through 4 of ResNet-50. Comparison to CIFAR-100 Superclasses. The identified CCs, as depicted in [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Effect of leaf color on classification. Example image of a maple tree before (left) and after (right) color manipulation. Baby Boy Girl Man Woman Human Predictions Baby Boy Girl Man Woman NN Predictions [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Human CC created from human and NN predictions. Visualization of the confusion graph of the human labeling (left) and of the NN predictions (right). Main Takeaway: NNs rely on seasonal leaf color for distinguishing oak and maple trees, indicating a dataset bias that could be mitigated through more diverse, seasonally balanced data. Ambiguous Human Class Labels. A second inconsistency can be found when look… view at source ↗

**Figure 6.** Figure 6: Confusion graph of EffVit for Tiny ImageNet using the validation set. Visualization of CCs for decoder 12 at the final epoch using our graph representation for the validation set [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Linear separability trends in EffVit. Accuracy of CMs generated by LCs trained on true labels for decoders 1, 3, 6, 9, and 12 of EffVit, shown over the training epochs. 5.4 Linear Separability We analyzed the linear separability of features in EffVit by training LCs on the true labels (i.e., with λ = 1). The accuracy of these LCs serves as a direct measure of linear separability throughout the network (Ala… view at source ↗

**Figure 8.** Figure 8: Linear separability trends in EffVit with 8 decoders. Accuracy of CMs generated by LCs trained on true labels for decoders 1, 2, 4, 6, and 8 of EffVit, shown over the training epochs. 0 200 400 600 800 1000 0 0.25 0.5 0.75 1 Decoders Epochs Accuracy Training Set Validation Set [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: Linear separability trends in EffVit with 4 decoders. Accuracy of CMs generated by LCs trained on true labels for decoders 1, 2, 3, and 4 of EffVit, shown over the training epochs. The accuracy for the LCs trained on the true labels also represents the true potential or the linear separability of the layer outputs at that stage. An LC trained on the true labels is basically a decision maker that is allowed… view at source ↗

**Figure 10.** Figure 10: Linear separability trends in EffVit for Tiny ImageNet. Accuracy of CMs generated by LCs trained on true labels for decoders 1, 3, 6, 9, and 12 of EffVit, shown over the training epochs. 0 15 30 45 60 0 0.25 0.5 0.75 1 Layers Epochs Accuracy Training Set Validation Set ResNet-50 [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

**Figure 11.** Figure 11: Linear separability trends in ResNet-50. Accuracy of CMs generated by LCs trained on true labels for layers 1 to 4, shown over the training epochs. The graph also includes the true accuracy of ResNet-50 as a baseline. 0 15 30 45 60 0 0.25 0.5 0.75 1 Layers Epochs Accuracy Training Set Validation Set ResNet-50 [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

**Figure 12.** Figure 12: Linear separability trends in ResNet-50. Accuracy of CMs generated by LCs trained on predicted labels for layers 1 to 4, shown over the training epochs. The graph also includes the true accuracy of ResNet-50 as a baseline. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

**Figure 13.** Figure 13: Linear separability trends in ResNet-50 across several λ values. Accuracy of CMs generated by LCs trained on several λ values for layer 4, shown over the training epochs for the validation set. 0 15 30 45 60 0 0.25 0.5 0.75 1 λ Epochs Accuracy λ = 1.00 λ = 0.75 λ = 0.50 λ = 0.25 λ = 0.00 [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗

**Figure 14.** Figure 14: Layer-wise modularity trends in ResNet-50 across several λ values. Modularity of CMs generated by LCs trained on several λ values for layer 4, shown over the training epochs for the validation set. A.3 Custom Loss Function As explained in Section 4.2 of the main text, the LCs are trained on a custom loss function. While we focus on the boundary cases (λ = 1 for true labels, λ = 0 for model predictions), F… view at source ↗

**Figure 15.** Figure 15: Loss curves for training LCs with and without regularization. Training (dashed) and validation (solid) loss of LCs trained on predicted labels for layer 4 of ResNet-50 at epoch 1, shown over the training epochs. 0 20 40 60 80 100 0 0.25 0.5 0.75 1 Epochs Loss Baseline Regularization [PITH_FULL_IMAGE:figures/full_fig_p022_15.png] view at source ↗

**Figure 16.** Figure 16: Loss curves for training LCs with and without regularization. Training (dashed) and validation (solid) loss of LCs trained on predicted labels for layer 4 of ResNet-50 at epoch 71, shown over the training epochs. all λ values. This suggests that meaningful group structures can be found and analyzed for any of these settings. All λ experiments were conducted using identical LC training settings (learning r… view at source ↗

**Figure 17.** Figure 17: Layer-wise modularity over training epochs for ResNet-50 with and without regularization. Modularity of CCs generated from CMs for the LCs trained on predicted labels for layers 1 to 4 over the training epochs for the validation set. 0 15 30 45 60 0 0.25 0.5 0.75 1 Layers Epochs Accuracy Baseline Regularization [PITH_FULL_IMAGE:figures/full_fig_p023_17.png] view at source ↗

**Figure 18.** Figure 18: Linear separability trends in ResNet-50 with and without regularization. Accuracy of CMs generated by LCs trained on predicted labels for layers 1 to 4, shown over the training epochs for the validation set. A.5 Scalability As GRAPHIC relies on visual cues to identify dataset errors, scalability to datasets with many classes is discussed here. There are several possible strategies. If visual clutter is ca… view at source ↗

**Figure 19.** Figure 19: Sparse confusion graphs of EffVit for Tiny ImageNet. Visualization of CCs for decoder 12 at the final epoch for the validation set (left), the graph with 20% of the edges removed (right), and the graph with 40% of the edges removed (bottom). As a complementary insight to understand how CCs interact with each other, nodes could also be aggregated to supernodes. A straightforward way to do that is to group … view at source ↗

**Figure 20.** Figure 20: Separate CC graphs of EffVit for Tiny ImageNet. Visualization of the CCs of the animals (left) and creepy-crawlies (right) for decoder 12 at the final epoch for the validation set. 0 15 30 45 60 0 0.25 0.5 0.75 1 Layers Epochs Modularity [PITH_FULL_IMAGE:figures/full_fig_p025_20.png] view at source ↗

**Figure 21.** Figure 21: Mean layer-wise modularity over training epochs for ResNet-50. Mean modularity with three standard deviations of CCs generated from CMs for the LCs trained on true labels for the training set for layers 1 to 4 over the training epochs. Results are averaged over five seeds. A.6 Robustness of Linear Classifier Training To assess the sensitivity of LCs to initialization, we train them on the same features us… view at source ↗

**Figure 22.** Figure 22: Loss curves for training LCs across several batch sizes. Training (dashed) and validation (solid) loss of LCs trained on predicted labels for layer 4 of ResNet-50 at epoch 1, shown over the training epochs. 0 20 40 60 80 100 0 0.25 0.5 0.75 1 Epochs Loss 4000 2000 1000 500 [PITH_FULL_IMAGE:figures/full_fig_p026_22.png] view at source ↗

**Figure 23.** Figure 23: Loss curves for training LCs across several batch sizes. Training (dashed) and validation (solid) loss of LCs trained on predicted labels for layer 4 of ResNet-50 at epoch 71, shown over the training epochs. 0 20 40 60 80 100 0 0.25 0.5 0.75 1 Epochs Loss Double Baseline Half [PITH_FULL_IMAGE:figures/full_fig_p026_23.png] view at source ↗

**Figure 24.** Figure 24: Loss curves for training LCs across several learning rates. Training (dashed) and validation (solid) loss of LCs trained on predicted labels for layer 4 of ResNet-50 at epoch 1, shown over the training epochs. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_24.png] view at source ↗

**Figure 25.** Figure 25: Loss curves for training LCs across several learning rates. Training (dashed) and validation (solid) loss of LCs trained on predicted labels for layer 4 of ResNet-50 at epoch 71, shown over the training epochs. 0 15 30 45 60 0 0.25 0.5 0.75 1 Epochs Accuracy 4000 2000 1000 500 [PITH_FULL_IMAGE:figures/full_fig_p027_25.png] view at source ↗

**Figure 26.** Figure 26: Linear separability trends in ResNet-50 across several batch sizes. Accuracy of CMs generated by LCs trained on predicted labels for layer 4, shown over the training epochs for the training (dashed) and the validation (solid) set. 0 15 30 45 60 0 0.25 0.5 0.75 1 Epochs Accuracy Double Baseline Half [PITH_FULL_IMAGE:figures/full_fig_p027_26.png] view at source ↗

**Figure 27.** Figure 27: Linear separability trends in ResNet-50 across several learning rates. Accuracy of CMs generated by LCs trained on predicted labels for layer 4, shown over the training epochs for the training (dashed) and the validation (solid) set. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_27.png] view at source ↗

**Figure 28.** Figure 28: Layer-wise training time of LCs over epochs. Runtime of training the LCs on the predicted labels for layers 1 to 4, shown over the training epochs. 0 0.2 0.4 0.6 0.8 0 0.15 0.3 0.45 0.6 Layers Accuracy Modularity Training Set Validation Set [PITH_FULL_IMAGE:figures/full_fig_p029_28.png] view at source ↗

**Figure 29.** Figure 29: Modularity accuracy trends in ResNet-50. Modularity of CMs generated by LCs trained on true labels for layers 1 to 4, shown over the accuracy of ResNet-50. A.8 Practical Guidelines To assist practitioners in choosing when and where to probe their models, we provide empirical guidelines derived from our experiments. Our analysis indicates a clear positive relationship between the accuracy of the NN and the… view at source ↗

**Figure 30.** Figure 30: Layer-wise modularity over training epochs for ResNet-50. Modularity of CCs generated from CMs for the LCs trained on predicted labels for layers 1 to 4 over the training epochs. A.9 Modularity The modularity, i.e., the measure used to group the classes and assess the strength of the grouping (cf. Section 4.1 of the main text), is plotted for both ResNet-50 and EffVit for the predicted and true labels for… view at source ↗

**Figure 31.** Figure 31: Layer-wise modularity over training epochs for ResNet-50. Modularity of CCs generated from CMs for the LCs trained on true labels for layers 1 to 4 over the training epochs. 0 200 400 600 800 1000 0 0.25 0.5 0.75 1 Decoders Epochs Modularity Training Set Validation Set [PITH_FULL_IMAGE:figures/full_fig_p031_31.png] view at source ↗

**Figure 32.** Figure 32: Layer-wise modularity over training epochs for EffVit. Modularity of CCs generated from CMs for the LCs trained on predicted labels for decoders 1, 3, 6, 9 and 12 over the training epochs. 0 200 400 600 800 1000 0 0.25 0.5 0.75 1 Decoders Epochs Modularity Training Set Validation Set [PITH_FULL_IMAGE:figures/full_fig_p031_32.png] view at source ↗

**Figure 33.** Figure 33: Layer-wise modularity over training epochs for EffVit. Modularity of CCs generated from CMs for the LCs trained on true labels for decoders 1, 3, 6, 9 and 12 over the training epochs. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_33.png] view at source ↗

**Figure 34.** Figure 34: Layer-wise modularity over training epochs EffVit for Tiny ImageNet. Modularity of CCs generated from CMs for the LCs trained on true labels for decoders 1, 3, 6, 9 and 12 over the training epochs. 0 15 30 45 60 0 0.25 0.5 0.75 1 Layers Epochs Sparsity Training Set Validation Set [PITH_FULL_IMAGE:figures/full_fig_p032_34.png] view at source ↗

**Figure 35.** Figure 35: Layer-wise sparsity over training epochs for ResNet-50. Fraction of zero entries of CMs generated by LCs trained on true labels for layers 1 to 4, shown over the training epochs. A.10 Graph Sparsity In addition to accuracy and modularity, we examined how sparsity evolves in the graphs over the training. The sparsity of the CMs or graphs is here defined as the percentage of zero entries of the CMs and depi… view at source ↗

**Figure 36.** Figure 36: Layer-wise sparsity over training epochs for ResNet-50. Fraction of zero entries of CMs generated by LCs trained on predicted labels for layers 1 to 4, shown over the training epochs. A.11 Graphs This section depicts the additional graphs created when training the LCs on the predicted labels, i.e., λ = 0, for both ResNet-50 and EffVit. Figures 37, 38, and 39 show the graphs for ResNet-50 for the validatio… view at source ↗

**Figure 37.** Figure 37: Confusion evolution of ResNet-50 using the validation set. Visualization of CCs for layer 4 at early (left), intermediate (right), and final (bottom) epochs using our graph representation for the validation set. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_37.png] view at source ↗

**Figure 38.** Figure 38: Confusion evolution of EffVit using the training set. Visualization of CCs for decoder 12 at early (left), intermediate (right), and final (bottom) epochs using our graph representation for the training set. 35 [PITH_FULL_IMAGE:figures/full_fig_p035_38.png] view at source ↗

**Figure 39.** Figure 39: Confusion evolution of EffVit using the validation set. Visualization of CCs for decoder 12 at early (left), intermediate (right), and final (bottom) epochs using our graph representation for the validation set. Apple Bear Clock Girl Computer keyboard Poppy Road Aquarium fish Otter Leopard Sunflower Baby Beaver Bed Cup Elephant Bee Beetle Bicycle Bottle Bowl Boy Bridge Bus Butterfly Camel Can Castle Cater… view at source ↗

**Figure 40.** Figure 40: Effect of dataset order on graph structure. Graph constructed from an LC trained on the predicted labels for the training set at epoch 1 for layer 4 after reversing the dataset order. 36 [PITH_FULL_IMAGE:figures/full_fig_p036_40.png] view at source ↗

**Figure 41.** Figure 41: CC of humans including flatfish. Visualization of the human CC of layer 2 of ResNet-50 created using the training set including the class flatfish. 0 15 30 45 60 0 0.25 0.5 0.75 1 Epochs Out-degree Easiest Classes Hardest Classes [PITH_FULL_IMAGE:figures/full_fig_p037_41.png] view at source ↗

**Figure 42.** Figure 42: Evolution of class difficulty in ResNet-50. Out-degree of the five most difficult (solid lines) and five easiest (dashed lines) classes identified in epoch 71 for layer 4, shown over the training epochs. A.12 Class Difficulty We further investigate whether certain classes are inherently difficult by analyzing the out-degree of the confusion graphs. We consider the fully converged models and identify the f… view at source ↗

**Figure 43.** Figure 43: Evolution of class difficulty in EffVit. Out-degree of the five most difficult (solid lines) and five easiest (dashed lines) classes identified in epoch 1000 for decoder 12, shown over the training epochs [PITH_FULL_IMAGE:figures/full_fig_p038_43.png] view at source ↗

**Figure 44.** Figure 44: Layer-wise assortativity over training epochs. Assortativity computed by superclasses (solid lines), intuitive groups (dotted lines), and natural vs. man-made grouping (dashed lines), for layers 1 through 4 of ResNet-50. 0 15 30 45 60 0 0.25 0.5 0.75 1 Layers Epochs Assortativity Natural vs. Man-Made Intuitive Superclasses [PITH_FULL_IMAGE:figures/full_fig_p039_44.png] view at source ↗

**Figure 45.** Figure 45: Layer-wise assortativity over training epochs for random groups. Assortativity computed for random groups with group size matching those of superclasses (solid lines), intuitive groups (dotted lines), and natural and man-made things (dashed lines), for layers 1 through 4 of ResNet-50. A.14 Modified Images As previously discussed, maple trees are often depicted in fall with yellow, orange or red leaves, w… view at source ↗

**Figure 46.** Figure 46: Effect of leaf color on classification for maple and oak trees. Example images of maple and oak trees before and after color manipulation. 40 [PITH_FULL_IMAGE:figures/full_fig_p040_46.png] view at source ↗

**Figure 47.** Figure 47: Images frequently misclassified by humans. These examples are images from CIFAR-100 that were often confused by human participants during labeling. The true labels are shown below the images. A.15 Ambiguous Labels and Study Details As discussed in Section 5.3 of the main text, there are images that are apparently hard to correctly label for both humans and NNs [PITH_FULL_IMAGE:figures/full_fig_p041_47.png] view at source ↗

**Figure 48.** Figure 48: Ambiguity in image labeling. One image shows a boy (left) and was labeled as boy, girl or woman, the other image shows a girl (right) and was labeled as boy, girl, and baby by humans. For the boy 71% of participants changed their label, for the duplicate image for the girl 48%. label of the boy, and 48% changed their label of the girl. These results are not surprising, as the age limit for baby is not cle… view at source ↗

read the original abstract

Explainable artificial intelligence has emerged as a promising field of research to address reliability concerns in artificial intelligence. Despite significant progress in explainable artificial intelligence, few methods provide a systematic way to visualize and understand how classes are confused and how their relationships evolve as training progresses. In this work, we present GRAPHIC, an architecture-agnostic approach that analyzes neural networks on a class level. It leverages confusion matrices derived from intermediate layers using linear classifiers. We interpret these as adjacency matrices of directed graphs, allowing tools from network science to visualize and quantify learning dynamics across training epochs and intermediate layers. GRAPHIC provides insights into linear class separability, dataset issues, and architectural behavior, revealing, for example, similarities between flatfish and man and labeling ambiguities validated in a human study. In summary, by uncovering real confusions, GRAPHIC offers new perspectives on how neural networks learn. The code is available at https://github.com/Johanna-S-Froehlich/GRAPHIC.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GRAPHIC applies network science to probe-based confusion matrices, which is novel but hinges on an unverified proxy for model behavior.

read the letter

The key takeaway is that GRAPHIC treats confusion matrices from linear probes on intermediate layers as directed graphs and uses network science metrics to analyze how class confusions evolve during training. This combination looks new based on the abstract. What works is the systematic visualization layer it adds for class-level dynamics. The examples of uncovering similarities like between flatfish and man, along with the human study confirming labeling ambiguities, demonstrate it can point to dataset issues and model behaviors in a way that standard confusion matrices don't. Applying graph tools like centrality or community detection across epochs and layers gives a quantitative handle on learning progress. Making the code available is helpful for others to try it. The main soft spot is the assumption that these probe-derived matrices reflect the actual confusions of the full model. Since the network applies nonlinear transformations after the probed layer, the linear decision boundaries might miss or misrepresent the real class relationships. Without comparisons to the model's output confusion matrix or ablations showing robustness, the claims about revealing real confusions and new perspectives on learning feel preliminary. The lack of quantitative metrics or statistical controls in the described results adds to that. This is for explainable AI folks and model developers who want better tools for debugging class confusions and auditing data. A reader focused on visualization techniques or network applications in ML would get practical value from the pipeline. It deserves a serious referee to evaluate the probe validity and suggest ways to strengthen the evidence.

Referee Report

2 major / 1 minor

Summary. The paper introduces GRAPHIC, an architecture-agnostic method that derives confusion matrices by training linear classifiers on intermediate-layer activations of a neural network, interprets these matrices as adjacency matrices of directed graphs, and applies standard network-science metrics (centrality, community detection, dynamics across epochs) to analyze class relationships and learning progress. It presents qualitative examples of discovered class similarities (e.g., flatfish and man) and reports a human validation study confirming labeling ambiguities.

Significance. If the linear-probe matrices can be shown to faithfully reflect the full model's class confusions, the approach would supply a systematic, visualizable way to track how class separability evolves during training and to flag dataset or architectural issues. The human-study component adds modest credibility to the labeling-ambiguity claim. At present the significance remains limited because the core proxy has not been validated against the model's actual output confusion matrix, leaving open whether the network metrics yield insights beyond conventional confusion-matrix inspection.

major comments (2)

[Methods (linear-probe construction)] The construction of confusion matrices via linear probes on intermediate activations (described in the Methods section) is not accompanied by any direct quantitative comparison to the confusion matrix produced by the full end-to-end model. Because later layers apply non-linear transformations, the linear decision boundaries at layer l need not reproduce the model's actual output confusions; all downstream network-science claims therefore rest on an unverified proxy.
[Experiments and Results] The experimental evaluation (qualitative examples plus human study) provides no quantitative metrics, ablation studies, or statistical controls that would demonstrate the robustness or added value of the network-science metrics over standard confusion-matrix analysis. This absence makes it impossible to assess whether the reported insights are reproducible or merely post-hoc interpretations.

minor comments (1)

[Abstract / Introduction] The abstract and introduction repeatedly use the phrase 'real confusions' without a precise definition; a short clarifying sentence would help readers distinguish the probe-derived matrices from the model's output matrix.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which identify key areas where additional validation and quantification can strengthen the manuscript. We address each major comment below and will incorporate revisions to provide the requested comparisons and metrics.

read point-by-point responses

Referee: [Methods (linear-probe construction)] The construction of confusion matrices via linear probes on intermediate activations (described in the Methods section) is not accompanied by any direct quantitative comparison to the confusion matrix produced by the full end-to-end model. Because later layers apply non-linear transformations, the linear decision boundaries at layer l need not reproduce the model's actual output confusions; all downstream network-science claims therefore rest on an unverified proxy.

Authors: We agree that a direct quantitative validation of the linear-probe proxy against the full model's output confusion matrix is a valuable addition. Although the probes are designed to isolate linear class separability at each layer (a complementary perspective to the end-to-end non-linear boundaries), we will add a new subsection in the Methods and Results that computes the final-layer probe confusion matrix and compares it to the model's actual test-set confusion matrix. The comparison will report agreement percentage, normalized Frobenius distance, and element-wise Pearson correlation to quantify fidelity. This will be included in the revised manuscript. revision: yes
Referee: [Experiments and Results] The experimental evaluation (qualitative examples plus human study) provides no quantitative metrics, ablation studies, or statistical controls that would demonstrate the robustness or added value of the network-science metrics over standard confusion-matrix analysis. This absence makes it impossible to assess whether the reported insights are reproducible or merely post-hoc interpretations.

Authors: We acknowledge the absence of quantitative evaluation and agree that it limits assessment of added value. In the revision we will augment the Experiments section with: (i) quantitative tracking of network metrics (e.g., betweenness centrality, modularity) across epochs with statistical significance tests; (ii) an ablation comparing class-similarity rankings derived from GRAPHIC versus direct confusion-matrix inspection; and (iii) inter-rater agreement statistics (Fleiss' kappa) and confidence intervals for the human validation study. These additions will demonstrate reproducibility and the incremental insight provided by the network-science tools. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive application of standard network metrics to derived confusion matrices

full rationale

The paper constructs confusion matrices via linear probes on intermediate-layer activations, interprets them as directed graphs, and applies off-the-shelf network-science metrics (centrality, community detection, dynamics across epochs). No parameters are fitted to a subset and then re-used as a 'prediction'; no equations reduce by construction to the inputs; no load-bearing self-citations or uniqueness theorems are invoked. The method is self-contained as an analysis pipeline whose outputs are direct computations on the proxy matrices, not tautological re-labelings or re-derivations of the same quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on two domain assumptions about linear separability and graph interpretability of confusion counts. No free parameters are introduced and no new entities are postulated.

axioms (2)

domain assumption Confusion matrices derived from linear classifiers on intermediate activations represent meaningful class relationships inside the network
This is the central modeling step that allows the graph construction.
domain assumption Network-science metrics applied to these graphs yield interpretable and actionable insights into learning dynamics
Justifies the use of graph tools rather than direct matrix inspection.

pith-pipeline@v0.9.0 · 5498 in / 1283 out tokens · 55198 ms · 2026-05-15T20:15:58.706357+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We employ the method introduced by Dugué & Perez (2015) to identify community structures... modularity Q

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

[1]

Understanding intermediate layers using linear classifier probes

Guillaume Alain and Yoshua Bengio. Understanding intermediate layers using linear classifier probes.arXiv preprint arXiv:1610.01644,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Brundage, Min Wu, and Sanghamitra Dutta

Pranjal Atrey, Michael P. Brundage, Min Wu, and Sanghamitra Dutta. Demystifying the accuracy- interpretability trade-off: A case study of inferring ratings from reviews.arXiv preprint arXiv:2503.07914,

work page arXiv
[3]

Jan Niklas Böhm, Philipp Berens, and Dmitry Kobak

doi: 10.1109/TVCG.2017.2744683. Jan Niklas Böhm, Philipp Berens, and Dmitry Kobak. Unsupervised visualization of image datasets using contrastive learning. InInternational Conference on Learning Representations

work page doi:10.1109/tvcg.2017.2744683 2017
[4]

org/de/worterbuch/englisch/tabby

URLhttps://dictionary.cambridge. org/de/worterbuch/englisch/tabby. 13 Published in Transactions on Machine Learning Research (01/2026) David M Chan, Roshan Rao, Forrest Huang, and John F Canny. t-SNE-CUDA: GPU-accelerated t-SNE and its applications to modern data. InProceedings of the 30th International Symposium on Computer Architecture and High Performa...

work page 2026
[5]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Confusion graph: Detecting confusion communities in large scale image classification

Ruochun Jin, Yong Dou, Yueqing Wang, and Xin Niu. Confusion graph: Detecting confusion communities in large scale image classification. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1980–1986,

work page 1980
[7]

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brah- man, Lester James V

Hamid Karimi, Tyler Derr, and Jiliang Tang. Characterizing the decision boundary of deep neural networks. arXiv preprint arXiv:1912.11460,

work page arXiv 1912
[8]

Concept bottleneck models

14 Published in Transactions on Machine Learning Research (01/2026) Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Hal Daumé III and Aarti Singh (eds.),International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pp. 5338–5348. PMLR...

work page 2026
[9]

Explainable artificial intelligence (xai): From inherent explainability to large language models.arXiv preprint arXiv:2501.09967,

Fuseini Mumuni and Alhassan Mumuni. Explainable artificial intelligence (xai): From inherent explainability to large language models.arXiv preprint arXiv:2501.09967,

work page arXiv
[10]

org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html

URL https://docs.pytorch. org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html. 15 Published in Transactions on Machine Learning Research (01/2026) Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. Do vision transformers see like convolutional neural networks?Advances in Neural Information Proc...

work page 2026
[11]

Cuevas-Tello, Jose Nunez-Varela, Cesar Puente, and Alejandra G

Gabriela Rangel, Juan C. Cuevas-Tello, Jose Nunez-Varela, Cesar Puente, and Alejandra G. Silva-Trujillo. A survey on convolutional neural networks and their performance limitations in image recognition tasks. Journal of Sensors, 2024(1):2797320,

work page 2024
[12]

Train- ing a vision transformer from scratch in less than 24 hours with 1 gpu

Saghar Irandoust, Thibaut Durand, Yunduz Rakhmangulova, Wenjie Zi, and Hossein Hajimirsadeghi. Train- ing a vision transformer from scratch in less than 24 hours with 1 gpu. InHas it Trained Yet? NeurIPS 2022 Workshop,

work page 2022
[13]

arXiv preprint arXiv:2402.01761 , year=

Chandan Singh, Jeevana Priya Inala, Michel Galley, Rich Caruana, and Jianfeng Gao. Rethinking inter- pretability in the era of large language models.arXiv preprint arXiv:2402.01761,

work page arXiv
[14]

Post-hoc concept bottleneck models

16 Published in Transactions on Machine Learning Research (01/2026) Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc concept bottleneck models. InInternational Conference on Learning Representations,

work page 2026
[15]

This requires a dataset, where concept annotations must be available during training

are intrinsically interpretable architectures that explicitly decompose prediction into two stages: predicting human-defined concepts and then using these concepts to predict the final class. This requires a dataset, where concept annotations must be available during training. Post-hoc concept bottleneck models have also been proposed (Yuksekgonul et al.,...

work page 2023
[16]

visualize representations by first com- puting pairwise similarities between samples and then mapping them into a low-dimensional space. When 17 Published in Transactions on Machine Learning Research (01/2026) these similarities are computed in the input space, however, it is not guaranteed that distance measures such as the Euclidean distance in high-dim...

work page 2026
[17]

understand- ing

It shows the accuracy of the different layers of ResNet-50 determined by LCs trained on the true labels in comparison to the true accuracy of ResNet-50. Here we see an increase ending in a stagnation, rather than a drop. 18 Published in Transactions on Machine Learning Research (01/2026) 0 200 400 600 800 10000 0.25 0.5 0.75 1 Decoders Epochs Accuracy Tra...

work page 2026
[18]

As discussed in Appendix A.6 our results are robust to the LC training and this is confirmed here again. 22 Published in Transactions on Machine Learning Research (01/2026) 0 15 30 45 600 0.25 0.5 0.75 1 Layers Epochs Modularity Baseline Regularization Figure 17: Layer-wise modularity over training epochs for ResNet-50 with and without regu- larization. M...

work page 2026
[19]

animals” and “creepy-crawlies

Even though the number of nodes remains unchanged, the reduced edge set leads to fewer overlapping structures. This can be especially helpful in the early stages of training as there are more confusions overall. To address larger numbers of classes, a second approach is to inspect individual CCs instead of the full graph. The strongest confusions typicall...

work page 2026
[20]

The modularity for layer 4 on the training set tracks the accuracy (cf

show an interesting trend. The modularity for layer 4 on the training set tracks the accuracy (cf. Figure 11), with similar steps. The steps stem from the used scheduler ReduceLROnPlateau (PyTorch Foundation, 2025), which reduces the learning rate if the loss stagnates. For both the predicted labels and the true labels (cf. Figure 31), the grouping streng...

work page 2025
[21]

Because EffVit is trained for only 24 hours rather than to full convergence, the increase in modularity is small as the number of confusions is still relatively high. 30 Published in Transactions on Machine Learning Research (01/2026) 0 15 30 45 600 0.25 0.5 0.75 1 Layers Epochs Modularity Training Set Validation Set Figure 31:Layer-wise modularity over t...

work page 2026
[22]

It depicts the human CC of layer 2 of the converged ResNet-50 model. 33 Published in Transactions on Machine Learning Research (01/2026) Apple Aquarium fish Bicycle Chair Computer keyboard Orange Pear Poppy Sea Snake Sweet pepper Telephone Crocodile Elephant Possum Shark Baby Turtle Bear Beaver Kangaroo Skyscraper Bed Bee Mushroom Beetle Cup Bottle Bowl B...

work page 2026
[23]

It increases slightly for layer three and four for the later training epochs

The assortativity is similar for all layers and around 0.3. It increases slightly for layer three and four for the later training epochs. This still suggests a group structure, though it is much less distinct than with the correct group assignments. Since we are working with only two classes, many individual categories – whether labeled as man-made or nat...

work page 2026
[24]

For the image of the girl the initial labeling accuracy is 39%, the duplicate is labeled correctly with an accuracy of 23%

For the image of the boy the accuracy for the first image is 42%, the duplicate is identified with 61% accuracy. For the image of the girl the initial labeling accuracy is 39%, the duplicate is labeled correctly with an accuracy of 23%. 71% of the participants changed their mind about their initial 41 Published in Transactions on Machine Learning Research...

work page 2026

[1] [1]

Understanding intermediate layers using linear classifier probes

Guillaume Alain and Yoshua Bengio. Understanding intermediate layers using linear classifier probes.arXiv preprint arXiv:1610.01644,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Brundage, Min Wu, and Sanghamitra Dutta

Pranjal Atrey, Michael P. Brundage, Min Wu, and Sanghamitra Dutta. Demystifying the accuracy- interpretability trade-off: A case study of inferring ratings from reviews.arXiv preprint arXiv:2503.07914,

work page arXiv

[3] [3]

Jan Niklas Böhm, Philipp Berens, and Dmitry Kobak

doi: 10.1109/TVCG.2017.2744683. Jan Niklas Böhm, Philipp Berens, and Dmitry Kobak. Unsupervised visualization of image datasets using contrastive learning. InInternational Conference on Learning Representations

work page doi:10.1109/tvcg.2017.2744683 2017

[4] [4]

org/de/worterbuch/englisch/tabby

URLhttps://dictionary.cambridge. org/de/worterbuch/englisch/tabby. 13 Published in Transactions on Machine Learning Research (01/2026) David M Chan, Roshan Rao, Forrest Huang, and John F Canny. t-SNE-CUDA: GPU-accelerated t-SNE and its applications to modern data. InProceedings of the 30th International Symposium on Computer Architecture and High Performa...

work page 2026

[5] [5]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Confusion graph: Detecting confusion communities in large scale image classification

Ruochun Jin, Yong Dou, Yueqing Wang, and Xin Niu. Confusion graph: Detecting confusion communities in large scale image classification. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1980–1986,

work page 1980

[7] [7]

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brah- man, Lester James V

Hamid Karimi, Tyler Derr, and Jiliang Tang. Characterizing the decision boundary of deep neural networks. arXiv preprint arXiv:1912.11460,

work page arXiv 1912

[8] [8]

Concept bottleneck models

14 Published in Transactions on Machine Learning Research (01/2026) Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Hal Daumé III and Aarti Singh (eds.),International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pp. 5338–5348. PMLR...

work page 2026

[9] [9]

Explainable artificial intelligence (xai): From inherent explainability to large language models.arXiv preprint arXiv:2501.09967,

Fuseini Mumuni and Alhassan Mumuni. Explainable artificial intelligence (xai): From inherent explainability to large language models.arXiv preprint arXiv:2501.09967,

work page arXiv

[10] [10]

org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html

URL https://docs.pytorch. org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html. 15 Published in Transactions on Machine Learning Research (01/2026) Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. Do vision transformers see like convolutional neural networks?Advances in Neural Information Proc...

work page 2026

[11] [11]

Cuevas-Tello, Jose Nunez-Varela, Cesar Puente, and Alejandra G

Gabriela Rangel, Juan C. Cuevas-Tello, Jose Nunez-Varela, Cesar Puente, and Alejandra G. Silva-Trujillo. A survey on convolutional neural networks and their performance limitations in image recognition tasks. Journal of Sensors, 2024(1):2797320,

work page 2024

[12] [12]

Train- ing a vision transformer from scratch in less than 24 hours with 1 gpu

Saghar Irandoust, Thibaut Durand, Yunduz Rakhmangulova, Wenjie Zi, and Hossein Hajimirsadeghi. Train- ing a vision transformer from scratch in less than 24 hours with 1 gpu. InHas it Trained Yet? NeurIPS 2022 Workshop,

work page 2022

[13] [13]

arXiv preprint arXiv:2402.01761 , year=

Chandan Singh, Jeevana Priya Inala, Michel Galley, Rich Caruana, and Jianfeng Gao. Rethinking inter- pretability in the era of large language models.arXiv preprint arXiv:2402.01761,

work page arXiv

[14] [14]

Post-hoc concept bottleneck models

16 Published in Transactions on Machine Learning Research (01/2026) Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc concept bottleneck models. InInternational Conference on Learning Representations,

work page 2026

[15] [15]

This requires a dataset, where concept annotations must be available during training

are intrinsically interpretable architectures that explicitly decompose prediction into two stages: predicting human-defined concepts and then using these concepts to predict the final class. This requires a dataset, where concept annotations must be available during training. Post-hoc concept bottleneck models have also been proposed (Yuksekgonul et al.,...

work page 2023

[16] [16]

visualize representations by first com- puting pairwise similarities between samples and then mapping them into a low-dimensional space. When 17 Published in Transactions on Machine Learning Research (01/2026) these similarities are computed in the input space, however, it is not guaranteed that distance measures such as the Euclidean distance in high-dim...

work page 2026

[17] [17]

understand- ing

It shows the accuracy of the different layers of ResNet-50 determined by LCs trained on the true labels in comparison to the true accuracy of ResNet-50. Here we see an increase ending in a stagnation, rather than a drop. 18 Published in Transactions on Machine Learning Research (01/2026) 0 200 400 600 800 10000 0.25 0.5 0.75 1 Decoders Epochs Accuracy Tra...

work page 2026

[18] [18]

As discussed in Appendix A.6 our results are robust to the LC training and this is confirmed here again. 22 Published in Transactions on Machine Learning Research (01/2026) 0 15 30 45 600 0.25 0.5 0.75 1 Layers Epochs Modularity Baseline Regularization Figure 17: Layer-wise modularity over training epochs for ResNet-50 with and without regu- larization. M...

work page 2026

[19] [19]

animals” and “creepy-crawlies

Even though the number of nodes remains unchanged, the reduced edge set leads to fewer overlapping structures. This can be especially helpful in the early stages of training as there are more confusions overall. To address larger numbers of classes, a second approach is to inspect individual CCs instead of the full graph. The strongest confusions typicall...

work page 2026

[20] [20]

The modularity for layer 4 on the training set tracks the accuracy (cf

show an interesting trend. The modularity for layer 4 on the training set tracks the accuracy (cf. Figure 11), with similar steps. The steps stem from the used scheduler ReduceLROnPlateau (PyTorch Foundation, 2025), which reduces the learning rate if the loss stagnates. For both the predicted labels and the true labels (cf. Figure 31), the grouping streng...

work page 2025

[21] [21]

Because EffVit is trained for only 24 hours rather than to full convergence, the increase in modularity is small as the number of confusions is still relatively high. 30 Published in Transactions on Machine Learning Research (01/2026) 0 15 30 45 600 0.25 0.5 0.75 1 Layers Epochs Modularity Training Set Validation Set Figure 31:Layer-wise modularity over t...

work page 2026

[22] [22]

It depicts the human CC of layer 2 of the converged ResNet-50 model. 33 Published in Transactions on Machine Learning Research (01/2026) Apple Aquarium fish Bicycle Chair Computer keyboard Orange Pear Poppy Sea Snake Sweet pepper Telephone Crocodile Elephant Possum Shark Baby Turtle Bear Beaver Kangaroo Skyscraper Bed Bee Mushroom Beetle Cup Bottle Bowl B...

work page 2026

[23] [23]

It increases slightly for layer three and four for the later training epochs

The assortativity is similar for all layers and around 0.3. It increases slightly for layer three and four for the later training epochs. This still suggests a group structure, though it is much less distinct than with the correct group assignments. Since we are working with only two classes, many individual categories – whether labeled as man-made or nat...

work page 2026

[24] [24]

For the image of the girl the initial labeling accuracy is 39%, the duplicate is labeled correctly with an accuracy of 23%

For the image of the boy the accuracy for the first image is 42%, the duplicate is identified with 61% accuracy. For the image of the girl the initial labeling accuracy is 39%, the duplicate is labeled correctly with an accuracy of 23%. 71% of the participants changed their mind about their initial 41 Published in Transactions on Machine Learning Research...

work page 2026