arxiv: 2604.04171 · v1 · submitted 2026-04-05 · 💻 cs.AI

Recognition: 3 theorem links

· Lean Theorem

A Model of Understanding in Deep Learning Systems

David Peter Wallis Freeborn

Authors on Pith no claims yet

Pith reviewed 2026-05-13 16:46 UTC · model grok-4.3

classification 💻 cs.AI

keywords deep learningunderstandinginternal modelsbridge principlesfractured understandingscientific understandingmachine learningprediction

0 comments

The pith

Deep learning systems achieve understanding through internal models tracking real regularities but fall short of scientific understanding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a model of systematic understanding for machine learning systems. An agent understands a property when it has an adequate internal model that tracks real regularities, is coupled to the target by stable bridge principles, and supports reliable prediction. The paper argues that contemporary deep learning systems often achieve this understanding. However, they generally fall short of the ideal of scientific understanding because the understanding is symbolically misaligned with the target system, not explicitly reductive, and only weakly unifying. This is labeled the Fractured Understanding Hypothesis.

Core claim

The central claim is that contemporary deep learning systems often achieve such understanding. However they generally fall short of the ideal of scientific understanding: the understanding is symbolically misaligned with the target system, not explicitly reductive, and only weakly unifying. I label this the Fractured Understanding Hypothesis.

What carries the argument

The model of systematic understanding, requiring an adequate internal model that tracks real regularities, is coupled by stable bridge principles, and supports reliable prediction.

If this is right

Deep learning systems can be credited with achieving genuine understanding of properties in target systems.
Interpretability work should identify and strengthen the internal models and bridge principles in neural networks.
Reliable prediction is necessary but insufficient without tracking real regularities and stable coupling.
The model distinguishes practical understanding from the stronger features of scientific understanding.
This account can evaluate understanding across different machine learning architectures and tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The model suggests a path for engineering AI with greater symbolic alignment and unification.
It links machine learning evaluation to philosophy of science by applying bridge principles to artificial systems.
A testable extension is to check whether altering internal representations breaks the coupling and reduces prediction reliability.
If correct, fractured understanding may explain brittle performance in novel situations despite strong benchmark results.

Load-bearing premise

That deep learning systems contain internal models that track real regularities in the target system and are coupled to it by stable bridge principles in the sense defined by the proposed model.

What would settle it

A demonstration that deep learning systems make reliable predictions without their internal representations tracking the actual regularities in the target system or without stable coupling to it via bridge principles.

Figures

Figures reproduced from arXiv: 2604.04171 by David Peter Wallis Freeborn.

**Figure 2.** Figure 2: Schematics showing the bias–variance tradeoff and the double descent behavior. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison between the ground truth torus surface (left) and the neural [PITH_FULL_IMAGE:figures/full_fig_p030_3.png] view at source ↗

**Figure 4.** Figure 4: Training and test accuracy as a function of optimization step in the modular [PITH_FULL_IMAGE:figures/full_fig_p036_4.png] view at source ↗

**Figure 5.** Figure 5: Average magnitude of the 2D discrete Fourier transform of the system’s logits [PITH_FULL_IMAGE:figures/full_fig_p037_5.png] view at source ↗

**Figure 6.** Figure 6: A visual demonstration of the Othello rules. From left to right. (A) The [PITH_FULL_IMAGE:figures/full_fig_p038_6.png] view at source ↗

read the original abstract

I propose a model of systematic understanding, suitable for machine learning systems. On this account, an agent understands a property of a target system when it contains an adequate internal model that tracks real regularities, is coupled to the target by stable bridge principles, and supports reliable prediction. I argue that contemporary deep learning systems often can and do achieve such understanding. However they generally fall short of the ideal of scientific understanding: the understanding is symbolically misaligned with the target system, not explicitly reductive, and only weakly unifying. I label this the Fractured Understanding Hypothesis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a tidy philosophical definition of understanding for deep learning systems and labels their version 'fractured,' but the key terms stay too loose to check against actual networks.

read the letter

The main thing here is a new framing: understanding means an internal model that tracks real regularities, links to the target through stable bridge principles, and supports prediction. The author says current deep learning hits this bar often enough but falls short of full scientific understanding because the representations are misaligned, not reductive, and only weakly unifying. That 'Fractured Understanding Hypothesis' is the fresh piece, pulled from philosophy of science but applied specifically to neural nets.

Referee Report

3 major / 1 minor

Summary. The paper proposes a model of systematic understanding for machine learning systems: an agent understands a target property when it contains an adequate internal model tracking real regularities, coupled to the target via stable bridge principles, and supporting reliable prediction. It claims contemporary deep learning systems often achieve this form of understanding but fall short of ideal scientific understanding because their understanding is symbolically misaligned, not explicitly reductive, and only weakly unifying; this is labeled the Fractured Understanding Hypothesis.

Significance. If the definitional framework can be made precise and operational, the work could supply a useful conceptual tool for distinguishing levels of understanding in AI systems and for clarifying how deep learning representations relate to (or diverge from) scientific understanding. It engages directly with ongoing debates in philosophy of science and AI interpretability.

major comments (3)

[Proposed model (abstract and main definition)] The central definitions of 'adequate internal model,' 'stable bridge principles,' and 'reliable prediction' are presented at a high conceptual level with no formal conditions, operational criteria, or identification procedure for locating such elements inside trained neural networks (whose representations are distributed and learned end-to-end). This renders the application to deep learning systems underdetermined and prevents concrete evaluation of whether any given system satisfies the model.
[Application to deep learning and Fractured Understanding Hypothesis] The claims that 'contemporary deep learning systems often can and do achieve such understanding' and that they fall short in the three specified respects rest on general assertions rather than empirical data, formal derivations, or detailed case studies. No concrete analyses of particular networks, datasets, or tasks are supplied to ground the Fractured Understanding Hypothesis.
[Fractured Understanding Hypothesis] The load-bearing distinction between the proposed 'systematic understanding' and 'ideal scientific understanding' depends on the precise meanings of 'symbolically misaligned,' 'explicitly reductive,' and 'weakly unifying,' none of which receive operational elaboration sufficient for falsifiability or cross-system comparison.

minor comments (1)

The manuscript would benefit from explicit engagement with related literature in philosophy of science (e.g., on bridge principles and reductive explanation) and AI interpretability (e.g., work on mechanistic interpretability and representation alignment) to situate the proposal.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below, indicating where revisions have been made to the manuscript.

read point-by-point responses

Referee: The central definitions of 'adequate internal model,' 'stable bridge principles,' and 'reliable prediction' are presented at a high conceptual level with no formal conditions, operational criteria, or identification procedure for locating such elements inside trained neural networks (whose representations are distributed and learned end-to-end). This renders the application to deep learning systems underdetermined and prevents concrete evaluation of whether any given system satisfies the model.

Authors: We agree the definitions are intentionally conceptual, as the paper proposes a philosophical model rather than a computational procedure. In revision we have added Section 4, which outlines initial operational heuristics drawing on existing mechanistic interpretability techniques (e.g., representation probing and causal interventions) for identifying candidate internal models and bridge principles. These heuristics are presented as starting points, not as a complete identification algorithm. revision: yes
Referee: The claims that 'contemporary deep learning systems often can and do achieve such understanding' and that they fall short in the three specified respects rest on general assertions rather than empirical data, formal derivations, or detailed case studies. No concrete analyses of particular networks, datasets, or tasks are supplied to ground the Fractured Understanding Hypothesis.

Authors: The paper is a conceptual contribution that relies on established results from the interpretability literature showing that deep networks learn representations tracking statistical regularities. We have added two short illustrative examples (one from image classification and one from language modeling) that connect the model to concrete network behaviors. A full empirical test of the hypothesis lies outside the scope of this theoretical work. revision: partial
Referee: The load-bearing distinction between the proposed 'systematic understanding' and 'ideal scientific understanding' depends on the precise meanings of 'symbolically misaligned,' 'explicitly reductive,' and 'weakly unifying,' none of which receive operational elaboration sufficient for falsifiability or cross-system comparison.

Authors: We have expanded the characterizations of these three terms with additional definitions and concrete links to architectural features of deep networks. The revisions improve qualitative distinguishability, but we note that full operational criteria enabling strict falsifiability would require further empirical and formal work, which we flag as future research. revision: yes

Circularity Check

0 steps flagged

Proposed model is definitional with no load-bearing reductions to inputs or self-citations

full rationale

The paper proposes a model of understanding by definition—an agent understands a property when it contains an adequate internal model that tracks real regularities, is coupled to the target by stable bridge principles, and supports reliable prediction—without deriving this from any prior fitted quantities, equations, or self-citations. Claims that contemporary deep learning systems achieve such understanding (but fall short of scientific understanding due to symbolic misalignment, lack of explicit reduction, and weak unification) rest on external observations of DL systems rather than reducing to parameters or assumptions defined inside the paper. No self-definitional loops, fitted-input predictions, uniqueness theorems imported from the authors' prior work, or ansatzes smuggled via citation appear in the derivation chain. The account is self-contained against external benchmarks and introduces new conceptual machinery without circular collapse.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper's central contribution is a new definition of understanding; it introduces no free parameters or invented physical entities and relies on one core domain assumption about internal models and bridge principles.

axioms (1)

domain assumption An agent understands a property of a target system when it contains an adequate internal model that tracks real regularities, is coupled to the target by stable bridge principles, and supports reliable prediction.
This definition is the load-bearing premise introduced in the abstract and used to evaluate deep learning systems.

pith-pipeline@v0.9.0 · 5372 in / 1232 out tokens · 53206 ms · 2026-05-13T16:46:51.824980+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability echoes
An agent understands a property p of a target system T insofar as ... contains a subsystem M ... that functions as an adequate model ... systematically tracks the property p ... without memorizing it ... appropriate bridge principles ... can use M to (approximately) derive certain properties of p
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes
a pattern in a body of data is real only if it admits a description that is shorter than brute-force enumeration, thereby supporting reliable prediction ... MDL ... L(H) + L(D|H) < L(D)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective echoes
structure-preserving mapping between relevant parts of M and T ... homomorphism ... structural understanding

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages · 3 internal anchors

[1]

Achille, A., Paolini, G., and Soatto, S. (2020). Where is the information in a deep neural network? 49 Achille, A. and Soatto, S. (2018). Emergence of invariance and disentanglement in deep representations. In2018 Information Theory and Applications Workshop (ITA), pages 1–9. Alain, G. and Bengio, Y. (2016). Understanding intermediate layers using linear ...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[2]

Dennett, D. C. (1991a). Real patterns.The Journal of Philosophy, 88(1):27–51. Dennett, D. C. (1991b). Real patterns.The Journal of Philosophy, 88(1):27–51. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805. Dewar, N. (2022).Structure...

work page internal anchor Pith review Pith/arXiv arXiv 2018
[3]

Floridi, L. (2023). Ai as agency without intelligence: on chatgpt, large language models, and other generative models.Philosophy & Technology, 36(1):15. Freeborn, D. (2025a). Sloppy models, renormalization group realism, and the success of science.Erkenntnis, 90(2):645–673. Freeborn, D. P. W. (2025b). Effective theory building and manifold learning.Synthe...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1712.00409 2023
[4]

Li, K. et al. (2023a). Do large language models have a world model?arXiv preprint 55 arXiv:2303.15447. Li, K., Hopkins, A. K., Bau, D., Vi´ egas, F., Pfister, H., and Wattenberg, M. (2023b). Emergent world representations: Exploring a sequence model trained on a syn- thetic task. InThe Eleventh International Conference on Learning Representa- tions. Also ...

work page arXiv 2008
[5]

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2022). High- resolution image synthesis with latent diffusion models.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Rudin, C. (2019). Stop explaining black box machine learning models for high-stakes decisions and use interpretable models instead.N...

work page 2022
[6]

Seeley, T

Duncker & Humblot, Berlin. Seeley, T. D. (1995).The Wisdom of the Hive: The Social Physiology of Honey Bee Colonies. Harvard University Press, Cambridge, MA. Sellars, W. S. (1963). Philosophy and the scientific image of man. In Colodny, R., editor, Science, Perception, and Reality, pages 35–78. Humanities Press/Ridgeview. Shakespeare, W. (1997).Othello. A...

work page 1995
[7]

Thilak, V., Saremi, O., Littwin, E., Paiss, R., Zhai, S., and Susskind, J. (2022). The slingshot mechanism: An empirical study of adaptive optimizers and the grokking phenomenon.arXiv preprint arXiv:2206.04817. Thoren, V. E. (1990).The Lord of Uraniborg: A Biography of Tycho Brahe. Cambridge University Press, Cambridge. Touvron, H., Lavril, T., Izacard, G...

work page arXiv 2022
[8]

Voelkel, J. R. (2001).The Composition of Kepler’s Astronomia nova. Princeton Uni- versity Press, Princeton, NJ. von Frisch, K. (1967).The Dance Language and Orientation of Bees. Harvard University Press, Cambridge, MA. English translation of the original 1965 German edition. Votsis, I. (2015). Unification: Not just a thing of beauty.Theoria, 30(1):97–114....

work page 2001
[9]

Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2017). Understanding deep learning requires rethinking generalization. In5th International Conference on Learning Representations, ICLR

work page 2017
[10]

Zhang, Y. (2024). Causal abstraction in model interpretability: A compact survey. Zipf, G. K. (1949).Human Behavior and the Principle of Least Effort. Addison-Wesley Press. 60

work page 2024