Recognition: 3 theorem links
· Lean TheoremA Model of Understanding in Deep Learning Systems
Pith reviewed 2026-05-13 16:46 UTC · model grok-4.3
The pith
Deep learning systems achieve understanding through internal models tracking real regularities but fall short of scientific understanding.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that contemporary deep learning systems often achieve such understanding. However they generally fall short of the ideal of scientific understanding: the understanding is symbolically misaligned with the target system, not explicitly reductive, and only weakly unifying. I label this the Fractured Understanding Hypothesis.
What carries the argument
The model of systematic understanding, requiring an adequate internal model that tracks real regularities, is coupled by stable bridge principles, and supports reliable prediction.
If this is right
- Deep learning systems can be credited with achieving genuine understanding of properties in target systems.
- Interpretability work should identify and strengthen the internal models and bridge principles in neural networks.
- Reliable prediction is necessary but insufficient without tracking real regularities and stable coupling.
- The model distinguishes practical understanding from the stronger features of scientific understanding.
- This account can evaluate understanding across different machine learning architectures and tasks.
Where Pith is reading between the lines
- The model suggests a path for engineering AI with greater symbolic alignment and unification.
- It links machine learning evaluation to philosophy of science by applying bridge principles to artificial systems.
- A testable extension is to check whether altering internal representations breaks the coupling and reduces prediction reliability.
- If correct, fractured understanding may explain brittle performance in novel situations despite strong benchmark results.
Load-bearing premise
That deep learning systems contain internal models that track real regularities in the target system and are coupled to it by stable bridge principles in the sense defined by the proposed model.
What would settle it
A demonstration that deep learning systems make reliable predictions without their internal representations tracking the actual regularities in the target system or without stable coupling to it via bridge principles.
Figures
read the original abstract
I propose a model of systematic understanding, suitable for machine learning systems. On this account, an agent understands a property of a target system when it contains an adequate internal model that tracks real regularities, is coupled to the target by stable bridge principles, and supports reliable prediction. I argue that contemporary deep learning systems often can and do achieve such understanding. However they generally fall short of the ideal of scientific understanding: the understanding is symbolically misaligned with the target system, not explicitly reductive, and only weakly unifying. I label this the Fractured Understanding Hypothesis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a model of systematic understanding for machine learning systems: an agent understands a target property when it contains an adequate internal model tracking real regularities, coupled to the target via stable bridge principles, and supporting reliable prediction. It claims contemporary deep learning systems often achieve this form of understanding but fall short of ideal scientific understanding because their understanding is symbolically misaligned, not explicitly reductive, and only weakly unifying; this is labeled the Fractured Understanding Hypothesis.
Significance. If the definitional framework can be made precise and operational, the work could supply a useful conceptual tool for distinguishing levels of understanding in AI systems and for clarifying how deep learning representations relate to (or diverge from) scientific understanding. It engages directly with ongoing debates in philosophy of science and AI interpretability.
major comments (3)
- [Proposed model (abstract and main definition)] The central definitions of 'adequate internal model,' 'stable bridge principles,' and 'reliable prediction' are presented at a high conceptual level with no formal conditions, operational criteria, or identification procedure for locating such elements inside trained neural networks (whose representations are distributed and learned end-to-end). This renders the application to deep learning systems underdetermined and prevents concrete evaluation of whether any given system satisfies the model.
- [Application to deep learning and Fractured Understanding Hypothesis] The claims that 'contemporary deep learning systems often can and do achieve such understanding' and that they fall short in the three specified respects rest on general assertions rather than empirical data, formal derivations, or detailed case studies. No concrete analyses of particular networks, datasets, or tasks are supplied to ground the Fractured Understanding Hypothesis.
- [Fractured Understanding Hypothesis] The load-bearing distinction between the proposed 'systematic understanding' and 'ideal scientific understanding' depends on the precise meanings of 'symbolically misaligned,' 'explicitly reductive,' and 'weakly unifying,' none of which receive operational elaboration sufficient for falsifiability or cross-system comparison.
minor comments (1)
- The manuscript would benefit from explicit engagement with related literature in philosophy of science (e.g., on bridge principles and reductive explanation) and AI interpretability (e.g., work on mechanistic interpretability and representation alignment) to situate the proposal.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below, indicating where revisions have been made to the manuscript.
read point-by-point responses
-
Referee: The central definitions of 'adequate internal model,' 'stable bridge principles,' and 'reliable prediction' are presented at a high conceptual level with no formal conditions, operational criteria, or identification procedure for locating such elements inside trained neural networks (whose representations are distributed and learned end-to-end). This renders the application to deep learning systems underdetermined and prevents concrete evaluation of whether any given system satisfies the model.
Authors: We agree the definitions are intentionally conceptual, as the paper proposes a philosophical model rather than a computational procedure. In revision we have added Section 4, which outlines initial operational heuristics drawing on existing mechanistic interpretability techniques (e.g., representation probing and causal interventions) for identifying candidate internal models and bridge principles. These heuristics are presented as starting points, not as a complete identification algorithm. revision: yes
-
Referee: The claims that 'contemporary deep learning systems often can and do achieve such understanding' and that they fall short in the three specified respects rest on general assertions rather than empirical data, formal derivations, or detailed case studies. No concrete analyses of particular networks, datasets, or tasks are supplied to ground the Fractured Understanding Hypothesis.
Authors: The paper is a conceptual contribution that relies on established results from the interpretability literature showing that deep networks learn representations tracking statistical regularities. We have added two short illustrative examples (one from image classification and one from language modeling) that connect the model to concrete network behaviors. A full empirical test of the hypothesis lies outside the scope of this theoretical work. revision: partial
-
Referee: The load-bearing distinction between the proposed 'systematic understanding' and 'ideal scientific understanding' depends on the precise meanings of 'symbolically misaligned,' 'explicitly reductive,' and 'weakly unifying,' none of which receive operational elaboration sufficient for falsifiability or cross-system comparison.
Authors: We have expanded the characterizations of these three terms with additional definitions and concrete links to architectural features of deep networks. The revisions improve qualitative distinguishability, but we note that full operational criteria enabling strict falsifiability would require further empirical and formal work, which we flag as future research. revision: yes
Circularity Check
Proposed model is definitional with no load-bearing reductions to inputs or self-citations
full rationale
The paper proposes a model of understanding by definition—an agent understands a property when it contains an adequate internal model that tracks real regularities, is coupled to the target by stable bridge principles, and supports reliable prediction—without deriving this from any prior fitted quantities, equations, or self-citations. Claims that contemporary deep learning systems achieve such understanding (but fall short of scientific understanding due to symbolic misalignment, lack of explicit reduction, and weak unification) rest on external observations of DL systems rather than reducing to parameters or assumptions defined inside the paper. No self-definitional loops, fitted-input predictions, uniqueness theorems imported from the authors' prior work, or ansatzes smuggled via citation appear in the derivation chain. The account is self-contained against external benchmarks and introduces new conceptual machinery without circular collapse.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption An agent understands a property of a target system when it contains an adequate internal model that tracks real regularities, is coupled to the target by stable bridge principles, and supports reliable prediction.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability echoesAn agent understands a property p of a target system T insofar as ... contains a subsystem M ... that functions as an adequate model ... systematically tracks the property p ... without memorizing it ... appropriate bridge principles ... can use M to (approximately) derive certain properties of p
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoesa pattern in a body of data is real only if it admits a description that is shorter than brute-force enumeration, thereby supporting reliable prediction ... MDL ... L(H) + L(D|H) < L(D)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_injective echoesstructure-preserving mapping between relevant parts of M and T ... homomorphism ... structural understanding
Reference graph
Works this paper leans on
-
[1]
Achille, A., Paolini, G., and Soatto, S. (2020). Where is the information in a deep neural network? 49 Achille, A. and Soatto, S. (2018). Emergence of invariance and disentanglement in deep representations. In2018 Information Theory and Applications Workshop (ITA), pages 1–9. Alain, G. and Bengio, Y. (2016). Understanding intermediate layers using linear ...
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[2]
Dennett, D. C. (1991a). Real patterns.The Journal of Philosophy, 88(1):27–51. Dennett, D. C. (1991b). Real patterns.The Journal of Philosophy, 88(1):27–51. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805. Dewar, N. (2022).Structure...
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[3]
Floridi, L. (2023). Ai as agency without intelligence: on chatgpt, large language models, and other generative models.Philosophy & Technology, 36(1):15. Freeborn, D. (2025a). Sloppy models, renormalization group realism, and the success of science.Erkenntnis, 90(2):645–673. Freeborn, D. P. W. (2025b). Effective theory building and manifold learning.Synthe...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1712.00409 2023
-
[4]
Li, K. et al. (2023a). Do large language models have a world model?arXiv preprint 55 arXiv:2303.15447. Li, K., Hopkins, A. K., Bau, D., Vi´ egas, F., Pfister, H., and Wattenberg, M. (2023b). Emergent world representations: Exploring a sequence model trained on a syn- thetic task. InThe Eleventh International Conference on Learning Representa- tions. Also ...
-
[5]
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2022). High- resolution image synthesis with latent diffusion models.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Rudin, C. (2019). Stop explaining black box machine learning models for high-stakes decisions and use interpretable models instead.N...
work page 2022
-
[6]
Duncker & Humblot, Berlin. Seeley, T. D. (1995).The Wisdom of the Hive: The Social Physiology of Honey Bee Colonies. Harvard University Press, Cambridge, MA. Sellars, W. S. (1963). Philosophy and the scientific image of man. In Colodny, R., editor, Science, Perception, and Reality, pages 35–78. Humanities Press/Ridgeview. Shakespeare, W. (1997).Othello. A...
work page 1995
-
[7]
Thilak, V., Saremi, O., Littwin, E., Paiss, R., Zhai, S., and Susskind, J. (2022). The slingshot mechanism: An empirical study of adaptive optimizers and the grokking phenomenon.arXiv preprint arXiv:2206.04817. Thoren, V. E. (1990).The Lord of Uraniborg: A Biography of Tycho Brahe. Cambridge University Press, Cambridge. Touvron, H., Lavril, T., Izacard, G...
-
[8]
Voelkel, J. R. (2001).The Composition of Kepler’s Astronomia nova. Princeton Uni- versity Press, Princeton, NJ. von Frisch, K. (1967).The Dance Language and Orientation of Bees. Harvard University Press, Cambridge, MA. English translation of the original 1965 German edition. Votsis, I. (2015). Unification: Not just a thing of beauty.Theoria, 30(1):97–114....
work page 2001
-
[9]
Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2017). Understanding deep learning requires rethinking generalization. In5th International Conference on Learning Representations, ICLR
work page 2017
-
[10]
Zhang, Y. (2024). Causal abstraction in model interpretability: A compact survey. Zipf, G. K. (1949).Human Behavior and the Principle of Least Effort. Addison-Wesley Press. 60
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.