Recognition: 2 theorem links
· Lean TheoremWhy Code, Why Now: An Information-Theoretic Perspective on the Limits of Machine Learning
Pith reviewed 2026-05-15 21:54 UTC · model grok-4.3
The pith
The ceiling on machine learning progress is set by the information structure of the task rather than model size or algorithms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that limits on machine learning are determined by the information structure of the task. The paper distinguishes expressibility, computability, and learnability as separate properties of computational problems, establishes their relationships, and proposes a five-level hierarchy of learnability that ranks tasks by feedback quality. This hierarchy accounts for why supervised learning on code scales predictably while reinforcement learning generally does not.
What carries the argument
A five-level hierarchy of learnability that classifies tasks according to the density, locality, and verifiability of their feedback signals.
If this is right
- Supervised learning on code will continue to scale predictably with model size because of its position in the hierarchy.
- Reinforcement learning tasks lacking dense local feedback will hit performance ceilings independent of model scale.
- Diagnosing a task's place in the hierarchy will prove more predictive of scaling success than model architecture choices.
- The assumption that scaling laws apply uniformly across machine learning requires revision for tasks with sparse feedback.
- A unified template comparing computational problems makes explicit where implications among expressibility, computability, and learnability hold or fail.
Where Pith is reading between the lines
- Tasks could be engineered to rise in the hierarchy by adding mechanisms for local verification.
- The framework suggests prioritizing feedback-dense data collection over pure increases in model capacity.
- Hybrid systems that embed code-like verification into reinforcement learning environments might improve scaling.
- Quantifying information density per task could yield practical tests for predicting scaling exponents in advance.
Load-bearing premise
The proposed five-level hierarchy accurately predicts which tasks will exhibit reliable scaling behavior.
What would settle it
Demonstration of a task low in the hierarchy that nevertheless scales reliably with model size, or a high-hierarchy task that fails to scale despite dense feedback.
read the original abstract
This paper offers a new perspective on the limits of machine learning: the ceiling on progress is set not by model size or algorithm choice but by the information structure of the task itself. Code generation has progressed more reliably than reinforcement learning, largely because code provides dense, local, verifiable feedback at every token, whereas most reinforcement learning problems do not. This difference in feedback quality is not binary but graded. We propose a five-level hierarchy of learnability based on information structure and argue that diagnosing a task's position in this hierarchy is more predictive of scaling outcomes than any property of the model. The hierarchy rests on a formal distinction among three properties of computational problems (expressibility, computability, and learnability). We establish their pairwise relationships, including where implications hold and where they fail, and present a unified template that makes the structural differences explicit. The analysis suggests why supervised learning on code scales predictably while reinforcement learning does not, and why the common assumption that scaling alone will solve remaining ML challenges warrants scrutiny.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that limits on machine learning progress are set by the information structure of the task itself rather than model size or algorithm choice. It proposes a five-level hierarchy of learnability derived from formal distinctions among expressibility, computability, and learnability, along with a unified template for their relationships. This framework is used to explain why supervised learning on code exhibits reliable scaling while reinforcement learning does not, attributing the difference to the density and verifiability of feedback signals, and argues that diagnosing a task's position in the hierarchy is more predictive of scaling outcomes than model properties.
Significance. If the hierarchy can be made precise with explicit level definitions, task mappings, and empirical links to scaling behavior, the work would offer a valuable conceptual tool for understanding why certain domains scale predictably while others do not. It could shift research focus toward characterizing task information structure and away from purely scaling-based assumptions. The absence of derivations, concrete examples, or validation in the current manuscript limits its immediate impact to a high-level perspective rather than a demonstrated predictor.
major comments (2)
- [Abstract and hierarchy proposal] The central claim that the five-level hierarchy predicts scaling reliability rests on distinctions among expressibility, computability, and learnability, but the manuscript provides no explicit definitions of the five levels, no assignment of concrete tasks (e.g., code generation vs. RL) to specific levels, and no derivation showing how level membership maps to observable quantities such as scaling exponents or feedback density. This renders the predictive assertion untested.
- [Formal distinctions and unified template] The abstract states that pairwise relationships among expressibility, computability, and learnability are established 'including where implications hold and where they fail,' yet no such relationships, proofs, or counterexamples are supplied in the text. Without these, the unified template cannot be evaluated for internal consistency or novelty relative to standard computability theory.
minor comments (1)
- [Abstract] The abstract refers to 'dense, local, verifiable feedback at every token' for code but does not quantify this property or contrast it formally with RL feedback structures; adding a brief operational definition would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that the original manuscript presented the hierarchy and formal relationships at too high a level, limiting testability. The revised version adds explicit level definitions, task mappings, a derivation linking levels to scaling exponents via feedback density, and a section with proofs, implications, and counterexamples for the expressibility-computability-learnability relationships.
read point-by-point responses
-
Referee: [Abstract and hierarchy proposal] The central claim that the five-level hierarchy predicts scaling reliability rests on distinctions among expressibility, computability, and learnability, but the manuscript provides no explicit definitions of the five levels, no assignment of concrete tasks (e.g., code generation vs. RL) to specific levels, and no derivation showing how level membership maps to observable quantities such as scaling exponents or feedback density. This renders the predictive assertion untested.
Authors: We agree the original text was insufficiently precise. The revision adds Section 3 with explicit definitions: Level 1 requires dense, local, per-token verifiable feedback; Level 2 has verifiable but delayed feedback; Level 3 has partial verifiability; Level 4 has sparse non-verifiable rewards; Level 5 has no reliable signal. Code generation is assigned to Level 1 and typical RL to Level 4. We derive the scaling link by modeling feedback density as inversely proportional to gradient variance, yielding a closed-form relation where lower levels produce scaling exponents closer to the information-theoretic optimum. This makes the predictive claim directly testable against existing scaling curves. revision: yes
-
Referee: [Formal distinctions and unified template] The abstract states that pairwise relationships among expressibility, computability, and learnability are established 'including where implications hold and where they fail,' yet no such relationships, proofs, or counterexamples are supplied in the text. Without these, the unified template cannot be evaluated for internal consistency or novelty relative to standard computability theory.
Authors: The revision includes a new Section 4 that states and proves the relationships. Expressibility implies computability by definition, but neither implies learnability; counterexamples include the halting problem (expressible and computable yet unlearnable from finite samples) and certain sparse-reward MDPs (computable but not PAC-learnable). The unified template is given as an implication diagram plus a table of where each implication fails. Novelty lies in extending classical computability to the finite-data, noisy-feedback regime relevant to scaling laws. revision: yes
Circularity Check
No circularity: hierarchy derived from independent computability distinctions
full rationale
The paper proposes a five-level hierarchy of learnability resting on formal distinctions among expressibility, computability, and learnability, then establishes their pairwise relationships and a unified template. No step reduces a prediction or central claim to a fitted parameter, self-definition, or self-citation chain; the distinctions are presented as external formal properties used to classify tasks, with no equations or definitions shown to be equivalent by construction to the scaling predictions they support. The derivation remains self-contained against external benchmarks of computability theory.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Expressibility, computability, and learnability are distinct properties of computational problems whose pairwise relationships can be formally established.
invented entities (1)
-
Five-level hierarchy of learnability
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a five-level hierarchy of learnability based on information structure... Level 4: Direct... deterministic verification
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
formal distinction among three properties... expressibility, computability, and learnability... unified template
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman.Compilers: Principles, Techniques, and Tools. Pearson, 2nd edition, 2006
work page 2006
-
[2]
Dana Angluin. Inductive inference of formal languages from positive data.Information and Control, 45(2):117–135, 1980
work page 1980
-
[3]
Marcelo Arenas, Pablo Barceló, Luis Cofré, and Alexander Kozachinskiy. Language generation: Complexity barriers and implications for learning.arXiv preprint arXiv:2511.05759, 2025
-
[4]
Non-stationary stochastic optimization
Omar Besbes, Yonatan Gur, and Assaf Zeevi. Non-stationary stochastic optimization. Operations Research, 63(5):1227–1244, 2015
work page 2015
-
[5]
Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension.Journal of the ACM, 36(4):929–965, 1989
work page 1989
-
[6]
Moses Charikar and Chirag Pabbaraju. On the non-uniform generation of countable language collections.arXiv preprint arXiv:2411.15364, 2024
-
[7]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[8]
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Zhihan Chu et al. Supervised fine-tuning vs. reinforcement learning for generalization.arXiv preprint arXiv:2501.17161, 2025. 22
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[9]
Alonzo Church. An unsolvable problem of elementary number theory.American Journal of Mathematics, 58(2):345–363, 1936
work page 1936
-
[10]
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Ganqu Cui et al. Policy entropy exhaustion in language model reinforcement learning.arXiv preprint arXiv:2505.22617, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[11]
Positional attention: Expressivity and learnability of algorithmic computation
Artur Back de Luca, George Giapitzakis, Shenghao Yang, Petar Veličković, and Kimon Fountoulakis. Positional attention: Expressivity and learnability of algorithmic computation. arXiv preprint arXiv:2410.01686, 2025
-
[12]
The Lean 4 theorem prover and programming language
Leonardo de Moura and Sebastian Ullrich. The Lean 4 theorem prover and programming language. InProceedings of the 28th International Conference on Automated Deduction (CADE), pages 625–635, 2021
work page 2021
-
[13]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
On Goodhart’s law, with an application to value alignment.arXiv preprint arXiv:2410.09638, 2024
El-Mahdi El-Mhamdi and Lê-Nguyên Hoang. On Goodhart’s law, with an application to value alignment.arXiv preprint arXiv:2410.09638, 2024
-
[15]
Testing the manifold hypothesis
Charles Fefferman, Sanjoy Mitter, and Hariharan Narayanan. Testing the manifold hypothesis. Journal of the American Mathematical Society, 29(4):983–1049, 2016
work page 2016
- [16]
- [17]
-
[18]
Yiding Jiang, Soufiane Hayou, and Andrew Gordon Wilson. Epiplexity: Structural information extractable by a computationally bounded observer.arXiv preprint arXiv:2507.02598, 2025
-
[19]
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman...
work page 2021
-
[20]
On the limits of language generation: Trade-offs between hallucination and mode collapse
Alkis Kalavasis, Anay Mehrotra, and Nikolas Velegkas. On the limits of language generation: Trade-offs between hallucination and mode collapse. InProceedings of the 57th Annual ACM Symposium on Theory of Computing (STOC), 2025
work page 2025
-
[21]
Scaling Laws for Neural Language Models
JaredKaplan, SamMcCandlish, TomHenighan, TomB.Brown, BenjaminChess, RewonChild, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[22]
Goodhart’s law in reinforcement learning.arXiv preprint arXiv:2310.09144, 2024
Jacek Karwowski, Oliver Hayman, Xingjian Bai, Klaus Kiendlhofer, Charlie Griffin, and Joar Skalse. Goodhart’s law in reinforcement learning.arXiv preprint arXiv:2310.09144, 2024. 23
-
[23]
Kiani, Jason Wang, and Melanie Weber
Bobak T. Kiani, Jason Wang, and Melanie Weber. Hardness of learning neural networks under the manifold hypothesis.arXiv preprint arXiv:2406.01461, 2024
-
[24]
Language generation in the limit.arXiv preprint arXiv:2404.06757, 2024
Jon Kleinberg and Sendhil Mullainathan. Language generation in the limit.arXiv preprint arXiv:2404.06757, 2024
-
[25]
Generation through the lens of learning theory
Chinmay Li, Meghal Raman, and Ambuj Tewari. Generation through the lens of learning theory. InProceedings of the 38th Annual Conference on Learning Theory (COLT), 2025
work page 2025
-
[26]
Reasoning kingdom: Chapter 12 — implicit reasoning and the Yonglin formula, 2025
Zixi Li. Reasoning kingdom: Chapter 12 — implicit reasoning and the Yonglin formula, 2025. https://datawhalechina.github.io/reasoning-kingdom/chapter12
work page 2025
-
[27]
Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step.arXiv preprint arXiv:2305.20050, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[28]
Gabriel Loaiza-Ganem, Brendan Leigh Ross, Rasa Hosseinzadeh, Anthony L. Caterini, and Jesse C. Cresswell. Deep generative models through the lens of the manifold hypothesis: A survey and new connections.arXiv preprint arXiv:2404.02954, 2024
-
[29]
Categorizing Variants of Goodhart's Law
David Manheim and Scott Garrabrant. Categorizing variants of Goodhart’s law.arXiv preprint arXiv:1803.04585, 2019
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[30]
Juan Perdomo, Tijana Zrnic, Celestine Mendler-Dünner, and Moritz Hardt. Performative prediction. InProceedings of the 37th International Conference on Machine Learning (ICML), pages 7599–7609, 2020
work page 2020
-
[31]
Language models are unsupervised multitask learners
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. Technical report, OpenAI, 2019
work page 2019
-
[32]
Modeling by shortest data description.Automatica, 14(5):465–471, 1978
Jorma Rissanen. Modeling by shortest data description.Automatica, 14(5):465–471, 1978
work page 1978
- [33]
- [34]
-
[35]
Cambridge University Press, 2014
Shai Shalev-Shwartz and Shai Ben-David.Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014
work page 2014
-
[36]
Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, and Karthik Sridharan. Learnability, stability, and uniform convergence.Journal of Machine Learning Research, 11:2635–2670, 2010
work page 2010
-
[37]
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the g...
work page 2016
-
[38]
Oren Sultan, Jordi Armengol-Estape, Pascal Kesseli, Julien Vanegue, Dafna Shahaf, Yossi Adi, and Peter O’Hearn. LLMs versus the halting problem: Revisiting program termination prediction.arXiv preprint arXiv:2601.18987, 2025. 24
-
[39]
Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. MIT Press, 2nd edition, 2018
work page 2018
-
[40]
Competition on software verification and witness validation: Results
SV-COMP. Competition on software verification and witness validation: Results. 2025.https: //sv-comp.sosy-lab.org/2025/results/
work page 2025
-
[41]
Alan M. Turing. On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, s2-42(1):230–265, 1936
work page 1936
-
[42]
David A. Turner. Total functional programming.Journal of Universal Computer Science, 10 (7):751–768, 2004
work page 2004
-
[43]
Leslie G. Valiant. A theory of the learnable.Communications of the ACM, 27(11):1134–1142, 1984
work page 1984
-
[44]
Philip Wadler. Propositions as types. InCommunications of the ACM, volume 58, pages 75–84, 2015
work page 2015
-
[45]
Wang, Keir Dorchen, and Peter Jin
Charles L. Wang, Keir Dorchen, and Peter Jin. On the statistical limits of self-improving agents.arXiv preprint arXiv:2510.04399, 2025
-
[46]
Texts in Theoretical Computer Science
Klaus Weihrauch.Computable Analysis: An Introduction. Texts in Theoretical Computer Science. Springer, 2000
work page 2000
-
[47]
Ke Xu and Wei Li. Exact phase transitions in random constraint satisfaction problems.Journal of Artificial Intelligence Research, 12:93–103, 2000
work page 2000
-
[48]
Stronger supervised checkpoints underperform after reinforcement learning
Yao Zhang et al. Stronger supervised checkpoints underperform after reinforcement learning. arXiv preprint arXiv:2602.01058, 2026. 25 Table 7: Summary of notation. Symbol Meaning Spaces and sets XInstance space (countable set) ΣFinite alphabet Σ∗ Set of all finite strings overΣ Rd d-dimensional real space NNatural numbers X <∞ All finite sequences overX, ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.