Recognition: 3 theorem links
· Lean TheoremOn the Measure of Intelligence
Pith reviewed 2026-05-12 12:59 UTC · model grok-4.3
The pith
Intelligence is the efficiency of acquiring skills from limited experience, not performance on fixed tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Intelligence is formalized as skill-acquisition efficiency: the rate at which a system develops new capabilities given a defined scope of tasks, a level of generalization difficulty, and a quantity of experience, while incorporating its priors. This formulation, rooted in algorithmic information theory, treats skill at any single task as an insufficient proxy because priors and experience heavily modulate observed performance. The definition therefore directs evaluation toward how economically a system converts limited experience into broad competence.
What carries the argument
The formal definition of intelligence as skill-acquisition efficiency from algorithmic information theory, which isolates generalization power by controlling for priors and experience across tasks of varying difficulty.
If this is right
- Benchmarks must limit the priors and experience supplied to systems so that measured performance reflects acquisition efficiency rather than external resources.
- Comparisons between AI and humans become possible once both operate under comparable innate priors on the same task scope.
- AI progress can be tracked by improvements in skill-acquisition efficiency rather than by gains on any single fixed task.
- The Abstraction and Reasoning Corpus provides one concrete realization of these guidelines for measuring fluid, human-like intelligence.
Where Pith is reading between the lines
- Systems optimized under this measure may generalize more readily to open-ended real-world problems than those trained on narrow, data-heavy tasks.
- The definition could be applied to non-benchmark settings by defining new task scopes and measuring acquisition rates under controlled priors.
- If the approach holds, large-scale pretraining on fixed datasets would be revealed as a limited path to general intelligence.
Load-bearing premise
The explicit priors chosen for the Abstraction and Reasoning Corpus are sufficiently close to innate human priors that performance differences on the benchmark reflect genuine differences in generalization power.
What would settle it
An AI system that reaches high scores on the Abstraction and Reasoning Corpus yet fails to acquire skills efficiently when tested on a fresh set of tasks with matched priors and experience would show that the benchmark does not isolate the intended form of intelligence.
read the original abstract
To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to "buy" arbitrary levels of skills for a system, in a way that masks the system's own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper summarizes and critically assesses historical definitions of intelligence from psychology and AI, identifies two implicit conceptions guiding them, argues that task-specific skill benchmarks fail to measure intelligence because skill depends on priors and experience, articulates a new formal definition of intelligence grounded in Algorithmic Information Theory as skill-acquisition efficiency (incorporating scope, generalization difficulty, priors, and experience), proposes guidelines for general AI benchmarks, and introduces the Abstraction and Reasoning Corpus (ARC) built on an explicit set of priors designed to approximate innate human priors for measuring human-like fluid intelligence and enabling fair AI-human comparisons.
Significance. If the definition is sound and the ARC priors sufficiently match human innate priors, the work could meaningfully shift AI evaluation toward measuring generalization efficiency rather than acquired skill, providing a principled alternative to current task-specific benchmarks and influencing the design of future intelligence tests.
major comments (2)
- [Section on the new formal definition] The section articulating the new formal definition: intelligence is defined as skill-acquisition efficiency drawing on AIT, but the manuscript provides only a conceptual description without a precise mathematical formulation, derivation from core AIT quantities (such as Kolmogorov complexity), or operationalization that would allow quantitative computation or direct falsification of the definition.
- [ARC benchmark description] The ARC benchmark description and guidelines section: the central claim that ARC enables fair human-AI comparisons rests on the assumption that its enumerated priors (objectness, basic geometry, counting, etc.) are close enough to innate human priors that performance differences isolate generalization efficiency; however, no derivation, empirical calibration against human data, or sensitivity analysis is supplied to show that modifying any listed prior would not materially alter relative scores.
minor comments (2)
- [Abstract and introduction] The abstract and introduction reference 'two historical conceptions of intelligence' without naming or briefly characterizing them, which reduces clarity for readers unfamiliar with the cited psychology and AI literature.
- [Conclusions] The manuscript would benefit from an explicit statement of the scope of the proposed definition (e.g., whether it applies only to fluid intelligence or extends to other forms) to avoid overgeneralization in the conclusions.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments on the manuscript. We address each major comment below, indicating where revisions will be incorporated.
read point-by-point responses
-
Referee: [Section on the new formal definition] The section articulating the new formal definition: intelligence is defined as skill-acquisition efficiency drawing on AIT, but the manuscript provides only a conceptual description without a precise mathematical formulation, derivation from core AIT quantities (such as Kolmogorov complexity), or operationalization that would allow quantitative computation or direct falsification of the definition.
Authors: We acknowledge that the definition is presented conceptually, drawing on AIT to frame intelligence as skill-acquisition efficiency without supplying a closed-form mathematical expression or explicit derivation from Kolmogorov complexity. This choice was made to emphasize the definition's implications for evaluation and to keep it accessible across psychology and AI. In revision, we will expand the section with a more explicit mapping to AIT notions, such as relating efficiency to the incremental reduction in description length for novel tasks, and add discussion of possible operationalizations along with their limitations for direct falsification. revision: partial
-
Referee: [ARC benchmark description] The ARC benchmark description and guidelines section: the central claim that ARC enables fair human-AI comparisons rests on the assumption that its enumerated priors (objectness, basic geometry, counting, etc.) are close enough to innate human priors that performance differences isolate generalization efficiency; however, no derivation, empirical calibration against human data, or sensitivity analysis is supplied to show that modifying any listed prior would not materially alter relative scores.
Authors: The referee is correct that the fairness claim for human-AI comparisons depends on the priors approximating innate human ones, and that the manuscript supplies neither empirical calibration nor sensitivity analysis. We will revise the relevant section to elaborate the rationale for each prior with additional citations from cognitive science literature on core knowledge systems. We will also add an explicit limitations paragraph acknowledging the absence of sensitivity analysis and noting that full empirical calibration is an important avenue for subsequent work. revision: partial
Circularity Check
No circularity: formal AIT-based definition and benchmark guidelines are independent of fitted inputs or self-referential loops.
full rationale
The paper articulates a definition of intelligence as skill-acquisition efficiency drawing directly from established Algorithmic Information Theory concepts (scope, generalization difficulty, priors, experience) without any equations or derivations that reduce back to the paper's own data or assumptions by construction. Guidelines for benchmarks follow from this definition. ARC is presented as one implementation using an explicitly enumerated prior set chosen by the authors; while the claim of closeness to human priors is an unvalidated assumption rather than a derived result, it does not create a self-definitional loop, fitted-parameter prediction, or load-bearing self-citation chain. No steps match the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Intelligence can be formalized as skill-acquisition efficiency using concepts from Algorithmic Information Theory
- domain assumption The priors in ARC are close to innate human priors
Lean theorems connected to this paper
-
Foundation/LawOfExistencelaw_of_existence echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience.
-
Foundation/HierarchyEmergencehierarchy_emergence_forces_phi echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
built upon an explicit set of priors designed to be as close as possible to innate human priors
-
Foundation/DiscretenessForcingdiscreteness_forcing_principle echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to buy arbitrary levels of skills
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 40 Pith papers
-
Gradient-Based Program Synthesis with Neurally Interpreted Languages
NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prio...
-
Are Flat Minima an Illusion?
Flat minima are illusory; generalization is driven by weakness, a reparameterization-invariant measure of compatible completions that predicts performance better than sharpness on MNIST and Fashion-MNIST.
-
Test-Time Learning with an Evolving Library
EvoLib enables LLMs to accumulate, reuse, and evolve knowledge abstractions from inference trajectories at test time, yielding substantial gains on math reasoning, code generation, and agentic benchmarks without param...
-
Assessing the Creativity of Large Language Models: Testing, Limits, and New Frontiers
The Divergent Remote Association Test (DRAT) is the first creativity test that significantly predicts LLMs' scientific ideation ability, unlike prior tests such as DAT or RAT.
-
Prospective Compression in Human Abstraction Learning
Humans exhibit abstraction learning consistent with prospective compression of future tasks in non-stationary domains, unlike retrospective compression algorithms or LLM-based approaches.
-
When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning
State-conditioned commitment depth in a vision-language policy Pareto-dominates fixed-depth baselines on Sliding Puzzle and Sokoban, raising solve rates by up to 12.5 points while using 25% fewer actions and beating l...
-
Lattice Deduction Transformers
An 800K-parameter Lattice Deduction Transformer reaches 100% accuracy on Sudoku-Extreme and Snowflake Sudoku and 99.9% on Maze-Hard by using lattice projections and abstract-interpretation supervision, while frontier ...
-
Intervention Complexity as a Canonical Reward and a Measure of Intelligence
Intervention complexity provides a family of canonical rewards indexed by resource bias that completes the Legg-Hutter framework and enables a two-dimensional view of intelligence as competence plus learning efficiency.
-
AI scientists produce results without reasoning scientifically
LLM agents execute scientific tasks but fail to follow core scientific reasoning norms such as evidence consideration and belief revision based on refutations.
-
Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning
CoT-PoT ensembling achieves self-consistency accuracy in LLMs with only two samples for 78.6% of tasks, reducing computation by 9.3x compared to standard methods.
-
Yanasse: Finding New Proofs from Deep Vision's Analogies, Part 1
A domain-independent analogy engine transfers Lean tactic patterns from probability to representation theory, producing four new machine-verified proofs.
-
Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs
The paper delivers the first survey of abductive reasoning in LLMs, a unified two-stage taxonomy, a compact benchmark, and an analysis of gaps relative to deductive and inductive reasoning.
-
Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism
ProofGrid is a new benchmark for LLM reasoning that uses machine-checkable proofs in minimal formal notation, revealing progress on basic tasks but major gaps in complex combinatorial and synthesis reasoning.
-
Factorization Regret mediates compositional generalization in latent space
Factorization Regret measures how latent variable interactions affect performance, and RCCs enable learning them to achieve compositional generalization in partially observable tasks.
-
Less is More: Recursive Reasoning with Tiny Networks
TRM with 7M parameters achieves 45% accuracy on ARC-AGI-1 and 8% on ARC-AGI-2, surpassing most LLMs with under 0.01% of their parameters.
-
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
-
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.
-
The Evaluation Trap: Benchmark Design as Theoretical Commitment
AI benchmarks trap progress by operationalizing assumptions that redefine capabilities around the benchmarks themselves, and Epistematics provides an audit procedure to detect when evaluations cannot discriminate clai...
-
The Generalized Turing Test: A Foundation for Comparing Intelligence
The Generalized Turing Test defines relative intelligence as the inability of one agent to distinguish an imitator from the original through interaction.
-
Forge: Quality-Aware Reinforcement Learning for NP-Hard Optimization in LLMs
OPT-BENCH trains LLMs on NP-hard optimization via quality-aware RLVR, achieving 93.1% success rate and 46.6% quality ratio on Qwen2.5-7B while outperforming GPT-4o and transferring gains to other domains.
-
Continuous Latent Diffusion Language Model
Cola DLM proposes a hierarchical latent diffusion model that learns a text-to-latent mapping, fits a global semantic prior in continuous space with a block-causal DiT, and performs conditional decoding, establishing l...
-
Intervention Complexity as a Canonical Reward and a Measure of Intelligence
Intervention complexity provides a family of environment-derived universal rewards indexed by resource bias that completes the Legg-Hutter framework without external normative input.
-
One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.
-
Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale
Evidence for cross-modal representational convergence weakens substantially at scale and in realistic many-to-many settings, indicating models learn rich but distinct representations.
-
Representation-Guided Parameter-Efficient LLM Unlearning
REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.
-
C-voting: Confidence-Based Test-Time Voting without Explicit Energy Functions
C-voting improves recurrent reasoning models by selecting among multiple latent trajectories the one with highest average top-1 probability, achieving 4.9% better Sudoku-hard accuracy than energy-based voting and outp...
-
ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence
ARC-AGI-3 is a benchmark where humans solve 100% of tasks but frontier AI systems score below 1% as of March 2026, using efficiency-based scoring grounded in human baselines.
-
Video models are zero-shot learners and reasoners
Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.
-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
High-entropy minority tokens drive RLVR gains, so restricting gradients to the top 20% maintains or improves performance over full updates on Qwen3 models, especially larger ones.
-
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Repeated sampling scales problem coverage log-linearly with sample count, improving SWE-bench Lite performance from 15.9% to 56% using 250 samples.
-
Deep Vision: A Formal Proof of Wolstenholmes Theorem in Lean 4
Wolstenholme's theorem is formally verified in Lean 4 via expansion of a shifted factorial product and vanishing power sums modulo p.
-
The Rise and Fall of $G$ in AGI
PCA on AI model benchmarks reveals a general intelligence factor that rises then falls as specialized reasoning models appear, inverting the expected move toward parsimonious mechanisms.
-
Kuramoto Oscillatory Phase Encoding: Neuro-inspired Synchronization for Improved Learning Efficiency
KoPE adds Kuramoto-based oscillatory phase states and synchronization to Vision Transformers, improving training, parameter, and data efficiency on structured vision tasks.
-
From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments
An empirical literature analysis reveals a bifurcation in RL environments into Semantic Prior (LLM-dominated) and Domain-Specific Generalization ecosystems with distinct cognitive fingerprints.
-
Hierarchical Reasoning Model
HRM is a recurrent architecture with high-level planning and low-level execution modules that reaches near-perfect accuracy on complex Sudoku, maze navigation, and ARC benchmarks using 27M parameters and 1000 samples ...
-
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.
-
The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents
Agent Cybernetics reframes foundation agent design by adapting classical cybernetics laws into three engineering desiderata for reliable, long-running, self-improving agents.
-
Measuring AI Reasoning: A Guide for Researchers
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.
-
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
-
Auto-Relational Reasoning
A system using auto-relational reasoning solves IQ test problems at 98.03% rate without any prior knowledge, reaching top 1% human performance.
Reference graph
Works this paper leans on
-
[1]
I-athlon: Towards a mul- tidimensional turing test
Sam S Adams, Guruduth Banavar, and Murray Campbell. I-athlon: Towards a mul- tidimensional turing test. AI Magazine, (1):78–84, 2016
work page 2016
-
[2]
Anderson and Christian Lebiere
John R. Anderson and Christian Lebiere. The newell test for a theory of cognition. Behavioral and Brain Sciences, pages 587–601, 2003
work page 2003
- [3]
-
[4]
Minoru Asada et al. Cognitive developmental robotics: A survey.IEEE Transactions on Autonomous Mental Development, pages 12–34, 2009
work page 2009
-
[5]
ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst
Mayank Bansal, Alex Krizhevsky, and Abhijit Ogale. Chauffeurnet: Learn- ing to drive by imitating the best and synthesizing the worst. arXiv preprint arXiv:1812.03079, 2018
work page Pith review arXiv 2018
-
[6]
Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling
Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. J. Artif. Int. Res., (1):253–279, May 2013
work page 2013
-
[7]
The animal-ai environment: Training and testing animal- like artificial cognition, 2019
Benjamin Beyret, Jos Hernndez-Orallo, Lucy Cheke, Marta Halina, Murray Shana- han, and Matthew Crosby. The animal-ai environment: Training and testing animal- like artificial cognition, 2019
work page 2019
-
[8]
Mthodes nouvelles pour le diagnostic du niveau intellectuel des anormaux
Alfred Binet and Thodore Simon. Mthodes nouvelles pour le diagnostic du niveau intellectuel des anormaux. L’anne psychologique, pages 191–244, 1904
work page 1904
-
[9]
What is artificial intelligence? psycho- metric ai as an answer
Selmer Bringsjord and Bettina Schimanski. What is artificial intelligence? psycho- metric ai as an answer. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI’03, pages 887–893, San Francisco, CA, USA, 2003. Morgan Kaufmann Publishers Inc
work page 2003
-
[10]
Sample-efficient reinforcement learning with stochastic ensemble value expansion, 2018
Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, and Honglak Lee. Sample-efficient reinforcement learning with stochastic ensemble value expansion, 2018
work page 2018
-
[11]
The 2005 DARPA Grand Chal- lenge: The Great Robot Race
Martin Buehler, Karl Iagnemma, and Sanjiv Singh. The 2005 DARPA Grand Chal- lenge: The Great Robot Race . Springer Publishing Company, Incorporated, 1st edition, 2007
work page 2005
-
[12]
Joseph Hoane, Jr., and Feng-hsiung Hsu
Murray Campbell, A. Joseph Hoane, Jr., and Feng-hsiung Hsu. Deep blue. Artif. Intell., (1-2):57–83, 2002
work page 2002
-
[13]
Raymond B. Cattell. Abilities: Their structure, growth, and action. 1971
work page 1971
-
[14]
G. Chaitin. Algorithmic Information Theory. Cambridge University Press, 1987. 58
work page 1987
-
[15]
A theory of program size formally identical to information theory
Gregory J Chaitin. A theory of program size formally identical to information theory. Journal of the ACM (JACM), (3):329–340, 1975
work page 1975
-
[16]
Francois Chollet. Deep Learning with Python. Manning Publications, 2017
work page 2017
-
[17]
Quantifying generalization in reinforcement learning
Karl Cobbe, Oleg Klimov, Christopher Hesse, Taehoon Kim, and John Schulman. Quantifying generalization in reinforcement learning. CoRR, 2018
work page 2018
-
[18]
Cultural perceptions of human intelligence
Ebinepre A Cocodia. Cultural perceptions of human intelligence. Journal of Intelli- gence, 2(4):180–196, 2014
work page 2014
-
[19]
L. Cosmides and J. Tooby. Origins of domain specificity: the evolution of functional organization. page 85116, 1994
work page 1994
-
[20]
Introduction to classical and modern test theory
Linda Crocker and James Algina. Introduction to classical and modern test theory. ERIC, 1986
work page 1986
- [21]
-
[22]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large- Scale Hierarchical Image Database. In CVPR09, 2009
work page 2009
-
[23]
D. K. Detterman. A challenge to watson. Intelligence, page 7778, 2011
work page 2011
-
[24]
T.G. Evans. A program for the solution of a class of geometric-analogy intelligence- test questions. pages 271–353, 1968
work page 1968
-
[25]
What is intelligence?: Beyond the Flynn effect
James R Flynn. What is intelligence?: Beyond the Flynn effect. Cambridge Univer- sity Press, 2007
work page 2007
-
[26]
Richard M Friedberg. A learning machine: Part i. IBM Journal of Research and Development, 2(1):2–13, 1958
work page 1958
-
[27]
Beyond the Turing Test (workshop), 2014
Manuela Veloso Gary Marcus, Francesca Rossi. Beyond the Turing Test (workshop), 2014
work page 2014
-
[28]
B. Goertzel and C. Pennachin, editors. Artificial general intelligence. Springer, New York, 2007
work page 2007
-
[29]
Intelligence and computer simulation
Bert F Green Jr. Intelligence and computer simulation. Transactions of the New York Academy of Sciences, 1964
work page 1964
-
[30]
Peter D. Gr ¨unwald and Paul M. B. Vit´anyi. Algorithmic information theory. 2008
work page 2008
-
[31]
Inductive programming meets the real world
Sumit Gulwani, Jos ´e Hern´andez-Orallo, Emanuel Kitzelmann, Stephen H Muggle- ton, Ute Schmid, and Benjamin Zorn. Inductive programming meets the real world. Communications of the ACM, 58(11):90–99, 2015
work page 2015
-
[32]
Sumit Gulwani, Alex Polozov, and Rishabh Singh. Program Synthesis. 2017
work page 2017
-
[33]
William H. Guss, Cayden Codel, Katja Hofmann, Brandon Houghton, Noburu Kuno, Stephanie Milani, Sharada Prasanna Mohanty, Diego Perez Liebana, Rus- lan Salakhutdinov, Nicholay Topin, Manuela Veloso, and Phillip Wang. The minerl competition on sample efficient reinforcement learning using human priors. CoRR, 2019. 59
work page 2019
-
[34]
R. Hambleton, H. Swaminathan, and H. Rogers. Fundamentals of Item Response Theory. Sage Publications, Inc., 1991
work page 1991
- [35]
-
[36]
Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement
Jos ´e Hern ´andez-Orallo. Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement. Artificial Intelligence Review, pages 397–447, 2017
work page 2017
-
[37]
The Measure of All Minds: Evaluating Natural and Artificial Intelligence
Jos ´e Hern´andez-Orallo. The Measure of All Minds: Evaluating Natural and Artificial Intelligence. Cambridge University Press, 2017
work page 2017
-
[38]
Jos ´e Hern´andez-Orallo and David L Dowe. Measuring universal intelligence: To- wards an anytime intelligence test.Artificial Intelligence, 174(18):1508–1539, 2010
work page 2010
-
[39]
Dowe, and M.Victoria Hern ´andez-Lloreda
Jos ´e Hern´andez-Orallo, David L. Dowe, and M.Victoria Hern ´andez-Lloreda. Uni- versal psychometrics. Cogn. Syst. Res., (C):50–74, March 2014
work page 2014
-
[40]
A formal definition of intelli- gence based on an intensional variant of algorithmic complexity
Jos ´e Hern ´andez-Orallo and Neus Minaya-Collado. A formal definition of intelli- gence based on an intensional variant of algorithmic complexity. 1998
work page 1998
-
[41]
G.E. Hinton. How neural networks learn from experience. Mind and brain: Read- ings from the Scientific American magazine, page 113124, 1993
work page 1993
-
[42]
Human Nature: or The fundamental Elements of Policie
Thomas Hobbes. Human Nature: or The fundamental Elements of Policie. 1650
-
[43]
Universal artificial intelligence: Sequential decisions based on al- gorithmic probability
Marcus Hutter. Universal artificial intelligence: Sequential decisions based on al- gorithmic probability. Springer Science & Business Media, 2004
work page 2004
-
[44]
D.L. Dowe J. Hernndez-Orallo. Iq tests are not for machines, yet. Intelligence, page 7781, 2012
work page 2012
-
[45]
Predicting the generalization gap in deep networks with margin distributions
Yiding Jiang, Dilip Krishnan, Hossein Mobahi, and Samy Bengio. Predicting the generalization gap in deep networks with margin distributions. ArXiv, 2018
work page 2018
-
[46]
Measuring the tendency of cnns to learn surface sta- tistical regularities
Jason Jo and Yoshua Bengio. Measuring the tendency of cnns to learn surface sta- tistical regularities. ArXiv, 2017
work page 2017
-
[47]
Raven J. John. Raven Progressive Matrices. Springer, Boston, MA, 2003
work page 2003
-
[48]
Wendy Johnson and Thomas J.Bouchard Jr. The structure of human intelligence: It is verbal, perceptual, and image rotation (vpr), not fluid and crystallized. Intelligence, pages 393–416, 2005
work page 2005
-
[49]
Arthur Juliani, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Ervin Teng, Hunter Henry, Adam Crespi, Julian Togelius, and Danny Lange. Obstacle tower: A generalization challenge in vision, control, and planning.Proceedings of the Twenty- Eighth International Joint Conference on Artificial Intelligence, Aug 2019
work page 2019
-
[50]
Illuminating generalization in deep reinforcement learning through procedural level generation
Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Ju- lian Togelius, and Sebastian Risi. Illuminating generalization in deep reinforcement learning through procedural level generation. arXiv preprint arXiv:1806.10729 , 2018. 60
-
[51]
Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J. Gersh- man. Building machines that learn and think like people. CoRR, 2016
work page 2016
-
[52]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, (7553):436, 2015
work page 2015
-
[53]
A collection of definitions of intelligence
Shane Legg and Marcus Hutter. A collection of definitions of intelligence. 2007
work page 2007
-
[54]
Universal intelligence: A definition of machine intelligence
Shane Legg and Marcus Hutter. Universal intelligence: A definition of machine intelligence. Minds and machines, 17(4):391–444, 2007
work page 2007
-
[55]
An introduction to Kolmogorov complexity and its applications, volume 3
Ming Li, Paul Vit ´anyi, et al. An introduction to Kolmogorov complexity and its applications, volume 3. Springer
-
[56]
An Essay Concerning Human Understanding
John Locke. An Essay Concerning Human Understanding. 1689
-
[57]
Human performance on the traveling salesman and related problems: A review
James Macgregor and Yun Chu. Human performance on the traveling salesman and related problems: A review. The Journal of Problem Solving, 3, 02 2011
work page 2011
-
[58]
Human performance on the traveling sales- man problem
James Macgregor and Thomas Ormerod. Human performance on the traveling sales- man problem. Perception & psychophysics, 58:527–39, 06 1996
work page 1996
-
[59]
Deep Learning: A Critical Appraisal
Gary Marcus. Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631, 2018
work page Pith review arXiv 2018
-
[60]
Generality in artificial intelligence
John McCarthy. Generality in artificial intelligence. Communications of the ACM, 30(12):1030–1035, 1987
work page 1987
-
[61]
Machines Who Think: A Personal Inquiry into the History and Prospects of Artificial Intelligence
Pamela McCorduck. Machines Who Think: A Personal Inquiry into the History and Prospects of Artificial Intelligence. AK Peters Ltd, 2004
work page 2004
-
[62]
The cattell-horn-carroll theory of cognitive abilities: Past, present, and future
Kevin McGrew. The cattell-horn-carroll theory of cognitive abilities: Past, present, and future. Contemporary Intellectual Assessment: Theories, Tests, and Issues , 01 2005
work page 2005
- [63]
-
[64]
Place cells, grid cells, and memory
May-Britt Moser, David C Rowland, and Edvard I Moser. Place cells, grid cells, and memory. Cold Spring Harbor perspectives in biology, 7(2):a021808, 2015
work page 2015
-
[65]
Shane Mueller, Matt Jones, Brandon Minnery, Ph Julia, and M Hiland. The bica cog- nitive decathlon: A test suite for biologically-inspired cognitive agents.Proceedings of the 16th Conference on Behavior Representation in Modeling and Simulation , 2007
work page 2007
-
[66]
A. Newell. You cant play 20 questions with nature and win: Projective comments on the papers of this symposium. 1973
work page 1973
-
[67]
Ex- ploring generalization in deep learning
Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nati Srebro. Ex- ploring generalization in deep learning. In Advances in Neural Information Process- ing Systems, pages 5947–5956, 2017
work page 2017
-
[68]
Behaviour suite for reinforce- ment learning.arXiv preprint arXiv:1908.03568,
Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepezvari, Satinder Singh, et al. Behaviour suite for reinforcement learning. arXiv preprint arXiv:1908.03568, 2019. 61
-
[69]
A. E. Howe P. R. Cohen. How evaluation guides ai research: the message still counts more than the medium. AI Mag, page 35, 1988
work page 1988
-
[70]
Assessing generalization in deep reinforcement learning
Charles Packer, Katelyn Gao, Jernej Kos, Philipp Kr ¨ahenb¨uhl, Vladlen Koltun, and Dawn Xiaodong Song. Assessing generalization in deep reinforcement learning. ArXiv, 2018
work page 2018
-
[71]
Diego Perez-Liebana, Katja Hofmann, Sharada Prasanna Mohanty, Noboru Sean Kuno, Andre Kramer, Sam Devlin, Raluca D. Gaina, and Daniel Ionita. The multi- agent reinforcement learning in malm (marl) competition. Technical report, 2019
work page 2019
-
[72]
Diego Perez-Liebana, Jialin Liu, Ahmed Khalifa, Raluca D Gaina, Julian Togelius, and Simon M Lucas. General video game ai: a multi-track framework for evaluating agents, games and content generation algorithms. arXiv preprint arXiv:1802.10363, 2018
-
[73]
Reproducible, Reusable, and Robust Reinforcement Learning, 2018
Joelle Pineau. Reproducible, Reusable, and Robust Reinforcement Learning, 2018. Neural Information Processing Systems
work page 2018
-
[74]
S. Pinker. The blank slate: The modern denial of human nature. Viking, New York, 2002
work page 2002
-
[75]
David M. W. Powers. The total Turing test and the loebner prize. In New Methods in Language Processing and Computational Natural Language Learning, 1998
work page 1998
- [76]
- [77]
- [78]
-
[79]
& McClelland J.L. Rumelhart, D.E. Distributed memory and the representation of general and specific information.Journal of Experimental Psychology, page 159188, 1985
work page 1985
-
[80]
P. Sanghi and D. L. Dowe. A computer program capable of passing iq tests. page 570575, 2003
work page 2003
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.