Path-dependent program induction under resource constraints explains human sequence learning

Charley M. Wu; David G. Nagy; Hanqi Zhou; Peter Dayan

arxiv: 2606.20623 · v1 · pith:3HZI44ERnew · submitted 2026-05-26 · 💻 cs.AI · cs.CL· cs.LG· cs.PL

Path-dependent program induction under resource constraints explains human sequence learning

Hanqi Zhou , David G. Nagy , Peter Dayan , Charley M. Wu This is my paper

Pith reviewed 2026-06-29 17:42 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LGcs.PL

keywords program inductionsequence learningadaptor grammarrate-distortion theoryresource constraintshuman cognitionmelodic sequenceshierarchical libraries

0 comments

The pith

Hierarchical libraries under memory and computation constraints explain human sequence learning as path-dependent program induction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper integrates rate-distortion theory with program induction to model how people build reusable knowledge from sequential experience despite limited cognitive resources. It proposes a hierarchical Adaptor Grammar that maintains separate local and global libraries, allowing abstractions to depend on the order of encountered tasks. Simulations show this approach yields superior rate-distortion trade-offs and generalization compared to fixed grammars or simple chunking. In a human experiment with melodic sequences, recall errors aligned with model-inferred simplifications and reaction times rose at program boundaries. Trial-by-trial model fits indicated that the hierarchical structure best captured individual variations in both recall performance and choices for continuing sequences.

Core claim

A hierarchical Adaptor Grammar with local and global libraries, jointly constrained by memory and computation, formalizes path-dependent program induction and accounts for human sequence learning, as evidenced by matching recall errors to systematic simplifications, reaction times to inferred boundaries, and superior fits to individual differences over alternative models.

What carries the argument

Hierarchical Adaptor Grammar (HAG) with distinct local (within-task) and global (across-task) libraries governed jointly by constraints on memory and computation; it makes some future structures cheaper to encode based on the order of prior experience.

If this is right

Simulations demonstrate better rate-distortion trade-offs and stronger generalization than fixed grammars or shallow chunking methods.
Recall errors in the melodic task reflect systematic simplifications predicted by the model.
Reaction times increase at the boundaries of programs inferred by the hierarchical grammar.
Hierarchical libraries best explain individual differences in both recall accuracy and out-of-sample continuation choices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the account holds, deliberately ordering the sequence of learning experiences could improve abstraction formation when memory and computation are limited.
The same path-dependent mechanism could apply to other sequential domains such as acquiring syntax or motor skills.
The model predicts specific performance breakdowns when the joint memory-computation constraint is exceeded, offering a way to test resource limits directly.

Load-bearing premise

Participants' recall errors and reaction times directly reflect the program boundaries and simplifications inferred by the specific HAG parameterization under local versus global libraries.

What would settle it

A new melodic sequence experiment in which human recall error patterns and reaction time increases fail to align with the specific program boundaries or simplifications predicted by the hierarchical model, or where non-hierarchical models fit the trial-by-trial data as well or better.

Figures

Figures reproduced from arXiv: 2606.20623 by Charley M. Wu, David G. Nagy, Hanqi Zhou, Peter Dayan.

**Figure 1.** Figure 1: Overview. (a) Hierarchical Adaptor Grammar (HAG) model describes sequence learning as inference over programs under dual resource constraints. An observed sequence X (left) is encoded as a generative program π. A memory constraint sets the representational budget (i.e., rate) for the final program, with β controlling how much shorter programs are preferred over minimizing distortion. A computational constr… view at source ↗

**Figure 2.** Figure 2: Simulation results for the melodic sequence learning task. We study how memory (rate), computation (search and backtracking), and accuracy (distortion) interact over the course of learning. (a) Loss objectives (rate + distortion) for training sequences and one-shot generalization to held-out sequences using learned libraries. (b) Evolution of loss and subsequence length per subprogram as training data incr… view at source ↗

**Figure 3.** Figure 3: (a) Schematic illustration of the four error categories: vertical shifts, temporal shifts, intrusions from earlier notes (RLE) or segments (Chunking), and simplification bias (i.e., corresponds to a simpler program). (b) The histogram of error rates across participants in Phase 2. Participants on average (Pop.) perform well above chance (Rand.) and first-order Markovian model (Stat.). (c) Frequency of dist… view at source ↗

**Figure 4.** Figure 4: Program-based predictors explain human recall and reaction-time behavior. (a) Left, mean logRT across relative note position within each segment. Right, distribution of participant-level average chunk size inferred by the reaction time statistics. (b) Program-boundary effects on note-level RT. Left, mixed-effects regression coefficients predicting logRT from model-predicted boundary probability, controllin… view at source ↗

**Figure 5.** Figure 5: Model fitting, comparison, and parameter–behavior relationships. (a) Model comparison accounting for complexity, shown as relative ∆BIC values with respect to the best model for each participant. (b) Model comparison using protected exceedance probability (PXP), indicating the likelihood that each model is the most frequent generative model in the population. (c) Model-based predictions in the Phase 3. Par… view at source ↗

read the original abstract

How do people build abstract, reusable knowledge from sequential experience under bounded cognitive resources? To answer this question, we integrate rate-distortion theory with recent advances in program induction to describe how prior knowledge shapes which future structures are cheap to encode and easy to discover. We formalize this in a hierarchical Adaptor Grammar (HAG) with distinct local (within-task) and global (across-task) libraries, governed jointly by constraints on memory and computation. In simulations, HAG achieves better rate-distortion trade-offs and stronger generalization than fixed grammars or shallow chunking methods. In an online melodic sequence-learning experiment, participants' recall errors reflected systematic simplifications and reaction times increased at inferred program boundaries. Trial-by-trial fits further showed that hierarchical libraries best explained individual differences in both recall and out-of-sample continuation choices, outperforming all alternative models. These findings cast structured learning as bounded program induction in which the order of experience shapes future abstractions a learner builds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper combines rate-distortion constraints with a hierarchical adaptor grammar that keeps separate local and global libraries to model path-dependent sequence learning, and the fits to human recall plus continuation data look internally consistent.

read the letter

The main point is that this work shows how memory and computation limits can drive the growth of reusable abstractions in sequences through a path-dependent program induction process. The model uses a hierarchical adaptor grammar with distinct local and global libraries, and it produces better rate-distortion trade-offs than fixed grammars or shallow chunking in simulations.

What the paper does well is tie the formal setup to measurable behavior. In the melodic sequence task, recall errors line up with the simplifications the model infers, reaction times rise at the program boundaries it identifies, and trial-by-trial fits capture individual differences in both recall and out-of-sample continuation choices. The parameter count stays low at two constraint values, which keeps the account parsimonious.

The soft spots are limited. The mapping from model states to error patterns and reaction times rests on the assumption that participants' mistakes directly reflect the program's boundaries and compressions; that link could be sensitive to grammar parameterization choices. The out-of-sample continuation claim is presented as a strength, but the exact separation between fitting and prediction steps would benefit from explicit checks in review. No internal contradiction or hidden circularity appears in the reported results.

This paper is aimed at cognitive scientists and AI researchers who work on bounded rationality and program induction. Readers already thinking about rate-distortion or adaptor grammars will see a concrete extension to path dependence and individual differences.

It deserves peer review. The core construction is coherent, the empirical targets are clear, and the model comparison is falsifiable enough to warrant referee time.

Referee Report

2 major / 1 minor

Summary. The paper claims that human sequence learning arises from path-dependent program induction under joint memory and computation constraints, formalized as a hierarchical adaptor grammar (HAG) with distinct local and global libraries. Simulations demonstrate superior rate-distortion trade-offs relative to fixed grammars or shallow chunking; a melodic sequence-learning experiment shows that participants' recall errors reflect systematic simplifications, reaction times rise at inferred program boundaries, and trial-by-trial fits indicate that hierarchical libraries best account for individual differences in both recall and out-of-sample continuation choices.

Significance. If the model-comparison results hold after fuller methodological disclosure, the work supplies a unified computational account that links rate-distortion theory, program induction, and resource bounds to explain how sequential experience shapes reusable abstractions, offering a concrete alternative to purely statistical or chunk-based models of structured learning.

major comments (2)

[Abstract; human-experiment results section] The abstract and the description of the trial-by-trial fits provide no information on how the two free parameters (memory and computation constraints) are estimated, what data-exclusion criteria were applied, or whether error bars reflect participant or model variability; without these details the claim that HAG outperforms alternatives cannot be evaluated for robustness.
[Trial-by-trial fits paragraph] Model selection and explanatory claims both rely on fits to the same recall and continuation data; although continuation is described as out-of-sample, the absence of an explicit held-out partition or pre-registered cross-validation procedure leaves open the possibility that the reported superiority of hierarchical libraries partly reflects the fitting process itself rather than independent predictive success.

minor comments (1)

All model-comparison figures should include participant-level variability or bootstrap intervals and should label the exact number of free parameters used for each alternative model.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on methodological transparency. We address each point below and will revise the manuscript to improve clarity on parameter estimation, data handling, and the out-of-sample evaluation.

read point-by-point responses

Referee: [Abstract; human-experiment results section] The abstract and the description of the trial-by-trial fits provide no information on how the two free parameters (memory and computation constraints) are estimated, what data-exclusion criteria were applied, or whether error bars reflect participant or model variability; without these details the claim that HAG outperforms alternatives cannot be evaluated for robustness.

Authors: We agree that these details are necessary to evaluate robustness. In the revised manuscript we will add a methods subsection describing how the memory and computation constraints were estimated (grid search over a discrete range of values, with the combination maximizing recall likelihood selected per participant), the data-exclusion criteria (participants removed if they failed attention checks or completed fewer than 80% of trials), and that all reported error bars are standard errors across participants. These additions will allow direct assessment of the model-comparison results. revision: yes
Referee: [Trial-by-trial fits paragraph] Model selection and explanatory claims both rely on fits to the same recall and continuation data; although continuation is described as out-of-sample, the absence of an explicit held-out partition or pre-registered cross-validation procedure leaves open the possibility that the reported superiority of hierarchical libraries partly reflects the fitting process itself rather than independent predictive success.

Authors: The continuation choices were held out from parameter estimation; models were fit only to recall data and then evaluated on continuation without refitting. We acknowledge, however, that the manuscript does not explicitly describe the partition or any validation steps. We will revise the text to state the exact held-out procedure (last block of trials per participant reserved for continuation) and note that the split was determined by experimental design rather than pre-registration. If the data permit, we will also report a supplementary cross-validation check to further substantiate the predictive claims. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The abstract and described claims rest on rate-distortion integration into HAG, simulations demonstrating superior trade-offs, and empirical model comparison with out-of-sample continuation choices. No quoted equations or steps reduce by construction to inputs, self-citations, or fitted parameters renamed as predictions. Trial-by-trial fits against alternatives with explicit out-of-sample elements constitute standard validation rather than circular reduction. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Based on abstract only; the central claim rests on the modeling choice that human sequence learning is bounded program induction whose libraries are shaped by rate-distortion trade-offs, but no explicit free parameters, axioms, or invented entities are enumerated in the provided text.

free parameters (2)

memory constraint parameter
Governs capacity of local and global libraries in the HAG; value not stated in abstract.
computation constraint parameter
Controls rate-distortion trade-off during program induction; value not stated in abstract.

axioms (2)

domain assumption Human sequence learning proceeds via program induction that can be formalized with adaptor grammars
Invoked to justify the HAG architecture as a model of cognition.
domain assumption Rate-distortion theory provides the correct formalization of bounded cognitive resources for abstraction
Used to jointly constrain local and global libraries.

pith-pipeline@v0.9.1-grok · 5704 in / 1549 out tokens · 45307 ms · 2026-06-29T17:42:30.307917+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

81 extracted references · 8 canonical work pages · 1 internal anchor

[1]

S., Tenenbaum, J

Rule, J. S., Tenenbaum, J. B. & Piantadosi, S. T. The child as hacker.Trends Cogn. Sci.24, 900–915 (2020)

2020
[2]

& Sablé-Meyer, M

Dehaene, S., Al Roumi, F., Lakretz, Y ., Planton, S. & Sablé-Meyer, M. Symbols and mental programs: a hy- pothesis about human singularity.Trends Cogn. Sci.26, 751–766 (2022)

2022
[3]

B., Kemp, C., Griffiths, T

Tenenbaum, J. B., Kemp, C., Griffiths, T. L. & Goodman, N. D. How to grow a mind: Statistics, structure, and abstraction.science331, 1279–1285 (2011)

2011
[4]

M., Salakhutdinov, R

Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. Human-level concept learning through probabilistic pro- gram induction.Science350, 1332–1338 (2015)

2015
[5]

& Gerstenberg, T

Goodman, N., Tenenbaum, J. & Gerstenberg, T. Concepts in a probabilistic language of thought (tech. rep.).Cent. for Brains, Minds Mach. (CBMM)(2014)

2014
[6]

C., Jacobs, R

Overlan, M. C., Jacobs, R. A. & Piantadosi, S. T. Learn- ing abstract visual concepts via probabilistic program induction in a language of thought.Cognition168, 320– 334 (2017)

2017
[7]

& Lake, B

Zhou, Y ., Feinman, R. & Lake, B. M. Compositional di- versity in visual concept learning.Cognition244, 105711 (2024)

2024
[8]

& Tenenbaum, J

Tian, L., Ellis, K., Kryven, M. & Tenenbaum, J. Learning abstract structure for drawing by efficient motor program induction.Adv. Neural Inf. Process. Syst.33, 2686–2697 (2020)

2020
[9]

T., Goodman, N., Ellis, B

Piantadosi, S. T., Goodman, N., Ellis, B. A. & Tenen- baum, J. A bayesian model of the acquisition of compo- sitional semantics. InProceedings of the thirtieth annual conference of the cognitive science society, 1620–1625 (Cognitive Science Society Washington DC, 2008)

2008
[10]

Piantadosi, S. T. & Jacobs, R. A. Four problems solved by the probabilistic language of thought.Curr. Dir. Psychol. Sci.25, 54–59 (2016)

2016
[11]

Ellis, K., Albright, A., Solar-Lezama, A., Tenenbaum, J. B. & O’Donnell, T. J. Synthesizing theories of hu- man language with bayesian program induction.Nat. communications13, 5024 (2022)

2022
[12]

& Fiete, I

Sharma, S., Curtis, A., Kryven, M., Tenenbaum, J. & Fiete, I. Map induction: Compositional spatial submap learning for efficient exploration in novel environments. arXiv preprint arXiv:2110.12301(2021)

work page arXiv 2021
[13]

G., Griffiths, T

Correa, C. G., Griffiths, T. L. & Daw, N. D. Program- based strategy induction for reinforcement learning.arXiv preprint arXiv:2402.16668(2024)

work page arXiv 2024
[14]

Compositional process in music

Wiggins, J. Compositional process in music. InInterna- tional handbook of research in arts education, 453–476 (Springer, 2007)

2007
[15]

Harasim, D.The learnability of the grammar of jazz: Bayesian inference of hierarchical structures in harmony. Ph.D. thesis, EPFL (2020)

2020
[16]

& Ting, C.-K

Liu, C.-H. & Ting, C.-K. Computational intelligence in music composition: A survey.IEEE Transactions on Emerg. Top. Comput. Intell.1, 2–15 (2016). 19/32

2016
[17]

Nierhaus, G.Algorithmic composition: paradigms of automated music generation(Springer, 2009)

2009
[18]

Zhao, B., Lucas, C. G. & Bramley, N. R. A model of conceptual bootstrapping in human cognition.Nat. Hum. Behav.1–12 (2023)

2023
[19]

S.et al.Symbolic metaprogram search improves learning efficiency and explains rule learning in humans

Rule, J. S.et al.Symbolic metaprogram search improves learning efficiency and explains rule learning in humans. Nat. Commun.15, 6847 (2024)

2024
[20]

A.The Language of Thought(Harvard Univer- sity Press, 1975)

Fodor, J. A.The Language of Thought(Harvard Univer- sity Press, 1975)

1975
[21]

& Griffiths, T

Lieder, F. & Griffiths, T. L. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources.Behav. brain sciences 43, e1 (2020)

2020
[22]

Simon, H. A. A behavioral model of rational choice.The quarterly journal economics99–118 (1955)

1955
[23]

Shannon, C. E. Coding theorems for a discrete source with a fidelity criterion.IRE Nat. Conv. Rec4, 1 (1959)

1959
[24]

Berger, T.Rate-distortion theory: A mathematical basis for data compression(Prentice-Hall, 1971)

1971
[25]

Gershman, S. J. The rational analysis of memory. In Oxford Handbook of Human Memory(Oxford University Press, Oxford, UK, 2021)

2021
[26]

Sims, C. R. Rate–distortion theory and human perception. Cognition152, 181–198 (2016)

2016
[27]

G., Török, B

Nagy, D. G., Török, B. & Orbán, G. Optimal forget- ting: Semantic compression of episodic memories.PLoS Comput. Biol.16, e1008367 (2020)

2020
[28]

Bates, C. J. & Jacobs, R. A. Efficient data compression in perception and perceptual memory.Psychol. Rev.127, 891 (2020)

2020
[29]

& Gershman, S

Bhui, R. & Gershman, S. J. Decision by sampling im- plements efficient coding of psychoeconomic functions. Psychol. Rev.125, 985 (2018)

2018
[30]

& Gershman, S

Lai, L. & Gershman, S. J. Policy compression: An in- formation bottleneck in action selection. InPsychology of Learning and Motivation, vol. 74, 195–232 (Elsevier, 2021)

2021
[31]

B., Otto, F

Dekker, R. B., Otto, F. & Summerfield, C. Curriculum learning for human compositional generalization.Proc. Natl. Acad. Sci.119, e2205582119 (2022)

2022
[32]

G., Orbán, G

Nagy, D. G., Orbán, G. & Wu, C. M. Adaptive compres- sion as a unifying framework for episodic and semantic memory.Nat. Rev. Psychol.1–15 (2025)

2025
[33]

Fränken, J.-P., Theodoropoulos, N. C. & Bramley, N. R. Algorithms of adaptation in inductive inference.Cogn. Psychol.137, 101506 (2022)

2022
[34]

& Lalinde-Pulido, J

Garcia-Valencia, S., Betancourt, A. & Lalinde-Pulido, J. G. Sequence generation using deep recurrent networks and embeddings: A study case in music.ArXiv e-prints abs/2012.0(2020). 2012.01231

work page arXiv 2012
[35]

Run-length encodings (corresp.).IEEE trans- actions on information theory12, 399–401 (1966)

Golomb, S. Run-length encodings (corresp.).IEEE trans- actions on information theory12, 399–401 (1966)

1966
[36]

Miller, G. A. The magical number seven, plus or minus two: Some limits on our capacity for processing informa- tion.Psychol. review63, 81 (1956)

1956
[37]

& Rabinovich, M

Fonollosa, J., Neftci, E. & Rabinovich, M. Learning of chunking sequences in cognition and behavior.PLoS computational biology11, e1004592 (2015)

2015
[38]

& Schulz, E

Wu, S., Éltet˝o, N., Dasgupta, I. & Schulz, E. Chunking as a rational solution to the speed–accuracy trade-off in a serial reaction time task.Sci. reports13, 7680 (2023)

2023
[39]

J.Productivity and reuse in language: A theory of linguistic computation and storage(MIT Press, 2015)

O’Donnell, T. J.Productivity and reuse in language: A theory of linguistic computation and storage(MIT Press, 2015)

2015
[40]

Liang, P., Jordan, M. I. & Klein, D. Learning programs: A hierarchical bayesian approach. InProceedings of the 27th international conference on machine learning (icml-10), 639–646 (2010)

2010
[41]

Shannon, C. E. & Weaver, W.The Mathematical Theory of Communication(University of Illinois Press, 1949)

1949
[42]

Zhou, H., Nagy, D. G. & Wu, C. M. Harmonizing program induction with rate-distortion theory (2024). 2405.05294

work page arXiv 2024
[43]

A.et al.Universality and diversity in human song.Science366, eaax0868 (2019)

Mehr, S. A.et al.Universality and diversity in human song.Science366, eaax0868 (2019)

2019
[44]

& Wheatley, T

Sievers, B., Polansky, L., Casey, M. & Wheatley, T. Music and movement share a dynamic structure that supports universal expressions of emotion.Proc. national academy sciences110, 70–75 (2013)

2013
[45]

Fitch, W. T. & Martins, M. D. Hierarchical processing in music, language, and action: Lashley revisited.Annals New York Acad. Sci.1316, 87–104 (2014)

2014
[46]

E., Friston, K

Rigoux, L., Stephan, K. E., Friston, K. J. & Daunizeau, J. Bayesian model selection for group studies—revisited. Neuroimage84, 971–985 (2014)

2014
[47]

& Tenenbaum, J

Kemp, C. & Tenenbaum, J. B. The discovery of structural form.Proc. Natl. Acad. Sci.105, 10687–10692 (2008)

2008
[48]

S., Schulz, E., Piantadosi, S

Rule, J. S., Schulz, E., Piantadosi, S. T. & Tenenbaum, J. B. Learning list concepts through program induction. BioRxiv321505 (2018)

2018
[49]

InPro- ceedings of the 42nd ACM SIGPLAN Conference on Pro- gramming Language Design and Implementation, 835– 850 (2021)

Ellis, K.et al.Dreamcoder: Bootstrapping inductive pro- gram synthesis with wake-sleep library learning. InPro- ceedings of the 42nd ACM SIGPLAN Conference on Pro- gramming Language Design and Implementation, 835– 850 (2021)

2021
[50]

F., Konkle, T

Brady, T. F., Konkle, T. & Alvarez, G. A. Compression in visual working memory: using statistical regularities to form more efficient memory representations.J. Exp. Psychol. Gen.138, 487 (2009). 20/32

2009
[51]

& Griffiths, T

Reali, F. & Griffiths, T. L. The evolution of frequency distributions: Relating regularization to inductive bi- ases through iterated learning.Cognition111, 317–328 (2009)

2009
[52]

& Legendre, G

Culbertson, J., Smolensky, P. & Legendre, G. Learning biases predict a word order universal.Cognition122, 306–329 (2012)

2012
[53]

& Steyvers, M

Hemmer, P. & Steyvers, M. A bayesian account of re- constructive memory.Top. cognitive science1, 189–202 (2009)

2009
[54]

Lashley, K. S.et al. The problem of serial order in behavior, vol. 21 (Bobbs-Merrill Oxford, 1951)

1951
[55]

A., Cohen, R

Rosenbaum, D. A., Cohen, R. G., Jax, S. A., Weiss, D. J. & Van Der Wel, R. The problem of serial order in be- havior: Lashley’s legacy.Hum. movement science26, 525–554 (2007)

2007
[56]

& Hikosaka, O

Sakai, K., Kitaguchi, K. & Hikosaka, O. Chunking dur- ing human visuomotor sequence learning.Exp. brain research152, 229–242 (2003)

2003
[57]

Verwey, W. B. & Dronkert, Y . Practicing a structured continuous key-pressing task: Motor chunking or rhythm consolidation?J. motor behavior28, 71–79 (1996)

1996
[58]

& Collins, A

Xia, L. & Collins, A. G. Temporal and state abstractions for efficient learning, transfer, and composition in humans. Psychol. review128, 643 (2021)

2021
[59]

E., Nerb, J., Lehtinen, E

Ritter, F. E., Nerb, J., Lehtinen, E. & O’Shea, T. M.In order to learn: How the sequence of topics influences learning(Oxford University Press, 2007)

2007
[60]

Bruner, J. S. Organization of early skilled action.Child development1–11 (1973)

1973
[61]

& Viding, E

Dayan, P., Roiser, J. & Viding, E. The first steps on long marches: The costs of active observation. InPsychiatry Reborn: Biopsychosocial psychiatry in modern medicine, 213–228 (Oxford University Press, 2020)

2020
[62]

Lerdahl, F.Tonal pitch space(Oxford University Press, 2001)

2001
[63]

London, J.Hearing in time: Psychological aspects of musical meter(Oxford University Press, 2012)

2012
[64]

Music performance.Annu

Palmer, C. Music performance.Annu. review psychology 48, 115–138 (1997)

1997
[65]

Repp, B. H. Quantitative effects of global tempo on ex- pressive timing in music performance: Some perceptual evidence.Music. perception13, 39–57 (1995)

1995
[66]

Pearce, M. T. Statistical learning and probabilistic pre- diction in music cognition: mechanisms of stylistic en- culturation.Annals new York Acad. Sci.1423, 378–395 (2018)

2018
[67]

Zhou, H., Bamler, R., Wu, C. M. & Tejero-Cantero, Á. Predictive, scalable and interpretable knowledge tracing on structured domains. InThe Twelfth Interna- tional Conference on Learning Representations, DOI: 10.48550/arXiv.2403.13179 (2024)

work page doi:10.48550/arxiv.2403.13179 2024
[68]

Bowers, M.et al.Top-down synthesis for library learning. Proc. ACM on Program. Lang.7, 1182–1213 (2023)

2023
[69]

Rubino, V ., Dayan, P. & Wu, C. M. Simplicity guides the discovery and use of compositionality.PsyArXivDOI: 10.31234/osf.io/25pha_v1 (2026)

work page doi:10.31234/osf.io/25pha_v1 2026
[70]

& Goldwater, S

Johnson, M., Griffiths, T. & Goldwater, S. Adaptor gram- mars: A framework for specifying compositional non- parametric bayesian models.Adv. Neural Inf. Process. Syst.19(2006)

2006
[71]

Teh, Y . W. A hierarchical bayesian language model based on pitman-yor processes. InProceedings of the 21st Inter- national Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computa- tional Linguistics, 985–992 (2006)

2006
[72]

D., Tenenbaum, J

Goodman, N. D., Tenenbaum, J. B., Feldman, J. & Grif- fiths, T. L. A rational analysis of rule-based concept learning.Cogn. Sci.32, 108–154 (2008)

2008
[73]

T., Tenenbaum, J

Piantadosi, S. T., Tenenbaum, J. B. & Goodman, N. D. Bootstrapping in a language of thought: A formal model of numerical concept learning.Cognition123, 199–217 (2012)

2012
[74]

G.et al.Exploring the hierarchical structure of human plans via program generation.arXiv preprint arXiv:2311.18644(2023)

Correa, C. G.et al.Exploring the hierarchical structure of human plans via program generation.arXiv preprint arXiv:2311.18644(2023)

work page arXiv 2023
[75]

Über die Bausteine der mathematis- chen Logik.Math

Schönfinkel, M. Über die Bausteine der mathematis- chen Logik.Math. Annalen92, 305–316, DOI: 10.1007/ BF01448013 (1924)

1924
[76]

D., Vitányi, P.et al.Algorithmic informa- tion theory (2008)

Grünwald, P. D., Vitányi, P.et al.Algorithmic informa- tion theory (2008)

2008
[77]

D.The Minimum Description Length Prin- ciple(MIT press, 2007)

Grünwald, P. D.The Minimum Description Length Prin- ciple(MIT press, 2007)

2007
[78]

New efficient algorithms for multiple change-point detection with kernels

Celisse, A., Marot, G., Pierre-Jean, M. & Rigaill, G. New efficient algorithms for multiple change-point detection with kernels.arXiv preprint arXiv:1710.04556(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[79]

& Blei, D

Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A. & Blei, D. M. Automatic differentiation variational infer- ence.J. machine learning research18, 1–45 (2017)

2017
[80]

The magical mystery four: How is working memory capacity limited, and why?Curr

Cowan, N. The magical mystery four: How is working memory capacity limited, and why?Curr. directions psychological science19, 51–57 (2010)

2010

Showing first 80 references.

[1] [1]

S., Tenenbaum, J

Rule, J. S., Tenenbaum, J. B. & Piantadosi, S. T. The child as hacker.Trends Cogn. Sci.24, 900–915 (2020)

2020

[2] [2]

& Sablé-Meyer, M

Dehaene, S., Al Roumi, F., Lakretz, Y ., Planton, S. & Sablé-Meyer, M. Symbols and mental programs: a hy- pothesis about human singularity.Trends Cogn. Sci.26, 751–766 (2022)

2022

[3] [3]

B., Kemp, C., Griffiths, T

Tenenbaum, J. B., Kemp, C., Griffiths, T. L. & Goodman, N. D. How to grow a mind: Statistics, structure, and abstraction.science331, 1279–1285 (2011)

2011

[4] [4]

M., Salakhutdinov, R

Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. Human-level concept learning through probabilistic pro- gram induction.Science350, 1332–1338 (2015)

2015

[5] [5]

& Gerstenberg, T

Goodman, N., Tenenbaum, J. & Gerstenberg, T. Concepts in a probabilistic language of thought (tech. rep.).Cent. for Brains, Minds Mach. (CBMM)(2014)

2014

[6] [6]

C., Jacobs, R

Overlan, M. C., Jacobs, R. A. & Piantadosi, S. T. Learn- ing abstract visual concepts via probabilistic program induction in a language of thought.Cognition168, 320– 334 (2017)

2017

[7] [7]

& Lake, B

Zhou, Y ., Feinman, R. & Lake, B. M. Compositional di- versity in visual concept learning.Cognition244, 105711 (2024)

2024

[8] [8]

& Tenenbaum, J

Tian, L., Ellis, K., Kryven, M. & Tenenbaum, J. Learning abstract structure for drawing by efficient motor program induction.Adv. Neural Inf. Process. Syst.33, 2686–2697 (2020)

2020

[9] [9]

T., Goodman, N., Ellis, B

Piantadosi, S. T., Goodman, N., Ellis, B. A. & Tenen- baum, J. A bayesian model of the acquisition of compo- sitional semantics. InProceedings of the thirtieth annual conference of the cognitive science society, 1620–1625 (Cognitive Science Society Washington DC, 2008)

2008

[10] [10]

Piantadosi, S. T. & Jacobs, R. A. Four problems solved by the probabilistic language of thought.Curr. Dir. Psychol. Sci.25, 54–59 (2016)

2016

[11] [11]

Ellis, K., Albright, A., Solar-Lezama, A., Tenenbaum, J. B. & O’Donnell, T. J. Synthesizing theories of hu- man language with bayesian program induction.Nat. communications13, 5024 (2022)

2022

[12] [12]

& Fiete, I

Sharma, S., Curtis, A., Kryven, M., Tenenbaum, J. & Fiete, I. Map induction: Compositional spatial submap learning for efficient exploration in novel environments. arXiv preprint arXiv:2110.12301(2021)

work page arXiv 2021

[13] [13]

G., Griffiths, T

Correa, C. G., Griffiths, T. L. & Daw, N. D. Program- based strategy induction for reinforcement learning.arXiv preprint arXiv:2402.16668(2024)

work page arXiv 2024

[14] [14]

Compositional process in music

Wiggins, J. Compositional process in music. InInterna- tional handbook of research in arts education, 453–476 (Springer, 2007)

2007

[15] [15]

Harasim, D.The learnability of the grammar of jazz: Bayesian inference of hierarchical structures in harmony. Ph.D. thesis, EPFL (2020)

2020

[16] [16]

& Ting, C.-K

Liu, C.-H. & Ting, C.-K. Computational intelligence in music composition: A survey.IEEE Transactions on Emerg. Top. Comput. Intell.1, 2–15 (2016). 19/32

2016

[17] [17]

Nierhaus, G.Algorithmic composition: paradigms of automated music generation(Springer, 2009)

2009

[18] [18]

Zhao, B., Lucas, C. G. & Bramley, N. R. A model of conceptual bootstrapping in human cognition.Nat. Hum. Behav.1–12 (2023)

2023

[19] [19]

S.et al.Symbolic metaprogram search improves learning efficiency and explains rule learning in humans

Rule, J. S.et al.Symbolic metaprogram search improves learning efficiency and explains rule learning in humans. Nat. Commun.15, 6847 (2024)

2024

[20] [20]

A.The Language of Thought(Harvard Univer- sity Press, 1975)

Fodor, J. A.The Language of Thought(Harvard Univer- sity Press, 1975)

1975

[21] [21]

& Griffiths, T

Lieder, F. & Griffiths, T. L. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources.Behav. brain sciences 43, e1 (2020)

2020

[22] [22]

Simon, H. A. A behavioral model of rational choice.The quarterly journal economics99–118 (1955)

1955

[23] [23]

Shannon, C. E. Coding theorems for a discrete source with a fidelity criterion.IRE Nat. Conv. Rec4, 1 (1959)

1959

[24] [24]

Berger, T.Rate-distortion theory: A mathematical basis for data compression(Prentice-Hall, 1971)

1971

[25] [25]

Gershman, S. J. The rational analysis of memory. In Oxford Handbook of Human Memory(Oxford University Press, Oxford, UK, 2021)

2021

[26] [26]

Sims, C. R. Rate–distortion theory and human perception. Cognition152, 181–198 (2016)

2016

[27] [27]

G., Török, B

Nagy, D. G., Török, B. & Orbán, G. Optimal forget- ting: Semantic compression of episodic memories.PLoS Comput. Biol.16, e1008367 (2020)

2020

[28] [28]

Bates, C. J. & Jacobs, R. A. Efficient data compression in perception and perceptual memory.Psychol. Rev.127, 891 (2020)

2020

[29] [29]

& Gershman, S

Bhui, R. & Gershman, S. J. Decision by sampling im- plements efficient coding of psychoeconomic functions. Psychol. Rev.125, 985 (2018)

2018

[30] [30]

& Gershman, S

Lai, L. & Gershman, S. J. Policy compression: An in- formation bottleneck in action selection. InPsychology of Learning and Motivation, vol. 74, 195–232 (Elsevier, 2021)

2021

[31] [31]

B., Otto, F

Dekker, R. B., Otto, F. & Summerfield, C. Curriculum learning for human compositional generalization.Proc. Natl. Acad. Sci.119, e2205582119 (2022)

2022

[32] [32]

G., Orbán, G

Nagy, D. G., Orbán, G. & Wu, C. M. Adaptive compres- sion as a unifying framework for episodic and semantic memory.Nat. Rev. Psychol.1–15 (2025)

2025

[33] [33]

Fränken, J.-P., Theodoropoulos, N. C. & Bramley, N. R. Algorithms of adaptation in inductive inference.Cogn. Psychol.137, 101506 (2022)

2022

[34] [34]

& Lalinde-Pulido, J

Garcia-Valencia, S., Betancourt, A. & Lalinde-Pulido, J. G. Sequence generation using deep recurrent networks and embeddings: A study case in music.ArXiv e-prints abs/2012.0(2020). 2012.01231

work page arXiv 2012

[35] [35]

Run-length encodings (corresp.).IEEE trans- actions on information theory12, 399–401 (1966)

Golomb, S. Run-length encodings (corresp.).IEEE trans- actions on information theory12, 399–401 (1966)

1966

[36] [36]

Miller, G. A. The magical number seven, plus or minus two: Some limits on our capacity for processing informa- tion.Psychol. review63, 81 (1956)

1956

[37] [37]

& Rabinovich, M

Fonollosa, J., Neftci, E. & Rabinovich, M. Learning of chunking sequences in cognition and behavior.PLoS computational biology11, e1004592 (2015)

2015

[38] [38]

& Schulz, E

Wu, S., Éltet˝o, N., Dasgupta, I. & Schulz, E. Chunking as a rational solution to the speed–accuracy trade-off in a serial reaction time task.Sci. reports13, 7680 (2023)

2023

[39] [39]

J.Productivity and reuse in language: A theory of linguistic computation and storage(MIT Press, 2015)

O’Donnell, T. J.Productivity and reuse in language: A theory of linguistic computation and storage(MIT Press, 2015)

2015

[40] [40]

Liang, P., Jordan, M. I. & Klein, D. Learning programs: A hierarchical bayesian approach. InProceedings of the 27th international conference on machine learning (icml-10), 639–646 (2010)

2010

[41] [41]

Shannon, C. E. & Weaver, W.The Mathematical Theory of Communication(University of Illinois Press, 1949)

1949

[42] [42]

Zhou, H., Nagy, D. G. & Wu, C. M. Harmonizing program induction with rate-distortion theory (2024). 2405.05294

work page arXiv 2024

[43] [43]

A.et al.Universality and diversity in human song.Science366, eaax0868 (2019)

Mehr, S. A.et al.Universality and diversity in human song.Science366, eaax0868 (2019)

2019

[44] [44]

& Wheatley, T

Sievers, B., Polansky, L., Casey, M. & Wheatley, T. Music and movement share a dynamic structure that supports universal expressions of emotion.Proc. national academy sciences110, 70–75 (2013)

2013

[45] [45]

Fitch, W. T. & Martins, M. D. Hierarchical processing in music, language, and action: Lashley revisited.Annals New York Acad. Sci.1316, 87–104 (2014)

2014

[46] [46]

E., Friston, K

Rigoux, L., Stephan, K. E., Friston, K. J. & Daunizeau, J. Bayesian model selection for group studies—revisited. Neuroimage84, 971–985 (2014)

2014

[47] [47]

& Tenenbaum, J

Kemp, C. & Tenenbaum, J. B. The discovery of structural form.Proc. Natl. Acad. Sci.105, 10687–10692 (2008)

2008

[48] [48]

S., Schulz, E., Piantadosi, S

Rule, J. S., Schulz, E., Piantadosi, S. T. & Tenenbaum, J. B. Learning list concepts through program induction. BioRxiv321505 (2018)

2018

[49] [49]

InPro- ceedings of the 42nd ACM SIGPLAN Conference on Pro- gramming Language Design and Implementation, 835– 850 (2021)

Ellis, K.et al.Dreamcoder: Bootstrapping inductive pro- gram synthesis with wake-sleep library learning. InPro- ceedings of the 42nd ACM SIGPLAN Conference on Pro- gramming Language Design and Implementation, 835– 850 (2021)

2021

[50] [50]

F., Konkle, T

Brady, T. F., Konkle, T. & Alvarez, G. A. Compression in visual working memory: using statistical regularities to form more efficient memory representations.J. Exp. Psychol. Gen.138, 487 (2009). 20/32

2009

[51] [51]

& Griffiths, T

Reali, F. & Griffiths, T. L. The evolution of frequency distributions: Relating regularization to inductive bi- ases through iterated learning.Cognition111, 317–328 (2009)

2009

[52] [52]

& Legendre, G

Culbertson, J., Smolensky, P. & Legendre, G. Learning biases predict a word order universal.Cognition122, 306–329 (2012)

2012

[53] [53]

& Steyvers, M

Hemmer, P. & Steyvers, M. A bayesian account of re- constructive memory.Top. cognitive science1, 189–202 (2009)

2009

[54] [54]

Lashley, K. S.et al. The problem of serial order in behavior, vol. 21 (Bobbs-Merrill Oxford, 1951)

1951

[55] [55]

A., Cohen, R

Rosenbaum, D. A., Cohen, R. G., Jax, S. A., Weiss, D. J. & Van Der Wel, R. The problem of serial order in be- havior: Lashley’s legacy.Hum. movement science26, 525–554 (2007)

2007

[56] [56]

& Hikosaka, O

Sakai, K., Kitaguchi, K. & Hikosaka, O. Chunking dur- ing human visuomotor sequence learning.Exp. brain research152, 229–242 (2003)

2003

[57] [57]

Verwey, W. B. & Dronkert, Y . Practicing a structured continuous key-pressing task: Motor chunking or rhythm consolidation?J. motor behavior28, 71–79 (1996)

1996

[58] [58]

& Collins, A

Xia, L. & Collins, A. G. Temporal and state abstractions for efficient learning, transfer, and composition in humans. Psychol. review128, 643 (2021)

2021

[59] [59]

E., Nerb, J., Lehtinen, E

Ritter, F. E., Nerb, J., Lehtinen, E. & O’Shea, T. M.In order to learn: How the sequence of topics influences learning(Oxford University Press, 2007)

2007

[60] [60]

Bruner, J. S. Organization of early skilled action.Child development1–11 (1973)

1973

[61] [61]

& Viding, E

Dayan, P., Roiser, J. & Viding, E. The first steps on long marches: The costs of active observation. InPsychiatry Reborn: Biopsychosocial psychiatry in modern medicine, 213–228 (Oxford University Press, 2020)

2020

[62] [62]

Lerdahl, F.Tonal pitch space(Oxford University Press, 2001)

2001

[63] [63]

London, J.Hearing in time: Psychological aspects of musical meter(Oxford University Press, 2012)

2012

[64] [64]

Music performance.Annu

Palmer, C. Music performance.Annu. review psychology 48, 115–138 (1997)

1997

[65] [65]

Repp, B. H. Quantitative effects of global tempo on ex- pressive timing in music performance: Some perceptual evidence.Music. perception13, 39–57 (1995)

1995

[66] [66]

Pearce, M. T. Statistical learning and probabilistic pre- diction in music cognition: mechanisms of stylistic en- culturation.Annals new York Acad. Sci.1423, 378–395 (2018)

2018

[67] [67]

Zhou, H., Bamler, R., Wu, C. M. & Tejero-Cantero, Á. Predictive, scalable and interpretable knowledge tracing on structured domains. InThe Twelfth Interna- tional Conference on Learning Representations, DOI: 10.48550/arXiv.2403.13179 (2024)

work page doi:10.48550/arxiv.2403.13179 2024

[68] [68]

Bowers, M.et al.Top-down synthesis for library learning. Proc. ACM on Program. Lang.7, 1182–1213 (2023)

2023

[69] [69]

Rubino, V ., Dayan, P. & Wu, C. M. Simplicity guides the discovery and use of compositionality.PsyArXivDOI: 10.31234/osf.io/25pha_v1 (2026)

work page doi:10.31234/osf.io/25pha_v1 2026

[70] [70]

& Goldwater, S

Johnson, M., Griffiths, T. & Goldwater, S. Adaptor gram- mars: A framework for specifying compositional non- parametric bayesian models.Adv. Neural Inf. Process. Syst.19(2006)

2006

[71] [71]

Teh, Y . W. A hierarchical bayesian language model based on pitman-yor processes. InProceedings of the 21st Inter- national Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computa- tional Linguistics, 985–992 (2006)

2006

[72] [72]

D., Tenenbaum, J

Goodman, N. D., Tenenbaum, J. B., Feldman, J. & Grif- fiths, T. L. A rational analysis of rule-based concept learning.Cogn. Sci.32, 108–154 (2008)

2008

[73] [73]

T., Tenenbaum, J

Piantadosi, S. T., Tenenbaum, J. B. & Goodman, N. D. Bootstrapping in a language of thought: A formal model of numerical concept learning.Cognition123, 199–217 (2012)

2012

[74] [74]

G.et al.Exploring the hierarchical structure of human plans via program generation.arXiv preprint arXiv:2311.18644(2023)

Correa, C. G.et al.Exploring the hierarchical structure of human plans via program generation.arXiv preprint arXiv:2311.18644(2023)

work page arXiv 2023

[75] [75]

Über die Bausteine der mathematis- chen Logik.Math

Schönfinkel, M. Über die Bausteine der mathematis- chen Logik.Math. Annalen92, 305–316, DOI: 10.1007/ BF01448013 (1924)

1924

[76] [76]

D., Vitányi, P.et al.Algorithmic informa- tion theory (2008)

Grünwald, P. D., Vitányi, P.et al.Algorithmic informa- tion theory (2008)

2008

[77] [77]

D.The Minimum Description Length Prin- ciple(MIT press, 2007)

Grünwald, P. D.The Minimum Description Length Prin- ciple(MIT press, 2007)

2007

[78] [78]

New efficient algorithms for multiple change-point detection with kernels

Celisse, A., Marot, G., Pierre-Jean, M. & Rigaill, G. New efficient algorithms for multiple change-point detection with kernels.arXiv preprint arXiv:1710.04556(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[79] [79]

& Blei, D

Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A. & Blei, D. M. Automatic differentiation variational infer- ence.J. machine learning research18, 1–45 (2017)

2017

[80] [80]

The magical mystery four: How is working memory capacity limited, and why?Curr

Cowan, N. The magical mystery four: How is working memory capacity limited, and why?Curr. directions psychological science19, 51–57 (2010)

2010