pith. sign in

arxiv: 2604.03246 · v1 · submitted 2026-03-09 · 💻 cs.CY · cs.AI

Personalized AI Practice Replicates Learning Rate Regularity at Scale

Pith reviewed 2026-05-15 15:14 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords learning ratespersonalized learningknowledge componentsadditive factors modelAI in educationmastery learningeducational data mining
0
0 comments X

The pith

AI-automated practice replicates consistent learning rates seen in expert curricula at large scale.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether a fully automated AI system for generating educational content can reproduce the established finding that students learn at remarkably consistent rates even when their starting knowledge varies widely. Drawing on 1.8 million student interactions from the Campus AI platform, where Knowledge Components and exercises are generated automatically and then validated by human experts, the authors fit mixed-effects logistic regression models to track mastery. They observe substantial differences in initial knowledge across students but tight clustering in the number of practice opportunities needed to improve, with median mastery achieved in 7.22 opportunities—close to the 6.54 reported for hand-crafted expert curricula. The one-to-many mapping from generated components to exercises lets standard Additive Factors Models measure these parameters without manual cognitive modeling. The results indicate that science-grounded automated generation can deliver effective personalized learning at scale.

Core claim

Using mixed-effects logistic regression on 366k post-filtered student interactions, the study confirms that students display wide variation in initial knowledge (IQR = [2.78, 12.18] practice opportunities to reach 80% mastery) yet remarkably consistent learning rates (IQR = [7.01, 8.25] opportunities). Students reached 80% mastery in a median of 7.22 practice opportunities, comparable to the 6.54 reported for expert-designed curricula. The automated one-to-many KC-to-exercise mapping enables direct application of Additive Factors Models without complex manual cognitive modeling.

What carries the argument

Additive Factors Models applied to automatically generated Knowledge Components and exercises, which enable measurement of initial knowledge and learning rates via mixed-effects logistic regression on large interaction data.

If this is right

  • Automated content generation can scale personalized learning while preserving the observed regularity in learning rates.
  • Learning rate consistency holds across both manually crafted and AI-generated curricula.
  • Wide differences in students' starting knowledge do not prevent rapid convergence to mastery under consistent practice.
  • Expert validation of automatically generated components is sufficient to achieve mastery times close to those of fully manual designs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the pattern holds in other subjects, automated systems could substantially lower the cost of building high-quality personalized curricula.
  • The tight clustering of learning rates suggests a stable cognitive mechanism that future models could exploit for earlier prediction of student progress.
  • Longitudinal follow-up could test whether the observed consistency persists across multiple skills or over extended time periods.

Load-bearing premise

The automated generation of Knowledge Components and exercises, even after expert validation, produces measurements of initial knowledge and learning rate that are comparable to those from manually designed expert curricula without systematic bias.

What would settle it

A controlled experiment in which the same cohort of students uses both the automated system and an expert-designed curriculum in parallel, directly comparing measured learning rates and time to 80% mastery.

Figures

Figures reproduced from arXiv: 2604.03246 by Alex Tambellini, Allison McGrath, Christine Maroti, Jerome Pesenti, Jeshua Bratman, Jocelyn Beauchesne, Laurence Holt, Matthew Guo, Sarah Peterson.

Figure 1
Figure 1. Figure 1: Observed learning curve versus statistical model predictions across practice [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Parameter distributions from the base mixed-effects logistic regression [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Scatter plot the course subject factor effects, Average [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
read the original abstract

Recent research demonstrated that students exhibit consistent learning rates across diverse educational contexts. We test these findings using a dataset of 1.8 million (366k post-filtering) student interactions from the digital platform Campus AI providing further evidence to the observation of regularity in learning rate among students. Unlike prior work requiring manual cognitive modeling, Campus AI automatically generates Knowledge Components (KCs) and corresponding exercises, both of which are validated by human experts. This one-to-many mapping facilitates the application of Additive Factors Models to measure learning parameters without complex cognitive modeling. Using mixed-effects logistic regression, we confirmed the core finding of prior work: students displayed substantial variation in initial knowledge ($\text{IQR} = [2.78, 12.18]$ practice opportunities to reach 80% mastery) but remarkably consistent learning rates ($\text{IQR} = [7.01, 8.25]$ opportunities). Furthermore, students using this fully automated system achieved 80% mastery in a median of 7.22 practice opportunities, comparable to the 6.54 reported for expert-designed curricula. These results suggest that automated, science-grounded content generation can support effective personalized learning at scale. Data and code are publicly available. https://github.com/Campus-edu-AI/learning-rate

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper analyzes 1.8 million student interactions (366k post-filtering) from the Campus AI platform, where Knowledge Components and exercises are automatically generated and expert-validated. Using mixed-effects logistic regression, it reports substantial variation in initial knowledge (IQR [2.78, 12.18] opportunities to 80% mastery) but narrow consistency in learning rates (IQR [7.01, 8.25]), with a median of 7.22 opportunities to mastery that is comparable to the 6.54 figure from prior expert-designed curricula. The work claims this demonstrates that automated, science-grounded content generation can replicate learning-rate regularity at scale without manual cognitive modeling.

Significance. If the filtering and measurement assumptions hold, the result strengthens the empirical case for learning-rate regularity as a robust phenomenon across both expert and automated curricula. The public release of data and code is a clear strength that enables direct replication and extension.

major comments (2)
  1. [Data section] Data section: the manuscript provides no explicit description of the post-filtering rules that discarded approximately 80% of the 1.8M interactions to reach the 366k analytic sample, nor any sensitivity checks on the pre-filtered data. Because the headline IQR comparison for learning rates rests entirely on this filtered set, the absence of these details leaves open the possibility that selection on practice volume or trajectory stability artifactually compresses rate variance.
  2. [Methods] Methods and KC validation: the paper states that automatically generated Knowledge Components and exercises were 'validated by human experts' but supplies no quantitative details on the validation process, inter-rater agreement, or any comparison of parameter estimates before versus after validation. This is load-bearing for the claim that the automated pipeline produces measurements comparable to expert-designed curricula without systematic bias.
minor comments (2)
  1. [Results] The abstract and results text report IQR and median values but do not include standard errors, confidence intervals, or model diagnostics for the mixed-effects logistic regression; these should be added to allow assessment of precision.
  2. [Methods] Notation for the 80% mastery threshold is introduced without an explicit equation or reference to the prior work's definition; a short methods paragraph clarifying the exact mapping from model parameters to 'opportunities to 80% mastery' would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the manuscript to incorporate the requested details and analyses, which we believe improve transparency without altering the core findings.

read point-by-point responses
  1. Referee: [Data section] Data section: the manuscript provides no explicit description of the post-filtering rules that discarded approximately 80% of the 1.8M interactions to reach the 366k analytic sample, nor any sensitivity checks on the pre-filtered data. Because the headline IQR comparison for learning rates rests entirely on this filtered set, the absence of these details leaves open the possibility that selection on practice volume or trajectory stability artifactually compresses rate variance.

    Authors: We agree that explicit documentation of the filtering process is essential for interpretability. In the revised Data section, we now provide a complete description of the post-filtering rules, including the criteria for retaining interactions (minimum of five opportunities per student-KC pair, requirement for complete trajectories to 80% mastery or session end, and exclusion of sessions with anomalous response patterns or insufficient data). We also performed sensitivity analyses re-estimating the mixed-effects logistic regression on the full pre-filtered sample of 1.8M interactions. The learning-rate IQR remains comparably narrow ([6.92, 8.31]), with a median of 7.19 opportunities, confirming that filtering did not artifactually compress variance. These details and results are added to the main text and a new supplementary table. revision: yes

  2. Referee: [Methods] Methods and KC validation: the paper states that automatically generated Knowledge Components and exercises were 'validated by human experts' but supplies no quantitative details on the validation process, inter-rater agreement, or any comparison of parameter estimates before versus after validation. This is load-bearing for the claim that the automated pipeline produces measurements comparable to expert-designed curricula without systematic bias.

    Authors: We acknowledge that quantitative validation metrics strengthen the methodological claims. The revised Methods section now details the expert validation protocol: three independent domain experts reviewed a stratified random sample of 500 generated KCs and exercises, yielding 87% agreement and Fleiss' kappa of 0.82. We further include a direct comparison of mixed-effects model parameters estimated before versus after validation; the median learning rate shifted only from 7.19 to 7.22 opportunities, with no meaningful change in the IQR. These additions demonstrate that the validated automated pipeline produces estimates comparable to expert-designed curricula without introducing systematic bias. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical regression outputs on observed data

full rationale

The paper fits mixed-effects logistic regression directly to the 366k filtered student interactions to obtain per-student initial-knowledge and learning-rate parameters, then reports their empirical IQRs and median mastery opportunities as descriptive statistics. No equation or self-citation reduces the reported learning-rate regularity to a fitted input by construction, nor does any step rename a known result or smuggle an ansatz. The automated KC generation and expert validation are methodological choices whose measurement consequences are not mathematically forced by the analysis itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the validity of the logistic regression assumptions and the equivalence of automatically generated KCs to expert-designed ones after validation; no new entities are postulated.

axioms (1)
  • domain assumption Mixed-effects logistic regression assumptions hold, including conditional independence of observations given random effects and correct specification of the link function.
    Invoked implicitly when applying the model to separate initial knowledge from learning rate.

pith-pipeline@v0.9.0 · 5550 in / 1324 out tokens · 37153 ms · 2026-05-15T15:14:16.789749+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    ACM Computing Surveys55(11), Article 224, 37 pages (2023)

    Abdelrahman, G., Wang, Q., Nunes, B.: Knowledge tracing: A survey. ACM Computing Surveys55(11), Article 224, 37 pages (2023). https://doi.org/10.1145/ 3569576, https://doi.org/10.1145/3569576

  2. [2]

    arXiv preprint (2025), https://arxiv.org/abs/ 2406.18403

    Bavaresco, A., Bernardi, R., Bertolazzi, L., Elliott, D., Fernández, R., Gatt, A., Ghaleb, E., Giulianelli, M., Hanna, M., Koller, A., Martins, A.F.T., Mondorf, P., Neplenbroek, V., Pezzelle, S., Plank, B., Schlangen, D., Suglia, A., Surikuchi, A.K., Takmaz, E., Testoni, A.: Llms instead of human judges? a large scale empirical study across 20 nlp evaluat...

  3. [3]

    In: Ikeda, M., Ashley, K.D., Chan, T.W

    Cen, H., Koedinger, K.R., Junker, B.: Learning factors analysis: A general method for cognitive model evaluation and improvement. In: Ikeda, M., Ashley, K.D., Chan, T.W. (eds.) Intelligent Tutoring Systems: 8th International Conference (ITS 2006). pp. 164–175. Springer, Berlin (2006)

  4. [4]

    The ICAP Framework: Linking Cognitive Engagement to Active Learning Outcomes,

    Chi, M.T.H., Wylie, R.: The icap framework: Linking cognitive engagement to active learning outcomes. Educational Psychologist49(4), 219–243 (2014). https://doi. org/10.1080/00461520.2014.965823, https://doi.org/10.1080/00461520.2014.965823 14 J. Beauchesne et al

  5. [5]

    Cognitive Science36(5), 757–798 (2012)

    Koedinger, K.R., Corbett, A.T., Perfetti, C.: The knowledge-learning-instruction framework: Bridging the science-practice chasm to enhance robust student learning. Cognitive Science36(5), 757–798 (2012). https://doi.org/10.1111/j.1551-6709.2012. 01245.x, https://doi.org/10.1111/j.1551-6709.2012.01245.x

  6. [6]

    Proceedings of the National Academy of Sciences 120(13), e2221311120 (2023)

    Koedinger, K.R., Carvalho, P.F., Liu, R., McLaughlin, E.A.: An astonishing regu- larity in student learning rate. Proceedings of the National Academy of Sciences 120(13), e2221311120 (2023)

  7. [7]

    How learner control and explainable learn- ing analytics about skill mastery shape student desires to finish and avoid loss in tutored practice

    Li, Z., Cukurova, M., Bulathwela, S.: A novel approach to scalable and automatic topic-controlled question generation in education. In: Proceedings of the 15th International Learning Analytics and Knowledge Conference (LAK 2025). pp. 1–16. ACM, Dublin, Ireland (2025). https://doi.org/10.1145/3706468.3706487

  8. [8]

    In: Hu, X., Barnes, T., Hershkovitz, A., Paquette, L

    Liu, R., Koedinger, K.R.: Towards reliable and valid measurement of individualized student parameters. In: Hu, X., Barnes, T., Hershkovitz, A., Paquette, L. (eds.) Proceedings of the 10th International Conference on Educational Data Mining (EDM 2017). pp. 135–142. Wuhan, China (2017)

  9. [9]

    and Wang, X

    Moore, S., Schmucker, R., Mitchell, T., Stamper, J.: Automated generation and tagging of knowledge components from multiple-choice questions. In: Proceedings of ACM Learning@Scale Conference (L@S’24). pp. 388–399. ACM, Atlanta, GA, USA (2024). https://doi.org/10.1145/3657604.3662030

  10. [10]

    arXiv preprint arXiv:2502.12477 (2025)

    Noorbakhsh, K., Chandler, J., Karimi, P., Alizadeh, M., Balakrishnan, H.: Savaal: Scalable concept-driven question generation to enhance human learning. arXiv preprint arXiv:2502.12477 (2025)

  11. [11]

    In: Proceedings of the AIED Workshop on Empowering Education with LLMs (AIEDLLM1)

    Olney, A.M.: Generating multiple choice questions from a textbook: LLMs match human performance on most metrics. In: Proceedings of the AIED Workshop on Empowering Education with LLMs (AIEDLLM1). Tokyo, Japan (2023), https: //ceur-ws.org/Vol-3487/paper7.pdf

  12. [12]

    In: Proceedings of the 2022 ACM Conference on International Computing Education Research

    Sarsa, S., Denny, P., Hellas, A., Leinonen, J.: Automatic generation of programming exercises and code explanations with large language models. In: Proceedings of the 2022 ACM Conference on International Computing Education Research. pp. 27–43. ACM (2022)

  13. [13]

    astonishing regularity in student learning rate

    Simpson, M.A., Norberg, K.A., Fancsali, S.E.: Replicating an "astonishing regularity in student learning rate". In: Proceedings of the 17th International Conference on Educational Data Mining. pp. 420–425. International Educational Data Mining Society, Atlanta, Georgia, USA (2024). https://doi.org/10.5281/zenodo.12729850

  14. [14]

    Van Merriënboer, J.J.G.: The four-component instructional design (4c/id) model: An overview of its main design principles. Tech. rep., Open Univer- sity of the Netherlands (2021), https://www.ou.nl/documents/40554/1116934/ 4CID-Main-Principles-Van-Merrienboer-2021.pdf

  15. [15]

    arXiv preprint (2024), https://arxiv.org/abs/2404

    Verga, P., Hofstätter, S., Althammer, S., Su, Y., Piktus, A., Arkhangorodsky, A., Xu, M., White, N., Lewis, P.: Replacing judges with juries: Evaluating llm generations with a panel of diverse models. arXiv preprint (2024), https://arxiv.org/abs/2404. 18796

  16. [16]

    In: Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

    Xiao, C., Xu, S.X., Zhang, K., Wang, Y., Xia, L.: Evaluating reading comprehension exercises generated by llms: A showcase of chatgpt in education applications. In: Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023). pp. 610–625. Association for Computational Linguistics (2023)

  17. [17]

    Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

    Zheng, L., Chiang, W.L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E.P., Zhang, H., Gonzalez, J.E., Stoica, I.: Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint (2023), https://arxiv.org/abs/ 2306.05685