From Heuristics to Analytics: Forecasting Effort and Progress in Online Learning
Pith reviewed 2026-05-14 20:27 UTC · model grok-4.3
The pith
Feature-based models forecast weekly student effort and progress in tutoring systems with 22-33 percent lower error than heuristic rules.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using interaction logs from 425 middle-school students across a school year, feature-based predictors reduce mean absolute error by 22-33 percent compared with percentile-based heuristic baselines when forecasting weekly minutes practiced and new skills mastered. The models track individual practice trajectories more closely than fixed rules, with effort forecasts driven chiefly by recent activity features and progress forecasts depending more on learner-state and content difficulty signals. In a case study, eight college tutors reasoned about effort versus progress goals in ways that aligned with these target-specific feature patterns.
What carries the argument
Supervised machine learning models that use interaction-log features to predict two weekly targets, benchmarked against fixed-percentile heuristic rules adapted from prior behavioral work.
Load-bearing premise
The 425-student log dataset and selected features capture patterns representative enough for the models to generalize to new students and new weeks without large distribution shifts.
What would settle it
A fresh cohort of student logs in which the feature-based models show no reduction in mean absolute error, or a reduction below 15 percent, relative to the same percentile heuristics.
Figures
read the original abstract
Sustained effort is essential for realizing the benefits of intelligent tutoring systems (ITS), yet many learners disengage or underuse available practice time. We introduce engagement forecasting as a supervised prediction task based on ITS logs, targeting two outcomes central to effort and learning progress: minutes practiced per week and new skills mastered per week. Using interaction log data from 425 middle-school students over a school year, we benchmark fifteen predictors including regressions, decision trees, and neural networks. We show that these feature-based models reduce mean absolute error (MAE) by 22-33% relative to heuristic baselines, including fixed-percentile rules adapted from prior work in other behavioral domains. We find that percentile heuristics systematically overpredict, whereas feature-based models better track student practice trajectories across weeks. To support explainability, we analyze feature importance and ablations, revealing target-specific patterns: effort forecasting is driven mainly by recent activity features, while progress forecasting depends more on learner-state and content difficulty signals. Finally, in a semi-structured user interview case study with eight college tutors, we examine how tutors reasoned about system-generated predictive features when setting goals with students. We find that tutors reasoned differently about effort versus progress goals in ways that mirror our pattern analysis. Together, these results establish a reproducible benchmark for forecasting weekly effort and learning progress in ITS. By making patterns of sustained effort and progress visible at a weekly timescale, engagement forecasting offers a foundation for supporting tutor-learner goal setting and timely instructional decisions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces engagement forecasting as a supervised prediction task on ITS interaction logs from 425 middle-school students, targeting weekly minutes practiced and new skills mastered. It benchmarks fifteen feature-based models (regressions, decision trees, neural networks) against heuristic baselines such as fixed-percentile rules, reporting 22-33% MAE reductions, provides feature-importance and ablation analyses showing recent activity driving effort forecasts while learner-state and content difficulty drive progress forecasts, and includes a semi-structured interview case study with eight tutors on using the predictions for goal setting.
Significance. If the central MAE reductions hold under proper out-of-sample validation, the work supplies a reproducible benchmark for weekly effort and progress forecasting in intelligent tutoring systems, moving beyond ad-hoc heuristics toward analytics-driven support for tutor-learner goal setting. The target-specific feature patterns and the tutor interview results add explanatory and practical value; the public benchmark framing is a clear strength.
major comments (2)
- [Benchmarking / Experimental Setup] Benchmarking / Experimental Setup: The train-test partitioning procedure is not stated to be student-stratified (or week-stratified). Because the data consist of repeated weekly observations per student, any split that allows the same student to appear in both training and test sets risks temporal autocorrelation leakage, which would inflate the reported 22-33% MAE gains relative to the percentile heuristics and undermine the out-of-sample forecasting claim.
- [Results] Results section: No statistical significance tests, confidence intervals, or cross-validation variance estimates are provided for the MAE differences across the fifteen models and two targets. Without these, it is impossible to determine whether the observed improvements are reliable or could be explained by sampling variability in the 425-student corpus.
minor comments (2)
- [Methods] The abstract states that fifteen predictors were benchmarked, yet the methods section would benefit from an explicit enumerated list of all models together with their hyperparameter ranges or selection procedure.
- [Feature Engineering] Feature definitions (especially the 'recent activity' and 'learner-state' groups) are described at a high level; a table listing each feature, its computation, and any normalization would improve reproducibility.
Simulated Author's Rebuttal
We appreciate the referee's insightful comments on the experimental validation and statistical reporting. We have revised the manuscript to clarify the data partitioning procedure and to include statistical significance tests and confidence intervals for the reported MAE improvements. Our responses to the major comments are detailed below.
read point-by-point responses
-
Referee: [Benchmarking / Experimental Setup] Benchmarking / Experimental Setup: The train-test partitioning procedure is not stated to be student-stratified (or week-stratified). Because the data consist of repeated weekly observations per student, any split that allows the same student to appear in both training and test sets risks temporal autocorrelation leakage, which would inflate the reported 22-33% MAE gains relative to the percentile heuristics and undermine the out-of-sample forecasting claim.
Authors: We thank the referee for highlighting this critical aspect of the experimental design. Upon review, the train-test split in our study was indeed performed in a student-stratified manner, with all weekly observations for a given student assigned entirely to either the training or test set (70/30 split). This prevents any leakage from temporal autocorrelation within students. We have updated the manuscript's Experimental Setup section to explicitly state this partitioning strategy and its rationale for ensuring valid out-of-sample forecasting. revision: yes
-
Referee: [Results] Results section: No statistical significance tests, confidence intervals, or cross-validation variance estimates are provided for the MAE differences across the fifteen models and two targets. Without these, it is impossible to determine whether the observed improvements are reliable or could be explained by sampling variability in the 425-student corpus.
Authors: We agree that providing measures of statistical reliability strengthens the results. In the revised manuscript, we have added bootstrap-derived 95% confidence intervals for all MAE values and conducted paired statistical tests (Wilcoxon signed-rank tests due to non-normality of errors) comparing each model's per-student MAE against the heuristic baselines. The improvements remain significant (p < 0.001) across targets, with the confidence intervals confirming the 22-33% reductions are not attributable to sampling variability alone. These additions are incorporated into the Results section and a new supplementary table. revision: yes
Circularity Check
No significant circularity: supervised forecasting trained on historical logs to predict future weeks
full rationale
The paper trains feature-based models (regressions, trees, neural nets) on 425-student ITS logs to forecast weekly minutes practiced and skills mastered. Heuristic baselines are percentile rules adapted from external prior work in other domains. No equation or fitting step reduces the target variable to a parameter of itself by construction; the reported 22-33% MAE reduction is an empirical out-of-sample comparison on future weeks. The derivation chain is self-contained against external benchmarks and does not rely on self-citation for its central claim.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
M. A. Adams, J. C. Hurley, M. Todd, N. Bhuiyan, C. L. Jarrett, W. J. Tucker, K. E. Hollingshead, and S. S. Angadi. Adaptive goal setting and financial incentives: a 2×2 factorial randomized controlled trial to increase adults’ physical activity.BMC Public Health, 17(1):1– 16, 2017
work page 2017
-
[2]
B. Albreiki, N. Zaki, and H. Alashwal. A systematic literature review of student’ performance prediction us- ing machine learning techniques.Education Sciences, 11(9):1–27, 2021
work page 2021
- [3]
- [4]
-
[5]
R. S. Baker, A. T. Corbett, and K. R. Koedinger. Detecting student misuse of intelligent tutoring sys- tems. InProceedings of the 7th International Confer- ence on Intelligent Tutoring Systems (ITS), pages 531– 540, 2004
work page 2004
-
[6]
R. S. J. d. Baker. Modeling and understanding students’ off-task behavior in intelligent tutoring systems. InPro- ceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pages 1059–1068, 2007
work page 2007
-
[7]
R. S. J. d. Baker, S. M. Gowda, M. Wixon, J. Kalka, A. Z. Wagner, A. Salvi, V. Aleven, G. W. Kusbit, J. Ocumpaugh, and L. M. Rossi. Towards sensor-free affect detection in cognitive tutor algebra. InPro- ceedings of the 5th International Conference on Educa- tional Data Mining (EDM 2012), pages 126–133, Cha- nia, Greece, 2012. International Educational Da...
work page 2012
-
[8]
C. R. Beal, I. M. Arroyo, P. R. Cohen, and B. P. Woolf. Evaluation of animalwatch: An intelligent tutoring sys- tem for arithmetic and fractions.Journal of Interactive Online Learning, 9(1):64–77, 2010
work page 2010
-
[9]
H. Bembenutty. Meaningful and maladaptive home- work practices: The role of self-efficacy and self- regulation.Journal of Advanced Academics, 22(3):448– 473, 2011
work page 2011
-
[10]
M. L. Bernacki, T. J. Nokes-Malach, and V. Aleven. Examining self-efficacy during learning: Variability and relations to behavior, performance, and learning. Metacognition and Learning, 10:99–117, 2015
work page 2015
-
[11]
C. Borchers, A. Houk, V. Aleven, and K. R. Koedinger. Engagement and learning benefits of goal setting with rewards in human-ai tutoring. In A. I. Cristea, E. Walker, Y. Lu, O. C. Santos, and S. Isotani, editors, Artificial Intelligence in Education. AIED 2025. Lec- ture Notes in Computer Science, volume 15880, pages 46–59. Springer, Cham, 2025
work page 2025
-
[12]
C. Borchers, J. Ooge, C. Peng, and V. Aleven. How learner control and explainable learning analytics about skill mastery shape student desires to finish and avoid loss in tutored practice. InProceedings of the 15th In- ternational Learning Analytics and Knowledge Confer- ence, LAK 2025, page 810–816. ACM, Mar. 2025
work page 2025
-
[13]
C. Borchers, C. Peng, Q. Lyu, P. F. Carvalho, K. R. Koedinger, and V. Aleven. Student perceptions of adap- tive goal setting recommendations: A design prototyp- ing study. In A. I. Cristea, E. Walker, Y. Lu, O. C. Santos, and S. Isotani, editors,Artificial Intelligence in Education, pages 244–251, Cham, 2025. Springer Na- ture Switzerland
work page 2025
-
[14]
S. Bull and J. Kay. Student models that invite the learner in: The smili open learner modelling framework technical report 580.I. J. Artificial Intelligence in Ed- ucation, 17, 01 2007
work page 2007
-
[15]
H. Cen, K. Koedinger, and B. Junker. Learning factors analysis - a general method for cognitive model evalua- tion and improvement. InInternational Conference on Intelligent Tutoring Systems, pages 164–175, 2006
work page 2006
-
[16]
H. Cen, K. Koedinger, and B. Junker. Comparing two irt models for conjunctive skills. InInternational Con- ference on Intelligent Tutoring Systems, pages 796–798, 2008
work page 2008
-
[17]
T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Dis- covery and Data Mining, pages 785–794, 2016
work page 2016
-
[18]
A. T. Corbett and J. R. Anderson. Knowledge tracing: Modeling the acquisition of procedural knowledge.User Modeling and User-Adapted Interaction, 4(4):253–278, Dec. 1994
work page 1994
-
[19]
A. T. Corbett, K. R. Koedinger, and J. R. Ander- son. Intelligent tutoring systems. In M. G. Helander, T. K. Landauer, and P. V. Prabhu, editors,Handbook of Human-Computer Interaction, chapter 37, pages 849–
-
[20]
Elsevier Science B.V., Amsterdam, The Nether- lands, 2 edition, 1997
work page 1997
-
[21]
S. C. Dang.Exploring Behavioral Measurement Mod- els of Learner Motivation. Ph.d. thesis, Carnegie Mel- lon University, School of Computer Science, Pittsburgh, PA, USA, Feb. 26 2022. CMU–HCII–21–109
work page 2022
-
[22]
G. W. Dekker, M. Pechenizkiy, and J. M. Vleeshouw- ers. Predicting students drop out: A case study. In Proceedings of the 2nd International Conference on Ed- ucational Data Mining, pages 41–50, Cordoba, Spain,
-
[23]
International Working Group on Educational Data Mining
- [24]
-
[25]
J. Gardner, Y. Yang, R. S. Baker, and C. Brooks. Mod- eling and experimental design for mooc dropout pre- diction: A replication perspective. InProceedings of the 12th International Conference on Educational Data Mining (EDM 2019), pages 49–58, 2019
work page 2019
-
[26]
P. Grimaldi, K. Weatherholtz, and K. M. Hill. Esti- mating the causal effects of Khan Academy MAP Ac- celerator across demographic subgroups. InProceed- ings of the 15th International Conference on Educa- tional Data Mining, pages 839–846, Durham, United Kingdom, 2022. International Educational Data Min- ing Society
work page 2022
-
[27]
A. Gurung, J. Lin, Z. Huang, C. Borchers, R. Baker, V. Aleven, and K. Koedinger. Starting seatwork earlier as a valid measure of student engagement. In C. Mills, G. Alexandron, D. Taibi, G. L. Bosco, and L. Paque- tte, editors,Proceedings of the 18th International Con- ference on Educational Data Mining, pages 303–316, Palermo, Italy, July 2025. Internati...
work page 2025
-
[28]
Hattie.Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement
J. Hattie.Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement. Routledge, London, UK, 2009
work page 2009
-
[29]
T. K. Ho. Random decision forests.Proceedings of the 3rd International Conference on Document Analysis and Recognition, 1:278–282, 1995
work page 1995
-
[30]
A. E. Hoerl and R. W. Kennard. Ridge regression: Bi- ased estimation for nonorthogonal problems.Techno- metrics, 12(1):55–67, 1970
work page 1970
-
[31]
L. Holt. The 5 percent problem: Online mathematics programs may benefit most the kids who need it least. Education Next, 24(4):26–31, apr 2024
work page 2024
-
[32]
D. Hooshyar, M. Pedaste, K. Saks, ¨Ali Leijen, E. Bar- done, and M. Wang. Open learner models in sup- porting self-regulated learning in higher education: A systematic literature review.Computers & Education, 154:103878–103878, 2020
work page 2020
-
[33]
K. R. Koedinger, J. Kim, J. Z. Jia, E. A. McLaughlin, and N. L. Bier. Learning is not a spectator sport: Doing is better than watching for learning from a mooc. In Proceedings of the Second (2015) ACM Conference on Learning @ Scale, L@S ’15, page 111–120, New York, NY, USA, 2015. Association for Computing Machinery
work page 2015
-
[34]
V. Kovanovic, D. Gaˇ sevi´ c, S. Dawson, S. Joksimovic, and R. Baker. Does time-on-task estimation matter? implications on validity of learning analytics findings. Journal of Learning Analytics, 2(3):81–110, Feb. 2016
work page 2016
-
[35]
J. A. Kulik and J. D. Fletcher. Effectiveness of intelli- gent tutoring systems: A meta-analytic review.Review of Educational Research, 86(1):42–78, 2016
work page 2016
-
[36]
E. A. Locke and G. P. Latham. Building a practically useful theory of goal setting and task motivation: A 35- year Odyssey.American Psychologist, 57(9):705–717, Sept. 2002. PMID: 12237980
work page 2002
-
[37]
E. A. Locke and G. P. Latham. The development of goal setting theory: A half century retrospective.Motivation Science, 5(2):93–105, 2019
work page 2019
-
[38]
W. Ma, O. O. Adesope, J. C. Nesbit, and Q. Liu. Intelligent tutoring systems and learning outcomes: A meta-analysis.Journal of Educational Psychology, 106(4):901–918, 2014
work page 2014
-
[39]
T. Mu, A. Jetten, and E. Brunskill. Towards suggesting actionable interventions for wheel-spinning students. In Proceedings of the 13th International Conference on Ed- ucational Data Mining, pages 183–193, Online, 2020. International Educational Data Mining Society
work page 2020
-
[40]
C. Peng, C. Borchers, and V. Aleven. Designing home- work support tools for middle school mathematics us- ing intelligent tutoring systems. InProceedings of the 18th International Conference of the Learning Sciences (ICLS 2024), pages 1730–1733, Buffalo, NY, USA,
work page 2024
-
[41]
International Society of the Learning Sciences
-
[42]
S. Ritter, A. Joshi, S. Fancsali, and T. Nixon. Predict- ing standardized test scores from cognitive tutor inter- actions. In S. K. D’Mello, R. A. Calvo, and A. Ol- ney, editors,Proceedings of the 6th International Con- ference on Educational Data Mining, Memphis, Ten- nessee, USA, July 6-9, 2013, pages 169–176. Interna- tional Educational Data Mining Soc...
work page 2013
-
[43]
R. M. Ryan and E. L. Deci. Self-determination the- ory and the facilitation of intrinsic motivation, social development, and well-being.American Psychologist, 55(1):68–78, 2000
work page 2000
-
[44]
P. Schaldenbrand, N. G. Lobczowski, J. E. Richey, S. Gupta, E. A. McLaughlin, A. Adeniran, and K. R. Koedinger. Computer-supported human mentoring for personalized and equitable math learning. InAr- tificial Intelligence in Education: 22nd International Conference, AIED 2021, Utrecht, The Netherlands, June 14–18, 2021, Proceedings, Part II, page 308–313, ...
work page 2021
-
[45]
D. H. Schunk. Goal setting and self-efficacy during self- regulated learning.Educational Psychologist, 25(1):71– 86, 1990
work page 1990
-
[46]
G. A. Seber and A. J. Lee.Linear Regression Analysis. Wiley, 2003
work page 2003
-
[47]
J. Stamper, K. Koedinger, R. S. J. d. Baker, A. Skogsholm, B. Leber, J. Rankin, and S. Demi. Pslc datashop: A data analysis service for the learning sci- ence community. In V. Aleven, J. Kay, and J. Mostow, editors,Intelligent Tutoring Systems, pages 455–455, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg
work page 2010
-
[48]
R. Tibshirani. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Se- ries B (Methodological), 58(1):267–288, 1996
work page 1996
-
[49]
K. VanLehn. The behavior of tutoring systems.Inter- national Journal of Artificial Intelligence in Education, 16(3):227–265, 2006
work page 2006
-
[50]
K. VanLehn. The relative effectiveness of human tu- toring, intelligent tutoring systems, and other tutoring systems.Educational psychologist, 46(4):197–221, 2011
work page 2011
-
[51]
L. S. Vygotsky. Interaction between learning and de- velopment. In M. Cole, V. John-Steiner, S. Scribner, and E. Souberman, editors,Mind in Society: Develop- ment of Higher Psychological Processes, pages 79–91. Harvard University Press, Cambridge, MA, 1978
work page 1978
- [52]
-
[53]
H. Wan, J. Ding, X. Gao, and D. E. Pritchard. Dropout prediction in MOOCs using learners’ study habits fea- tures. InProceedings of the 10th International Confer- ence on Educational Data Mining, pages 408–409, 2017
work page 2017
-
[54]
K. W ¨aschle, A. Allgaier, A. Lachner, S. Fink, and M. N¨uckles. Procrastination and self-efficacy: Tracing vicious and virtuous circles in self-regulated learning. Learning and Instruction, 29:103–114, 02 2014
work page 2014
-
[55]
M. Xia, R. Schmucker, C. Borchers, and V. Aleven. Optimizing mastery learning by fast-forwarding over- practice steps. InTwo Decades of TEL. From Lessons Learnt to Challenges Ahead: 20th European Confer- ence on Technology Enhanced Learning, EC-TEL 2025, Newcastle upon Tyne and Durham, UK, September 15–19, 2025, Proceedings, Part I, page 549–563, Berlin, ...
work page 2025
-
[56]
A. F. Zambrano, R. S. Baker, S. Baral, N. T. Heffernan, and A. Lan. From reaction to anticipation: Predicting future affect. InProceedings of the 17th International Conference on Educational Data Mining, pages 566– 574, Atlanta, Georgia, USA, July 2024. International Educational Data Mining Society
work page 2024
-
[57]
C. Zhang, Y. Huang, J. Wang, D. Lu, W. Fang, J. C. Stamper, S. E. Fancsali, K. Holstein, and V. Aleven. Early detection of wheel spinning: Comparison across tutors, models, features, and operationalizations. In Proceedings of the 12th International Conference on Ed- ucational Data Mining, pages 468–473, Montr´ eal, QC, Canada, 2019. International Educatio...
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.