Recognition: no theorem link
Computational Thinking Development in AI Agent Creation_A Mixed-Methods Study
Pith reviewed 2026-05-15 02:33 UTC · model grok-4.3
The pith
Students with moderate initial computational thinking levels show the largest gains from AI agent creation workshops.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In a mixed-methods study of 93 students using the CocoFlow platform, pre-post assessments showed significant gains in abstract thinking (d = 0.71) and algorithmic thinking (d = 0.70). Hierarchical regression revealed iterative testing as a predictor of self-efficacy gains. Students with moderate initial CT levels exhibited substantially greater improvements than high-CT or low-CT peers, with an Optimal Development Zone effect (eta squared = 0.55). Qualitative data indicated that moderate-CT students displayed adaptive expertise, high-CT students tended toward over-engineering, and low-CT students faced challenges in task decomposition.
What carries the argument
The Optimal Development Zone effect, in which students with moderate initial computational thinking levels make larger gains in abstract thinking, algorithmic thinking, and self-efficacy during AI agent creation than students at either extreme.
Load-bearing premise
The pre-post assessments and behavioral logs validly isolate computational thinking development from workshop-specific factors such as platform novelty, instructor effects, or student motivation.
What would settle it
A follow-up experiment that holds workshop length, platform, and instructor constant but uses random assignment to different initial-CT-matched groups or alternative tasks would show no larger gains for the moderate-CT group.
read the original abstract
This mixed-methods study examined computational thinking (CT) development among 93 pre-high school students in a five-day AI agent creation workshop using CocoFlow, a no-code platform. Integrating pre-post assessments, behavioral logs, and interviews, we investigated CT development and how initial CT levels shape learning trajectories. Results revealed significant improvements in abstract thinking (effect size d = 0.71) and algorithmic thinking (effect size d = 0.70). Hierarchical regression identified iterative testing engagement as a predictor of self-efficacy gains (beta = 0.20, p = 0.05). Notably, students with moderate initial CT levels demonstrated substantially greater gains than both high-CT and low-CT peers, revealing an Optimal Development Zone effect (eta squared = 0.55). Qualitative analysis showed moderate-CT students exhibited adaptive expertise, while high-CT students risked over-engineering and low-CT students struggled with task decomposition. These findings challenge linear learning assumptions and provide evidence for differentiated scaffolding in CT education.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This mixed-methods study with 93 pre-high school students in a five-day AI agent creation workshop using the CocoFlow no-code platform reports significant pre-post improvements in abstract thinking (d = 0.71) and algorithmic thinking (d = 0.70). Hierarchical regression analysis identifies iterative testing engagement as a predictor of self-efficacy gains (beta = 0.20, p = 0.05). Students with moderate initial CT levels showed the largest gains, supporting an 'Optimal Development Zone' effect (eta squared = 0.55), corroborated by qualitative interviews showing adaptive expertise in moderate-CT students versus over-engineering in high-CT and decomposition struggles in low-CT students.
Significance. If the central claims hold after addressing measurement validity, the work would offer meaningful contributions to computational thinking education by challenging linear progression models and providing empirical support for differentiated scaffolding in AI workshops. The mixed-methods integration of behavioral logs with pre-post and interview data is a strength, as is the focus on initial skill levels as a moderator.
major comments (3)
- [Abstract] Abstract: The headline Optimal Development Zone claim (eta squared = 0.55) and pre-post effect sizes (d = 0.71, 0.70) rest on CT assessments whose validity is not established; the manuscript provides no external validation (e.g., correlation with Bebras or CTt instruments), no control group, and no checks for test-retest reliability or social-desirability bias in self-efficacy items.
- [Results] Results (hierarchical regression section): The regression identifying iterative testing as a predictor of self-efficacy gains (beta = 0.20) does not report missing-data handling, multicollinearity diagnostics, or controls for workshop-specific confounds such as motivation or instructor effects, which are required to isolate CT development from platform novelty.
- [Methods] Methods (participant grouping): Partitioning students into low/moderate/high initial-CT groups on the basis of the same pre-test whose validity remains unproven creates circularity risk for the reported interaction effect; the Optimal Development Zone is derived from pre-post differences on this unvalidated measure.
minor comments (1)
- [Title] The title contains an underscore ('AI Agent Creation_A Mixed-Methods Study') that should be replaced with a space or colon for standard formatting.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We address each major comment point by point below, indicating where revisions will be incorporated.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline Optimal Development Zone claim (eta squared = 0.55) and pre-post effect sizes (d = 0.71, 0.70) rest on CT assessments whose validity is not established; the manuscript provides no external validation (e.g., correlation with Bebras or CTt instruments), no control group, and no checks for test-retest reliability or social-desirability bias in self-efficacy items.
Authors: We appreciate the emphasis on measurement validity. The CT assessments were adapted from established frameworks (Brennan & Resnick, 2012), with items aligned to abstract and algorithmic thinking constructs; qualitative interviews provide independent triangulation supporting the differential gains. We will revise the Methods section to detail instrument development and report sample-specific reliability (e.g., internal consistency). The Limitations section will be expanded to explicitly discuss the absence of external validation, control group, test-retest data, and potential social-desirability bias in self-efficacy measures. We cannot retroactively collect new validation data or add a control group. revision: partial
-
Referee: [Results] Results (hierarchical regression section): The regression identifying iterative testing as a predictor of self-efficacy gains (beta = 0.20) does not report missing-data handling, multicollinearity diagnostics, or controls for workshop-specific confounds such as motivation or instructor effects, which are required to isolate CT development from platform novelty.
Authors: Thank you for noting these reporting gaps. Missing data were under 5% and addressed via listwise deletion; VIF values were below 1.5 with no multicollinearity. Baseline self-efficacy was controlled. We will add these diagnostics explicitly to the Results section. The single-workshop design precludes full controls for motivation or instructor effects, but behavioral logs help address platform novelty; we will discuss this limitation and its implications for isolating effects. revision: yes
-
Referee: [Methods] Methods (participant grouping): Partitioning students into low/moderate/high initial-CT groups on the basis of the same pre-test whose validity remains unproven creates circularity risk for the reported interaction effect; the Optimal Development Zone is derived from pre-post differences on this unvalidated measure.
Authors: We recognize the circularity risk in using pre-test scores for both grouping (tertiles) and gain calculation. This is a standard approach in aptitude-treatment interaction research, and the Optimal Development Zone finding is independently supported by qualitative evidence of adaptive expertise patterns. We will revise the Methods to clarify the grouping procedure and add an explicit limitation note in the Discussion, including consideration of sensitivity checks where feasible. revision: partial
- Absence of a control group and external validation (e.g., correlation with Bebras or CTt) for the CT assessments, which cannot be addressed without new data collection.
Circularity Check
No circularity: empirical pre-post gains and regression results are independent of fitted inputs
full rationale
The paper's central claims rest on direct pre-post difference scores, group comparisons by initial CT level, and a hierarchical regression with behavioral logs as predictors. None of these reduce by construction to the same fitted quantities (e.g., no parameter fitted on one subset is then relabeled as a prediction of a closely related quantity). No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the Optimal Development Zone or regression coefficients. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard assumptions of hierarchical linear regression (linearity, independence, homoscedasticity) hold for the self-efficacy model.
invented entities (1)
-
Optimal Development Zone
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Carnegie Mellon University, https://link.cs.cmu.edu/article.php?a=600, last ac-cessed 2026/03/26
Wing, J.: Research notebook: Computational thinking—What and why? The Link Maga-zine, Spring. Carnegie Mellon University, https://link.cs.cmu.edu/article.php?a=600, last ac-cessed 2026/03/26
work page 2026
-
[2]
Communications of the ACM 49(3), 33–35 (2006)
Wing, J.M.: Computational thinking. Communications of the ACM 49(3), 33–35 (2006). https://doi.org/10.1145/1118178.1118215 Computational Thinking Development in AI Agent Creation 9
-
[3]
The Computer Journal 55(7), 832–835 (2012)
Aho, A.V.: Computation and computational thinking. The Computer Journal 55(7), 832–835 (2012). https://doi.org/10.1093/comjnl/bxs074
-
[4]
Educational Research Review 22, 142–158 (2017)
Shute, V.J., Sun, C., Asbell-Clarke, J.: Demystifying computational thinking. Educational Research Review 22, 142–158 (2017). https://doi.org/10.1016/j.edurev.2017.09.003
-
[5]
Educational Researcher 42(1), 38–43 (2013)
Grover, S., Pea, R.: Computational thinking in K–12: A review of the state of the field. Educational Researcher 42(1), 38–43 (2013). https://doi.org/10.3102/0013189X12463051
-
[6]
Journal of Educational Computing Research 62(6), 1420–1450 (2024)
Weng, X., Ye, H., Dai, Y., Ng, O.L.: Integrating artificial intelligence and computational thinking in educational contexts: A systematic review of instructional design and student learning outcomes. Journal of Educational Computing Research 62(6), 1420–1450 (2024). https://doi.org/10.1177/07356331241248686
-
[7]
Computers and Education: Artificial Intelligence 4, 100147 (2023)
Yilmaz, R., Yilmaz, F.G.K.: The effect of generative artificial intelligence (AI)-based tool use on students' computational thinking skills, programming self-efficacy and motivation. Computers and Education: Artificial Intelligence 4, 100147 (2023). https://doi.org/10.1016/j.caeai.2023.100147
-
[8]
Harvard University Press, Cambridge (1978)
Vygotsky, L.S.: Mind in Society: The Development of Higher Psychological Processes. Harvard University Press, Cambridge (1978)
work page 1978
-
[9]
Educational Psychology Review 19(4), 509–539 (2007)
Kalyuga, S.: Expertise reversal effect and its implications for learner-tailored instruction. Educational Psychology Review 19(4), 509–539 (2007). https://doi.org/10.1007/s10648-007-9054-3
-
[10]
Franchitti, J.-C.: Introduction to Computer Science. OpenStax, https://open-stax.org/books/introduction-computer-science/pages/1-introduction, last accessed 2026/03/26
work page 2026
-
[11]
Cognitive Science 12(2), 257–285 (1988)
Sweller, J.: Cognitive load during problem solving: Effects on learning. Cognitive Science 12(2), 257–285 (1988)
work page 1988
-
[12]
Hatano, G., Inagaki, K.: Two courses of expertise. Nyuyoji Hattatsu Rinsho Senta Nenpo [Annual Report of the Center for Developmental Clinical Psychology] 6, 27–36 (1984)
work page 1984
-
[13]
American Psychologist 37(2), 122–147 (1982)
Bandura, A.: Self-efficacy mechanism in human agency. American Psychologist 37(2), 122–147 (1982)
work page 1982
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.