arxiv: 2605.14330 · v1 · submitted 2026-05-14 · 💻 cs.CY

Recognition: no theorem link

Computational Thinking Development in AI Agent Creation_A Mixed-Methods Study

Yimeng Sun , Haiyang Xin , Qiannan Niu , Shuang Li , Lingyun Huang , Gaowei Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:33 UTC · model grok-4.3

classification 💻 cs.CY

keywords computational thinkingAI agent creationoptimal development zonemixed-methodsno-code platformiterative testingself-efficacypre-high school students

0 comments

The pith

Students with moderate initial computational thinking levels show the largest gains from AI agent creation workshops.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines computational thinking development in 93 pre-high school students during a five-day AI agent creation workshop on the no-code CocoFlow platform. Pre-post assessments and behavioral logs revealed significant gains in abstract thinking and algorithmic thinking, while iterative testing predicted increases in self-efficacy. The central finding is that moderate initial CT students improved substantially more than high-CT or low-CT peers, indicating an Optimal Development Zone rather than steady linear progress. This challenges uniform approaches to CT education and suggests the value of adjusting support to match starting skill levels in such hands-on AI activities.

Core claim

In a mixed-methods study of 93 students using the CocoFlow platform, pre-post assessments showed significant gains in abstract thinking (d = 0.71) and algorithmic thinking (d = 0.70). Hierarchical regression revealed iterative testing as a predictor of self-efficacy gains. Students with moderate initial CT levels exhibited substantially greater improvements than high-CT or low-CT peers, with an Optimal Development Zone effect (eta squared = 0.55). Qualitative data indicated that moderate-CT students displayed adaptive expertise, high-CT students tended toward over-engineering, and low-CT students faced challenges in task decomposition.

What carries the argument

The Optimal Development Zone effect, in which students with moderate initial computational thinking levels make larger gains in abstract thinking, algorithmic thinking, and self-efficacy during AI agent creation than students at either extreme.

Load-bearing premise

The pre-post assessments and behavioral logs validly isolate computational thinking development from workshop-specific factors such as platform novelty, instructor effects, or student motivation.

What would settle it

A follow-up experiment that holds workshop length, platform, and instructor constant but uses random assignment to different initial-CT-matched groups or alternative tasks would show no larger gains for the moderate-CT group.

read the original abstract

This mixed-methods study examined computational thinking (CT) development among 93 pre-high school students in a five-day AI agent creation workshop using CocoFlow, a no-code platform. Integrating pre-post assessments, behavioral logs, and interviews, we investigated CT development and how initial CT levels shape learning trajectories. Results revealed significant improvements in abstract thinking (effect size d = 0.71) and algorithmic thinking (effect size d = 0.70). Hierarchical regression identified iterative testing engagement as a predictor of self-efficacy gains (beta = 0.20, p = 0.05). Notably, students with moderate initial CT levels demonstrated substantially greater gains than both high-CT and low-CT peers, revealing an Optimal Development Zone effect (eta squared = 0.55). Qualitative analysis showed moderate-CT students exhibited adaptive expertise, while high-CT students risked over-engineering and low-CT students struggled with task decomposition. These findings challenge linear learning assumptions and provide evidence for differentiated scaffolding in CT education.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Moderate initial CT students show biggest gains in this short no-code AI workshop, but the effect rests on unvalidated pre-post measures with no control group.

read the letter

The main thing here is the Optimal Development Zone result: students with moderate starting CT levels gained more than low or high starters during the five-day CocoFlow workshop, with eta squared at 0.55. They also report clear pre-post gains in abstract thinking (d=0.71) and algorithmic thinking (d=0.70), plus a regression link from iterative testing logs to self-efficacy (beta=0.20). The mixed-methods angle adds some color through interviews showing moderate students adapting while high ones over-engineer and low ones get stuck on decomposition. That part feels like a reasonable extension of existing CT work into no-code AI agent building. The behavioral logs are a plus for grounding the regression. The soft spots are the ones the stress-test flags. No mention of validating the CT assessments against established instruments, no control group, and the low/moderate/high split comes from the same pre-test used to measure gains. That setup leaves room for workshop novelty, instructor effects, or motivation to drive the differences instead of the AI tool itself. Subgroup sizes aren't detailed, so the zone effect could be sensitive to how they binned the data. The paper is aimed at K-12 CS educators trying AI platforms for computational thinking. It is honest about its scope as one workshop and doesn't overclaim theory change. I would bring it to a reading group to talk through the zone idea and what a follow-up with better controls might look like. It deserves peer review because the empirical pattern is specific enough to be worth referee input on the methods, even if heavy revisions on validation and design are likely.

Referee Report

3 major / 1 minor

Summary. This mixed-methods study with 93 pre-high school students in a five-day AI agent creation workshop using the CocoFlow no-code platform reports significant pre-post improvements in abstract thinking (d = 0.71) and algorithmic thinking (d = 0.70). Hierarchical regression analysis identifies iterative testing engagement as a predictor of self-efficacy gains (beta = 0.20, p = 0.05). Students with moderate initial CT levels showed the largest gains, supporting an 'Optimal Development Zone' effect (eta squared = 0.55), corroborated by qualitative interviews showing adaptive expertise in moderate-CT students versus over-engineering in high-CT and decomposition struggles in low-CT students.

Significance. If the central claims hold after addressing measurement validity, the work would offer meaningful contributions to computational thinking education by challenging linear progression models and providing empirical support for differentiated scaffolding in AI workshops. The mixed-methods integration of behavioral logs with pre-post and interview data is a strength, as is the focus on initial skill levels as a moderator.

major comments (3)

[Abstract] Abstract: The headline Optimal Development Zone claim (eta squared = 0.55) and pre-post effect sizes (d = 0.71, 0.70) rest on CT assessments whose validity is not established; the manuscript provides no external validation (e.g., correlation with Bebras or CTt instruments), no control group, and no checks for test-retest reliability or social-desirability bias in self-efficacy items.
[Results] Results (hierarchical regression section): The regression identifying iterative testing as a predictor of self-efficacy gains (beta = 0.20) does not report missing-data handling, multicollinearity diagnostics, or controls for workshop-specific confounds such as motivation or instructor effects, which are required to isolate CT development from platform novelty.
[Methods] Methods (participant grouping): Partitioning students into low/moderate/high initial-CT groups on the basis of the same pre-test whose validity remains unproven creates circularity risk for the reported interaction effect; the Optimal Development Zone is derived from pre-post differences on this unvalidated measure.

minor comments (1)

[Title] The title contains an underscore ('AI Agent Creation_A Mixed-Methods Study') that should be replaced with a space or colon for standard formatting.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the detailed and constructive review. We address each major comment point by point below, indicating where revisions will be incorporated.

read point-by-point responses

Referee: [Abstract] Abstract: The headline Optimal Development Zone claim (eta squared = 0.55) and pre-post effect sizes (d = 0.71, 0.70) rest on CT assessments whose validity is not established; the manuscript provides no external validation (e.g., correlation with Bebras or CTt instruments), no control group, and no checks for test-retest reliability or social-desirability bias in self-efficacy items.

Authors: We appreciate the emphasis on measurement validity. The CT assessments were adapted from established frameworks (Brennan & Resnick, 2012), with items aligned to abstract and algorithmic thinking constructs; qualitative interviews provide independent triangulation supporting the differential gains. We will revise the Methods section to detail instrument development and report sample-specific reliability (e.g., internal consistency). The Limitations section will be expanded to explicitly discuss the absence of external validation, control group, test-retest data, and potential social-desirability bias in self-efficacy measures. We cannot retroactively collect new validation data or add a control group. revision: partial
Referee: [Results] Results (hierarchical regression section): The regression identifying iterative testing as a predictor of self-efficacy gains (beta = 0.20) does not report missing-data handling, multicollinearity diagnostics, or controls for workshop-specific confounds such as motivation or instructor effects, which are required to isolate CT development from platform novelty.

Authors: Thank you for noting these reporting gaps. Missing data were under 5% and addressed via listwise deletion; VIF values were below 1.5 with no multicollinearity. Baseline self-efficacy was controlled. We will add these diagnostics explicitly to the Results section. The single-workshop design precludes full controls for motivation or instructor effects, but behavioral logs help address platform novelty; we will discuss this limitation and its implications for isolating effects. revision: yes
Referee: [Methods] Methods (participant grouping): Partitioning students into low/moderate/high initial-CT groups on the basis of the same pre-test whose validity remains unproven creates circularity risk for the reported interaction effect; the Optimal Development Zone is derived from pre-post differences on this unvalidated measure.

Authors: We recognize the circularity risk in using pre-test scores for both grouping (tertiles) and gain calculation. This is a standard approach in aptitude-treatment interaction research, and the Optimal Development Zone finding is independently supported by qualitative evidence of adaptive expertise patterns. We will revise the Methods to clarify the grouping procedure and add an explicit limitation note in the Discussion, including consideration of sensitivity checks where feasible. revision: partial

standing simulated objections not resolved

Absence of a control group and external validation (e.g., correlation with Bebras or CTt) for the CT assessments, which cannot be addressed without new data collection.

Circularity Check

0 steps flagged

No circularity: empirical pre-post gains and regression results are independent of fitted inputs

full rationale

The paper's central claims rest on direct pre-post difference scores, group comparisons by initial CT level, and a hierarchical regression with behavioral logs as predictors. None of these reduce by construction to the same fitted quantities (e.g., no parameter fitted on one subset is then relabeled as a prediction of a closely related quantity). No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the Optimal Development Zone or regression coefficients. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The study rests on standard statistical assumptions for regression and effect-size calculations plus the interpretive construct of an Optimal Development Zone; no free parameters are fitted to derive the central claim beyond ordinary data analysis.

axioms (1)

standard math Standard assumptions of hierarchical linear regression (linearity, independence, homoscedasticity) hold for the self-efficacy model.
Invoked when reporting beta = 0.20, p = 0.05 for iterative testing engagement.

invented entities (1)

Optimal Development Zone no independent evidence
purpose: To label the non-linear pattern in which moderate initial CT produces larger gains than high or low initial CT.
Introduced to interpret the eta squared = 0.55 result; no external falsifiable prediction or independent evidence is supplied in the abstract.

pith-pipeline@v0.9.0 · 5481 in / 1475 out tokens · 46064 ms · 2026-05-15T02:33:08.034431+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

Carnegie Mellon University, https://link.cs.cmu.edu/article.php?a=600, last ac-cessed 2026/03/26

Wing, J.: Research notebook: Computational thinking—What and why? The Link Maga-zine, Spring. Carnegie Mellon University, https://link.cs.cmu.edu/article.php?a=600, last ac-cessed 2026/03/26

work page 2026
[2]

Communications of the ACM 49(3), 33–35 (2006)

Wing, J.M.: Computational thinking. Communications of the ACM 49(3), 33–35 (2006). https://doi.org/10.1145/1118178.1118215 Computational Thinking Development in AI Agent Creation 9

work page doi:10.1145/1118178.1118215 2006
[3]

The Computer Journal 55(7), 832–835 (2012)

Aho, A.V.: Computation and computational thinking. The Computer Journal 55(7), 832–835 (2012). https://doi.org/10.1093/comjnl/bxs074

work page doi:10.1093/comjnl/bxs074 2012
[4]

Educational Research Review 22, 142–158 (2017)

Shute, V.J., Sun, C., Asbell-Clarke, J.: Demystifying computational thinking. Educational Research Review 22, 142–158 (2017). https://doi.org/10.1016/j.edurev.2017.09.003

work page doi:10.1016/j.edurev.2017.09.003 2017
[5]

Educational Researcher 42(1), 38–43 (2013)

Grover, S., Pea, R.: Computational thinking in K–12: A review of the state of the field. Educational Researcher 42(1), 38–43 (2013). https://doi.org/10.3102/0013189X12463051

work page doi:10.3102/0013189x12463051 2013
[6]

Journal of Educational Computing Research 62(6), 1420–1450 (2024)

Weng, X., Ye, H., Dai, Y., Ng, O.L.: Integrating artificial intelligence and computational thinking in educational contexts: A systematic review of instructional design and student learning outcomes. Journal of Educational Computing Research 62(6), 1420–1450 (2024). https://doi.org/10.1177/07356331241248686

work page doi:10.1177/07356331241248686 2024
[7]

Computers and Education: Artificial Intelligence 4, 100147 (2023)

Yilmaz, R., Yilmaz, F.G.K.: The effect of generative artificial intelligence (AI)-based tool use on students' computational thinking skills, programming self-efficacy and motivation. Computers and Education: Artificial Intelligence 4, 100147 (2023). https://doi.org/10.1016/j.caeai.2023.100147

work page doi:10.1016/j.caeai.2023.100147 2023
[8]

Harvard University Press, Cambridge (1978)

Vygotsky, L.S.: Mind in Society: The Development of Higher Psychological Processes. Harvard University Press, Cambridge (1978)

work page 1978
[9]

Educational Psychology Review 19(4), 509–539 (2007)

Kalyuga, S.: Expertise reversal effect and its implications for learner-tailored instruction. Educational Psychology Review 19(4), 509–539 (2007). https://doi.org/10.1007/s10648-007-9054-3

work page doi:10.1007/s10648-007-9054-3 2007
[10]

OpenStax, https://open-stax.org/books/introduction-computer-science/pages/1-introduction, last accessed 2026/03/26

Franchitti, J.-C.: Introduction to Computer Science. OpenStax, https://open-stax.org/books/introduction-computer-science/pages/1-introduction, last accessed 2026/03/26

work page 2026
[11]

Cognitive Science 12(2), 257–285 (1988)

Sweller, J.: Cognitive load during problem solving: Effects on learning. Cognitive Science 12(2), 257–285 (1988)

work page 1988
[12]

Nyuyoji Hattatsu Rinsho Senta Nenpo [Annual Report of the Center for Developmental Clinical Psychology] 6, 27–36 (1984)

Hatano, G., Inagaki, K.: Two courses of expertise. Nyuyoji Hattatsu Rinsho Senta Nenpo [Annual Report of the Center for Developmental Clinical Psychology] 6, 27–36 (1984)

work page 1984
[13]

American Psychologist 37(2), 122–147 (1982)

Bandura, A.: Self-efficacy mechanism in human agency. American Psychologist 37(2), 122–147 (1982)

work page 1982