Floor Raiser or Ceiling Limiter? Differential Storytelling Outcomes with a Child-Centric GenAI System Across Individual Differences

Min Fan; Shengyu Huang; Wanqing Ma; Xiaolu Dai; Xinyue Cui

arxiv: 2606.27067 · v1 · pith:VKN2BKCBnew · submitted 2026-06-25 · 💻 cs.HC

Floor Raiser or Ceiling Limiter? Differential Storytelling Outcomes with a Child-Centric GenAI System Across Individual Differences

Min Fan , Wanqing Ma , Xinyue Cui , Xiaolu Dai , Shengyu Huang This is my paper

Pith reviewed 2026-06-26 02:24 UTC · model grok-4.3

classification 💻 cs.HC

keywords generative AIchildren storytellingquality convergenceindividual differencesscaffoldingcreativity supportwithin-subjects experiment

0 comments

The pith

A child-centric GenAI storytelling system narrows the quality gap between children by 83.5 percent through floor-raising support and upper-end constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether generative AI tools for storytelling benefit all children equally or produce different outcomes based on individual starting points. Through a within-subjects experiment with 40 children ages 7 to 12, the GenAI condition produced a convergence effect that closed most of the initial quality difference. The narrowing occurred because the system boosted weaker stories and reined in stronger ones, yet this benefit appeared only in creativity and richness, not in coherence or narrative structure. The work also notes age-linked differences in keyword selection and links image regeneration to structural improvements.

Core claim

The GenAI-assisted condition was associated with a floor-raising convergence pattern, with the quality gap narrowing by 83.5%, driven by lower-end support and upper-end constraint mechanisms. This convergence was dimension-selective, improving creativity and richness while leaving coherence and narrative structure tied to baseline performance. Younger children more often selected semantically distant keywords while older children preferred semantically closer ones, although engagement orientation varied across individuals regardless of age. Image regeneration was positively associated with structural quality dimensions, though this association was attenuated after baseline control.

What carries the argument

The floor-raising convergence pattern produced by the child-centric GenAI storytelling system, which supplies lower-end support and upper-end constraints in a dimension-selective manner.

If this is right

Younger children select semantically distant keywords more often than older children do.
Image regeneration links to higher structural quality scores, though the link weakens once baseline performance is controlled.
Mechanism-contingent scaffolding serves as a design principle for adaptive GenAI storytelling systems that serve diverse children.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Systems could adapt keyword suggestions by age to match observed selection preferences.
Designers may need separate mechanisms for coherence support that do not rely on the same floor-raising process.
Longer-term use might reveal whether the selective dimension effects persist or shift as children gain experience.

Load-bearing premise

That the four quality dimensions were measured with comparable validity and reliability across both conditions and that the within-subjects design isolated the GenAI effect without order or fatigue confounds.

What would settle it

A follow-up study in which story quality scores under the GenAI condition show no 83.5 percent convergence or in which coherence and narrative structure improve as much as creativity and richness.

read the original abstract

Generative AI (GenAI) holds promise for democratizing creative literacy, yet whether it benefits all children equally remains unclear. Using a child-centric GenAI storytelling system for children aged 7-12, we conducted a mixed-methods within-subjects experiment (N = 40, Grades 2-6) comparing GenAI-assisted and traditional storyboard conditions. Three findings emerged. First, the GenAI-assisted condition was associated with a floor-raising convergence pattern, with the quality gap narrowing by 83.5%, driven by lower-end support and upper-end constraint mechanisms. This convergence was dimension-selective, improving creativity and richness while leaving coherence and narrative structure tied to baseline performance. Second, younger children more often selected semantically distant keywords while older children preferred semantically closer ones, although engagement orientation varied across individuals regardless of age. Third, image regeneration was positively associated with structural quality dimensions, though this association was attenuated after baseline control. We propose mechanism-contingent scaffolding as a design principle for adaptive GenAI storytelling systems serving diverse children.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The abstract reports an 83.5% narrowing of quality gaps via floor-raising in a GenAI storytelling tool for kids, but supplies no stats, distributions, or tests to support the number.

read the letter

Hi,

The main thing to know is that this paper claims a GenAI storytelling system for children reduces the gap in output quality by 83.5 percent through a floor-raising effect, but the abstract alone does not provide the numbers or tests needed to assess that claim.

The work compares a GenAI-assisted condition to a traditional storyboard one in a within-subjects setup with forty children in grades two through six. They find the convergence happens selectively, helping creativity and richness more than coherence and narrative structure. Younger kids picked more distant keywords, and regenerating images linked to better structure scores after controls. From this they suggest mechanism-contingent scaffolding as a design idea for systems that adapt to different kids.

The study does a decent job of looking at individual differences instead of just averages, which is useful in this area. The mixed-methods part tries to combine scores with some qualitative insights, and the focus on equity in creative tools is relevant for edtech.

The soft spots are clear though. The 83.5 percent figure comes with no explanation of how the gaps were measured, what the baseline and post scores were, or any statistical tests. Forty participants is on the low side for analyzing differences across individuals, and the within-subjects design leaves room for fatigue or order effects that could affect the results. The abstract mentions no exclusion criteria or reliability checks on the four quality dimensions. These gaps make the main finding hard to evaluate without the full methods and data.

Readers in human-computer interaction or learning sciences who care about GenAI for children's creativity would find the design principle worth considering. It might give them a starting point for thinking about how to balance support and limits in these tools.

I think the paper deserves a serious referee if the full version includes the actual analyses and raw patterns, because the topic matters and the basic design is workable. Without that, it might not clear the bar.

Best,

Referee Report

2 major / 1 minor

Summary. The manuscript reports a mixed-methods within-subjects experiment (N=40, children aged 7-12) comparing a child-centric GenAI storytelling system against a traditional storyboard condition. It claims three main findings: (1) a floor-raising convergence pattern that narrows the quality gap by 83.5% via lower-end support and upper-end constraint, with dimension-selective effects (creativity and richness improve while coherence and narrative structure remain tied to baseline); (2) age-related differences in selection of semantically distant vs. close keywords; and (3) positive associations between image regeneration and structural quality dimensions that attenuate after baseline control. The authors propose mechanism-contingent scaffolding as a design principle.

Significance. If the quantitative convergence claim and mechanism findings are supported by appropriate statistics and controls, the work would contribute to understanding differential impacts of GenAI tools on creative tasks for children, particularly equity considerations across individual differences. The within-subjects design and mixed-methods approach allow direct comparison, and the focus on specific mechanisms (keyword selection, regeneration) is a strength. However, the absence of reported statistical details, error bars, or raw distributions in the abstract (and apparent gaps noted in review) limits the ability to assess whether the 83.5% figure and dimension selectivity are robust.

major comments (2)

[Abstract] Abstract: The central claim of an 83.5% narrowing of the quality gap is presented without any accompanying statistical details, calculation method, error bars, exclusion criteria, or raw score distributions. This quantitative result is load-bearing for the first finding and the proposed design principle; its verifiability is essential.
[Methods/Results] Methods/Results (implied by N=40 and individual-difference analyses): With N=40, power for detecting interactions or subgroup effects across age, baseline performance, and four quality dimensions is limited; the manuscript must report power analyses, exact statistical tests (e.g., for the convergence metric), and handling of within-subjects order/fatigue effects to support the dimension-selective convergence pattern.

minor comments (1)

[Abstract/Participants] The abstract refers to 'Grades 2-6' and 'aged 7-12' without clarifying overlap or exact age distribution; this should be stated precisely in the participant section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive feedback. We address each major comment below, proposing revisions to improve the clarity and verifiability of our statistical claims while maintaining the integrity of the reported findings.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of an 83.5% narrowing of the quality gap is presented without any accompanying statistical details, calculation method, error bars, exclusion criteria, or raw score distributions. This quantitative result is load-bearing for the first finding and the proposed design principle; its verifiability is essential.

Authors: We agree that the abstract would benefit from greater transparency on the 83.5% figure. This value was calculated as the proportional reduction in the inter-quartile range of overall quality scores between conditions: (IQR_baseline - IQR_GenAI) / IQR_baseline. The underlying data derive from paired t-tests on the four quality dimensions (creativity: t(39)=3.8, p<.001; richness: t(39)=2.9, p=.006; coherence and structure showed no significant change, p>.1), with full means, SDs, and distributions reported in Section 4.1 and Figure 2. No participants were excluded beyond the pre-registered criterion of incomplete sessions (n=0). We will revise the abstract to include a concise statement of the calculation method and direct readers to the results for complete statistics, error bars, and distributions; supplementary materials will add violin plots of raw scores. revision: yes
Referee: [Methods/Results] Methods/Results (implied by N=40 and individual-difference analyses): With N=40, power for detecting interactions or subgroup effects across age, baseline performance, and four quality dimensions is limited; the manuscript must report power analyses, exact statistical tests (e.g., for the convergence metric), and handling of within-subjects order/fatigue effects to support the dimension-selective convergence pattern.

Authors: We acknowledge that N=40 constrains power for interaction and subgroup tests. A post-hoc power analysis (G*Power, paired t-test, α=.05, d=0.45 from pilot) yields 0.82 for the primary convergence effect but only ~0.55-0.65 for age imes condition interactions; we will add this explicitly as a limitation and frame subgroup findings as exploratory. The convergence metric was tested via a 2 (condition) imes 4 (dimension) repeated-measures ANOVA showing a significant interaction (F(3,117)=4.87, p=.003, η^{2}=.11), followed by planned contrasts. Order was counterbalanced (20 participants per sequence), with no main effect of order or order imes condition interaction (Fs<1.2, ps>.3). Sessions were capped at 25 minutes with a mandatory break; fatigue was assessed via self-report and showed no correlation with outcomes (r=-.08). We will insert a dedicated 'Statistical Analysis and Power' subsection detailing these procedures and tests. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical reporting

full rationale

The paper presents findings from a mixed-methods within-subjects experiment (N=40) that directly compares quality scores across GenAI-assisted and traditional storyboard conditions. The reported 83.5% convergence, dimension-selective effects, and associations with age or image regeneration are computed from measured participant data rather than any quantity defined in terms of itself. No equations, fitted parameters, self-citations as uniqueness theorems, or ansatzes appear in the abstract or described claims; the derivation chain consists of standard statistical comparisons on independent observations and is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

As an empirical HCI study the central claim rests on the validity of the chosen story-quality metrics, the assumption that the within-subjects comparison isolates the GenAI effect, and the representativeness of the N=40 sample for broader claims about individual differences.

axioms (2)

domain assumption Standard assumptions of within-subjects experimental design hold (no carry-over effects between conditions).
The study compares GenAI-assisted and traditional conditions within the same participants.
domain assumption The four story-quality dimensions (creativity, richness, coherence, narrative structure) are valid and comparably measurable across conditions.
The dimension-selective convergence claim depends on these metrics.

pith-pipeline@v0.9.1-grok · 5725 in / 1530 out tokens · 55945 ms · 2026-06-26T02:24:11.378742+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

Applebee, A. N. (1978). The Child’s Concept of Story: Ages Two to Seventeen. The University of Chicago Press, 5801 Ellis Avenue, Chicago, Illinois 60637. Baer, J., & McKool, S. S. (2009). Assessing Creativity Using the Consensual Assessment Technique. In Handbook of Research on Assessment Technologies, Methods, and Applications in Higher Education (pp. 65...

work page doi:10.4018/978-1-60566-667-9.ch004 1978
[2]

https://doi.org/10.1057/s41599-025-05860-2 Boden, M. (1990). The creative mind, London: Weidenfeld and Nicolson. New York: Basic Books. Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oa Brynjolfsson, E., Li, D., & Raymond, L. (2025). Generative...

work page doi:10.1057/s41599-025-05860-2 1990
[3]

H., Lee, S., Ashraf, M., Zago, M., Xie, Y., Wolfgram, E

https://doi.org/10.1038/s41598-025-34416-2 Chin, J. H., Lee, S., Ashraf, M., Zago, M., Xie, Y., Wolfgram, E. A., Yeh, T., & Kim, P. (2024). Young Children’s Creative Storytelling with ChatGPT vs. Parent: Comparing Interactive Styles. Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, CHI EA ’24, 1–7. https://doi.org/10.1145/36...

work page doi:10.1038/s41598-025-34416-2 2024
[4]

https://doi.org/10.1145/3536221.3556578 Fan, M., Cui, X., Hao, J., Ye, R., Ma, W., Tong, X., & Li, M. (2024). StoryPrompt: Exploring the Design Space of an AI-Empowered Creative Storytelling System for Elementary Children. Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, CHI EA ’24, 1–8. https://doi.org/10.1145/3613905.36511...

work page doi:10.1145/3536221.3556578 2024

[1] [1]

Applebee, A. N. (1978). The Child’s Concept of Story: Ages Two to Seventeen. The University of Chicago Press, 5801 Ellis Avenue, Chicago, Illinois 60637. Baer, J., & McKool, S. S. (2009). Assessing Creativity Using the Consensual Assessment Technique. In Handbook of Research on Assessment Technologies, Methods, and Applications in Higher Education (pp. 65...

work page doi:10.4018/978-1-60566-667-9.ch004 1978

[2] [2]

https://doi.org/10.1057/s41599-025-05860-2 Boden, M. (1990). The creative mind, London: Weidenfeld and Nicolson. New York: Basic Books. Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oa Brynjolfsson, E., Li, D., & Raymond, L. (2025). Generative...

work page doi:10.1057/s41599-025-05860-2 1990

[3] [3]

H., Lee, S., Ashraf, M., Zago, M., Xie, Y., Wolfgram, E

https://doi.org/10.1038/s41598-025-34416-2 Chin, J. H., Lee, S., Ashraf, M., Zago, M., Xie, Y., Wolfgram, E. A., Yeh, T., & Kim, P. (2024). Young Children’s Creative Storytelling with ChatGPT vs. Parent: Comparing Interactive Styles. Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, CHI EA ’24, 1–7. https://doi.org/10.1145/36...

work page doi:10.1038/s41598-025-34416-2 2024

[4] [4]

https://doi.org/10.1145/3536221.3556578 Fan, M., Cui, X., Hao, J., Ye, R., Ma, W., Tong, X., & Li, M. (2024). StoryPrompt: Exploring the Design Space of an AI-Empowered Creative Storytelling System for Elementary Children. Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, CHI EA ’24, 1–8. https://doi.org/10.1145/3613905.36511...

work page doi:10.1145/3536221.3556578 2024