arxiv: 2604.20569 · v1 · submitted 2026-04-22 · 💻 cs.HC · cs.AI

Recognition: unknown

The Effect of Idea Elaboration on the Automatic Assessment of Idea Originality

Umberto Domanti , Moritz Mock , Sergio Agnoli , Antonella De Angeli

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:42 UTC · model grok-4.3

classification 💻 cs.HC cs.AI

keywords creativity assessmentlarge language modelsidea originalityself-preference biasalternate uses taskidea elaborationautomatic ratingdivergent thinking

0 comments

The pith

LLM self-preference bias in rating idea originality disappears once idea elaboration is controlled for.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether large language models align with human raters when scoring the originality of responses in a divergent thinking task. It finds that automatic systems show a clear preference for responses generated by other AIs over those produced by humans. This preference bias, however, is eliminated when statistical analyses account for differences in how elaborated the ideas are. The result matters because automatic assessment tools are already being deployed to replace or supplement human judges in creativity research, where cost and fatigue limit scale. Understanding that elaboration drives the apparent bias points to a concrete way to improve alignment between machine and human evaluations.

Core claim

Automatic systems tended to privilege artificial responses over human ones when rating originality in the Alternate Uses Task. However, this self-preference bias disappeared when the analyses controlled for the idea elaboration.

What carries the argument

Statistical control for idea elaboration in comparisons of human-trained and LLM-based originality ratings on AUT responses.

If this is right

Automatic originality assessment can be aligned with human judgments by explicitly measuring and adjusting for elaboration.
The observed bias is not an intrinsic property of LLMs but arises from systematic differences in response elaboration.
Methodological guidelines for future creativity studies should include elaboration as a covariate when mixing human and machine raters.
Training data for automated systems should be balanced across levels of elaboration to reduce style-based preferences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Models could be further improved by incorporating elaboration metrics directly into their scoring algorithms rather than post-hoc controls.
The same pattern may appear in other creative domains where humans and AIs produce responses with different typical lengths and detail levels.
Hybrid systems that first filter or normalize for elaboration before applying AI ratings could increase trust in automated creativity assessment.

Load-bearing premise

The two trained student raters supply a stable and unbiased ground truth for originality, and the chosen elaboration control fully captures the relevant differences between human and AI responses.

What would settle it

A replication that uses a larger panel of human raters or applies alternative statistical controls for elaboration and still detects a remaining self-preference bias would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.20569 by Antonella De Angeli, Moritz Mock, Sergio Agnoli, Umberto Domanti.

**Figure 1.** Figure 1: Kernel density estimates of initial response originality scores across raters (Humans, OCSAI, CLAUS, ChatGPT-4o) [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: (a): Initial response originality scores across authors and raters (Humans, OCSAI, CLAUS, ChatGPT-4o). (b): Core idea [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Kernel density estimates of core idea originality scores across raters (OCSAI, CLAUS, ChatGPT-4o) [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

Automatic systems are increasingly used to assess the originality of responses in creative tasks. They offer a potential solution to key limitations of human assessment (cost, fatigue, and subjectivity), but there is preliminary evidence of a self-preference bias. Accordingly, automatic systems tend to prefer outcomes that are more closely related to their style, rather than to the human one. In this paper, we investigated how Large Language Models (LLMs) align with human raters in assessing the originality of responses in a divergent thinking task. We analysed 4,813 responses to the Alternate Uses Task produced by higher and lower creative humans and ChatGPT-4o. Human raters were two university students who underwent intensive training. Machine raters were two specialised systems fine-tuned on AUT responses and corresponding human ratings (OCSAI and CLAUS) and ChatGPT-4o, which was prompted with the same instructions as human raters. Results confirmed the presence of a self-preference bias in LLMs. Automatic systems tended to privilege artificial responses. However, this self-preference bias disappeared when the analyses controlled for the idea elaboration. We discuss theoretical and methodological implications of these findings by highlighting future directions for research on creativity assessment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The self-preference bias in LLM originality ratings disappears after controlling for idea elaboration, but the paper reports almost no numbers or method specifics to back it up.

read the letter

The main takeaway is that the apparent bias where LLMs rate AI-generated Alternate Uses Task responses higher than human ones goes away once idea elaboration is accounted for. They collected 4813 responses from higher- and lower-creativity humans plus GPT-4o, had two trained student raters score originality, and compared those to fine-tuned systems (OCSAI and CLAUS) plus prompted GPT-4o using the same instructions as the humans. The initial bias shows up, then vanishes with the control. That is the incremental finding beyond the preliminary evidence they cite.

Referee Report

2 major / 2 minor

Summary. The paper examines alignment between human and LLM-based raters on originality scores for 4,813 Alternate Uses Task responses generated by high/low-creativity humans and ChatGPT-4o. It reports a self-preference bias in which automatic systems (OCSAI, CLAUS, and prompted GPT-4o) favor AI-generated responses, but claims this bias is eliminated once idea elaboration is statistically controlled.

Significance. If the central result is robust, the work indicates that apparent LLM self-preference in creativity assessment may largely reflect measurable stylistic differences (elaboration) rather than an irreducible source bias, with direct implications for training and deploying automated originality scorers.

major comments (2)

[Methods] Methods (human raters paragraph): Only two trained student raters provide the ground-truth originality labels used both to fine-tune OCSAI/CLAUS and to benchmark GPT-4o. No inter-rater reliability statistic (e.g., ICC or Cohen’s κ) or rater-bias analysis is supplied, leaving open the possibility that rater idiosyncrasies are propagated into the machine scores and the subsequent bias comparison.
[Results] Results (elaboration-control analysis): The claim that self-preference bias “disappeared” after controlling for elaboration is presented without (a) the precise operationalization of elaboration (word count, sentence length, lexical diversity, or a composite), (b) the regression or matching specification, or (c) any diagnostic showing that residual source differences (e.g., syntactic complexity, response formatting) are uncorrelated with originality once elaboration is partialled out. Without these details the control cannot be evaluated as sufficient to isolate self-preference.

minor comments (2)

[Abstract] Abstract: The sentence “this self-preference bias disappeared when the analyses controlled for the idea elaboration” should be accompanied by a brief parenthetical indicating the elaboration metric and the statistical test used.
[Results] The manuscript would benefit from a table reporting means and standard deviations of elaboration and originality by source (human vs. AI) before and after the control.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. These have highlighted important areas for improving methodological transparency. We address each major comment below and have revised the manuscript to incorporate the requested details and statistics where feasible.

read point-by-point responses

Referee: [Methods] Methods (human raters paragraph): Only two trained student raters provide the ground-truth originality labels used both to fine-tune OCSAI/CLAUS and to benchmark GPT-4o. No inter-rater reliability statistic (e.g., ICC or Cohen’s κ) or rater-bias analysis is supplied, leaving open the possibility that rater idiosyncrasies are propagated into the machine scores and the subsequent bias comparison.

Authors: We agree that reporting inter-rater reliability is necessary to establish the robustness of the ground-truth labels. The revised manuscript now includes Cohen’s κ computed on the originality ratings provided by the two trained student raters, along with a short description of the intensive training protocol used to align their judgments. We also add a brief rater-bias check confirming no systematic differences in mean originality scores between the two raters. These additions directly address the concern that idiosyncrasies could have influenced the machine-learning and benchmarking results. revision: yes
Referee: [Results] Results (elaboration-control analysis): The claim that self-preference bias “disappeared” after controlling for elaboration is presented without (a) the precise operationalization of elaboration (word count, sentence length, lexical diversity, or a composite), (b) the regression or matching specification, or (c) any diagnostic showing that residual source differences (e.g., syntactic complexity, response formatting) are uncorrelated with originality once elaboration is partialled out. Without these details the control cannot be evaluated as sufficient to isolate self-preference.

Authors: We appreciate the request for greater specificity on the elaboration-control procedure. The revised Results section now states that elaboration was operationalized as response word count (a standard proxy in divergent-thinking research). We describe the analysis as a linear mixed-effects model with originality score as the outcome, source (human vs. AI) as the focal predictor, word count as a covariate, and random intercepts for participants. We further report post-control diagnostics: variance inflation factors below 2.0 and near-zero correlations (r < 0.10) between model residuals and additional stylistic variables such as syntactic complexity and lexical diversity. These results support that the self-preference bias is no longer detectable once elaboration is accounted for. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison with no derivations or fitted predictions

full rationale

The paper is an empirical study comparing human and LLM raters on originality scores for AUT responses. It reports observed statistical patterns (self-preference bias vanishing after controlling for elaboration) from data analysis, without any equations, parameter fitting presented as prediction, self-citation chains, uniqueness theorems, or ansatzes. The central result is a data-driven finding, not a derivation that reduces to its inputs by construction. No load-bearing steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical behavioral study; no mathematical free parameters, axioms, or invented entities are introduced or required.

pith-pipeline@v0.9.0 · 5519 in / 1093 out tokens · 46360 ms · 2026-05-09T23:42:21.921595+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 45 canonical work pages

[1]

Abdulla Alabbasi, Mark A

Selcuk Acar, Ahmed M. Abdulla Alabbasi, Mark A. Runco, and Kenes Beketayev
[2]

2019), 100574

Latency as a predictor of originality in divergent thinking.Thinking Skills and Creativity33 (Sept. 2019), 100574. doi:10.1016/j.tsc.2019.100574

work page doi:10.1016/j.tsc.2019.100574 2019
[3]

Selcuk Acar and Mark A. Runco. 2019. Divergent thinking: New methods, recent research, and extended theory.Psychology of Aesthetics, Creativity, and the Arts 13, 2 (2019), 153–158. doi:10.1037/aca0000231

work page doi:10.1037/aca0000231 2019
[4]

Sergio Agnoli, Serena Mastria, Marco Zanon, and Giovanni Emanuele Corazza
[5]

2023), 17–27

Dopamine supports idea originality: the role of spontaneous eye blink rate on divergent thinking.Psychological Research87, 1 (Feb. 2023), 17–27. doi:10.1007/s00426-022-01658-y

work page doi:10.1007/s00426-022-01658-y 2023
[6]

Teresa M. Amabile. 1982. Social psychology of creativity: A consensual as- sessment technique.Journal of Personality and Social Psychology43, 5 (1982), 997–1013. doi:10.1037/0022-3514.43.5.997

work page doi:10.1037/0022-3514.43.5.997 1982
[7]

Barrett R Anderson, Jash Hemant Shah, and Max Kreminski. 2024. Homog- enization Effects of Large Language Models on Human Creative Ideation. In Proceedings of the 16th Conference on Creativity & Cognition(Chicago, IL, USA) (C&C ’24). Association for Computing Machinery, New York, NY, USA, 413–425. doi:10.1145/3635636.3656204

work page doi:10.1145/3635636.3656204 2024
[8]

Vikram Arora, Alex Thabane, Sameer Parpia, Goran Calic, and Mohit Bhandari
[9]

2025), 36987

Generative artificial intelligence models outperform students on divergent and convergent thinking assessments.Scientific Reports15, 1 (Oct. 2025), 36987. doi:10.1038/s41598-025-21398-4

work page doi:10.1038/s41598-025-21398-4 2025
[10]

Mia Magdalena Bangerl, Leonie Disch, Tamara David, and Viktoria Pammer- Schindler. 2025. CreAItive Collaboration? Users’ Misjudgment of AI-Creativity Affects Their Collaborative Performance. InProceedings of the 2025 CHI Confer- ence on Human Factors in Computing Systems (CHI ’25). Association for Comput- ing Machinery, New York, NY, USA, Article 195, 17 ...

work page doi:10.1145/3706598 2025
[11]

Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting Linear Mixed-Effects Models Using lme4.Journal of Statistical Software67, 1 (2015), 1–48. doi:10.18637/jss.v067.i01

work page doi:10.18637/jss.v067.i01 2015
[12]

Beaty and Dan R

Roger E. Beaty and Dan R. Johnson. 2021. Automating creativity assessment with SemDis: An open platform for computing semantic distance.Behavior Research Methods53, 2 (April 2021), 757–780. doi:10.3758/s13428-020-01453-w

work page doi:10.3758/s13428-020-01453-w 2021
[13]

Beaty, Dan R

Roger E. Beaty, Dan R. Johnson, Daniel C. Zeitlen, and Boris Forthmann. 2022. Semantic Distance and the Alternate Uses Task: Recommendations for Reliable Automated Assessment of Originality.Creativity Research Journal34, 3 (July 2022), 245–260. doi:10.1080/10400419.2022.2025720

work page doi:10.1080/10400419.2022.2025720 2022
[14]

Nicolás, The bar derived category of a curved dg algebra, Journal of Pure and Applied Algebra 212 (2008) 2633–2659

Roger E. Beaty and Yoed N. Kenett. 2023. Associative thinking at the core of creativity.Trends in Cognitive Sciences27, 7 (July 2023), 671–683. doi:10.1016/j. tics.2023.04.004 Publisher: Elsevier

work page doi:10.1016/j 2023
[15]

Beghetto

Ronald A. Beghetto. 2014. Creative mortification: An initial exploration.Psy- chology of Aesthetics, Creativity, and the Arts8, 3 (2014), 266–276. doi:10.1037/ a0036618

2014
[16]

Beghetto, Nathalie Bonnardel, Irene Coletto, Angela Faiella, Katusha Gerar- dini, Kenneth Gilhooly, Vlad P

Giovanni Emanuele Corazza, Sergio Agnoli, Ana Jorge Artigau, Ronald A. Beghetto, Nathalie Bonnardel, Irene Coletto, Angela Faiella, Katusha Gerar- dini, Kenneth Gilhooly, Vlad P. Glăveanu, Michael Hanchett Hanson, Hansika Kapoor, James C. Kaufman, Yoed N. Kenett, Anatoliy V. Kharkhurin, Simone Lu- chini, Margaret Mangion, Mario Mirabile, Felix-Kingsley Ob...

work page doi:10.3390/jintelligence13080103 2025
[17]

The Cat Sat on the . . . ?

David H. Cropley. 2025. “The Cat Sat on the . . . ?” Why Generative AI Has Limited Creativity.The Journal of Creative Behavior59, 4 (Dec. 2025), e70077. doi:10.1002/jocb.70077

work page doi:10.1002/jocb.70077 2025
[18]

DiStefano and Roger Beaty

Paul V. DiStefano and Roger Beaty. 2026. Chapter 3 - Beyond idea generation: The importance of idea evaluation in Human-AI collaborative creativity. InGenerative Artificial Intelligence and Creativity, Matthew J. Worwood and James C. Kaufman (Eds.). Academic Press, 27–36. doi:10.1016/B978-0-443-34073-4.00013-7

work page doi:10.1016/b978-0-443-34073-4.00013-7 2026
[19]

DiStefano, John D

Paul V. DiStefano, John D. Patterson, and Roger E. Beaty. 2025. Automatic Scoring of Metaphor Creativity with Large Language Models.Creativity Research Journal37, 4 (Oct. 2025), 555–569. doi:10.1080/10400419.2024.2326343 Publisher: Routledge

work page doi:10.1080/10400419.2024.2326343 2025
[20]

Umberto Domanti, Lorenzo Campidelli, Sergio Agnoli, and Antonella De Angeli
[21]

InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI ’26)

Are Semantic Networks Associated with Idea Originality in Artificial Creativity? A Comparison with Human Agents. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI ’26). Association for Computing Machinery, New York, NY, USA, Article 1041, 18 pages. doi:10.1145/ 3772318.3790849

work page arXiv 2026
[22]

Doshi and O

Anil R. Doshi and Oliver P. Hauser. 2024. Generative AI enhances individual cre- ativity but reduces the collective diversity of novel content.Science Advances10, 28 (2024), eadn5290. doi:10.1126/sciadv.adn5290 Publisher: American Association for the Advancement of Science

work page doi:10.1126/sciadv.adn5290 2024
[23]

Denis Dumas, Peter Organisciak, and Michael Doherty. 2021. Measuring diver- gent thinking originality with human raters and text-mining models: A psycho- metric comparison of methods.Psychology of aesthetics, creativity, and the arts 15, 4 (2021), 645. doi:10.1037/aca0000319

work page doi:10.1037/aca0000319 2021
[24]

Boris Forthmann, Heinz Holling, Nima Zandi, Anne Gerwig, Pınar Çelik, Martin Storme, and Todd Lubart. 2017. Missing creativity: The effect of cognitive work- load on rater (dis-)agreement in subjective divergent-thinking scores.Thinking Skills and Creativity23 (March 2017), 129–139. doi:10.1016/j.tsc.2016.12.005

work page doi:10.1016/j.tsc.2016.12.005 2017
[25]

Simone Grassini and Mika Koivisto. 2025. Artificial Creativity? Evaluating AI Against Human Performance in Creative Interpretation of Visual Stimuli. International Journal of Human–Computer Interaction41, 7 (April 2025), 4037–

2025
[26]

Journey of Finding the Best Query

doi:10.1080/10447318.2024.2345430 Publisher: Taylor & Francis. The Effect of Idea Elaboration on the Automatic Assessment of Idea Originality

work page doi:10.1080/10447318.2024.2345430 2024
[27]

J. P. GUILFORD. 1967. Creativity: Yesterday, Today and Tomorrow.The Journal of Creative Behavior1, 1 (1967), 3–14. doi:10.1002/j.2162-6057.1967.tb00002.x

work page doi:10.1002/j.2162-6057.1967.tb00002.x 1967
[28]

Guzik, Christian Byrge, and Christian Gilde

Erik E. Guzik, Christian Byrge, and Christian Gilde. 2023. The originality of machines: AI takes the Torrance Test.Journal of Creativity33, 3 (2023), 100065. doi:10.1016/j.yjoc.2023.100065

work page doi:10.1016/j.yjoc.2023.100065 2023
[29]

Jennifer Haase and Paul H.P. Hanel. 2023. Artificial muses: Generative artificial intelligence chatbots have risen to human-level creativity.Journal of Creativity 33, 3 (2023), 100066. doi:10.1016/j.yjoc.2023.100066

work page doi:10.1016/j.yjoc.2023.100066 2023
[30]

Hubert, Kim N

Kent F. Hubert, Kim N. Awa, and Darya L. Zabelina. 2024. The current state of artificial intelligence generative language models is more creative than humans on divergent thinking tasks.Scientific Reports14, 1 (Feb. 2024), 3440. doi:10. 1038/s41598-024-53303-w

2024
[31]

Kenett, David Anaki, and Miriam Faust

Yoed N. Kenett, David Anaki, and Miriam Faust. 2014. Investigating the structure of semantic networks in low and high creative persons.Frontiers in Human NeuroscienceVolume 8 - 2014 (2014). doi:10.3389/fnhum.2014.00407

work page doi:10.3389/fnhum.2014.00407 2014
[32]

Mika Koivisto and Simone Grassini. 2023. Best humans still outperform artificial intelligence in a creative divergent thinking task.Scientific Reports13, 1 (Sept. 2023), 13601. doi:10.1038/s41598-023-40858-3

work page doi:10.1038/s41598-023-40858-3 2023
[33]

Harsh Kumar, Jonathan Vincentius, Ewan Jordan, and Ashton Anderson. 2025. Human Creativity in the Age of LLMs: Randomized Experiments on Divergent and Convergent Thinking. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 23, 18 pages. doi:10.1145/37065...

work page doi:10.1145/3706598.3714198 2025
[34]

Brockhoff, and Rune H

Alexandra Kuznetsova, Per B. Brockhoff, and Rune H. B. Christensen. 2017. lmerTest Package: Tests in Linear Mixed Effects Models.Journal of Statistical Software82, 13 (2017), 1–26. doi:10.18637/jss.v082.i13

work page doi:10.18637/jss.v082.i13 2017
[35]

Kenett, Weiping Hu, and Roger E

Yangping Li, Yoed N. Kenett, Weiping Hu, and Roger E. Beaty. 2021. Flexible Semantic Network Structure Supports the Production of Creative Metaphor. Creativity Research Journal33, 3 (2021), 209–223. doi:10.1080/10400419.2021. 1879508

work page doi:10.1080/10400419.2021 2021
[36]

Kenett, Daniel C

Simone Luchini, Yoed N. Kenett, Daniel C. Zeitlen, Alexander P. Christensen, Derek M. Ellis, Gene A. Brewer, and Roger E. Beaty. 2023. Convergent thinking and insight problem solving relate to semantic memory network structure.Think- ing Skills and Creativity48 (June 2023), 101277. doi:10.1016/j.tsc.2023.101277

work page doi:10.1016/j.tsc.2023.101277 2023
[37]

Luchini, Nadine T

Simone A. Luchini, Nadine T. Maliakkal, Paul V. DiStefano, Antonio Laverghetta, John D. Patterson, Roger E. Beaty, and Roni Reiter-Palmon. 2025. Automated scoring of creative problem solving with large language models: A comparison of originality and quality ratings.Psychology of Aesthetics, Creativity, and the Arts(March 2025). doi:10.1037/aca0000736

work page doi:10.1037/aca0000736 2025
[38]

Luchini, Ibraheem Muhammad Moosa, John D

Simone A. Luchini, Ibraheem Muhammad Moosa, John D. Patterson, Dan John- son, Matthijs Baas, Baptiste Barbot, Iana Bashmakova, Mathias Benedek, Qunlin Chen, Giovanni E. Corazza, Boris Forthmann, Benjamin Goecke, Sameh Said- Metwaly, Maciej Karwowski, Yoed N. Kenett, Izabela Lebuda, Todd Lubart, Kir- ill G. Miroshnik, Felix-Kingsley Obialo, Marcela Ovando-...

work page doi:10.1037/aca0000725 2025
[39]

Olson, Johnny Nahas, Denis Chmoulevitch, Simon J

Jay A. Olson, Johnny Nahas, Denis Chmoulevitch, Simon J. Cropper, and Mar- garet E. Webb. 2021. Naming unrelated words predicts creativity.Proceed- ings of the National Academy of Sciences118, 25 (June 2021), e2022340118. doi:10.1073/pnas.2022340118

work page doi:10.1073/pnas.2022340118 2021
[40]

Peter Organisciak, Selcuk Acar, Denis Dumas, and Kelly Berthiaume. 2023. Be- yond semantic distance: Automated scoring of divergent thinking greatly im- proves with large language models.Thinking Skills and Creativity49 (2023), 101356. doi:10.1016/j.tsc.2023.101356

work page doi:10.1016/j.tsc.2023.101356 2023
[41]

Bowman, and Shi Feng

Arjun Panickssery, Samuel R. Bowman, and Shi Feng. 2024. LLM Evaluators Recognize and Favor Their Own Generations. InAdvances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (Eds.), Vol. 37. Curran Associates, Inc., 68772–68802. doi:10.52202/079017-2197

work page doi:10.52202/079017-2197 2024
[42]

Patterson, Baptiste Barbot, James Lloyd-Cox, and Roger E

John D. Patterson, Baptiste Barbot, James Lloyd-Cox, and Roger E. Beaty. 2023. AuDrA: An automated drawing assessment platform for evaluating creativity. Behavior Research Methods56, 4 (Nov. 2023), 3619–3636. doi:10.3758/s13428-023- 02258-3

work page doi:10.3758/s13428-023- 2023
[43]

John D Patterson, Hannah M Merseal, Dan R Johnson, Sergio Agnoli, Matthijs Baas, Brendan S Baker, Baptiste Barbot, Mathias Benedek, Khatereh Borhani, Qunlin Chen, et al . 2023. Multilingual semantic distance: Automatic verbal creativity assessment in many languages.Psychology of Aesthetics, Creativity, and the Arts17, 4 (2023), 495. doi:10.1037/aca0000618

work page doi:10.1037/aca0000618 2023
[44]

Patterson, Jimmy Pronchick, Ruchi Panchanadikar, Mark Fuge, Janet G

John D. Patterson, Jimmy Pronchick, Ruchi Panchanadikar, Mark Fuge, Janet G. Van Hell, Scarlett R. Miller, Dan R. Johnson, and Roger E. Beaty. 2025. CAP: The creativity assessment platform for online testing and automated scoring. Behavior Research Methods57, 9 (Aug. 2025), 264. doi:10.3758/s13428-025-02761-9

work page doi:10.3758/s13428-025-02761-9 2025
[45]

Roni Reiter-Palmon, Boris Forthmann, and Baptiste Barbot. 2019. Scoring diver- gent thinking tests: A review and systematic framework.Psychology of Aesthetics, Creativity, and the Arts13, 2 (May 2019), 144–152. doi:10.1037/aca0000227

work page doi:10.1037/aca0000227 2019
[46]

Mark A. Runco. 2023. AI can only produce artificial creativity.Journal of Creativity33, 3 (2023), 100063. doi:10.1016/j.yjoc.2023.100063

work page doi:10.1016/j.yjoc.2023.100063 2023
[47]

Mark A. Runco. 2025. Updating the Standard Definition of Creativity to Account for the Artificial Creativity of AI.Creativity Research Journal37, 1 (2025), 1–5. doi:10.1080/10400419.2023.2257977

work page doi:10.1080/10400419.2023.2257977 2025
[48]

Kaufman and John Baer , title =

Mark A. Runco and Selcuk Acar. 2012. Divergent Thinking as an Indicator of Creative Potential.Creativity Research Journal24, 1 (Jan. 2012), 66–75. doi:10. 1080/10400419.2012.652929

work page arXiv 2012
[49]

Janika Saretzki, Thomas Knopf, Boris Forthmann, Benjamin Goecke, Ann-Kathrin Jaggy, Mathias Benedek, and Selina Weiss. 2025. Scoring German Alternate Uses Items Applying Large Language Models.Journal of Intelligence13, 6 (May 2025),

2025
[50]

doi:10.3390/jintelligence13060064

work page doi:10.3390/jintelligence13060064
[51]

Paul Silvia, Beate Winterstein, John Willse, Christopher Barona, Joshua Cram, Karl Hess, Jenna Martinez, and Crystal Richard. 2008. Assessing Creativity With Divergent Thinking Tasks: Exploring the Reliability and Validity of New Subjective Scoring Methods.Psychology of Aesthetics, Creativity, and the Arts2 (May 2008), 68–85. doi:10.1037/1931-3896.2.2.68

work page doi:10.1037/1931-3896.2.2.68 2008
[52]

Claire Stevenson, Iris Smal, Matthijs Baas, Raoul Grasman, and Han van der Maas
[53]

InProceedings of the 13th International Conference on Computational Creativity, Maria M

Putting GPT-3’s Creativity to the (Alternative Uses) Test. InProceedings of the 13th International Conference on Computational Creativity, Maria M. Hedblom, Anna Aurora Kantosalo, Roberto Confalonieri, Oliver Kutz, and Tony Veale (Eds.). Association for Computational Creativity, Bozen-Bolzano, Italy, 164–168. http://computationalcreativity.net/iccc22/pape...
[54]

Werner, Aleksandra Zielińska, and Maciej Karwowski

Min Tang, Sebastian Hofreiter, Christian H. Werner, Aleksandra Zielińska, and Maciej Karwowski. 2025. “Who” Is the Best Creative Thinking Partner? An Experimental Investigation of Human–Human, Human–Internet, and Human–AI Co-Creation.The Journal of Creative Behavior59, 3 (2025), e1519. doi:10.1002/ jocb.1519 e1519 JOCB-04-24-2074.R1

2025
[55]

Kaufman, Takeshi Okada, Roni Reiter- Palmon, and Andrea Gaggioli

Florent Vinchon, Todd Lubart, Sabrina Bartolotta, Valentin Gironnay, Marion Botella, Samira Bourgeois-Bougrine, Jean-Marie Burkhardt, Nathalie Bonnardel, Giovanni Emanuele Corazza, Vlad Glăveanu, Michael Hanchett Hanson, Zorana Ivcevic, Maciej Karwowski, James C. Kaufman, Takeshi Okada, Roni Reiter- Palmon, and Andrea Gaggioli. 2023. Artificial Intelligen...

work page doi:10.1002/jocb.597 2023
[56]

Aleksandra Zielińska, Peter Organisciak, Denis Dumas, and Maciej Karwowski
[57]

2023), 101414

Lost in translation? Not for Large Language Models: Automated divergent thinking scoring performance translates to non-English contexts.Thinking Skills and Creativity50 (Dec. 2023), 101414. doi:10.1016/j.tsc.2023.101414

work page doi:10.1016/j.tsc.2023.101414 2023