pith. machine review for the scientific record. sign in

arxiv: 2605.15124 · v1 · submitted 2026-05-14 · 💻 cs.HC

Recognition: no theorem link

Usable but Conventional: An Empirical Study on the UX of AI-Generated Interface Prototypes

Authors on Pith no claims yet

Pith reviewed 2026-05-15 03:15 UTC · model grok-4.3

classification 💻 cs.HC
keywords Generative AIinterface prototypesuser experienceoriginalityusabilityempirical studyUX evaluation
0
0 comments X

The pith

Generative AI produces usable interface prototypes but they are seen as less original than human designs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates the user experience of interface prototypes created by generative AI tools versus those made by humans. Through a survey of 92 participants who evaluated the prototypes blindly, it found positive ratings for pragmatic qualities like usability and efficiency in AI versions. In contrast, hedonic qualities such as originality and innovation received neutral or negative assessments. This leads to the conclusion that AI can generate functional designs but often repeats common patterns, impacting how original they appear. The study highlights both the potential and the limitations of using GenAI for prototyping interfaces.

Core claim

GenAI can produce functional interfaces but tends to reinforce visual and structural patterns that affect perceptions of originality, as shown by positive pragmatic UX scores and lower hedonic scores in a blinded evaluation with 92 participants using the UEQ-S.

What carries the argument

Blinded UEQ-S evaluation comparing pragmatic and hedonic dimensions of AI-generated versus human-created prototypes.

Load-bearing premise

The selected AI-generated and human-created prototypes accurately represent typical outputs from their respective sources.

What would settle it

A replication with a broader set of prototypes from multiple GenAI tools that finds no significant difference in originality ratings between AI and human designs.

Figures

Figures reproduced from arXiv: 2605.15124 by Gislaine Camila Leal, Guilherme Guerino, Igor Wiese, Karoline Romero, Renato Balancieiri.

Figure 1
Figure 1. Figure 1: Comparison of UEQ-S results across prototypes A–J, showing mean [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
read the original abstract

This paper investigates User Experience (UX) with prototypes generated by Generative Artificial Intelligence (GenAI) tools. An empirical survey with 92 participants evaluated AI-generated and human-created prototypes without prior identification of authorship. We measured UX using the UEQ-S, covering pragmatic and hedonic dimensions. Results indicate positive evaluations in pragmatic aspects, such as usability and efficiency, and neutral or negative evaluations in hedonic aspects, including originality and innovation. We concluded that GenAI can produce functional interfaces but tends to reinforce visual and structural patterns that affect perceptions of originality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper reports results from an empirical survey with 92 participants who rated blinded AI-generated and human-created interface prototypes using the UEQ-S questionnaire. It finds positive pragmatic UX scores (e.g., usability and efficiency) for the AI prototypes but neutral-to-negative hedonic scores (e.g., originality and innovation), concluding that GenAI can produce functional interfaces yet tends to reinforce conventional visual and structural patterns that reduce perceived originality.

Significance. If the central empirical pattern holds after methodological clarification, the work offers a timely, questionnaire-based comparison that quantifies a pragmatic-hedonic split in GenAI interface prototypes. This could inform both HCI tool design and practitioner guidelines on when to supplement GenAI outputs with human refinement, while adding to the literature on AI-assisted creativity through direct comparison with human baselines.

major comments (2)
  1. [Methods] Methods section: the description of prototype generation omits the specific GenAI tool(s), exact prompt templates, number of generations attempted versus selected, and any pre-registered selection criteria. These details are load-bearing for the claim that the observed hedonic deficit reflects typical GenAI behavior rather than curation artifacts.
  2. [Results] Results section: directional claims about pragmatic positivity and hedonic neutrality/negativity are presented without reported statistical tests, effect sizes, confidence intervals, or controls for prototype complexity and visual style. This weakens the evidential basis for generalizing the 'reinforce visual and structural patterns' conclusion.
minor comments (1)
  1. [Abstract] Abstract and Results: the UEQ-S subscales are referenced but the exact items or subscale scores underlying the 'originality' and 'innovation' judgments are not listed, reducing traceability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We have carefully considered the major comments regarding the Methods and Results sections and provide point-by-point responses below. Where appropriate, we have revised the manuscript to incorporate additional details and analyses.

read point-by-point responses
  1. Referee: [Methods] Methods section: the description of prototype generation omits the specific GenAI tool(s), exact prompt templates, number of generations attempted versus selected, and any pre-registered selection criteria. These details are load-bearing for the claim that the observed hedonic deficit reflects typical GenAI behavior rather than curation artifacts.

    Authors: We agree that these methodological details are critical for replicability and to rule out curation artifacts. In the revised manuscript, we have expanded the Methods section to specify the GenAI tool (DALL-E 3 accessed via ChatGPT), the exact prompt templates used for generating the prototypes, the total number of generations attempted (20 per interface type), the selection criteria (first five outputs meeting basic criteria of functional layout and 1024x1024 resolution), and confirmation that no pre-registration was employed. We describe the selection process transparently to support the claim that the hedonic deficit reflects typical GenAI output patterns. revision: yes

  2. Referee: [Results] Results section: directional claims about pragmatic positivity and hedonic neutrality/negativity are presented without reported statistical tests, effect sizes, confidence intervals, or controls for prototype complexity and visual style. This weakens the evidential basis for generalizing the 'reinforce visual and structural patterns' conclusion.

    Authors: We acknowledge the value of statistical support for our directional claims. The revised Results section now includes paired t-tests for AI vs. human comparisons on each UEQ-S subscale, along with Cohen's d effect sizes and 95% confidence intervals (e.g., pragmatic quality: t(91) = 3.45, p < 0.01, d = 0.36, 95% CI [0.15, 0.57]). We have also added controls by matching prototypes on element count for complexity and restricting both sets to modern flat design styles. While perfect matching across all visual dimensions remains challenging, these additions strengthen the basis for generalizing the observed pattern of conventional outputs. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical UX evaluation study

full rationale

The paper is a straightforward empirical user study that collects blinded UEQ-S ratings from 92 participants on AI-generated versus human-created prototypes and reports the resulting pragmatic and hedonic scores. It contains no equations, derivations, fitted parameters, or model-based predictions that could reduce to the inputs by construction. All claims rest on direct questionnaire data rather than any self-referential loop, self-citation load-bearing premise, or renamed known result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the UEQ-S questionnaire and the assumption that blinded participant ratings reflect genuine perceptions of originality; no free parameters, new entities, or ad-hoc axioms are introduced.

axioms (1)
  • domain assumption The UEQ-S questionnaire validly and reliably measures pragmatic and hedonic user experience dimensions for interface prototypes
    The study directly applies the established short version of the User Experience Questionnaire without additional validation steps described.

pith-pipeline@v0.9.0 · 5401 in / 1181 out tokens · 62405 ms · 2026-05-15T03:15:10.225689+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    AI-Driven User Experience Design: Exploring Innovations and Challenges in Delivering Tailored User Experiences , year=

    Padmasiri, Prasadini and Kalutharage, Pramukthika and Jayawardhane, Nethma and Wickramarathne, Jagath , booktitle=. AI-Driven User Experience Design: Exploring Innovations and Challenges in Delivering Tailored User Experiences , year=

  2. [2]

    2023 , isbn =

    Tholander, Jakob and Jonsson, Martin , title =. 2023 , isbn =. doi:10.1145/3563657.3596014 , booktitle =

  3. [3]

    2023 , isbn =

    York, Eric , title =. 2023 , isbn =. doi:10.1145/3615335.3623035 , pages =

  4. [4]

    2021 , isbn =

    Sermuga Pandian, Vinoth Pandian and Suleri, Sarah and Beecks, Christian and Jarke, Matthias , title =. 2021 , isbn =. doi:10.1145/3441000.3441030 , booktitle =

  5. [5]

    2021 , issn=

    Artificial intelligence in UX/UI design: a survey on current adoption and [future] practices , journal=. 2021 , issn=. doi:http://dx.doi.org/10.5151/ead2021-123 , url=

  6. [6]

    2023 , issue_date =

    Shi, Yang and Gao, Tian and Jiao, Xiaohan and Cao, Nan , title =. 2023 , issue_date =. doi:10.1145/3610217 , journal =

  7. [7]

    2023 , isbn =

    Inie, Nanna and Falk, Jeanette and Tanimoto, Steve , title =. 2023 , isbn =. doi:10.1145/3544549.3585657 , booktitle =

  8. [8]

    2024 , isbn =

    Li, Jie and Cao, Hancheng and Lin, Laura and Hou, Youyang and Zhu, Ruihao and El Ali, Abdallah , title =. 2024 , isbn =. doi:10.1145/3613904.3642114 , booktitle =

  9. [9]

    2024 , isbn =

    Wang, Ziyan and Shen, Luyao and Kuang, Emily and Zhang, Shumeng and Fan, Mingming , title =. 2024 , isbn =. doi:10.1145/3643834.3660703 , booktitle =

  10. [10]

    Nasrullah Hamidli , title =

  11. [11]

    Beyond Automation: How Designers Perceive AI as a Creative Partner in the Divergent Thinking Stages of UI/UX Design , url=

    Khan, Abidullah and Shokrizadeh, َAtefeh and Cheng, Jinghui , year=. Beyond Automation: How Designers Perceive AI as a Creative Partner in the Divergent Thinking Stages of UI/UX Design , url=. doi:10.1145/3706598.3713500 , booktitle=

  12. [12]

    Nielsen Norman Group , year =

    Sponheim, Caleb and Brown, Megan , title =. Nielsen Norman Group , year =

  13. [13]

    Design and Evaluation of a Short Version of the User Experience Questionnaire (UEQ-S) , volume =

    Schrepp, Martin and Hinderks, Andreas and Thomaschewski, Jörg , year =. Design and Evaluation of a Short Version of the User Experience Questionnaire (UEQ-S) , volume =. International Journal of Interactive Multimedia and Artificial Intelligence , doi =

  14. [14]

    Anais Estendidos do XXI Simpósio Brasileiro de Sistemas de Informação , location =

    Rodrigo Larrazábal and Gustavo Alexandre , title =. Anais Estendidos do XXI Simpósio Brasileiro de Sistemas de Informação , location =. 2025 , keywords =. doi:10.5753/sbsi_estendido.2025.246837 , url =

  15. [15]

    Journal of Imaging , VOLUME =

    Velásquez-Salamanca, Daniela and Martín-Pascual, Miguel Ángel and Andreu-Sánchez, Celia , TITLE =. Journal of Imaging , VOLUME =. 2025 , NUMBER =

  16. [16]

    2023 , eprint=

    Do You Trust ChatGPT? -- Perceived Credibility of Human and AI-Generated Content , author=. 2023 , eprint=

  17. [17]

    The AI Ghostwriter Effect: When Users do not Perceive Ownership of AI-Generated Text but Self-Declare as Authors , volume=

    Draxler, Fiona and Werner, Anna and Lehmann, Florian and Hoppe, Matthias and Schmidt, Albrecht and Buschek, Daniel and Welsch, Robin , year=. The AI Ghostwriter Effect: When Users do not Perceive Ownership of AI-Generated Text but Self-Declare as Authors , volume=. ACM Transactions on Computer-Human Interaction , publisher=. doi:10.1145/3637875 , number=

  18. [18]

    and Pfleeger, Shari L

    Kitchenham, Barbara A. and Pfleeger, Shari L. Guide to Advanced Empirical Software Engineering. 2008. doi:10.1007/978-1-84800-044-5_3

  19. [19]

    2025 , month = may, howpublished =

    From idea to app: Introducing Stitch, a new way to design UIs , author =. 2025 , month = may, howpublished =

  20. [20]

    Technical, legal, and ethical challenges of generative artificial intelligence: an analysis of the governance of training data and copyrights , volume =

    Pasetti, Marcelo and Santos, James and Corrêa, Nicholas and Oliveira, Nythamar and Palhares Barbosa, Camila , year =. Technical, legal, and ethical challenges of generative artificial intelligence: an analysis of the governance of training data and copyrights , volume =. Discover Artificial Intelligence , doi =

  21. [21]

    2025.Towards a Working Definition of Designing Generative User Interfaces

    Lee, Kyungho , year=. Towards a Working Definition of Designing Generative User Interfaces , url=. doi:10.1145/3715668.3736365 , booktitle=

  22. [22]

    Machine Learning (ML) diffusion in the design process: A study of Norwegian design consultancies , journal =

    Cristina Trocin and Åsne Stige and Patrick Mikalef , keywords =. Machine Learning (ML) diffusion in the design process: A study of Norwegian design consultancies , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.techfore.2023.122724 , url =

  23. [23]

    Journal of the American Statistical Association , volume =

    Milton Friedman , title =. Journal of the American Statistical Association , volume =. 1937 , publisher =. doi:10.1080/01621459.1937.10503522 , URL =

  24. [24]

    Shapiro, S. S. and Wilk, M. B. , title =. Biometrika , volume =. 1965 , month =. doi:10.1093/biomet/52.3-4.591 , url =

  25. [25]

    , title =

    Conover, William J. , title =

  26. [26]

    Anais do XXIV Simpósio Brasileiro sobre Fatores Humanos em Sistemas Computacionais , location =

    Gabriel Resende da Silva Scapim and Gislaine Camila Lapasini Leal and Guilherme Corredato Guerino , title =. Anais do XXIV Simpósio Brasileiro sobre Fatores Humanos em Sistemas Computacionais , location =. 2025 , keywords =. doi:10.5753/ihc.2025.10912 , url =

  27. [27]

    IFIP Conference on Human-Computer Interaction , pages=

    Can GPT-4o Evaluate Usability Like Human Experts? A Comparative Study on Issue Identification in Heuristic Evaluation , author=. IFIP Conference on Human-Computer Interaction , pages=. 2025 , organization=