pith. machine review for the scientific record. sign in

arxiv: 2604.16393 · v3 · submitted 2026-03-28 · 💻 cs.SE · cs.HC

Recognition: 2 theorem links

· Lean Theorem

How Do Developers Interact with AI? An Exploratory Study on Modeling Developer Programming Behavior

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:10 UTC · model grok-4.3

classification 💻 cs.SE cs.HC
keywords AI-assisted programmingdeveloper behavior modelintention and emotionuser studyprogramming workflowsAI tool interactionemotional impact
0
0 comments X

The pith

Developers using AI during programming tasks focus more on creating and verifying code while maintaining steadier emotions than those coding without AI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports results from a mixed-methods study involving 76 developers who completed programming tasks in either Python or Java, divided into AI-assisted and non-AI groups. Participants retrospectively labeled their intentions, actions, tools, and emotions while reviewing screen recordings of their work, with additional data from surveys and interviews. This led to the S-IASE model, which frames any development state through four dimensions: the developer's intention, the concrete action taken, the supporting tool in use, and the accompanying emotion. Analysis showed AI users spent more time actively generating and evaluating code with fewer emotional swings, while some reported guilt about depending on AI. A sympathetic reader would care because these hidden dimensions suggest AI tools could be designed to better match or support the full experience of coding rather than just speeding up output.

Core claim

The central claim is that developer programming behavior, especially in the presence of AI, can be described using the S-IASE model with four dimensions—intention, action, supporting tool, and emotion—for any given development state. The study data revealed distinct aggregated patterns: AI-assisted developers engaged more in active code creation, evaluation, and verification, and displayed emotionally stable flows, unlike the fluctuating emotions seen in the non-AI group. Interviews added that reliance on AI sometimes produced impostor-like feelings of guilt or self-doubt.

What carries the argument

The S-IASE model, a four-dimensional framework that characterizes each development state by the developer's intention, the action performed, the tool used, and the emotion experienced.

If this is right

  • AI assistance shifts developer effort toward actively creating code and verifying AI-generated results rather than other activities.
  • Developers experience fewer emotional fluctuations when using AI tools compared to traditional non-AI workflows.
  • Some developers report guilt or self-doubt tied to relying on AI, even when performance improves.
  • Sequential patterns in the four dimensions distinguish AI-assisted programming from non-AI programming.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The model could support real-time AI features that detect shifting intentions and adjust suggestions accordingly.
  • Emotional stability observed with AI might lower burnout risk during extended sessions, though this link remains untested.
  • Future AI tools could be tuned to preserve some non-AI emotional rhythms when developers prefer them.

Load-bearing premise

Developers can accurately recall and label their true intentions and emotions after the tasks by watching recordings of their own screens.

What would settle it

A new experiment that collects real-time emotion data via heart-rate monitors or concurrent verbal reports during identical tasks and checks whether those measures match the retrospective labels used to build the S-IASE model.

Figures

Figures reproduced from arXiv: 2604.16393 by Bowen Xu, Kathryn Thomasset Stolee, Yinan Wu, Ze Shi Li.

Figure 1
Figure 1. Figure 1: Overview of Our Study Steps the AI-assisted group were explicitly informed that they could freely install and use their preferred AI assistants (e.g., ChatGPT or GitHub Copilot) without restriction, whereas participants in the non-AI group were instructed not to use any AI assistants during the tasks. Then, we introduced participants to the tasks and group assignment (i.e., AI or non-AI). Participants star… view at source ↗
Figure 2
Figure 2. Figure 2: Our Annotation Tool’s GUI: The participant [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Mean number of intention occurrences per participant across the four groups. The x-axis shows [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Mean number of action occurrences per participant across the four groups. The x-axis shows action [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Mean supporting tool and emotion occurrences per participant across the four groups. The x-axis [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
read the original abstract

Artificial Intelligence (AI) is reshaping how developers adopt software engineering practices, yet the multi-dimensional nature of developer-AI interaction remains under-explored. Prior studies have primarily examined dimensions observable from developer activities such as "Prompt Crafting" and "Code Editing," overlooking how hidden intentions and emotional dimensions intertwine with concrete actions during AI-assisted programming. To understand this phenomenon, we conducted a mixed-methods study with 76 developers split into AI-assisted and non-AI groups. Each performed programming tasks (Python with API management or Java with SQL). Developers retrospectively labeled their self-reported intentions, tool-supported actions, and emotions from screen recordings, supplemented by surveys and interviews. Our user study resulted in a novel model named S-IASE with four dimensions to describe programming behavior: intention, action, supporting tool, and emotion for a given development state. Our analysis reveals aggregated and sequential behavioral patterns. For example, using AI assistants often makes developers more focused on actively creating code, evaluating, and verifying generated results. AI-assisted participants showed emotionally stable development flow, as opposed to non-AI-assisted participants who experienced more fluctuating emotions. Interviews revealed further nuance: some developers reported impostor-like feelings, expressing guilt or self-doubt about relying on AI. Our work bridges an important gap in understanding the complexities of developer-AI interaction in programming context.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper reports results from a mixed-methods study with 76 developers split into AI-assisted and non-AI groups who completed Python (API) or Java (SQL) tasks. Participants viewed their own screen recordings after the fact to retrospectively label intentions, actions, supporting tools, and emotions for each development state; these labels were aggregated with survey and interview data. The central contribution is the derivation of the S-IASE model, a four-dimensional taxonomy (intention, action, supporting tool, emotion) claimed to capture programming behavior. Analysis of the labeled sequences yields patterns such as greater focus on creation/verification and emotionally stable flow among AI users, contrasted with fluctuating emotions in the non-AI group, plus interview reports of impostor-like guilt.

Significance. If the labeling procedure can be shown to recover contemporaneous states with acceptable fidelity, the S-IASE model supplies a useful integrative lens that moves beyond the observable-action focus of prior AI-assistance studies. The 76-participant sample and mixed-methods design are strengths for an exploratory study, generating concrete behavioral sequences and emotional contrasts that could guide tool design and training interventions.

major comments (1)
  1. [§4] §4 (Data Collection and Labeling Procedure): The S-IASE dimensions and all reported patterns are extracted directly from participants' retrospective self-labels of intentions and emotions. No inter-rater reliability statistics, concurrent think-aloud validation, or comparison against real-time measures are reported; the method therefore rests on the untested assumption that post-hoc reconstruction accurately recovers hidden states rather than introducing recall bias or social-desirability distortion. This is load-bearing for the central claim that the four-dimensional model describes actual programming behavior.
minor comments (2)
  1. [Abstract and §5] Abstract and §5: The abstract states that AI-assisted participants showed 'emotionally stable development flow' while non-AI participants experienced 'more fluctuating emotions.' Provide the exact operationalization (e.g., variance of labeled emotion scores per minute or transition counts) and the statistical test used to support this contrast.
  2. [§6] §6 (Interview Analysis): The impostor-feeling theme is presented qualitatively. Indicate how many participants expressed it and whether it differed systematically between AI and non-AI groups.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our exploratory study. We address the single major comment point-by-point below, agreeing that the retrospective labeling method requires explicit discussion of its limitations. We will revise the manuscript accordingly to improve transparency and strengthen the presentation of our contributions.

read point-by-point responses
  1. Referee: [§4] §4 (Data Collection and Labeling Procedure): The S-IASE dimensions and all reported patterns are extracted directly from participants' retrospective self-labels of intentions and emotions. No inter-rater reliability statistics, concurrent think-aloud validation, or comparison against real-time measures are reported; the method therefore rests on the untested assumption that post-hoc reconstruction accurately recovers hidden states rather than introducing recall bias or social-desirability distortion. This is load-bearing for the central claim that the four-dimensional model describes actual programming behavior.

    Authors: We agree that the retrospective self-labeling procedure is foundational to deriving the S-IASE model and that the lack of concurrent validation or reliability metrics represents a methodological limitation. Our design choice to use post-task labeling assisted by screen recordings was deliberate: concurrent methods such as think-aloud protocols risk altering natural developer behavior, cognitive flow, and emotional states during the programming tasks. The recordings were provided specifically to support accurate recall of intentions, actions, tools, and emotions. We triangulated the labels with survey and interview data to mitigate bias. Nevertheless, we acknowledge that recall bias and social-desirability effects cannot be fully ruled out without additional validation. In the revised manuscript we will (1) expand the Limitations section with an explicit discussion of these threats and (2) add a paragraph outlining future work that could include concurrent think-aloud or physiological measures to test the fidelity of the retrospective approach. revision: yes

Circularity Check

0 steps flagged

No circularity: S-IASE model derived directly from empirical participant data

full rationale

The paper is a mixed-methods empirical study that collects screen recordings, retrospective self-labels for intentions/emotions/actions/tools, surveys, and interviews from 76 developers. The S-IASE four-dimensional model is presented as emerging from pattern analysis of these labels (aggregated and sequential behaviors). No equations, fitted parameters, predictions, or mathematical derivations appear; the taxonomy is a descriptive organization of observed data rather than a reduction to prior inputs by construction. Self-citations are absent from the provided text and not load-bearing for any derivation. This is standard exploratory qualitative work with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The model rests on the assumption that retrospective labeling captures real-time states; no free parameters or invented physical entities, only the proposed model itself.

axioms (1)
  • domain assumption Retrospective self-labeling from screen recordings accurately reflects real-time intentions, actions, and emotions
    Central to the data collection method described in the abstract
invented entities (1)
  • S-IASE model no independent evidence
    purpose: Framework to describe programming behavior across intention, action, tool, and emotion dimensions
    Newly proposed based on the study observations

pith-pipeline@v0.9.0 · 5544 in / 1229 out tokens · 42824 ms · 2026-05-14T22:10:14.852544+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

75 extracted references · 75 canonical work pages · 2 internal anchors

  1. [1]

    [n. d.]. Otter.ai - AI Meeting Note Taker & Real-time AI Transcription. https://otter.ai/

  2. [2]

    AI | 2024 Stack Overflow Developer Survey

    2024. AI | 2024 Stack Overflow Developer Survey. https://survey.stackoverflow.co/2024/ai

  3. [3]

    2025 Developer Survey

    2025. 2025 Developer Survey. https://survey.stackoverflow.co/2025/

  4. [4]

    GitHub Copilot·Your AI pair programmer

    2025. GitHub Copilot·Your AI pair programmer. https://github.com/features/copilot

  5. [5]

    Aldeida Aleti. 2023. Software Testing of Generative AI Systems: Challenges and Opportunities. In2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE). 4–14. doi:10.1109/ICSE- FoSE59343.2023.00009

  6. [6]

    Eman Abdullah AlOmar, Anushkrishna Venkatakrishnan, Mohamed Wiem Mkaouer, Christian Newman, and Ali Ouni. 2024. How to refactor this code? An exploratory study on developer-ChatGPT refactoring conversations. In Proceedings of the 21st International Conference on Mining Software Repositories. 202–206. doi:10.1145/3643991.3645081

  7. [7]

    Matin Amoozadeh, Daye Nam, Daniel Prol, Ali Alfageeh, James Prather, Michael Hilton, Sruti Srinivasa Ragavan, and Amin Alipour. 2024. Student-AI Interaction: A Case Study of CS1 students. InProceedings of the 24th Koli Calling International Conference on Computing Education Research. 1–13. doi:10.1145/3699538.3699567

  8. [8]

    2001.A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives: complete edition

    Lorin W Anderson and David R Krathwohl. 2001.A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives: complete edition. Addison Wesley Longman, Inc

  9. [9]

    Shraddha Barke, Michael B James, and Nadia Polikarpova. 2023. Grounded copilot: How programmers interact with code-generating models.Proceedings of the ACM on Programming Languages7, OOPSLA1 (2023), 85–111. doi:10.1145/3586030

  10. [10]

    Yoav Benjamini and Yosef Hochberg. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal statistical society: series B (Methodological)57, 1 (1995), 289–300. doi:10.1111/j.2517- 6161.1995.tb02031.x

  11. [11]

    Michael Bidollahkhani and Julian M. Kunkel. 2024. Revolutionizing System Reliability: The Role of AI in Predictive Maintenance Strategies. arXiv:2404.13454 [cs.AI] https://arxiv.org/abs/2404.13454

  12. [12]

    1956.Taxonomy of educational objectives: The classification of educational goals

    Benjamin S Bloom, Max D Engelhart, Edward J Furst, Walker H Hill, David R Krathwohl, et al. 1956.Taxonomy of educational objectives: The classification of educational goals. Handbook 1: Cognitive domain. Longman New York

  13. [13]

    Adam Brown, Sarah D’Angelo, Ambar Murillo, Ciera Jaspan, and Collin Green. 2024. Identifying the factors that influence trust in AI code completion. InProceedings of the 1st ACM International Conference on AI-Powered Software. 1–9. doi:10.1145/3664646.3664757

  14. [14]

    Anita Carleton, Davide Falessi, Hongyu Zhang, and Xin Xia. 2024. Generative AI: Redefining the Future of Software Engineering.IEEE Software41, 6 (2024), 34–37. doi:10.1109/MS.2024.3441889

  15. [15]

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374(2021). doi:10.48550/arXiv.2107.03374

  16. [16]

    Ben Cheng, Chetan Arora, Xiao Liu, Thuong Hoang, Yi Wang, and John Grundy. 2023. Multi-modal emotion recognition for enhanced requirements engineering: a novel approach. In2023 IEEE 31st International Requirements Engineering Conference (RE). IEEE, 299–304. doi:10.1109/RE57278.2023.00039

  17. [17]

    Rudrajit Choudhuri, Bianca Trinkenreich, Rahul Pandita, Eirini Kalliamvakou, Igor Steinmacher, Marco Gerosa, Christopher Sanchez, and Anita Sarma. 2025. What guides our choices? Modeling developers’ trust and behavioral intentions towards genai. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). 1691–1703. doi:10.1109/ICSE55347....

  18. [18]

    2013.Applied multiple regression/correlation analysis for the behavioral sciences

    Jacob Cohen, Patricia Cohen, Stephen G West, and Leona S Aiken. 2013.Applied multiple regression/correlation analysis for the behavioral sciences. Routledge

  19. [19]

    Cursor. 2023. Cursor. https://cursor.so

  20. [20]

    Kostadin Damevski, David C Shepherd, Johannes Schneider, and Lori Pollock. 2016. Mining sequences of developer interactions in visual studio for usage smells.IEEE Transactions on Software Engineering43, 4 (2016), 359–371. doi:10.1109/TSE.2016.2592905

  21. [21]

    Denae Ford, Tom Zimmermann, Christian Bird, and Nachiappan Nagappan. 2017. Characterizing software engineering work with personas based on knowledge worker actions. In2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 394–403. doi:10.1109/ESEM.2017.54

  22. [22]

    Nicholas Gardella, Raymond Pettit, and Sara L Riggs. 2024. Performance, Workload, Emotion, and Self-Efficacy of Novice Programmers Using AI Code Generation. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1. 290–296. doi:10.1145/3649217.3653615

  23. [23]

    Daniela Girardi, Nicole Novielli, Davide Fucci, and Filippo Lanubile. 2020. Recognizing developers’ emotions while programming. InProceedings of the ACM/IEEE 42nd international conference on software engineering. 666–677. doi:10. 1145/3377811.3380374

  24. [24]

    GitHub. 2023. GitHub Copilot. https://github.com/features/copilot Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE113. Publication date: July 2026. FSE113:22 Yinan Wu, Ze Shi Li, Kathryn Thomasset Stolee, and Bowen Xu

  25. [25]

    GitHub. 2023. GitHub Copilot Chat. https://github.com/features/copilot

  26. [26]

    Daniel Graziotin, Xiaofeng Wang, and Pekka Abrahamsson. 2015. Do feelings matter? On the correlation of affects and the self-assessed productivity in software engineering.Journal of Software: Evolution and Process27, 7 (2015), 467–487. doi:10.1002/smr.1673

  27. [27]

    Paloma Guenes, Rafael Tomaz, Marcos Kalinowski, Maria Teresa Baldassarre, and Margaret-Anne Storey. 2024. Impostor phenomenon in software engineers. InProceedings of the 46th International Conference on Software Engineering: Software Engineering in Society. 96–106. doi:10.1145/3639475.3640114

  28. [28]

    Melissa Harper and Patricia Cole. 2012. Member checking: Can benefits be gained similar to group therapy.The qualitative report17, 2 (2012), 510–517. doi:10.46743/2160-3715/2012.2139

  29. [29]

    Sandra G Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. InProceedings of the human factors and ergonomics society annual meeting, Vol. 50. Sage publications Sage CA: Los Angeles, CA, 904–908. doi:10.1177/ 154193120605000909

  30. [30]

    Rashina Hoda. 2021. Socio-technical grounded theory for software engineering.IEEE Transactions on Software Engineering48, 10 (2021), 3808–3832. doi:10.1109/TSE.2021.3106280

  31. [31]

    Rashina Hoda. 2024. Qualitative research with socio-technical grounded theory.Springer(2024)

  32. [32]

    Amber Horvath, Brad Myers, Andrew Macvean, and Imtiaz Rahman. 2022. Using annotations for sensemaking about code. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–16. doi:10.1145/3526113.3545667

  33. [33]

    Brittany Johnson, Christian Bird, Denae Ford, Nicole Forsgren, and Thomas Zimmermann. 2023. Make your tools sparkle with trust: The PICSE framework for trust in software tools. In2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 409–419. doi:10.1109/ICSE-SEIP58684.2023.00043

  34. [34]

    Matthew Kam, Cody Miller, Miaoxin Wang, Abey Tidwell, Irene A Lee, Joyce Malyn-Smith, Beatriz Perret, Vikram Tiwari, Joshua Kenitzer, Andrew Macvean, et al. 2025. What do professional software developers need to know to succeed in an age of Artificial Intelligence?. InProceedings of the 33rd ACM International Conference on the Foundations of Software Engi...

  35. [35]

    Ericson, David Weintrop, and Tovi Grossman

    Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, Ne...

  36. [36]

    Nan M Laird and James H Ware. 1982. Random-effects models for longitudinal data.Biometrics(1982), 963–974. doi:10.2307/2529876

  37. [37]

    Paul Luo Li, Amy J Ko, and Jiamin Zhu. 2015. What makes a great software engineer?. In2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 700–710. doi:10.1109/ICSE.2015.335

  38. [38]

    Ze Shi Li, Nowshin Nawar Arony, Ahmed Musa Awon, Daniela Damian, and Bowen Xu. 2024. AI tool use and adoption in software development by individuals and organizations: a grounded theory study.arXiv preprint arXiv:2406.17325 (2024). doi:10.48550/arXiv.2406.17325

  39. [39]

    Jenny T Liang, Chenyang Yang, and Brad A Myers. 2024. A large-scale survey on the usability of ai programming assistants: Successes and challenges. InProceedings of the 46th IEEE/ACM international conference on software engineering. 1–13. doi:10.1145/3597503.3608128

  40. [40]

    Kung-Yee Liang and Scott L Zeger. 1986. Longitudinal data analysis using generalized linear models.Biometrika73, 1 (1986), 13–22. doi:10.1093/biomet/73.1.13

  41. [41]

    J Scott Long and Laurie H Ervin. 2000. Using heteroscedasticity consistent standard errors in the linear regression model.The American Statistician54, 3 (2000), 217–224. doi:10.1080/00031305.2000.10474549

  42. [42]

    Kashumi Madampe, John Grundy, Minh Nguyen, Ellen Welstead-Cloud, Vinh Tuan Huynh, Linh Doan, William Lay, and Sayed Hashim. 2025. EmoReflex: an AI-powered emotion-centric developer insights platform.Automated Software Engineering32, 1 (2025), 22. doi:10.1007/s10515-025-00488-7

  43. [43]

    Luciano Marchezan, Wesley K. G. Assunção, Edvin Herac, and Alexander Egyed. 2024. Model-based Maintenance and Evolution with GenAI: A Look into the Future. arXiv:2407.07269 [cs.SE] https://arxiv.org/abs/2407.07269

  44. [44]

    Hussein Mozannar, Gagan Bansal, Adam Fourney, and Eric Horvitz. 2024. Reading between the lines: Modeling user behavior and costs in AI-assisted programming. InProceedings of the CHI Conference on Human Factors in Computing Systems. 1–16. doi:10.1145/3613904.3641936

  45. [45]

    Vijayaraghavan Murali, Chandra Maddila, Imad Ahmad, Michael Bolin, Daniel Cheng, Negar Ghorbani, Renuka Fernandez, Nachiappan Nagappan, and Peter C. Rigby. 2024. AI-Assisted Code Authoring at Scale: Fine-Tuning, Deploying, and Mixed Methods Evaluation.Proc. ACM Softw. Eng.1, FSE, Article 48 (July 2024), 20 pages. doi:10.1145/ 3643774

  46. [46]

    Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, and Brad Myers. 2024. Using an llm to help with code understanding. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE113. Publication date: July 2026. How Do Developers Interact with AI? An Explorato...

  47. [47]

    Kevin KB Ng, Liyana Fauzi, Leon Leow, and Jaren Ng. 2024. Harnessing the Potential of Gen-AI Coding Assistants in Public Sector Software Development.arXiv preprint arXiv:2409.17434(2024). doi:10.48550/arXiv.2409.17434

  48. [48]

    Sydney Nguyen, Hannah McLean Babe, Yangtian Zi, Arjun Guha, Carolyn Jane Anderson, and Molly Q Feldman. 2024. How beginning programmers and code llms (mis) read each other. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–26. doi:10.1145/3613904.3642706

  49. [49]

    OpenAI. 2023. ChatGPT. https://openai.com/chatgpt

  50. [50]

    OpenAI. 2023. Codex. https://openai.com/codex/

  51. [51]

    Stack Overflow. 2023. Insights into Stack Overflow’s traffic. https://stackoverflow.blog/2023/08/08/insights-into-stack- overflows-traffic/ Accessed: 03 August 2023

  52. [52]

    It’s Weird That it Knows What I Want

    James Prather, Brent N Reeves, Paul Denny, Brett A Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers.ACM Transactions on Computer-Human Interaction31, 1 (2023), 1–31. doi:10.1145/3617367

  53. [53]

    Asha Rajbhoj, Akanksha Somase, Piyush Kulkarni, and Vinay Kulkarni. 2024. Accelerating software development using generative ai: Chatgpt case study. InProceedings of the 17th innovations in software engineering conference. 1–11. doi:10.1145/3641399.3641403

  54. [54]

    Tiernan Ray. 2023. Microsoft has over a million paying Github Copilot users: CEO Nadella. https://www.zdnet.com/ article/microsoft-has-over-a-million-paying-github-copilot-users-ceo-nadella/

  55. [55]

    Google Research. 2023. Large sequence models for software development activities. https://research.google/blog/large- sequence-models-for-software-development-activities/ Accessed: 2023

  56. [56]

    Diana Robinson, Christian Cabrera, Andrew D Gordon, Neil D Lawrence, and Lars Mennen. 2025. Requirements are all you need: The final frontier for end-user software engineering.ACM Transactions on Software Engineering and Methodology34, 5 (2025), 1–22. doi:10.1145/3708524

  57. [57]

    Sánchez-García, and Xavier Limón

    Alfonso Robles-Aguilar, Jorge Octavio Ocharán-Hernández, Ángel J. Sánchez-García, and Xavier Limón. 2021. Software Design and Artificial Intelligence: A Systematic Mapping Study. In2021 9th International Conference in Software Engineering Research and Innovation (CONISOFT). 132–141. doi:10.1109/CONISOFT52520.2021.00028

  58. [58]

    Emma Roth. 2024. ChatGPT now has over 300 million weekly users. https://www.theverge.com/2024/12/4/24313097/ chatgpt-300-million-weekly-users

  59. [59]

    James A Russell. 1980. A circumplex model of affect.Journal of personality and social psychology39, 6 (1980), 1161. doi:10.1037/h0077714

  60. [60]

    Sadra Sabouri, Philipp Eibl, Xinyi Zhou, Morteza Ziyadi, Nenad Medvidovic, Lars Lindemann, and Souti Chattopadhyay

  61. [61]

    Devanbu, and Michael Pradel

    Trust dynamics in AI-assisted development: Definitions, factors, and implications. InProceedings of the 47th IEEE/ACM international conference on software engineering. doi:10.1109/ICSE55347.2025.00199

  62. [62]

    Advait Sarkar and Ian Drosos. 2025. Vibe coding: programming through conversation with artificial intelligence.arXiv preprint arXiv:2506.23253(2025). doi:10.48550/arXiv.2506.23253

  63. [63]

    Valerio Terragni, Annie Vella, Partha Roop, and Kelly Blincoe. 2025. The Future of AI-Driven Software Engineering. ACM Transactions on Software Engineering and Methodology(2025). doi:10.1145/3715003

  64. [64]

    Simon Torka and Sahin Albayrak. 2024. Optimizing AI-Assisted Code Generation. arXiv:2412.10953 [cs.SE] https: //arxiv.org/abs/2412.10953

  65. [65]

    Christoph Treude and Marco A Gerosa. 2025. How Developers Interact with AI: A Taxonomy of Human-AI Collaboration in Software Engineering. InIn Proceedings of the 2nd ACM International Conference on Al Foundation Models and Software Engineering. doi:10.1109/Forge66646.2025.00033

  66. [66]

    Ruotong Wang, Ruijia Cheng, Denae Ford, and Thomas Zimmermann. 2024. Investigating and designing for trust in ai-powered code generation tools. InThe 2024 ACM Conference on Fairness, Accountability, and Transparency. 1475–1493. doi:10.1145/3630106.3658984

  67. [67]

    Wikipedia. 2025. Vibe Coding. https://en.wikipedia.org/wiki/Vibe_coding Accessed: 2025

  68. [68]

    2026.Replication Package

    Yinan Wu, Ze Shi Li, Kathryn Thomasset Stolee, and Bowen Xu. 2026.Replication Package. doi:10.5281/zenodo.19582089

  69. [69]

    Tao Xie and Jian Pei. 2006. MAPO: Mining API usages from open source repositories. InProceedings of the 2006 international workshop on Mining software repositories. 54–57. doi:10.1145/1137983.1137997

  70. [70]

    Xifeng Yan, Jiawei Han, and Ramin Afshar. 2003. CloSpan: Mining: Closed sequential patterns in large datasets. In Proceedings of the 2003 SIAM international conference on data mining. SIAM, 166–177. doi:10.1137/1.9781611972733.15

  71. [71]

    Burak Yetiştiren, Işık Özsoy, Miray Ayerdem, and Eray Tüzün. 2023. Evaluating the code quality of ai-assisted code generation tools: An empirical study on github copilot, amazon codewhisperer, and chatgpt.arXiv preprint arXiv:2304.10778(2023). doi:10.48550/arXiv.2304.10778

  72. [72]

    Shengcheng Yu, Chunrong Fang, Jia Liu, and Zhenyu Chen. 2025. Test Script Intention Generation for Mobile Application via GUI Image and Code Understanding.ACM Transactions on Software Engineering and Methodology Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE113. Publication date: July 2026. FSE113:24 Yinan Wu, Ze Shi Li, Kathryn Thomasset Stolee, and...

  73. [73]

    Ilya Zakharov, Ekaterina Koshchenko, and Agnia Sergeyuk. 2025. AI in Software Engineering: Perceived Roles and Their Impact on Adoption. InProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering. 1305–1309. doi:10.1145/3696630.3730563

  74. [74]

    Zhengdong Zhang, Zihan Dong, Yang Shi, Thomas Price, Noboru Matsuda, and Dongkuan Xu. 2024. Students’ perceptions and preferences of generative artificial intelligence feedback for programming. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 23250–23258. doi:10.1609/aaai.v38i21.30372

  75. [75]

    Thomas Zimmermann. 2016. Card-sorting: From text to themes. InPerspectives on data science for software engineering. Elsevier, 137–141. doi:10.1016/B978-0-12-804206-9.00027-1 Received 2026-02-24; accepted 2026-03-24 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE113. Publication date: July 2026