pith. machine review for the scientific record. sign in

arxiv: 2605.10702 · v1 · submitted 2026-05-11 · 💻 cs.SE

Recognition: no theorem link

ChatGPT: Friend or Foe When Comprehending and Changing Unfamiliar Code

Andr\'e van der Hoek, Anthony Estey, Guilherme Vaz Pereira, Margaret-Anne Storey, Norman Anderson, Rafael Prikladnicki, Tarek Alakmeh, Thomas Fritz, Umit Akirmak, Victoria Jackson

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:21 UTC · model grok-4.3

classification 💻 cs.SE
keywords AI-assisted programmingproblem-solving behaviorscode comprehensiondeveloper stuck statesLLM use in software engineeringempirical lab studythink-aloud analysis
0
0 comments X

The pith

AI assistance leaves all detailed coding problem-solving steps intact while changing how developers get stuck and recover.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how AI tools affect the cognitive steps developers take when comprehending and extending unfamiliar code. In a lab study, ten advanced students split into AI and non-AI groups performed the same non-trivial extension task while their actions were recorded through think-aloud protocols, code edits, searches, and prompts. The analysis applied Polya's problem-solving phases and twenty-five behavior codes to compare the two groups. All observed behaviors appeared in both groups even though the AI group used the tool to offload parts of the work. Nine participants became stuck, and the study identified seven distinct causes along with cases where AI helped or hindered recovery.

Core claim

Developers in the AI group repeatedly turned to the tool to offload aspects of the process, yet every one of the twenty-five detailed problem-solving behaviors appeared in both the AI and non-AI groups. Nine of the ten participants encountered being stuck during the task, but the patterns of how they became stuck and how they became unstuck differed by group. The authors catalog seven distinct causes for getting stuck and note specific instances in which AI support either aided or impeded progress toward unstuck states.

What carries the argument

Polya's four problem-solving phases combined with twenty-five inductively generated behavior codes, applied to triangulated data from think-aloud sessions, code changes, web searches, and LLM prompts.

If this is right

  • All twenty-five detailed problem-solving behaviors remain present even when developers have access to AI for offloading work.
  • Developers encounter seven distinct causes of becoming stuck while extending unfamiliar code.
  • AI can either facilitate or obstruct recovery from stuck states depending on the cause.
  • Stuck and unstuck patterns differ between developers who use AI and those who do not.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Tool designers could add features that support recovery from each of the seven stuck causes without introducing new ones.
  • Training for developers might emphasize strategies for using AI that avoid the hindering stuck patterns observed here.
  • The same comparison of behaviors could be repeated on larger, multi-file changes to test whether the preservation of all steps continues.
  • Teams adopting AI might track which stuck causes appear most often in their workflow and adjust prompts or processes accordingly.

Load-bearing premise

That the problem-solving behaviors and stuck patterns seen in advanced students on one lab task will hold for professional developers on real projects.

What would settle it

A study that records professional developers working on actual codebases and checks whether the same twenty-five behaviors appear in both AI and non-AI conditions along with the same seven stuck causes and AI effects on recovery.

Figures

Figures reproduced from arXiv: 2605.10702 by Andr\'e van der Hoek, Anthony Estey, Guilherme Vaz Pereira, Margaret-Anne Storey, Norman Anderson, Rafael Prikladnicki, Tarek Alakmeh, Thomas Fritz, Umit Akirmak, Victoria Jackson.

Figure 1
Figure 1. Figure 1: Component sketches from P2 (left) and P3 (right), [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Timeline summarizing the time each participant spent in and across the four Polya stages throughout their task, [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
read the original abstract

A rapidly growing body of research is examining how LLMs influence developers when they code. To date, this research has tended to focus on productivity and code quality outcomes, rather than the underlying cognitive processes involved in programming. To address this gap, we report on the results of an exploratory laboratory study of ten advanced student developers (five with support from AI and five without) who had to make a non-trivial extension to a sizable software system. Leveraging Polya's four problem-solving phases and 25 inductively-generated codes detailing distinct problem-solving behaviors as the primary lenses, we examined: (1) how AI impacted the problem-solving approach the developers used to solve the programming task, and (2) how AI impacted their progress when they became stuck. For the analysis, we triangulated data across multiple sources (e.g., think-aloud, code changes, web searches, and LLM prompts). Unexpectedly, while developers in the AI group repeatedly turned to the AI tool to offload certain aspects of the process, all detailed problem-solving behaviors appeared in both groups. We also found that nine out of ten participants found themselves stuck in their work, but with key differences in how they became stuck and unstuck. We highlight seven distinct causes for being stuck and highlight how AI in some cases helped and in other cases hindered becoming unstuck.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper reports an exploratory laboratory study with ten advanced student developers (five with ChatGPT support and five without) performing a non-trivial extension task on a sizable software system. Using Polya's four problem-solving phases and 25 inductively generated codes for distinct behaviors, the authors triangulate think-aloud protocols, code changes, web searches, and LLM prompts to examine (1) AI's impact on problem-solving approaches and (2) how AI affects progress when stuck. Key claims are that all 25 behaviors appeared in both groups (with AI used to offload aspects of the process) and that nine of ten participants became stuck, with differences in the seven identified causes and in how they became unstuck.

Significance. If the patterns hold, the work fills a gap by focusing on cognitive processes rather than productivity or code quality metrics alone. Strengths include data triangulation across multiple sources and the inductive generation of a behavior inventory grounded in Polya's framework. This could inform the design of AI coding tools by suggesting they supplement rather than replace core problem-solving behaviors. The exploratory design and small sample, however, limit the strength of claims about invariance of the behavioral repertoire.

major comments (3)
  1. [Methods and Results] Methods and Results sections: The central claim that all 25 problem-solving behaviors appeared in both the AI and non-AI groups rests on a sample of only five participants per arm. With this n, the absence of observed differences may reflect limited opportunity to surface variations rather than true equivalence of the behavioral repertoires.
  2. [Discussion] Discussion: The interpretation that AI merely offloads without altering underlying problem-solving behaviors is load-bearing for the 'friend or foe' framing, yet the design uses advanced students on one controlled lab task. This does not secure generalizability to professional developers working on real, unfamiliar codebases, as a different task or participant pool could yield different distributions of behaviors or stuck triggers.
  3. [Analysis] Analysis: Full details on the inductive generation of the 25 codes, their application, and any inter-rater reliability metrics are not provided. This is load-bearing for the validity of the behavior inventory that underpins the equivalence finding across groups.
minor comments (2)
  1. [Abstract] Abstract: Specify whether the single participant who did not become stuck was in the AI or non-AI group, as this detail could illuminate the reported differences in stuck/unstuck patterns.
  2. The manuscript would benefit from a dedicated limitations subsection that explicitly addresses sample size, participant expertise, and task specificity in relation to the generalizability of the seven stuck causes.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments and for recognizing the value of our exploratory focus on cognitive processes and data triangulation. We respond point-by-point to the major comments below, agreeing where the critique is valid and outlining specific revisions to address each concern while preserving the integrity of our findings.

read point-by-point responses
  1. Referee: [Methods and Results] Methods and Results sections: The central claim that all 25 problem-solving behaviors appeared in both the AI and non-AI groups rests on a sample of only five participants per arm. With this n, the absence of observed differences may reflect limited opportunity to surface variations rather than true equivalence of the behavioral repertoires.

    Authors: We agree that the small sample (n=5 per group) means we cannot claim equivalence of behavioral repertoires; the absence of group-specific behaviors in this study may simply reflect limited opportunity to observe variations. Our original phrasing reported an observation from the data rather than a general claim, but we will revise the Methods and Results sections (and update the abstract) to explicitly frame this as an exploratory finding, noting that larger samples could surface differences not seen here. This change will be made without altering the reported observations. revision: yes

  2. Referee: [Discussion] Discussion: The interpretation that AI merely offloads without altering underlying problem-solving behaviors is load-bearing for the 'friend or foe' framing, yet the design uses advanced students on one controlled lab task. This does not secure generalizability to professional developers working on real, unfamiliar codebases, as a different task or participant pool could yield different distributions of behaviors or stuck triggers.

    Authors: We acknowledge the limitation in generalizability. The study involved advanced students on a single controlled extension task, so patterns of offloading, stuck points, and recovery may differ for professionals on real, large-scale codebases. We will revise the Discussion to add an explicit limitations subsection that discusses the participant pool, task constraints, and the need for future work with industry developers. We will also temper the 'friend or foe' framing to present the results as contextual insights that can still inform AI tool design, rather than broad generalizations. revision: yes

  3. Referee: [Analysis] Analysis: Full details on the inductive generation of the 25 codes, their application, and any inter-rater reliability metrics are not provided. This is load-bearing for the validity of the behavior inventory that underpins the equivalence finding across groups.

    Authors: We will expand the Analysis section to include full details on the inductive process. The 25 codes were developed through iterative open coding of think-aloud protocols, code changes, web searches, and LLM prompts, using Polya's phases as an organizing lens; two authors independently coded initial data and met to refine the codebook through discussion until consensus. We did not compute formal inter-rater reliability metrics given the exploratory qualitative design, but we will describe the multi-author review and consensus process in detail and add a supplementary table with code definitions and examples to support transparency and allow assessment of the inventory. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical qualitative study grounded in participant data

full rationale

The paper is an exploratory laboratory study that collects think-aloud, code-change, search, and prompt logs from ten participants, then applies Polya's phases and 25 inductively generated codes to describe observed behaviors. No equations, fitted parameters, derivations, or self-referential definitions exist. All claims (e.g., all 25 behaviors appearing in both groups, differences in stuck causes) are direct summaries of the collected data rather than reductions to prior inputs or self-citations. Standard qualitative triangulation does not meet any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on established cognitive frameworks and data-driven coding without introducing new mathematical entities or free parameters.

axioms (2)
  • domain assumption Polya's four problem-solving phases apply to software development tasks involving unfamiliar code
    Used as the primary lens for examining how AI impacted the problem-solving approach.
  • domain assumption Inductively generated codes from think-aloud protocols can reliably distinguish distinct problem-solving behaviors
    Basis for the 25 codes used to compare groups and analyze stuck states.

pith-pipeline@v0.9.0 · 5576 in / 1369 out tokens · 79396 ms · 2026-05-12T04:21:46.721350+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

  1. [1]

    Norman Anderson, Tarek Alakmeh, Victoria Jackson, Guilherme Vaz Pereira, Umit Akirmak, Anthony Estey, Rafael Prikladnicki, Thomas Fritz, André van der Hoek, and Margaret-Anne Storey. 2026. Study Supplementary Material. https: //doi.org/10.5281/zenodo.18372121

  2. [2]

    Martin Balfroid, Benoît Vanderose, and Xavier Devroey. 2024. Towards LLM- Generated Code Tours for Onboarding. InProceedings of the Third ACM/IEEE International Workshop on NL-Based Software Engineering (NLBSE ’24). Associa- tion for Computing Machinery, New York, NY, USA, 65–68. doi:10.1145/3643787. 3648033 ChatGPT: Friend or Foe When Comprehending and C...

  3. [3]

    Andrew Begel and Beth Simon. 2008. Struggles of New College Graduates in Their First Software Development Job. InProceedings of the 39th SIGCSE Technical Symposium on Computer Science Education (SIGCSE ’08). Association for Computing Machinery, New York, NY, USA, 226–230. doi:10.1145/1352135. 1352218

  4. [4]

    Blazity. 2024. Enterprise Commerce. Blazity. https://github.com/Blazity/ enterprise-commerce

  5. [5]

    Lane, Miaomiao Zhang, Vladimir Jacimovic, and Karim R

    Léonard Boussioux, Jacqueline N. Lane, Miaomiao Zhang, Vladimir Jacimovic, and Karim R. Lakhani. 2024. The Crowdless Future? Generative AI and Creative Problem-Solving.Organization science (Providence, R.I.)35, 5 (2024), 1589–1607

  6. [6]

    Soloway, Ruth E

    Bill Curtis, Elliot M. Soloway, Ruth E. Brooks, John B. Black, Kate Ehrlich, and H. R. Ramsey. 1986. Software psychology: The need for an interdisciplinary program.Proc. IEEE74, 8 (1986), 1092–1106. doi:10.1109/PROC.1986.13596

  7. [7]

    Davidson, Raymond Deuser, and Robert J

    Janet E. Davidson, Raymond Deuser, and Robert J. Sternberg. 1994. The role of metacognition in problem solving. InMetacognition: Knowing about Knowing, Janet Metcalfe and Arthur P. Shimamura (Eds.). MIT Press, Cambridge, MA, 207–226

  8. [8]

    Mollick, Hila Lifshitz-Assaf, Katherine Kellogg, Saran Rajendran, Lisa Krayer, François Candelon, and Karim R

    Fabrizio Dell’Acqua, Edward McFowland III, Ethan R. Mollick, Hila Lifshitz- Assaf, Katherine Kellogg, Saran Rajendran, Lisa Krayer, François Candelon, and Karim R. Lakhani. 2023.Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. Working Paper 24-013. Harvard Business...

  9. [9]

    1945.On Problem-Solving

    Karl Duncker. 1945.On Problem-Solving. American Psychological Association, Washington, DC. doi:10.1037/h0093599 Trans. by L. S. Lees.Psychological Monographs, 58(5), i–113

  10. [10]

    Dietrich Dörner and Joachim Funke. 2017. Complex Problem Solving: What It Is and What It Is Not.Frontiers in Psychology8 (2017), 1153. doi:10.3389/fpsyg. 2017.01153

  11. [11]

    Christof Ebert and Panos Louridas. 2023. Generative AI for Software Practitioners. IEEE Software40, 4 (2023), 30–38. doi:10.1109/MS.2023.3265877

  12. [12]

    K. A. Ericsson and J. H. Moxley. 2011. Thinking Aloud Protocols: Concurrent Verbalizations of Thinking During Performance on Tasks Involving Decision Making. InA Handbook of Process Tracing Methods for Decision Research: A Critical Review and User’s Guide, M. Schulte-Mecklenbeck, A. Kühberger, and R. Ranyard (Eds.). Psychology Press, 89–114

  13. [13]

    Fabian Fagerholm, Michael Felderer, Davide Fucci, Michael Unterkalmsteiner, Bogdan Marculescu, Michele Martini, Lars G. W. Tengberg, Robert Feldt, Björn Lehtelä, Bálint Nagyváradi, and Junaid Khattak. 2022. Cognition in Software Engineering: A Taxonomy and Survey of a Half-Century of Research.Comput. Surveys54, 11s (2022), 1–36. doi:10.1145/3508359

  14. [14]

    Joachim Funke. 2010. Complex Problem Solving: A Case for Complex Cognition? Cognitive Processing11, 2 (May 2010), 133–142. doi:10.1007/s10339-009-0345-0

  15. [15]

    Patrick Griffin and Esther Care. 2015. The ATC21S Method. InAssessment and Teaching of 21st Century Skills, Patrick Griffin and Esther Care (Eds.). Springer, Dordrecht, NL, 3–33

  16. [16]

    Xinyue Hao, Emrah Demir, and Daniel Eyers. 2025. Beyond Human-in-the-Loop: Sensemaking between Artificial Intelligence and Human Intelligence Collabora- tion.Sustainable Futures10 (2025), 101152. doi:10.1016/j.sftr.2025.101152

  17. [17]

    Ava Heinonen, Bettina Lehtelä, Arto Hellas, and Fabian Fagerholm. 2023. Syn- thesizing Research on Programmers’ Mental Models of Programs, Tasks and Concepts — A Systematic Literature Review.Information and Software Technology 164 (2023), 107300. doi:10.1016/j.infsof.2023.107300

  18. [18]

    Heppner and Charles J

    Paul P. Heppner and Charles J. Krauskopf. 1987. An Information-Processing Approach to Personal Problem Solving.The Counseling Psychologist15, 3 (1987), 371–447. doi:10.1177/0011000087153001

  19. [19]

    Lim, and Yew-Soon Ong

    Cheng Hou, Guangyang Zhu, Vivek Sudarshan, Fook S. Lim, and Yew-Soon Ong

  20. [20]

    doi:10.1016/j.compedu.2025.105329

    Measuring Undergraduate Students’ Reliance on Generative AI During Problem-Solving: Scale Development and Validation.Computers & Education 234 (2025), 105329. doi:10.1016/j.compedu.2025.105329

  21. [21]

    Sven Jacobs, Maurice Kempf, and Natalie Kiesler. 2025. That’s Not the Feed- back I Need! - Student Engagement with GenAI Feedback in the Tutor Kai. In Proceedings of the 2025 Conference on UK and Ireland Computing Education Re- search (UKICER ’25). Association for Computing Machinery, New York, NY, USA. doi:10.1145/3754508.3754512

  22. [22]

    Tingting Jiang, Zhumo Sun, Shiting Fu, and Yan Lv. 2024. Human-AI Inter- action Research Agenda: A User-Centered Perspective.Data and Information Management8, 4 (2024), 100078. doi:10.1016/j.dim.2024.100078

  23. [23]

    Enkelejda Kasneci, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, Stephan Krusche, Gitta Kutyniok, Tilman Michaeli, Clau- dia Nerdel, Jürgen Pfeffer, Oleksandra Poquet, Michael Sailer, Albrecht Schmidt, Tina Seidel, Matthias Stadler, Jochen Weller, Jochen Kuh...

  24. [24]

    Learning and Individual Differences

    ChatGPT for good? On opportunities and challenges of large language models for education.Learning and Individual Differences103 (2023), 102274. doi:10.1016/j.lindif.2023.102274

  25. [25]

    Ko, Thomas D

    Amy J. Ko, Thomas D. LaToza, Stephen Hull, Ellen A. Ko, William Kwok, Jane Quichocho, Harshitha Akkaraju, and Rishin Pandit. 2019. Teaching Explicit Programming Strategies to Adolescents. InProceedings of the 50th ACM Technical Symposium on Computer Science Education (SIGCSE ’19). Association for Com- puting Machinery, New York, NY, USA, 469–475. doi:10.1...

  26. [26]

    Amy J Ko, Brad A Myers, Michael J Coblenz, and Htet Htet Aung. 2006. An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks.IEEE Transactions on software engineering 32, 12 (2006), 971–987

  27. [27]

    Jürgen Koenemann and Scott P Robertson. 1991. Expert problem solving strate- gies for program comprehension. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems. 125–130

  28. [28]

    Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing task

    Natalia Kosmyna, Erik Hauptmann, Yu T. Yuan, Jing Situ, Xuan-Hao Liao, An- ton V. Beresnitzky, Itai Braunstein, and Pattie Maes. 2025. Your Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing Task. doi:10.48550/arXiv.2506.08872

  29. [29]

    Ko, Will Jernigan, Alannah Oleson, Christopher J

    Dastyni Loksa, Amy J. Ko, Will Jernigan, Alannah Oleson, Christopher J. Mendez, and Margaret M. Burnett. 2016. Programming, Problem Solving, and Self- Awareness: Effects of Explicit Guidance. InProceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI ’16). Association for Computing Machinery, New York, NY, USA, 1449–1461. doi:10.1...

  30. [30]

    Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke. 2014. On the Comprehension of Program Comprehension.ACM Trans. Softw. Eng. Methodol. 23, 4 (Sept. 2014). doi:10.1145/2622669

  31. [31]

    Klaus Mainzer. 2009. Challenges of Complexity in the 21st Century: An Inter- disciplinary Introduction.European Review17 (2009), 219–236. doi:10.1017/ S1062798709000714

  32. [32]

    Ran Mo, Dongyu Wang, Wenjing Zhan, Yingjie Jiang, Yepeng Wang, Yuqi Zhao, Zengyang Li, and Yutao Ma. 2025. Assessing and Analyzing the Correctness of GitHub Copilot’s Code Suggestions.ACM Trans. Softw. Eng. Methodol.34, 7 (Aug. 2025). doi:10.1145/3715108

  33. [33]

    Müller and Thomas Fritz

    Sebastian C. Müller and Thomas Fritz. 2015. Stuck and Frustrated or in Flow and Happy: Sensing Developers’ Emotions and Progress. In2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. 688–699. doi:10. 1109/ICSE.2015.334

  34. [34]

    Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, and Brad Myers. 2024. Using an LLM to Help With Code Understanding. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Association for Computing Machinery, New York, NY, USA. doi:10.1145/ 3597503.3639187

  35. [35]

    Allen Newell and Herbert A. Simon. 1972.Human Problem Solving. Prentice-Hall, Englewood Cliffs, NJ

  36. [36]

    Preetam Paul and Chandrima Variawa. 2025. A Framework for Understanding the Role of Generative AI in Engineering Education: A Literature Review. In Proceedings of the 2025 ASEE Annual Conference & Exposition. Montreal, Quebec, Canada. doi:10.18260/1-2--55366

  37. [37]

    Nancy Pennington. 1987. Stimulus structures and mental representations in expert comprehension of computer programs.Cognitive Psychology19, 3 (1987), 295–341. doi:10.1016/0010-0285(87)90007-7

  38. [38]

    1945.How to Solve It

    George Polya. 1945.How to Solve It. Princeton University Press, Princeton, NJ

  39. [39]

    It’s Weird That it Knows What I Want

    James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers.ACM Trans. Comput.-Hum. Interact.31, 1 (Nov. 2023). doi:10.1145/3617367

  40. [40]

    and Kimmel, Bailey and Wright, Jared and Briggs, Ben , title =

    James Prather, Brent N Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S Ran- drianasolo, Brett A. Becker, Bailey Kimmel, Jared Wright, and Ben Briggs. 2024. The Widening Gap: The Benefits and Harms of Generative AI for Novice Pro- grammers. InProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 1 (ICER ’24). Associ...

  41. [41]

    Yunhan Qiao, Md Istiak Hossain Shihab, and Christopher Hundhausen. 2025. A Systematic Literature Review of the Use of GenAI Assistants for Code Compre- hension: Implications for Computing Education Research and Practice.ACM Trans. Comput. Educ.(Dec. 2025). doi:10.1145/3785366

  42. [42]

    Robillard, W

    M.P. Robillard, W. Coelho, and G.C. Murphy. 2004. How effective developers investigate source code: an exploratory study.IEEE Transactions on Software Engineering30, 12 (2004), 889–903. doi:10.1109/TSE.2004.101

  43. [43]

    Tobias Roehm, Rebecca Tiarks, Rainer Koschke, and Walid Maalej. 2012. How do professional developers comprehend software?. In2012 34th International Con- ference on Software Engineering (ICSE). 255–265. doi:10.1109/ICSE.2012.6227188

  44. [44]

    Jaakko Sauvola, Sasu Tarkoma, Mika Klemettinen, Jukka Riekki, and David Doermann. 2024. Future of software development with generative AI.Automated Software Engineering31, 1 (2024), 26

  45. [45]

    Schoenfeld

    Alan H. Schoenfeld. 1992. Learning to Think Mathematically: Problem Solving, Metacognition, and Sense-Making in Mathematics. InHandbook of Research on Mathematics Teaching and Learning, Douglas A. Grouws (Ed.). Macmillan, New York, NY, 334–370. Anderson et al

  46. [46]

    Mohammed Tahri Sqalli. 2025. Eyes on the Code: Mapping Critical Thinking Through Eye-Tracking for Student-LLM Coding Interactions. InProceedings of the 16th Biannual Conference of the Italian SIGCHI Chapter (CHItaly ’25). Association for Computing Machinery, New York, NY, USA. doi:10.1145/3750069.3750397

  47. [47]

    Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

    Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. InExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA ’22). Association for Computing Machinery, New York, NY, USA. doi:10.1145/3491101.3519665

  48. [48]

    Von Mayrhauser and A.M

    A. Von Mayrhauser and A.M. Vans. 1995. Program comprehension during software maintenance and evolution.Computer28, 8 (1995), 44–55. doi:10.1109/ 2.402076

  49. [49]

    Ting-Ting Wu, Adi Asmara, Yueh-Min Huang, and Indah Permata Hapsari

  50. [50]

    doi:10.1177/ 21582440241249897

    Identification of Problem-Solving Techniques in Computational Thinking Studies: Systematic Literature Review.Sage Open14, 2 (2024). doi:10.1177/ 21582440241249897