To Tab or Not to Tab: Measuring Critical Engagement in AI Code Completion Tools Using Behavioral Signals and Attention Checks

Antonio Lazaro; Ian Tyler Applebaum; Jessica Hutchison; Kenneth Angelikas; Kush Rakesh Patel; Nicholas Rucinski; Phuoc Nguyen; Rahad Arman Nabid; Stephen MacNeil

arxiv: 2606.30549 · v1 · pith:UA7TRYQPnew · submitted 2026-06-29 · 💻 cs.HC · cs.AI· cs.SE

To Tab or Not to Tab: Measuring Critical Engagement in AI Code Completion Tools Using Behavioral Signals and Attention Checks

Jessica Hutchison , Ian Tyler Applebaum , Kenneth Angelikas , Kush Rakesh Patel , Phuoc Nguyen , Antonio Lazaro , Nicholas Rucinski , Rahad Arman Nabid

show 1 more author

Stephen MacNeil

This is my paper

Pith reviewed 2026-06-30 04:45 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.SE

keywords AI code completioncritical engagementattention checksbehavioral metricsprogramming educationtab acceptdwell timereflective practice

0 comments

The pith

Higher tab-accept rates in AI code tools link to weaker performance on attention checks measuring critical evaluation of suggestions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Clover, a logging tool that records how students interact with AI code suggestions and inserts attention checks to test whether they are reflecting on those suggestions. It defines a set of behavioral metrics drawn from existing literature, including how often users press tab to accept a suggestion and how long they dwell on a suggestion before acting. Analysis of student sessions shows that frequent tab accepts correlate with lower scores on the attention checks, while longer dwell times correlate with higher scores. The work argues that these patterns can indicate whether students are critically reviewing AI output rather than accepting it at face value during programming tasks.

Core claim

Higher rates of tab accepts were associated with lower attention check performance, while increased dwell time was associated with higher attention check performance, suggesting that behavioral interaction data can serve as signals of reflective engagement with AI code suggestions.

What carries the argument

Clover, a code completion tool that logs interactions with suggestions and deploys attention checks, paired with a taxonomy of behavioral metrics such as tab-accept rate and dwell time.

If this is right

Interaction logs from AI coding tools can be used to identify students who may not be critically evaluating suggestions.
Attention checks embedded in the coding workflow can surface moments of low engagement during live programming sessions.
Designers of AI coding assistants could surface dwell-time or accept-rate feedback to encourage more deliberate review of suggestions.
Programming educators could incorporate process data alongside final code to assess how students are using AI assistance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the correlations hold across settings, real-time dashboards could alert instructors when a student's accept rate spikes without corresponding dwell time.
The same logging approach might be adapted to non-code AI tools, such as document editors, to track whether users pause to review generated text.
A follow-up could test whether prompting students with their own behavioral metrics changes how often they accept suggestions without review.

Load-bearing premise

Attention checks provide a valid measure of reflective engagement with code suggestions rather than reflecting task difficulty, prior knowledge, or other unrelated factors.

What would settle it

An experiment that directly measures whether students can explain or correctly modify an accepted suggestion and checks whether that measure aligns with the reported correlations between tab accepts, dwell time, and attention-check scores.

Figures

Figures reproduced from arXiv: 2606.30549 by Antonio Lazaro, Ian Tyler Applebaum, Jessica Hutchison, Kenneth Angelikas, Kush Rakesh Patel, Nicholas Rucinski, Phuoc Nguyen, Rahad Arman Nabid, Stephen MacNeil.

read the original abstract

AI code completion tools, such as Github Copilot, provide students with code suggestions to help them write programs. However, recent qualitative studies suggest that students fail to critically evaluate these suggestions. We present Clover, a code completion tool that logs students' interactions with code suggestions and additionally offers attention checks to probe reflective engagement during programming tasks. We also develop a taxonomy of behavioral interaction metrics for AI-assisted programming, informed by literature. We analyzed relationships between interaction patterns, engagement with attention checks, and task performance. We observed that higher rates of tab accept were associated with lower attention check performance, while increased dwell time was associated with higher attention check performance. We conclude by discussing how programming process data and attention checks might support reflective engagement in AI-assisted programming.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives us Clover plus a taxonomy of interaction metrics and some observed links to attention checks, but the evidence is too thin on methods to judge the claims.

read the letter

Here's the quick take on this one: the paper brings a new tool called Clover that logs student interactions with AI code completions and pairs it with attention checks, plus a taxonomy of metrics, and finds links between tab-accept rates, dwell time, and how well students do on those checks.

The new pieces are the tool itself and the taxonomy, which pulls from existing literature to categorize things like acceptance rates and time spent. That part is solid for giving researchers a structured way to look at behavioral data in this setting, moving beyond just asking students what they think.

It does a good job identifying the issue of uncritical use of suggestions, which aligns with prior qualitative findings. The observed patterns make intuitive sense at first glance.

Where it gets soft is the evidence base. The abstract mentions directional associations but skips sample sizes, statistical tests, exclusion rules, or any controls for things like student experience level or task complexity. Without those, it's difficult to rule out that the results are just reflecting skill differences rather than engagement levels. The stress-test point holds up here—the attention checks might be measuring whether students grasped the problem overall instead of whether they critically reviewed the AI output specifically. The paper doesn't show evidence that the checks were isolated from prior knowledge.

This work is mainly for CS education folks studying AI assistance in programming classes. A reader interested in measurement methods for engagement could get some ideas from the taxonomy and tool description.

I'd send it for peer review. The core idea has enough promise that referees could help tighten the methods and clarify what the metrics actually capture.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Clover, a code completion tool that logs student interactions with AI suggestions (e.g., tab accepts) and administers attention checks during programming tasks. It develops a literature-informed taxonomy of behavioral interaction metrics and analyzes relationships among these metrics, attention-check performance, and task outcomes. Key observations are that higher tab-accept rates correlate with lower attention-check performance while longer dwell times correlate with higher attention-check performance. The authors conclude that such process data and checks can help measure and support reflective engagement with AI-assisted code completion.

Significance. If the reported associations hold after appropriate controls, the work supplies an empirical basis and practical instrumentation for studying critical evaluation of AI suggestions in CS education, an area of growing importance. The taxonomy of behavioral metrics, grounded in prior literature, is a reusable contribution that could improve comparability across studies. The paper does not ship machine-checked proofs or parameter-free derivations, but the logging tool and attention-check approach represent a concrete methodological step forward.

major comments (2)

[Abstract] Abstract: The central claim treats attention-check performance as a proxy for reflective engagement with the specific AI code suggestions. No details are supplied on how the checks were constructed to be independent of task difficulty, problem-statement comprehension, or participants' prior domain knowledge. This is load-bearing because the negative association with tab-accept rate could be driven by expertise differences rather than differences in scrutiny of the Copilot output.
[Abstract] Abstract (results paragraph): The directional associations are reported without reference to sample size, statistical tests, controls for expertise or task difficulty, or exclusion criteria. These omissions prevent evaluation of whether the data actually support the claimed relationships between behavioral signals and attention-check performance.

minor comments (1)

The taxonomy section would benefit from explicit operational definitions and example log entries for each metric to allow replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on the abstract. We agree that the abstract requires expansion to include methodological details and statistical information, and we will revise it accordingly in the next version. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim treats attention-check performance as a proxy for reflective engagement with the specific AI code suggestions. No details are supplied on how the checks were constructed to be independent of task difficulty, problem-statement comprehension, or participants' prior domain knowledge. This is load-bearing because the negative association with tab-accept rate could be driven by expertise differences rather than differences in scrutiny of the Copilot output.

Authors: We acknowledge the abstract lacks these details. The Methods section describes attention checks as embedded multiple-choice items on problem requirements and suggestion relevance, piloted for independence from syntax knowledge and administered at fixed points. Analyses include controls for self-reported prior experience and task difficulty via regression. We will revise the abstract to summarize check construction and note the controls, addressing the potential expertise confound. revision: yes
Referee: [Abstract] Abstract (results paragraph): The directional associations are reported without reference to sample size, statistical tests, controls for expertise or task difficulty, or exclusion criteria. These omissions prevent evaluation of whether the data actually support the claimed relationships between behavioral signals and attention-check performance.

Authors: The abstract is a concise summary; the Results section reports sample size (N=48), Pearson correlations and regressions with expertise/task controls, and exclusion criteria (e.g., incomplete tasks). We will expand the abstract to include sample size, mention of statistical tests and controls, and a note on exclusion criteria to support evaluation of the associations. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical correlations from observed data

full rationale

The paper reports an empirical study using a custom tool (Clover) to log interactions and attention checks during programming tasks. The central findings are observed associations (higher tab-accept rates linked to lower attention-check scores; higher dwell time linked to higher attention-check scores) derived from participant data analysis. No equations, fitted parameters, derivations, or predictions are described that reduce to inputs by construction. The taxonomy is informed by literature (standard practice), and no self-citation chains or uniqueness claims underpin the results. The derivation chain is self-contained against external benchmarks as straightforward behavioral data analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities; all elements appear drawn from standard HCI methods and prior literature on code completion.

pith-pipeline@v0.9.1-grok · 5694 in / 1017 out tokens · 50759 ms · 2026-06-30T04:45:56.349051+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 31 canonical work pages

[1]

James, and Nadia Polikarpova

Shraddha Barke, Michael B. James, and Nadia Polikarpova. 2023. Grounded Copilot: How Programmers Interact with Code-Generating Models.Proc. ACM Program. Lang.7, OOPSLA1, Article 78 (April 2023), 27 pages. https://doi.org/10. 1145/3586030

2023
[2]

Seth Bernstein, Ashfin Rahman, Nadia Sharifi, Ariunjargal Terbish, and Stephen MacNeil. 2025. Beyond the Benefits: A Systematic Review of the Harms and Consequences of Generative AI in Computing Education. InProceedings of the 25th Koli Calling International Conference on Computing Education Research. https://doi.org/10.1145/3769994.3770036

work page doi:10.1145/3769994.3770036 2025
[3]

Brown and Amjad Altadmri

Neil C.C. Brown and Amjad Altadmri. 2014. Investigating novice programming mistakes: educator beliefs vs. student data. InProceedings of the Tenth Annual Conference on International Computing Education Research. 43–50. https://doi. org/10.1145/2632320.2632343

work page doi:10.1145/2632320.2632343 2014
[4]

Carter, Christopher D

Adam S. Carter, Christopher D. Hundhausen, and Olusola Adesope. 2015. The Normalized Programming State Model: Predicting Student Performance in Com- puting Courses Based on Programming Behavior. InProceedings of the Eleventh Annual International Conference on International Computing Education Research To Tab or Not to Tab (Omaha, Nebraska, USA)(ICER ’15)....

work page doi:10.1145/2787622.2787710 2015
[5]

John Edwards, Arto Hellas, and Juho Leinonen. 2025. On the Opportunities of Large Language Models for Programming Process Data. InProceedings of the 27th Australasian Computing Education Conference (ACE 2025). 105–113. https://doi.org/10.1145/3716640.3716652

work page doi:10.1145/3716640.3716652 2025
[6]

Jonathan St. B. T. Evans. 2008. Dual-processing accounts of reasoning, judgment, and social cognition.Annual Review of Psychology59 (2008), 255–278. https: //doi.org/10.1146/annurev.psych.59.103006.093629

work page doi:10.1146/annurev.psych.59.103006.093629 2008
[7]

Ge Gao, Samiha Marwan, and Thomas W Price. 2021. Early performance predic- tion using interpretable patterns in programming process data. InProceedings of the 52nd ACM technical symposium on computer science education. 342–348. https://doi.org/10.1145/3408877.3432439

work page doi:10.1145/3408877.3432439 2021
[8]

Lydia Harbarth, Eva Gößwein, Daniel Bodemer, and Lenka Schnaubert. 2025. (Over) trusting AI recommendations: How system and person variables affect di- mensions of complacency.International Journal of Human–Computer Interaction 41, 1 (2025), 391–410. https://doi.org/10.1080/10447318.2023.2301250

work page doi:10.1080/10447318.2023.2301250 2025
[9]

Irene Hou, Sophia Mettille, Owen Man, Zhuo Li, Cynthia Zastudil, and Stephen MacNeil. 2024. The Effects of Generative AI on Computing Students’ Help- Seeking Preferences. InProceedings of the 26th Australasian Computing Education Conference (ACE ’24). ACM, 39–48. https://doi.org/10.1145/3636243.3636248

work page doi:10.1145/3636243.3636248 2024
[10]

Irene Hou, Hannah Vy Nguyen, Owen Man, and Stephen MacNeil. 2025. The Evolving Usage of GenAI by Computing Students. InProceedings of the 56th ACM Technical Symposium on Computer Science Education V. 2. 1481––1482. https://doi.org/10.1145/3641555.3705266

work page doi:10.1145/3641555.3705266 2025
[11]

Dhanya Jayagopal, Justin Lubin, and Sarah E. Chasins. 2022. Exploring the Learnability of Program Synthesizers by Novice Programmers. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. https://doi.org/10.1145/3526113.3545659

work page doi:10.1145/3526113.3545659 2022
[12]

2011.Thinking, Fast and Slow

Daniel Kahneman. 2011.Thinking, Fast and Slow. Farrar, Straus and Giroux. https://doi.org/10.1007/s00362-013-0533-y

work page doi:10.1007/s00362-013-0533-y 2011
[13]

Daniil Karol, Elizaveta Artser, Ilya Vlasov, Yaroslav Golubev, Hieke Keuning, and Anastasiia Birillo. 2025. KOALA: A Configurable Tool for Collecting IDE Data When Solving Programming Tasks. InProceedings of the ACM Global on Comput- ing Education Conference 2025 Vol 1 (CompEd 2025). Association for Computing Machinery, 183–189. https://doi.org/10.1145/37...

work page doi:10.1145/3736181.3747129 2025
[14]

Majeed Kazemitabaar, Xinying Hou, Austin Henley, Barbara Jane Ericson, David Weintrop, and Tovi Grossman. 2024. How Novices Use LLM-based Code Gen- erators to Solve CS1 Coding Tasks in a Self-Paced Learning Environment. In Proceedings of the 23rd Koli Calling International Conference on Computing Edu- cation Research. 1–12. https://doi.org/10.1145/3631802.3631806

work page doi:10.1145/3631802.3631806 2024
[15]

Hieke Keuning, Isaac Alpizar-Chacon, Ioanna Lykourentzou, Lauren Beehler, Christian Köppe, Imke de Jong, and Sergey Sosnovsky. 2024. Students’ Percep- tions and Use of Generative AI Tools for Programming Across Different Comput- ing Courses. InProceedings of the 24th Koli Calling International Conference on Computing Education Research. 1–12. https://doi....

work page doi:10.1145/3699538.3699546 2024
[16]

Shao-Heng Ko and Kristin Stephens-Martinez. 2025. Rethinking Computing Students’ Help Resource Utilization through Sequentiality.ACM Trans. Comput. Educ.25 (2025), 1–34. https://doi.org/10.1145/3716860

work page doi:10.1145/3716860 2025
[17]

Juho Leinonen et al. 2019. Keystroke data in programming courses.Department of Computer Science, Series of Publications A(2019)

2019
[18]

Reeves, Juho Leinonen, and Rachel Louise Rossetti

Stephen MacNeil, James Prather, Rahad Arman Nabid, Sebastian Gutierrez, Silas Carvalho, Saimon Shrestha, Paul Denny, Brent N. Reeves, Juho Leinonen, and Rachel Louise Rossetti. 2025. Fostering Responsible AI Use Through Negative Expertise: A Contextualized Autocompletion Quiz. InProceedings of the 30th ACM Conference on Innovation and Technology in Comput...

work page doi:10.1145/3724363.3729067 2025
[19]

Stephen MacNeil, Magdalena Rogalska, Juho Leinonen, Paul Denny, Arto Hel- las, and Xandria Crosland. 2024. Synthetic Students: A Comparative Study of Bug Distribution Between Large Language Models and Computing Students. InProceedings of the 2024 on ACM Virtual Global Computing Education Confer- ence V. 1 (SIGCSE Virtual 2024). Association for Computing M...

work page doi:10.1145/3649165.3690100 2024
[20]

Margulieux, James Prather, Brent N

Lauren E. Margulieux, James Prather, Brent N. Reeves, Brett A. Becker, Gozde Cetin Uzun, et al . 2024. Self-Regulation, Self-Efficacy, and Fear of Failure In- teractions with How Novices Use LLMs to Solve Programming Problems. In Proceedings of the 2024 on Innovation and Technology in Computer Science Educa- tion V. 1 (ITiCSE 2024). ACM, 276–282. https://...

work page doi:10.1145/3649217.3653621 2024
[21]

Pratibha Menon. 2023. Exploring GitHub Copilot assistance for working with classes in a programming course.Issues in Information Systems24, 4 (2023)

2023
[22]

Marvin Minsky. 1997. Negative expertise. (1997)

1997
[23]

Seong Min Park, Marco Ho, Michael Pin-Chuan Lin, and Jeeho Ryoo. 2025. Evaluating the Impact of Assistive AI Tools on Learning Outcomes and Ethical Considerations in Programming Education. In2025 IEEE Global Engineering Education Conference (EDUCON). 1–10. https://doi.org/10.1109/EDUCON62633. 2025.11016517

work page doi:10.1109/educon62633 2025
[24]

2024.Learn AI-Assisted Python Programming: With Github Copilot and ChatGPT

Leo Porter and Daniel Zingaro. 2024.Learn AI-Assisted Python Programming: With Github Copilot and ChatGPT. Simon and Schuster

2024
[25]

Becker, Ibrahim Albluwi, et al

James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, et al. 2023. The Robots Are Here: Navigating the Generative AI Revolution in Computing Education. InProceedings of the 2023 Working Group Reports on Innovation and Technology in Computer Science Education (ITiCSE-WGR ’23). Association for Computing Machinery. https://doi.org/10.11...

work page doi:10.1145/3623762.3633499 2023
[26]

James Prather, Juho Leinonen, Natalie Kiesler, Jamie Gorson Benario, et al. 2025. Beyond the Hype: A Comprehensive Review of Current Trends in Generative AI Research, Teaching Practices, and Tools. In2024 Working Group Reports on Innovation and Technology in Computer Science Education(Milan, Italy)(ITiCSE 2024). Association for Computing Machinery, New Yo...

work page doi:10.1145/3689187.3709614 2025
[27]

It’s Weird That it Knows What I Want

James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers.ACM Trans. Comput.-Hum. Interact.31, 1, Article 4 (Nov. 2023), 31 pages. https://doi.or...

work page doi:10.1145/3617367 2023
[28]

In: Proc

James Prather, Brent N Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S Ran- drianasolo, Brett A. Becker, Bailey Kimmel, Jared Wright, and Ben Briggs. 2024. The Widening Gap: The Benefits and Harms of Generative AI for Novice Pro- grammers. InProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 1 (ICER ’24). Associ...

work page doi:10.1145/3632620.3671116 2024
[29]

Price, David Hovemeyer, Kelly Rivers, Ge Gao, et al

Thomas W. Price, David Hovemeyer, Kelly Rivers, Ge Gao, et al. 2020. ProgSnap2: A Flexible Format for Programming Process Data. InProceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE ’20). ACM, 356–362. https://doi.org/10.1145/3341525.3387373

work page doi:10.1145/3341525.3387373 2020
[30]

Griswold, and Adalbert Gerald Soosai Raj

Anshul Shah, Anya Chernova, Elena Tomson, Leo Porter, William G. Griswold, and Adalbert Gerald Soosai Raj. 2025. Students’ Use of GitHub Copilot for Work- ing with Large Code Bases. InProceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1 (SIGCSETS 2025). Association for Computing Machinery, 1050–1056. https://doi.org/10.1145/3...

work page doi:10.1145/3641554.3701800 2025
[31]

Md Istiak Hossain Shihab, Christopher Hundhausen, Ahsun Tariq, Summit Haque, Yunhan Qiao, and Brian Wise Mulanda. 2025. The Effects of GitHub Copilot on Computing Students’ Programming Effectiveness, Efficiency, and Processes in Brownfield Coding Tasks. InProceedings of the 2025 ACM Conference on Interna- tional Computing Education Research V.1 (ICER ’25)...

work page doi:10.1145/3702652.3744219 2025
[32]

Estelle Smith, Kylee Shiekh, Hayden Cooreman, Sharfi Rahman, et al

C. Estelle Smith, Kylee Shiekh, Hayden Cooreman, Sharfi Rahman, et al. 2024. Early Adoption of Generative Artificial Intelligence in Computing Education: Emergent Student Use Cases and Perspectives in 2023. InProceedings of the 2024 Innovation and Technology in Computer Science Education V.1 (ITiCSE 2024). ACM, 3–9. https://doi.org/10.1145/3649217.3653575

work page doi:10.1145/3649217.3653575 2024
[33]

Smith IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, and Leo Porter

Annapurna Vadaparty, Daniel Zingaro, David H. Smith IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, and Leo Porter. 2024. CS1-LLM: Integrating LLMs into CS1 Instruction. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1 (ITiCSE 2024). ACM, 297–303. https://doi.org/10.1145/3649217.3653584

work page doi:10.1145/3649217.3653584 2024
[34]

experience: Evaluating the usability of code generation tools powered by large language models

Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. InExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA ’22). Association for Computing Machinery, Article 332, 7 pages. https://doi.org/10.114...

work page doi:10.1145/3491101.3519665 2022
[35]

Michel Wermelinger. 2023. Using GitHub Copilot to Solve Simple Programming Problems. InProceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 172–178. https://doi.org/10.1145/3545945.3569830

work page doi:10.1145/3545945.3569830 2023
[36]

Cynthia Zastudil, Magdalena Rogalska, Christine Kapp, Jennifer Vaughn, and Stephen MacNeil. 2023. Generative AI in Computing Education: Perspectives of Students and Instructors. In2023 IEEE Frontiers in Education Conference (FIE). 1–9. https://doi.org/10.1109/FIE58773.2023.10343467

work page doi:10.1109/fie58773.2023.10343467 2023

[1] [1]

James, and Nadia Polikarpova

Shraddha Barke, Michael B. James, and Nadia Polikarpova. 2023. Grounded Copilot: How Programmers Interact with Code-Generating Models.Proc. ACM Program. Lang.7, OOPSLA1, Article 78 (April 2023), 27 pages. https://doi.org/10. 1145/3586030

2023

[2] [2]

Seth Bernstein, Ashfin Rahman, Nadia Sharifi, Ariunjargal Terbish, and Stephen MacNeil. 2025. Beyond the Benefits: A Systematic Review of the Harms and Consequences of Generative AI in Computing Education. InProceedings of the 25th Koli Calling International Conference on Computing Education Research. https://doi.org/10.1145/3769994.3770036

work page doi:10.1145/3769994.3770036 2025

[3] [3]

Brown and Amjad Altadmri

Neil C.C. Brown and Amjad Altadmri. 2014. Investigating novice programming mistakes: educator beliefs vs. student data. InProceedings of the Tenth Annual Conference on International Computing Education Research. 43–50. https://doi. org/10.1145/2632320.2632343

work page doi:10.1145/2632320.2632343 2014

[4] [4]

Carter, Christopher D

Adam S. Carter, Christopher D. Hundhausen, and Olusola Adesope. 2015. The Normalized Programming State Model: Predicting Student Performance in Com- puting Courses Based on Programming Behavior. InProceedings of the Eleventh Annual International Conference on International Computing Education Research To Tab or Not to Tab (Omaha, Nebraska, USA)(ICER ’15)....

work page doi:10.1145/2787622.2787710 2015

[5] [5]

John Edwards, Arto Hellas, and Juho Leinonen. 2025. On the Opportunities of Large Language Models for Programming Process Data. InProceedings of the 27th Australasian Computing Education Conference (ACE 2025). 105–113. https://doi.org/10.1145/3716640.3716652

work page doi:10.1145/3716640.3716652 2025

[6] [6]

Jonathan St. B. T. Evans. 2008. Dual-processing accounts of reasoning, judgment, and social cognition.Annual Review of Psychology59 (2008), 255–278. https: //doi.org/10.1146/annurev.psych.59.103006.093629

work page doi:10.1146/annurev.psych.59.103006.093629 2008

[7] [7]

Ge Gao, Samiha Marwan, and Thomas W Price. 2021. Early performance predic- tion using interpretable patterns in programming process data. InProceedings of the 52nd ACM technical symposium on computer science education. 342–348. https://doi.org/10.1145/3408877.3432439

work page doi:10.1145/3408877.3432439 2021

[8] [8]

Lydia Harbarth, Eva Gößwein, Daniel Bodemer, and Lenka Schnaubert. 2025. (Over) trusting AI recommendations: How system and person variables affect di- mensions of complacency.International Journal of Human–Computer Interaction 41, 1 (2025), 391–410. https://doi.org/10.1080/10447318.2023.2301250

work page doi:10.1080/10447318.2023.2301250 2025

[9] [9]

Irene Hou, Sophia Mettille, Owen Man, Zhuo Li, Cynthia Zastudil, and Stephen MacNeil. 2024. The Effects of Generative AI on Computing Students’ Help- Seeking Preferences. InProceedings of the 26th Australasian Computing Education Conference (ACE ’24). ACM, 39–48. https://doi.org/10.1145/3636243.3636248

work page doi:10.1145/3636243.3636248 2024

[10] [10]

Irene Hou, Hannah Vy Nguyen, Owen Man, and Stephen MacNeil. 2025. The Evolving Usage of GenAI by Computing Students. InProceedings of the 56th ACM Technical Symposium on Computer Science Education V. 2. 1481––1482. https://doi.org/10.1145/3641555.3705266

work page doi:10.1145/3641555.3705266 2025

[11] [11]

Dhanya Jayagopal, Justin Lubin, and Sarah E. Chasins. 2022. Exploring the Learnability of Program Synthesizers by Novice Programmers. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. https://doi.org/10.1145/3526113.3545659

work page doi:10.1145/3526113.3545659 2022

[12] [12]

2011.Thinking, Fast and Slow

Daniel Kahneman. 2011.Thinking, Fast and Slow. Farrar, Straus and Giroux. https://doi.org/10.1007/s00362-013-0533-y

work page doi:10.1007/s00362-013-0533-y 2011

[13] [13]

Daniil Karol, Elizaveta Artser, Ilya Vlasov, Yaroslav Golubev, Hieke Keuning, and Anastasiia Birillo. 2025. KOALA: A Configurable Tool for Collecting IDE Data When Solving Programming Tasks. InProceedings of the ACM Global on Comput- ing Education Conference 2025 Vol 1 (CompEd 2025). Association for Computing Machinery, 183–189. https://doi.org/10.1145/37...

work page doi:10.1145/3736181.3747129 2025

[14] [14]

Majeed Kazemitabaar, Xinying Hou, Austin Henley, Barbara Jane Ericson, David Weintrop, and Tovi Grossman. 2024. How Novices Use LLM-based Code Gen- erators to Solve CS1 Coding Tasks in a Self-Paced Learning Environment. In Proceedings of the 23rd Koli Calling International Conference on Computing Edu- cation Research. 1–12. https://doi.org/10.1145/3631802.3631806

work page doi:10.1145/3631802.3631806 2024

[15] [15]

Hieke Keuning, Isaac Alpizar-Chacon, Ioanna Lykourentzou, Lauren Beehler, Christian Köppe, Imke de Jong, and Sergey Sosnovsky. 2024. Students’ Percep- tions and Use of Generative AI Tools for Programming Across Different Comput- ing Courses. InProceedings of the 24th Koli Calling International Conference on Computing Education Research. 1–12. https://doi....

work page doi:10.1145/3699538.3699546 2024

[16] [16]

Shao-Heng Ko and Kristin Stephens-Martinez. 2025. Rethinking Computing Students’ Help Resource Utilization through Sequentiality.ACM Trans. Comput. Educ.25 (2025), 1–34. https://doi.org/10.1145/3716860

work page doi:10.1145/3716860 2025

[17] [17]

Juho Leinonen et al. 2019. Keystroke data in programming courses.Department of Computer Science, Series of Publications A(2019)

2019

[18] [18]

Reeves, Juho Leinonen, and Rachel Louise Rossetti

Stephen MacNeil, James Prather, Rahad Arman Nabid, Sebastian Gutierrez, Silas Carvalho, Saimon Shrestha, Paul Denny, Brent N. Reeves, Juho Leinonen, and Rachel Louise Rossetti. 2025. Fostering Responsible AI Use Through Negative Expertise: A Contextualized Autocompletion Quiz. InProceedings of the 30th ACM Conference on Innovation and Technology in Comput...

work page doi:10.1145/3724363.3729067 2025

[19] [19]

Stephen MacNeil, Magdalena Rogalska, Juho Leinonen, Paul Denny, Arto Hel- las, and Xandria Crosland. 2024. Synthetic Students: A Comparative Study of Bug Distribution Between Large Language Models and Computing Students. InProceedings of the 2024 on ACM Virtual Global Computing Education Confer- ence V. 1 (SIGCSE Virtual 2024). Association for Computing M...

work page doi:10.1145/3649165.3690100 2024

[20] [20]

Margulieux, James Prather, Brent N

Lauren E. Margulieux, James Prather, Brent N. Reeves, Brett A. Becker, Gozde Cetin Uzun, et al . 2024. Self-Regulation, Self-Efficacy, and Fear of Failure In- teractions with How Novices Use LLMs to Solve Programming Problems. In Proceedings of the 2024 on Innovation and Technology in Computer Science Educa- tion V. 1 (ITiCSE 2024). ACM, 276–282. https://...

work page doi:10.1145/3649217.3653621 2024

[21] [21]

Pratibha Menon. 2023. Exploring GitHub Copilot assistance for working with classes in a programming course.Issues in Information Systems24, 4 (2023)

2023

[22] [22]

Marvin Minsky. 1997. Negative expertise. (1997)

1997

[23] [23]

Seong Min Park, Marco Ho, Michael Pin-Chuan Lin, and Jeeho Ryoo. 2025. Evaluating the Impact of Assistive AI Tools on Learning Outcomes and Ethical Considerations in Programming Education. In2025 IEEE Global Engineering Education Conference (EDUCON). 1–10. https://doi.org/10.1109/EDUCON62633. 2025.11016517

work page doi:10.1109/educon62633 2025

[24] [24]

2024.Learn AI-Assisted Python Programming: With Github Copilot and ChatGPT

Leo Porter and Daniel Zingaro. 2024.Learn AI-Assisted Python Programming: With Github Copilot and ChatGPT. Simon and Schuster

2024

[25] [25]

Becker, Ibrahim Albluwi, et al

James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, et al. 2023. The Robots Are Here: Navigating the Generative AI Revolution in Computing Education. InProceedings of the 2023 Working Group Reports on Innovation and Technology in Computer Science Education (ITiCSE-WGR ’23). Association for Computing Machinery. https://doi.org/10.11...

work page doi:10.1145/3623762.3633499 2023

[26] [26]

James Prather, Juho Leinonen, Natalie Kiesler, Jamie Gorson Benario, et al. 2025. Beyond the Hype: A Comprehensive Review of Current Trends in Generative AI Research, Teaching Practices, and Tools. In2024 Working Group Reports on Innovation and Technology in Computer Science Education(Milan, Italy)(ITiCSE 2024). Association for Computing Machinery, New Yo...

work page doi:10.1145/3689187.3709614 2025

[27] [27]

It’s Weird That it Knows What I Want

James Prather, Brent N. Reeves, Paul Denny, Brett A. Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023. “It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers.ACM Trans. Comput.-Hum. Interact.31, 1, Article 4 (Nov. 2023), 31 pages. https://doi.or...

work page doi:10.1145/3617367 2023

[28] [28]

In: Proc

James Prather, Brent N Reeves, Juho Leinonen, Stephen MacNeil, Arisoa S Ran- drianasolo, Brett A. Becker, Bailey Kimmel, Jared Wright, and Ben Briggs. 2024. The Widening Gap: The Benefits and Harms of Generative AI for Novice Pro- grammers. InProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 1 (ICER ’24). Associ...

work page doi:10.1145/3632620.3671116 2024

[29] [29]

Price, David Hovemeyer, Kelly Rivers, Ge Gao, et al

Thomas W. Price, David Hovemeyer, Kelly Rivers, Ge Gao, et al. 2020. ProgSnap2: A Flexible Format for Programming Process Data. InProceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE ’20). ACM, 356–362. https://doi.org/10.1145/3341525.3387373

work page doi:10.1145/3341525.3387373 2020

[30] [30]

Griswold, and Adalbert Gerald Soosai Raj

Anshul Shah, Anya Chernova, Elena Tomson, Leo Porter, William G. Griswold, and Adalbert Gerald Soosai Raj. 2025. Students’ Use of GitHub Copilot for Work- ing with Large Code Bases. InProceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1 (SIGCSETS 2025). Association for Computing Machinery, 1050–1056. https://doi.org/10.1145/3...

work page doi:10.1145/3641554.3701800 2025

[31] [31]

Md Istiak Hossain Shihab, Christopher Hundhausen, Ahsun Tariq, Summit Haque, Yunhan Qiao, and Brian Wise Mulanda. 2025. The Effects of GitHub Copilot on Computing Students’ Programming Effectiveness, Efficiency, and Processes in Brownfield Coding Tasks. InProceedings of the 2025 ACM Conference on Interna- tional Computing Education Research V.1 (ICER ’25)...

work page doi:10.1145/3702652.3744219 2025

[32] [32]

Estelle Smith, Kylee Shiekh, Hayden Cooreman, Sharfi Rahman, et al

C. Estelle Smith, Kylee Shiekh, Hayden Cooreman, Sharfi Rahman, et al. 2024. Early Adoption of Generative Artificial Intelligence in Computing Education: Emergent Student Use Cases and Perspectives in 2023. InProceedings of the 2024 Innovation and Technology in Computer Science Education V.1 (ITiCSE 2024). ACM, 3–9. https://doi.org/10.1145/3649217.3653575

work page doi:10.1145/3649217.3653575 2024

[33] [33]

Smith IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, and Leo Porter

Annapurna Vadaparty, Daniel Zingaro, David H. Smith IV, Mounika Padala, Christine Alvarado, Jamie Gorson Benario, and Leo Porter. 2024. CS1-LLM: Integrating LLMs into CS1 Instruction. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1 (ITiCSE 2024). ACM, 297–303. https://doi.org/10.1145/3649217.3653584

work page doi:10.1145/3649217.3653584 2024

[34] [34]

experience: Evaluating the usability of code generation tools powered by large language models

Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. InExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA ’22). Association for Computing Machinery, Article 332, 7 pages. https://doi.org/10.114...

work page doi:10.1145/3491101.3519665 2022

[35] [35]

Michel Wermelinger. 2023. Using GitHub Copilot to Solve Simple Programming Problems. InProceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 172–178. https://doi.org/10.1145/3545945.3569830

work page doi:10.1145/3545945.3569830 2023

[36] [36]

Cynthia Zastudil, Magdalena Rogalska, Christine Kapp, Jennifer Vaughn, and Stephen MacNeil. 2023. Generative AI in Computing Education: Perspectives of Students and Instructors. In2023 IEEE Frontiers in Education Conference (FIE). 1–9. https://doi.org/10.1109/FIE58773.2023.10343467

work page doi:10.1109/fie58773.2023.10343467 2023