pith. sign in

arxiv: 2606.30860 · v1 · pith:JULA6CSBnew · submitted 2026-06-29 · 💻 cs.CY

Less Deliberate in Teams: Student LLM Use Across Individual and Collaborative Work

Pith reviewed 2026-07-01 01:16 UTC · model grok-4.3

classification 💻 cs.CY
keywords LLM usage in educationcollaborative learningstudent AI engagementcomputing educationteam projectsAI prompting habits
0
0 comments X

The pith

Students engage less deliberately with LLMs when shifting from individual to team assignments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

A semester-long study tracked 96 undergraduates across six computing assignments that alternated between solo homework and team milestones. LLM usage fell 42.7 percentage points at the first team task, with fewer and simpler prompts plus less output verification, and the drop only partly recovered later. A within-student comparison showed 18.9 percent of consistent individual users stopped using LLMs entirely in teams. The pattern points to collaborative context itself reducing careful AI engagement beyond task differences alone.

Core claim

Collaborative context is associated with reduced deliberate LLM engagement beyond what task type alone can explain, as shown by the sharp drop in usage, prompting complexity, and verification when students form teams.

What carries the argument

The within-student tracking of LLM usage rates, prompt characteristics, and output verification across alternating individual and team assignment contexts.

If this is right

  • Course designs must address the transition to team work as a distinct point where deliberate LLM practices decline.
  • Support for LLM use in collaborative settings cannot rely on the same approaches used for individual assignments.
  • The share of students who test AI-generated code remains lower across team milestones and does not return to individual levels.
  • Task-type differences alone do not account for the observed reduction in prompting strategies and output checks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Interventions aimed at preserving careful LLM use could be tested specifically at the moment teams are formed.
  • The pattern may extend to other collaborative educational settings where social context influences tool engagement.
  • Objective usage logs rather than self-reports would allow stronger isolation of the team-formation effect from other factors.

Load-bearing premise

The tracking of LLM usage, prompting, and verification accurately reflects actual student behavior without substantial self-report bias or observation effects.

What would settle it

A replication with direct logging of LLM interactions that finds no drop in usage or verification upon team formation would falsify the claimed association.

Figures

Figures reproduced from arXiv: 2606.30860 by Khyati Goyal, Saad Nizamani, Sehrish Basir Nizamani, Zannah Ziew.

Figure 1
Figure 1. Figure 1: Self-reported LLM usage rates across all six assign [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Within-student transitions in dominant LLM role [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

As large language models (LLMs) become common in computing courses, we need to understand how the social setting shapes how students use them. This paper reports findings from a semester-long study of 96 undergraduate students who completed six assignments, alternating between individual homework and team project milestones. We tracked LLM usage, prompting habits, and how students verified AI-generated output across all six assignments. LLM usage dropped by 42.7 percentage points when students moved from individual work to their first team milestone, then partly recovered in later team tasks. Students also wrote fewer and simpler prompts, used fewer intentional prompting strategies, and checked LLM output less carefully. The share of students who ran tests on AI-generated code fell by 19.4 percentage points during team assignments and never fully rebounded. A within-student analysis found that 18.9% of students who consistently used LLMs on their own stopped using them entirely in teams, while only 3.2% went the other direction. These results suggest that collaborative context is associated with reduced deliberate LLM engagement beyond what task type alone can explain. The moment students form teams appears to be a critical and currently unsupported turning point in computing course design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper reports findings from a semester-long study of 96 undergraduate students tracking LLM usage, prompting habits, and output verification across six assignments alternating between individual homework and team project milestones. It finds a 42.7 percentage point drop in LLM usage at the first team milestone, with partial recovery later, fewer and simpler prompts, less verification, and a 19.4 pp drop in running tests on AI code. Within-student analysis shows 18.9% of consistent individual LLM users stopped in teams, vs 3.2% the reverse. The authors suggest collaborative context reduces deliberate LLM engagement beyond task type.

Significance. The concrete within-student transitions and behavioral changes provide useful descriptive data on how social setting may influence LLM use in education. The alternating design offers a natural experiment-like observation. However, the central claim regarding the role of collaborative context specifically is weakened by the confounding of team status with task type, limiting the significance until the isolation is addressed. Strengths include the longitudinal tracking and specific percentages reported.

major comments (1)
  1. [Abstract] The design confounds team/individual status with homework/project-milestone task type, as assignments alternate between the two. The 42.7 pp drop occurs exactly at the first switch from individual homework to team project milestone. No regression, matching, or within-task-type contrast holding assignment category fixed is described. This makes the claim that the reduction is 'beyond what task type alone can explain' unsupported by the presented evidence.
minor comments (2)
  1. Additional details on the methods for tracking LLM usage (e.g., self-report, logs, or observation) and any measures taken to mitigate self-report bias or observation effects would improve the interpretability of the reported percentages.
  2. The manuscript should clarify if any statistical controls or tests were used in the within-student analysis beyond the raw transition percentages.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and constructive comment. We agree that the alternating assignment structure confounds collaboration status with task type and that our current evidence does not isolate the effect of teams from task type. We will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] The design confounds team/individual status with homework/project-milestone task type, as assignments alternate between the two. The 42.7 pp drop occurs exactly at the first switch from individual homework to team project milestone. No regression, matching, or within-task-type contrast holding assignment category fixed is described. This makes the claim that the reduction is 'beyond what task type alone can explain' unsupported by the presented evidence.

    Authors: We acknowledge the validity of this observation. The study alternates individual homework with team project milestones, so the first observed drop coincides exactly with the transition to both team work and project-milestone tasks. No regression models, matching procedures, or within-task-type contrasts that hold assignment category fixed are reported in the manuscript. The partial recovery across later team milestones offers suggestive but not conclusive evidence. We will revise the abstract, results, and discussion sections to remove the claim that the reduction occurs 'beyond what task type alone can explain.' The findings will instead be framed as descriptive changes associated with the shift to collaborative work under the observed alternating design. We will also add an explicit limitations paragraph describing this confound and the absence of isolating analyses. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational empirical report

full rationale

This paper presents direct measurements of LLM usage, prompting habits, and verification behaviors across assignments in a semester-long study of 96 students. No equations, fitted parameters, predictions derived from models, or derivations are present. The reported drops (e.g., 42.7 pp usage drop, 19.4 pp test-running drop) and within-student transitions (18.9% stopping use in teams) are raw observational counts and percentages, not outputs of any self-referential fitting or ansatz. The central claim is an association observed in the data; it does not reduce to a definition or prior self-citation by construction. The design confound noted by the skeptic is a validity concern, not a circularity issue.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical observational study with no mathematical derivations, free parameters, or postulated entities; relies on standard assumptions of educational research such as accurate behavioral tracking.

pith-pipeline@v0.9.1-grok · 5754 in / 1175 out tokens · 33743 ms · 2026-07-01T01:16:45.634918+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 16 canonical work pages

  1. [1]

    Rudaiba Adnin, Atharva Pandkar, Bingsheng Yao, Dakuo Wang, and Maitraye Das. 2025. Examining Student and Teacher Perspectives on Undisclosed Use of Generative AI in Academic Work. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 1071. doi:10.1145/3706598.3713393

  2. [2]

    Doga Cambaz and Xiaoling Zhang. 2024. Use of AI-Driven Code Generation Models in Teaching and Learning Programming: A Systematic Literature Review. InProceedings of the 55th ACM Technical Symposium on Computer Science Educa- tion V. 1. Association for Computing Machinery, New York, NY, USA, 172–178. doi:10.1145/3626252.3630958

  3. [3]

    Yalong Du, Chaozheng Wang, and Huaijin Wang. 2025. Programming Language Techniques for Bridging LLM Code Generation Semantic Gaps. InProceedings of the 1st ACM SIGPLAN International Workshop on Language Models and Program- ming Languages. Association for Computing Machinery, New York, NY, USA, 40–45. doi:10.1145/3759425.3763383

  4. [4]

    Irene Hou, Owen Man, Kate Hamilton, Srishty Muthusekaran, Jeffin Johnykutty, Leili Zadeh, and Stephen MacNeil. 2025. ‘All Roads Lead to ChatGPT’: How Generative AI is Eroding Social Interactions and Student Learning Communities. InProceedings of the 30th ACM Conference on Innovation and Technology in Com- puter Science Education V. 1. Association for Comp...

  5. [5]

    Korchak, G

    A. Korchak, G. Al Murshidi, A. Getman, N. Raouf, M. Arshe, N. Al Meheiri, and J. Costley. 2025. The Role of Social Influence in Generative Artificial Intelligence ChatGPT Adoption Intentions Among Undergraduate and Graduate Students. Innovations in Education and Teaching International62, 5 (2025), 1559–1573. doi:10. 1080/14703297.2025.2496942

  6. [6]

    Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2023. Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. InProceedings of the 37th International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, Article 943

  7. [7]

    We Need Structured Output

    Michael Xieyang Liu, Frederick Liu, Alexander J. Fiannaca, Terry Koo, Lucas Dixon, Michael Terry, and Carrie J. Cai. 2024. “We Need Structured Output”: Towards User-Centered Constraints on Large Language Model Output. InEx- tended Abstracts of the CHI Conference on Human Factors in Computing Sys- tems. Association for Computing Machinery, New York, NY, US...

  8. [8]

    Mahmoudi, D

    H. Mahmoudi, D. Chang, H. Lee, N. Ghaffarzadegan, and M. S. Jalali. 2025. Crit- ical Assessment of Large Language Models’ (ChatGPT) Performance in Data Extraction for Systematic Reviews: Exploratory Study.JMIR AI4 (2025), e68097. doi:10.2196/68097

  9. [9]

    Lennart Meincke, Karan Girotra, Gideon Nave, Christian Terwiesch, and Karl T. Ulrich. 2024. Using Large Language Models for Idea Generation in Innovation. SSRN Working Paper(2024). doi:10.2139/ssrn.4526071

  10. [10]

    Jacob Penney, Pawan Acharya, Peter Hilbert, Priyanka Parekh, Anita Sarma, Igor Steinmacher, and Marco Aurelio Gerosa. 2025. Outcomes, Perceptions, and Inter- action Strategies of Novice Programmers Studying with ChatGPT. InProceedings of the 7th ACM Conference on Conversational User Interfaces. Association for Com- puting Machinery, New York, NY, USA, Art...

  11. [11]

    A. F. Pereira and R. F. Mello. 2025. A Systematic Literature Review on Large Language Models Applications in Computer Programming Teaching Evaluation Process.IEEE Access13 (2025), 113449–113460. doi:10.1109/ACCESS.2025.3584060

  12. [12]

    Eric Poitras, Brent Crane, and Angela Siegel. 2024. Generative AI in Introductory Programming Instruction: Examining the Assistance Dilemma with LLM-Based Code Generators. InProceedings of the 2024 ACM Virtual Global Computing Education Conference V. 1. Association for Computing Machinery, New York, NY, USA, 186–192. doi:10.1145/3649165.3690111

  13. [13]

    Qu and J

    Y. Qu and J. Wang. 2026. To Disclose or Not to Disclose: Peer Influence and Psychological Factors in Students’ Use of Generative Artificial Intelligence.British Journal of Educational Psychology(2026). doi:10.1111/bjep.70086

  14. [14]

    Santos, and Marcos Zampieri

    Nishat Raihan, Mohammed Latif Siddiq, Joanna C.S. Santos, and Marcos Zampieri

  15. [15]

    InProceedings of the 56th ACM Technical Symposium on Com- puter Science Education V

    Large Language Models in Computer Science Education: A Systematic Literature Review. InProceedings of the 56th ACM Technical Symposium on Com- puter Science Education V. 1. Association for Computing Machinery, New York, NY, USA, 938–944. doi:10.1145/3641554.3701863

  16. [16]

    2006.Group Cognition: Computer Support for Building Collaborative Knowledge

    Gerry Stahl. 2006.Group Cognition: Computer Support for Building Collaborative Knowledge. The MIT Press. doi:10.7551/mitpress/3372.001.0001

  17. [17]

    Weixi Tong and Tianyi Zhang. 2024. CodeJudge: Evaluating Code Generation with Large Language Models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

  18. [18]

    Yu, and Qingsong Wen

    Shen Wang, Tianlong Xu, Hang Li, Chaoli Zhang, Joleen Liang, Jiliang Tang, Philip S. Yu, and Qingsong Wen. 2025. Large Language Models for Education: A Survey and Outlook.IEEE Signal Processing Magazine42, 6 (2025), 51–63. doi:10.1109/MSP.2025.3594309

  19. [19]

    Zhang, S

    Y. Zhang, S. A. Khan, A. Mahmud, and H. Yang. 2025. Exploring the Role of Large Language Models in the Scientific Method: From Hypothesis to Discovery.npj Artificial Intelligence1 (2025), 14. doi:10.1038/s44387-025-00019-5 A Survey Instrument The following questions were included in each post-task survey. Students responded immediately after completing ea...