pith. sign in

arxiv: 2606.18548 · v1 · pith:W362LL2Xnew · submitted 2026-06-16 · 💻 cs.CY · cs.AI

Engagement Intensity as a Learner-Modeling Signal for Adaptive AI Ethics Instruction

Pith reviewed 2026-06-26 21:54 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords AI ethics instructionlearner modelingLLM usage frequencyadaptive educationperception outcomesbioscience graduate traineesintake surveyself-reported familiarity
0
0 comments X

The pith

Self-reported LLM usage frequency associates with all five baseline AI perception outcomes in bioscience trainees, while prior AI education associates with none.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests three simple intake measures in 93 bioscience graduate and postdoctoral trainees to see which best predicts pre-course views on AI ethics topics. Usage frequency linked to every outcome after statistical correction, self-rated familiarity linked to three, and prior coursework or workshops linked to none. A sharp threshold effect at low usage appeared for interest in training and trust in AI accuracy rather than a smooth increase across all measures. If these patterns hold, short behavioral questions could replace longer background checks when building adaptive versions of required ethics training. The work focuses on making intake profiling lightweight so that instruction can adjust early without heavy data collection.

Core claim

In a required research ethics course, self-reported frequency of LLM use showed Holm-corrected associations with all five measured perception outcomes, self-rated familiarity showed associations with three outcomes, and prior AI education showed associations with none. The pattern was not a uniform gradient; a threshold-like drop at the low end of the usage scale stood out most clearly for training interest and accuracy trust.

What carries the argument

Statistical comparison of three candidate intake features (usage frequency, self-rated familiarity, prior AI education) against five perception outcomes using Holm-corrected associations in a cross-sectional survey.

If this is right

  • Usage frequency can function as a primary signal for grouping learners into adaptive AI ethics pathways.
  • Prior coursework or workshop attendance is unlikely to serve as a reliable proxy for initial perception differences.
  • Self-rated familiarity can act as a secondary indicator when usage data are unavailable.
  • Instructional adjustments may need to target low-usage subgroups more than high-usage subgroups for outcomes like training interest.
  • Lightweight intake surveys focused on behavior can replace longer educational-history questions in ethics-course design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same usage signal might inform adaptive modules in non-bioscience fields if the five perception items behave similarly across disciplines.
  • Systems could route low-usage learners first to modules on accuracy trust and training interest rather than applying uniform content.
  • Longitudinal follow-up could test whether usage-based grouping improves post-instruction shifts in the same perception items.
  • If usage frequency remains stable over time, it could support repeated lightweight profiling without re-administering full surveys.

Load-bearing premise

The five perception items are stable and meaningful targets for adaptive instruction, and self-reported usage frequency will keep serving as a useful grouping signal when course content or trainee populations change.

What would settle it

A replication study in a new trainee cohort or with revised course content finds no Holm-corrected associations between reported usage frequency and the same five perception items.

Figures

Figures reproduced from arXiv: 2606.18548 by Alex Bui, Lynn Talton, YongKyung Oh.

Figure 1
Figure 1. Figure 1: summarizes the overall response distributions for the five focal outcomes. The profile shows high endorsement of critical-thinking risk and training interest, moderate confidence in distinguishing capability, and substantially more cautious trust for complex-task use than for general scientific accuracy. 0% 20% 40% 60% 80% 100% Share of respondents (%) Accuracy trust Distinguishing capability Complex-task … view at source ↗
Figure 2
Figure 2. Figure 2: Ordered-median ladders for accuracy trust, distinguishing capability, complex-task trust, critical￾thinking risk, and training interest across the candidate intake features. Panels (c) and (d) show prior AI education under the three-group (formal / informal / none) and two-group (any prior / none) codings. B. Alternative Interpretations The study design does not establish directionality. Engagement may inf… view at source ↗
read the original abstract

Adaptive AI ethics instruction in graduate research training benefits from intake measures that reflect differences in prior LLM experience. Prior coursework or workshop attendance is an obvious candidate, but it is not clear whether it is associated with pre-instruction ratings on key AI perception items. We compare three candidate intake features, self-reported usage frequency, self-rated LLM familiarity, and prior AI education, across five baseline perception outcomes in 93 bioscience graduate and postdoctoral trainees enrolled in a required research ethics course. Usage frequency shows Holm-corrected associations with all five outcomes, self-rated familiarity with three, and prior AI education with none. A threshold-like pattern at the lower end of the scale is most visible for training interest and accuracy trust rather than appearing as a uniform gradient across all five outcomes. In a short intake survey, reported LLM use is more consistently associated with these perceptions than prior coursework or workshops, with self-rated familiarity serving as a secondary indicator. These results suggest that simple pre-instruction behavioral signals can inform lightweight intake profiling for adaptive AI ethics education.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript reports findings from a cross-sectional survey of 93 bioscience graduate and postdoctoral trainees in a required research ethics course. It compares three candidate intake features—self-reported LLM usage frequency, self-rated familiarity, and prior AI education—against five baseline perception outcomes. The central result is that usage frequency shows Holm-corrected associations with all five outcomes, self-rated familiarity with three, and prior AI education with none. A threshold-like pattern is noted at the lower end of the usage scale for some outcomes. The authors conclude that simple behavioral signals from an intake survey can inform lightweight profiling for adaptive AI ethics instruction.

Significance. If the reported associations replicate, the work supplies an empirical basis for using low-burden intake questions to differentiate prior engagement levels that correlate with AI perceptions in ethics training. The explicit application of Holm correction is a methodological strength that supports the differential pattern (5/5 vs. 3/5 vs. 0/5). The single-sample, cross-sectional design appropriately limits the claims to observed associations rather than causal or stable effects across populations or course variants.

major comments (2)
  1. [Methods] Methods: the manuscript states that Holm correction was applied but provides no information on the underlying statistical tests (e.g., correlation, chi-squared, or regression), degrees of freedom, handling of missing data, exact wording of the five perception items, or whether any post-hoc thresholds were chosen after inspecting the data. These details are required to evaluate the reported association counts.
  2. [Results] Results: the threshold-like pattern at the lower end of the usage-frequency scale is described for training interest and accuracy trust, yet no effect sizes, confidence intervals, or per-outcome test statistics are supplied to quantify how distinct this pattern is from the other three outcomes.
minor comments (2)
  1. [Abstract] Abstract: adding the sample size (n=93) and the cross-sectional design would give readers immediate context for the strength of the claims.
  2. [Discussion] Discussion: the scoping statement that usage frequency 'can inform' profiling is appropriate, but a short clause clarifying that no stability claims are made across different course content or trainee populations would prevent over-interpretation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for minor revision. We address each major comment point by point below.

read point-by-point responses
  1. Referee: [Methods] Methods: the manuscript states that Holm correction was applied but provides no information on the underlying statistical tests (e.g., correlation, chi-squared, or regression), degrees of freedom, handling of missing data, exact wording of the five perception items, or whether any post-hoc thresholds were chosen after inspecting the data. These details are required to evaluate the reported association counts.

    Authors: We agree that these details are necessary for a complete evaluation. We will revise the Methods section to include the specific statistical tests used, the degrees of freedom, how missing data were handled, the exact wording of the five perception items, and confirmation that no post-hoc thresholds were chosen after data inspection. revision: yes

  2. Referee: [Results] Results: the threshold-like pattern at the lower end of the usage-frequency scale is described for training interest and accuracy trust, yet no effect sizes, confidence intervals, or per-outcome test statistics are supplied to quantify how distinct this pattern is from the other three outcomes.

    Authors: The threshold-like pattern is described qualitatively in the manuscript. We will update the Results section to include effect sizes, confidence intervals, and per-outcome test statistics to better quantify the patterns and their distinctiveness. revision: yes

Circularity Check

0 steps flagged

No circularity: direct statistical associations from survey data

full rationale

The paper reports observed Holm-adjusted associations between three intake variables (usage frequency, familiarity, prior education) and five perception outcomes in a single cross-sectional sample. No equations, fitted parameters, or derivations are present that reduce reported results to quantities defined by the study's own inputs. The central claims consist of direct statistical patterns computed from the data, with no self-citation chains, ansatzes, or renamings that create circularity. The analysis is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on the validity of self-report measures and on standard assumptions of multiple-comparison correction; no free parameters, new entities, or ad-hoc axioms are introduced in the abstract.

axioms (2)
  • standard math Holm correction is the appropriate method for controlling family-wise error across the five outcomes
    Abstract states associations were Holm-corrected.
  • domain assumption Self-reported usage frequency is a reliable proxy for engagement intensity relevant to AI ethics learning
    Title and abstract treat reported usage as the primary intake signal.

pith-pipeline@v0.9.1-grok · 5708 in / 1282 out tokens · 25974 ms · 2026-06-26T21:54:06.108916+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 27 canonical work pages

  1. [1]

    McGrane, and Sabina Leonelli

    S. Milano, J. A. McGrane, S. Leonelli, Large language models challenge the future of higher education, Nature Machine Intelligence 5 (2023) 333–334. doi:10.1038/s42256-023-00644-2

  2. [2]

    Roberts, M

    J. Roberts, M. Baker, J. Andrew, Artificial intelligence and qualitative research: The promise and perils of large language model (LLM) ‘assistance’, Critical Perspectives on Accounting 99 (2024) 102722. doi:10. 1016/j.cpa.2024.102722

  3. [3]

    J. S. Barrot, Using ChatGPT for second language writing: Pitfalls and potentials, Assessing Writing 57 (2023) 100745. doi:10.1016/j.asw.2023.100745

  4. [4]

    M. Bond, H. Khosravi, M. De Laat, N. Bergdahl, V. Negrea, E. Oxley, P. Pham, S. W. Chong, G. Siemens, A meta systematic review of artificial intelligence in higher education: a call for increased ethics, collaboration, and rigour, International Journal of Educational Technology in Higher Education 21 (2024) 4. doi:10.1186/ s41239-023-00436-z

  5. [5]

    McDonald, A

    N. McDonald, A. Johri, A. Ali, A. H. Collier, Generative artificial intelligence in higher education: Evidence from an analysis of institutional policies and guidelines, Computers in Human Behavior: Artificial Humans 3 (2025) 100121. doi:10.1016/j.chbah.2025.100121

  6. [6]

    Usher, M

    M. Usher, M. Barak, Unpacking the role of AI ethics online education for science and engineering students, International Journal of STEM Education 11 (2024) 35. doi:10.1186/s40594-024-00493-4

  7. [7]

    D. Long, B. Magerko, What is AI Literacy? Competencies and Design Considerations, in: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, CHI ’20, Association for Computing Machinery, New York, NY, USA, 2020, pp. 1–16. doi:10.1145/3313831.3376727

  8. [8]

    W. Xing, N. Nixon, S. Crossley, P. Denny, A. Lan, J. Stamper, Z. Yu, The Use of Large Language Models in Education, International Journal of Artificial Intelligence in Education 35 (2025) 439–443. doi: 10.1007/ s40593-025-00457-x

  9. [9]

    D. B. Resnik, M. Hosseini, The ethics of using artificial intelligence in scientific research: new guidance needed for a new tool, AI and Ethics 5 (2025) 1499–1521. doi:10.1007/s43681-024-00493-8

  10. [10]

    Hosseini, D

    M. Hosseini, D. B. Resnik, K. Holmes, The ethics of disclosing the use of artificial intelligence tools in writing scholarly manuscripts, Research Ethics 19 (2023) 449–465. doi:10.1177/17470161231180449

  11. [11]

    Borenstein, A

    J. Borenstein, A. Howard, Emerging challenges in AI and the need for AI ethics education, AI and Ethics 1 (2021) 61–65. doi:10.1007/s43681-020-00002-7

  12. [12]

    URL https://doi.org/10.1016/ j.lindif.2023.102274

    E. Kasneci, K. Sessler, S. Küchemann, M. Bannert, D. Dementieva, F. Fischer, U. Gasser, G. Groh, S. Günne- mann, E. Hüllermeier, et al., ChatGPT for good? On opportunities and challenges of large language models for education, Learning and Individual Differences 103 (2023) 102274. doi:10.1016/j.lindif.2023.102274

  13. [13]

    M. Zou, L. Huang, To use or not to use? Understanding doctoral students’ acceptance of ChatGPT in writing through technology acceptance model, Frontiers in Psychology Volume 14 - 2023 (2023). doi: 10.3389/ fpsyg.2023.1259531

  14. [14]

    Suh, Generative AI integration in higher education shifts students’ attitudes from tool use to innovation, Discover Education 4 (2025) 508

    W. Suh, Generative AI integration in higher education shifts students’ attitudes from tool use to innovation, Discover Education 4 (2025) 508. doi:10.1007/s44217-025-00914-8

  15. [15]

    Ravšelj, D

    D. Ravšelj, D. Keržič, N. Tomaževič, L. Umek, N. Brezovar, N. A. Iahad, A. A. Abdulla, A. Akopyan, M. W. Aldana Segura, J. AlHumaid, et al., Higher education students’ perceptions of ChatGPT: A global study of early reactions, PLOS ONE 20 (2025) e0315011. doi:10.1371/journal.pone.0315011

  16. [16]

    Nemt-allah, W

    M. Nemt-allah, W. Khalifa, M. Badawy, Y. Elbably, A. Ibrahim, Validating the ChatGPT Usage Scale: psychometric properties and factor structures among postgraduate students, BMC Psychology 12 (2024)

  17. [17]

    doi:10.1186/s40359-024-01983-4

  18. [18]

    W. Lyu, S. Zhang, T. Chung, Y. Sun, Y. Zhang, Understanding the practices, perceptions, and (dis)trust of generative AI among instructors: A mixed-methods study in the U.S. higher education, Computers and Education: Artificial Intelligence 8 (2025) 100383. doi:10.1016/j.caeai.2025.100383

  19. [19]

    Kelly, S.-A

    S. Kelly, S.-A. Kaye, O. Oviedo-Trespalacios, What factors contribute to the acceptance of artificial intelli- gence? A systematic review, Telematics and Informatics 77 (2023) 101925. doi: 10.1016/j.tele.2022. 101925

  20. [20]

    Viberg, M

    O. Viberg, M. Cukurova, Y. Feldman-Maggor, G. Alexandron, S. Shirai, S. Kanemune, B. Wasson, C. Tømte, D. Spikol, M. Milrad, et al., What Explains Teachers’ Trust in AI in Education Across Six Coun- tries?, International Journal of Artificial Intelligence in Education 35 (2025) 1288–1316. doi: 10.1007/ s40593-024-00433-x

  21. [21]

    Nazaretsky, P

    T. Nazaretsky, P. Mejia-Domenzain, V. Swamy, J. Frej, T. Käser, The critical role of trust in adopting AI- powered educational technology for learning: An instrument for measuring student perceptions, Computers and Education: Artificial Intelligence 8 (2025) 100368. doi:10.1016/j.caeai.2025.100368

  22. [22]

    J. D. Lee, K. A. See, Trust in Automation: Designing for Appropriate Reliance, Human Factors 46 (2004) 50–80. doi:10.1518/hfes.46.1.50_30392

  23. [23]

    Glikson, A

    E. Glikson, A. W. Woolley, Human Trust in Artificial Intelligence: Review of Empirical Research, Academy of Management Annals 14 (2020) 627–660. doi:10.5465/annals.2018.0057

  24. [24]

    Mehrotra, C

    S. Mehrotra, C. Degachi, O. Vereschak, C. M. Jonker, M. L. Tielman, A Systematic Review on Fostering Appropriate Trust in Human-AI Interaction: Trends, Opportunities and Challenges, ACM Journal on Responsible Computing 1 (2024) 26:1–26:45. doi:10.1145/3696449

  25. [25]

    G. M. Alarcon, A. Capiola, Explicating the trust process for effective human interaction with artificial intelligence and machine learning systems, Frontiers in Computer Science Volume 7 - 2025 (2025). doi:10. 3389/fcomp.2025.1662185

  26. [26]

    Educational Psychologist , volume =

    K. VanLEHN, The Relative Effectiveness of Human Tutoring, Intelligent Tutoring Systems, and Other Tutoring Systems, Educational Psychologist 46 (2011) 197–221. doi:10.1080/00461520.2011.611369

  27. [27]

    Venkatesh, M

    V. Venkatesh, M. G. Morris, G. B. Davis, F. D. Davis, User Acceptance of Information Technology: Toward A Unified View, Management Information Systems Quarterly 27 (2003) 425–478. doi:10.2307/30036540

  28. [28]

    N. H. Steneck, R. E. Bulger, The History, Purpose, and Future of Instruction in the Responsible Conduct of Research, Academic Medicine 82 (2007) 829–834. doi:10.1097/ACM.0b013e31812f7d4d

  29. [29]

    M. S. Anderson, A. S. Horn, K. R. Risbey, E. A. Ronning, R. De Vries, B. C. Martinson, What Do Mentoring and Training in the Responsible Conduct of Research Have To Do with Scientists’ Misbehavior? Findings from a National Survey of NIH-Funded Scientists, Academic Medicine 82 (2007) 853–860. doi: 10.1097/ ACM.0b013e31812f764c

  30. [30]

    C. Zhai, S. Wibowo, L. D. Li, The effects of over-reliance on AI dialogue systems on students’ cognitive abili- ties: a systematic review, Smart Learning Environments 11 (2024) 28. doi:10.1186/s40561-024-00316-7

  31. [31]

    McCoy, N

    L. McCoy, N. Ganesan, V. Rajagopalan, D. McKell, D. F. Niño, M. C. Swaim, A Training Needs Analysis for AI and Generative AI in Medical Education: Perspectives of Faculty and Students, Journal of Medical Education and Curricular Development 12 (2025) 23821205251339226. doi:10.1177/23821205251339226

  32. [32]

    Available: https://onlinelibrary.wiley.com/doi/10.1111/j

    S. Jamieson, Likert scales: how to (ab)use them, Medical Education 38 (2004) 1217–1218. doi: 10.1111/j. 1365-2929.2004.02012.x

  33. [33]

    Spearman, The Proof and Measurement of Association between Two Things, The American Journal of Psychology 15 (1904) 72

    C. Spearman, The Proof and Measurement of Association between Two Things, The American Journal of Psychology 15 (1904) 72. doi:10.2307/1412159

  34. [34]

    W. H. Kruskal, W. A. Wallis, Use of Ranks in One-Criterion Variance Analysis, Journal of the American Statistical Association 47 (1952) 583–621. doi:10.1080/01621459.1952.10483441

  35. [35]

    Holm, A Simple Sequentially Rejective Multiple Test Procedure, Scandinavian Journal of Statistics 6 (1979) 65–70

    S. Holm, A Simple Sequentially Rejective Multiple Test Procedure, Scandinavian Journal of Statistics 6 (1979) 65–70

  36. [36]

    L. Zhou, W. Schellaert, F. Martínez-Plumed, Y. Moros-Daval, C. Ferri, J. Hernández-Orallo, Larger and more instructable language models become less reliable, Nature 634 (2024) 61–68. doi: 10.1038/ s41586-024-07930-y

  37. [37]

    Not at all familiar

    P. M. Podsakoff, S. B. MacKenzie, J.-Y. Lee, N. P. Podsakoff, Common method biases in behavioral research: A critical review of the literature and recommended remedies., Journal of Applied Psychology 88 (2003) 879–903. doi:10.1037/0021-9010.88.5.879. A. Sample Composition and Supporting Visualizations Table 4 reports the baseline composition. The sample i...