Tutor, Not Solver: Designing a Guardrailed AI Assistant for Learning in Higher Education: A Design Case of PeteChat

Belle Li; Colby Ben Acton; Lily Tan; Qiang Qiu; Wei Zakharov

arxiv: 2606.09845 · v1 · pith:E5LTNRGLnew · submitted 2026-04-27 · 💻 cs.HC · cs.ET

Tutor, Not Solver: Designing a Guardrailed AI Assistant for Learning in Higher Education: A Design Case of PeteChat

Belle Li , Lily Tan , Wei Zakharov , Qiang Qiu , Colby Ben Acton This is my paper

Pith reviewed 2026-07-01 08:54 UTC · model grok-4.3

classification 💻 cs.HC cs.ET

keywords AI tutorshigher educationacademic integritydesign principlesguardrailsdesign-based researchself-regulated learningretrieval-augmented generation

0 comments

The pith

PeteChat shows how to design AI tutors with eight principles that scaffold learning while blocking direct answers on assessments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper documents the design of an AI assistant meant to act as a tutor rather than solve problems for students in higher education courses. Through design-based research that includes baseline analysis of student interactions and formative evaluation with teaching staff, the work extracts eight principles covering guardrails on homework, debugging help, self-regulated learning features, and instructor customization options. These principles are presented as transferable guidance for building responsible AI systems grounded in course materials. A reader would care because widely available generative AI risks replacing student effort unless deliberately constrained to preserve integrity and promote actual learning. The design case emphasizes situated decision-making over experimental results to make the principles actionable for similar deployments.

Core claim

The paper claims that a course-aligned AI tutor built with retrieval-augmented generation on a locally hosted model, refined through pre-deployment baseline analysis and expert evaluation, yields eight transferable design principles for assessment-aware AI tutors. These principles span homework guardrails that avoid providing direct answers, debugging scaffolds that guide without solving, supports for self-regulated learning, and tools allowing instructors to customize the system to their course.

What carries the argument

The PeteChat system itself, developed iteratively with course-specific materials, functions as the central mechanism for surfacing the eight design principles through repeated cycles of deployment, observation, and refinement.

If this is right

AI tutors can incorporate homework guardrails that prevent direct provision of assessment answers while still offering useful guidance.
Debugging scaffolds can be structured to walk students through problem-solving steps without delivering complete solutions.
Features supporting self-regulated learning can be added to help students plan, monitor, and reflect on their own progress.
Instructor-facing customization tools allow tailoring of the AI to fit specific course requirements and policies.
Such systems can be built on local models with course materials to maintain alignment and control.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The principles could be tested for effectiveness by comparing student outcomes in courses using guarded versus unguarded AI tools.
Similar guardrail patterns might extend to AI assistants in professional training or online certification programs.
The design approach could inform institutional policies on acceptable AI use by providing concrete examples of integrity-preserving features.
Integration with existing course platforms might increase the likelihood that instructors adopt and maintain the customized tools.

Load-bearing premise

Design principles drawn from one deployment and evaluation will transfer to other courses, institutions, and higher-education settings.

What would settle it

A follow-up deployment at another institution that applies the eight principles yet observes either increased academic integrity violations or no measurable improvement in student learning processes.

Figures

Figures reproduced from arXiv: 2606.09845 by Belle Li, Colby Ben Acton, Lily Tan, Qiang Qiu, Wei Zakharov.

**Figure 1.** Figure 1: From generic answer bots to a guardrailed, course-aligned tutor: motivating problems, core design tensions, and the conceptual positioning of PeteChat. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Design-Based Research (DBR) cycle guiding the development of PeteChat. Four phases — needs analysis, design, evaluation, and develop — form [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Cross-context patterns observed in the pre-deployment baseline corpus. Panel (a) shows student SRL-related behaviors (Category A) across exam [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of the PeteChat architecture. A mixture-of-experts-inspired router dispatches each student query to the most appropriate operating mode: [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Low-fidelity interface mockups of two PeteChat chat interface designs. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Assessment-aware interaction flow. When a student query matches an [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Turn-level pedagogical decision flow. Each incoming turn is tested [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: The eight design principles derived from the DBR cycle. Each principle is illustrated by a minimal before/after fragment showing the concrete [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Second-prototype interface states showing how the design principles were instantiated in PeteChat: (a) onboarding prompts, (b) a goal-setting modal, [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Additional second-prototype interface states. Shown here are (a) debugging support, (b) logistics answers with freshness disclaimers, (c) a study [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

read the original abstract

Generative AI tutors hold significant promise for higher education, yet designing systems that scaffold learning without undermining academic integrity remains an open design challenge. This paper presents PeteChat, a course-aligned AI tutor developed and deployed at Purdue University, documented through the lens of design-based research (DBR). Drawing on literature-informed design inputs, a pre-deployment baseline analysis of authentic student-system interactions, and formative expert evaluation with teaching assistants and UX/developer stakeholders, we report eight transferable design principles for assessment-aware AI tutors: from homework guardrails and debugging scaffolds to self-regulated learning support and instructor-facing customization tools. The system is built on a locally hosted Llama-3 model enhanced with retrieval-augmented generation (RAG) grounded in course-specific materials. Rather than reporting controlled experimental outcomes, this design case foregrounds the situated design reasoning, iterative refinement, and principled decision-making that shaped PeteChat across multiple development phases. The resulting principles and methodological approach offer actionable guidance for institutions seeking to deploy responsible, integrity-preserving AI tutors at scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PeteChat is a clear design case on building a guarded AI tutor, but the eight principles rest on one Purdue deployment with no cross-context checks.

read the letter

This paper walks through the build and refinement of PeteChat, a locally hosted Llama-3 tutor with RAG on course materials at Purdue. The core output is eight design principles aimed at preventing the system from solving homework while still giving useful scaffolds.

They drew on existing literature, ran a baseline look at student interactions, then iterated with input from TAs and UX people. The principles touch on homework guardrails, debugging help that stops short of answers, self-regulated learning prompts, and instructor customization options. The write-up is straightforward about using design-based research and not running controlled trials.

The paper does a reasonable job documenting the actual decisions and trade-offs in one real deployment. That kind of situated detail can be helpful for teams trying to ship something similar.

The main limitation is the transferability claim. Everything comes from a single course at one institution with one model and a formative expert review whose criteria are not spelled out. There are no outcome measures on learning gains or integrity violations, and no second site to test whether the principles survive different students, courses, or platforms. Readers will have to treat the list as informed suggestions rather than validated guidance.

No quantitative data or equations appear, so the usual worries about fitting or circular claims do not apply. The citations look relevant to edtech and AI tutoring work.

This is mainly for practitioners and researchers who are actively building or evaluating AI tutors in higher ed. Someone in that position can extract concrete ideas on guardrails and customization.

It should go to peer review. The implementation details and principle list give practical value even if later work will need to test how far they travel.

Referee Report

2 major / 0 minor

Summary. The paper presents PeteChat, a guardrailed AI tutor built on a locally hosted Llama-3 model with RAG over course materials, developed and deployed at Purdue University. Through design-based research incorporating literature inputs, pre-deployment baseline analysis of student interactions, and formative expert evaluation with TAs and UX stakeholders, the work extracts and reports eight transferable design principles for assessment-aware AI tutors. These address areas such as homework guardrails, debugging scaffolds, self-regulated learning support, and instructor customization tools. The manuscript emphasizes situated design reasoning and iterative refinement rather than controlled experiments or quantitative outcome measures.

Significance. If the reported principles demonstrate transferability, the work provides timely, actionable guidance for higher-education institutions seeking to deploy integrity-preserving AI tutors at scale. The design case foregrounds concrete decisions around guardrails and scaffolds that balance learning support with academic integrity, using practical elements like local hosting and course-specific RAG. This contributes situated knowledge to the HCI and learning-sciences literature on responsible generative AI in education.

major comments (2)

[Abstract] Abstract: The central claim that the eight design principles are 'transferable' rests on a single PeteChat deployment and formative evaluation at one institution; the design-case format supplies rich situated reasoning but supplies no cross-context cases, replication, or outcome measures that would test whether the principles survive changes in course, model, or institutional setting.
[Formative expert evaluation] Formative expert evaluation description: The scope, participant criteria, and evaluation protocol for the TA and stakeholder feedback are not detailed, making it difficult to assess how the eight principles were systematically derived or to judge their robustness independent of the specific Purdue context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our design case. We address the two major comments point by point below, clarifying the scope and intent of the work while indicating revisions where appropriate to improve transparency and precision.

read point-by-point responses

Referee: [Abstract] The central claim that the eight design principles are 'transferable' rests on a single PeteChat deployment and formative evaluation at one institution; the design-case format supplies rich situated reasoning but supplies no cross-context cases, replication, or outcome measures that would test whether the principles survive changes in course, model, or institutional setting.

Authors: We agree that the manuscript is a single-institution design case and does not include cross-context replication or outcome measures. In design-based research, principles are derived from iterative, situated practice and offered as transferable guidance rather than validated universals; the eight principles integrate literature synthesis, baseline interaction analysis, and expert input to support adaptation in comparable higher-education settings. We will revise the abstract to replace the unqualified claim of transferability with language that more precisely describes the principles as derived from this deployment and positioned for adaptation by other institutions, while retaining the design-case framing. revision: partial
Referee: [Formative expert evaluation] The scope, participant criteria, and evaluation protocol for the TA and stakeholder feedback are not detailed, making it difficult to assess how the eight principles were systematically derived or to judge their robustness independent of the specific Purdue context.

Authors: We concur that greater detail on the formative expert evaluation is needed. The revision will expand the relevant section to specify the number and roles of participants (TAs and UX/developer stakeholders), recruitment criteria, session structure, feedback prompts, and the process by which comments were mapped to the eight principles. This addition will clarify the derivation pathway without altering the design-case methodology. revision: yes

Circularity Check

0 steps flagged

No circularity; design principles derived from iterative deployment without reduction to inputs.

full rationale

The paper presents a design case of PeteChat, extracting eight design principles via design-based research from a single Purdue deployment, pre-deployment analysis, and formative evaluation. No equations, fitted parameters, predictions, or self-citations are described that would reduce the reported principles to their own inputs by construction. The derivation chain consists of literature-informed inputs, observed interactions, and stakeholder feedback leading to situated principles; this process is self-contained and does not exhibit self-definitional, fitted-input, or uniqueness-imported patterns. The transferability claim is an assertion of generalizability, not a tautological step.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on the standard assumption in human-computer interaction that design-based research can yield transferable principles from situated development and expert feedback.

axioms (1)

domain assumption Design-based research methodology is suitable for deriving transferable design principles for AI tutors in higher education.
Invoked through the choice of DBR as the lens for documenting the PeteChat development phases.

pith-pipeline@v0.9.1-grok · 5727 in / 1281 out tokens · 38179 ms · 2026-07-01T08:54:53.224351+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 15 canonical work pages · 1 internal anchor

[1]

GPT-4 Technical Report

OpenAI, “GPT-4 technical report,” 2023, doi: 10.48550/arXiv.2303.08774

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774 2023
[2]

ChatGPT for good? On opportunities and challenges of large language models for education,

E. Kasneci, K. Seßler, S. Küchemann, M. Bannert, D. Dementieva, F. Fischer, U. Gasser, G. Groh, S. Günnemann, E. Hüllermeier, S. Kr- usche, G. Kutyniok, T. Michaeli, C. Nerdel, J. Pfeffer, O. Poquet, M. Sailer, A. Schmidt, T. Seidel, and G. Kasneci, “ChatGPT for good? On opportunities and challenges of large language models for education,”Learning and Ind...

work page doi:10.1016/j.lindif.2023.102274 2023
[3]

Practical and ethical challenges of large language models in education: A systematic scoping review,

L. Yan, L. Sha, L. Zhao, Y . Li, R. Martinez-Maldonado, G. Chen, X. Li, Y . Jin, and D. Gasevic, “Practical and ethical challenges of large language models in education: A systematic scoping review,”British Journal of Educational Technology, vol. 55, no. 4, pp. 1132–1150, 2023, doi: 10.1111/bjet.13370

work page doi:10.1111/bjet.13370 2023
[4]

Chatting and cheating: Ensuring academic integrity in the era of ChatGPT,

D. R. E. Cotton, P. A. Cotton, and J. R. Shipway, “Chatting and cheating: Ensuring academic integrity in the era of ChatGPT,”Innovations in Education and Teaching International, vol. 61, no. 2, pp. 228–239, 2024, doi: 10.1080/14703297.2023.2190148

work page doi:10.1080/14703297.2023.2190148 2024
[5]

Learning analytics: Ethical issues and dilem- mas,

S. Slade and P. Prinsloo, “Learning analytics: Ethical issues and dilem- mas,”American Behavioral Scientist, vol. 57, no. 10, pp. 1510–1529, 2013, doi: 10.1177/0002764213479366

work page doi:10.1177/0002764213479366 2013
[6]

The need for design cases: Disseminating design knowl- edge,

E. Boling, “The need for design cases: Disseminating design knowl- edge,”International Journal of Designs for Learning, vol. 1, no. 1, pp. 1–8, 2010, doi: 10.14434/ijdl.v1i1.919

work page doi:10.14434/ijdl.v1i1.919 2010
[7]

Design-based research: A decade of progress in education research?

T. Anderson and J. Shattuck, “Design-based research: A decade of progress in education research?”Educational Researcher, vol. 41, no. 1, pp. 16–25, 2012, doi: 10.3102/0013189X11428813

work page doi:10.3102/0013189x11428813 2012
[8]

Design-based research: An emerg- ing paradigm for educational inquiry,

Design-Based Research Collective, “Design-based research: An emerg- ing paradigm for educational inquiry,”Educational Researcher, vol. 32, no. 1, pp. 5–8, 2003, doi: 10.3102/0013189X032001005

work page doi:10.3102/0013189x032001005 2003
[9]

Design-based research: Putting a stake in the ground,

S. Barab and K. Squire, “Design-based research: Putting a stake in the ground,”Journal of the Learning Sciences, vol. 13, no. 1, pp. 1–14, 2004, doi: 10.1207/s15327809jls1301_1

work page doi:10.1207/s15327809jls1301_1 2004
[10]

Interacting with educational chatbots: A systematic review,

M. A. Kuhail, N. Alturki, S. Alramlawi, and K. Alhejori, “Interacting with educational chatbots: A systematic review,”Education and Infor- mation Technologies, vol. 28, pp. 973–1018, 2022, doi: 10.1007/s10639- 022-11177-3

work page doi:10.1007/s10639- 2022
[11]

Role of AI chatbots in education: Systematic literature review,

L. Labadze, M. Grigolia, and L. Machaidze, “Role of AI chatbots in education: Systematic literature review,”International Journal of Educational Technology in Higher Education, vol. 20, p. 56, 2023, doi: 10.1186/s41239-023-00426-1

work page doi:10.1186/s41239-023-00426-1 2023
[12]

Building responsible AI chatbot platforms in higher education: An evidence-based framework from design to implementation,

J. Schell, K. Ford, and A. B. Markman, “Building responsible AI chatbot platforms in higher education: An evidence-based framework from design to implementation,”Frontiers in Education, vol. 10, Art. no. 1604934, 2025, doi: 10.3389/feduc.2025.1604934

work page doi:10.3389/feduc.2025.1604934 2025
[13]

Argumate: A three-year design-based research project on AI chatbots for argumentative writing instruction,

K. Guo, E. D. Zhang, and D. Li, “Argumate: A three-year design-based research project on AI chatbots for argumentative writing instruction,” British Journal of Educational Technology, in press, 2025

2025
[14]

Educational design principles of using AI chatbot that supports self-regulated learning in education: Goal setting, feedback, and personalization,

D. Chang and M. P.-C. Lin, “Educational design principles of using AI chatbot that supports self-regulated learning in education: Goal setting, feedback, and personalization,”Sustainability, vol. 15, no. 17, p. 12921, 2023, doi: 10.3390/su151712921

work page doi:10.3390/su151712921 2023
[15]

Becoming a self-regulated learner: An overview,

B. J. Zimmerman, “Becoming a self-regulated learner: An overview,” Theory Into Practice, vol. 41, no. 2, pp. 64–70, 2002, doi: 10.1207/s15430421tip4102_2

work page doi:10.1207/s15430421tip4102_2 2002
[16]

A conceptual framework for assessing motivation and self-regulated learning in college students,

P. R. Pintrich, “A conceptual framework for assessing motivation and self-regulated learning in college students,”Educational Psychology Review, vol. 16, no. 4, pp. 385–407, 2004, doi: 10.1007/s10648-004- 0006-x

work page doi:10.1007/s10648-004- 2004
[17]

ChatGPT-driven virtual manipulatives in anatomy education,

Z. Bolatli and S. Öncü, “ChatGPT-driven virtual manipulatives in anatomy education,”Computers & Education, in press, 2025

2025
[18]

Learner preferences for institutional versus public AI tools in distance higher education,

B. Rientieset al., “Learner preferences for institutional versus public AI tools in distance higher education,” working paper, The Open University, 2025

2025
[19]

Evaluation of an institutional AI assistant: Usability, trust, and course alignment,

B. Rientieset al., “Evaluation of an institutional AI assistant: Usability, trust, and course alignment,” working paper, The Open University, 2025

2025
[20]

Leveraging ChatGPT agents for academic advising at a mega- university,

S. Öncü, “Leveraging ChatGPT agents for academic advising at a mega- university,” working paper, Anadolu University, 2025

2025
[21]

The measurement of observer agreement for categorical data,

J. R. Landis and G. G. Koch, “The measurement of observer agreement for categorical data,”Biometrics, vol. 33, no. 1, pp. 159–174, 1977

1977
[22]

Nielsen,Usability Engineering

J. Nielsen,Usability Engineering. San Francisco, CA, USA: Morgan Kaufmann, 1994. APPENDIXA PROTOCOLUSED(THINKING-ALOUD+ INTERVIEW) A. Session Structure (45–60 min)

1994
[23]

What are you expecting here?

Welcome & consent→2) Warm-up→3) Five think-aloud tasks on the live prototype→4) Low-fi walkthrough of future features→5) Semi-structured interview→6) Debrief. B. Moderator Prompts During Tasks •“What are you expecting here?” / “What would you do next?” / “Say what’s confusing (if anything).” C. Interview Guide (Post-Task) •Usability & clarity: discoverabi...

[1] [1]

GPT-4 Technical Report

OpenAI, “GPT-4 technical report,” 2023, doi: 10.48550/arXiv.2303.08774

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774 2023

[2] [2]

ChatGPT for good? On opportunities and challenges of large language models for education,

E. Kasneci, K. Seßler, S. Küchemann, M. Bannert, D. Dementieva, F. Fischer, U. Gasser, G. Groh, S. Günnemann, E. Hüllermeier, S. Kr- usche, G. Kutyniok, T. Michaeli, C. Nerdel, J. Pfeffer, O. Poquet, M. Sailer, A. Schmidt, T. Seidel, and G. Kasneci, “ChatGPT for good? On opportunities and challenges of large language models for education,”Learning and Ind...

work page doi:10.1016/j.lindif.2023.102274 2023

[3] [3]

Practical and ethical challenges of large language models in education: A systematic scoping review,

L. Yan, L. Sha, L. Zhao, Y . Li, R. Martinez-Maldonado, G. Chen, X. Li, Y . Jin, and D. Gasevic, “Practical and ethical challenges of large language models in education: A systematic scoping review,”British Journal of Educational Technology, vol. 55, no. 4, pp. 1132–1150, 2023, doi: 10.1111/bjet.13370

work page doi:10.1111/bjet.13370 2023

[4] [4]

Chatting and cheating: Ensuring academic integrity in the era of ChatGPT,

D. R. E. Cotton, P. A. Cotton, and J. R. Shipway, “Chatting and cheating: Ensuring academic integrity in the era of ChatGPT,”Innovations in Education and Teaching International, vol. 61, no. 2, pp. 228–239, 2024, doi: 10.1080/14703297.2023.2190148

work page doi:10.1080/14703297.2023.2190148 2024

[5] [5]

Learning analytics: Ethical issues and dilem- mas,

S. Slade and P. Prinsloo, “Learning analytics: Ethical issues and dilem- mas,”American Behavioral Scientist, vol. 57, no. 10, pp. 1510–1529, 2013, doi: 10.1177/0002764213479366

work page doi:10.1177/0002764213479366 2013

[6] [6]

The need for design cases: Disseminating design knowl- edge,

E. Boling, “The need for design cases: Disseminating design knowl- edge,”International Journal of Designs for Learning, vol. 1, no. 1, pp. 1–8, 2010, doi: 10.14434/ijdl.v1i1.919

work page doi:10.14434/ijdl.v1i1.919 2010

[7] [7]

Design-based research: A decade of progress in education research?

T. Anderson and J. Shattuck, “Design-based research: A decade of progress in education research?”Educational Researcher, vol. 41, no. 1, pp. 16–25, 2012, doi: 10.3102/0013189X11428813

work page doi:10.3102/0013189x11428813 2012

[8] [8]

Design-based research: An emerg- ing paradigm for educational inquiry,

Design-Based Research Collective, “Design-based research: An emerg- ing paradigm for educational inquiry,”Educational Researcher, vol. 32, no. 1, pp. 5–8, 2003, doi: 10.3102/0013189X032001005

work page doi:10.3102/0013189x032001005 2003

[9] [9]

Design-based research: Putting a stake in the ground,

S. Barab and K. Squire, “Design-based research: Putting a stake in the ground,”Journal of the Learning Sciences, vol. 13, no. 1, pp. 1–14, 2004, doi: 10.1207/s15327809jls1301_1

work page doi:10.1207/s15327809jls1301_1 2004

[10] [10]

Interacting with educational chatbots: A systematic review,

M. A. Kuhail, N. Alturki, S. Alramlawi, and K. Alhejori, “Interacting with educational chatbots: A systematic review,”Education and Infor- mation Technologies, vol. 28, pp. 973–1018, 2022, doi: 10.1007/s10639- 022-11177-3

work page doi:10.1007/s10639- 2022

[11] [11]

Role of AI chatbots in education: Systematic literature review,

L. Labadze, M. Grigolia, and L. Machaidze, “Role of AI chatbots in education: Systematic literature review,”International Journal of Educational Technology in Higher Education, vol. 20, p. 56, 2023, doi: 10.1186/s41239-023-00426-1

work page doi:10.1186/s41239-023-00426-1 2023

[12] [12]

Building responsible AI chatbot platforms in higher education: An evidence-based framework from design to implementation,

J. Schell, K. Ford, and A. B. Markman, “Building responsible AI chatbot platforms in higher education: An evidence-based framework from design to implementation,”Frontiers in Education, vol. 10, Art. no. 1604934, 2025, doi: 10.3389/feduc.2025.1604934

work page doi:10.3389/feduc.2025.1604934 2025

[13] [13]

Argumate: A three-year design-based research project on AI chatbots for argumentative writing instruction,

K. Guo, E. D. Zhang, and D. Li, “Argumate: A three-year design-based research project on AI chatbots for argumentative writing instruction,” British Journal of Educational Technology, in press, 2025

2025

[14] [14]

Educational design principles of using AI chatbot that supports self-regulated learning in education: Goal setting, feedback, and personalization,

D. Chang and M. P.-C. Lin, “Educational design principles of using AI chatbot that supports self-regulated learning in education: Goal setting, feedback, and personalization,”Sustainability, vol. 15, no. 17, p. 12921, 2023, doi: 10.3390/su151712921

work page doi:10.3390/su151712921 2023

[15] [15]

Becoming a self-regulated learner: An overview,

B. J. Zimmerman, “Becoming a self-regulated learner: An overview,” Theory Into Practice, vol. 41, no. 2, pp. 64–70, 2002, doi: 10.1207/s15430421tip4102_2

work page doi:10.1207/s15430421tip4102_2 2002

[16] [16]

A conceptual framework for assessing motivation and self-regulated learning in college students,

P. R. Pintrich, “A conceptual framework for assessing motivation and self-regulated learning in college students,”Educational Psychology Review, vol. 16, no. 4, pp. 385–407, 2004, doi: 10.1007/s10648-004- 0006-x

work page doi:10.1007/s10648-004- 2004

[17] [17]

ChatGPT-driven virtual manipulatives in anatomy education,

Z. Bolatli and S. Öncü, “ChatGPT-driven virtual manipulatives in anatomy education,”Computers & Education, in press, 2025

2025

[18] [18]

Learner preferences for institutional versus public AI tools in distance higher education,

B. Rientieset al., “Learner preferences for institutional versus public AI tools in distance higher education,” working paper, The Open University, 2025

2025

[19] [19]

Evaluation of an institutional AI assistant: Usability, trust, and course alignment,

B. Rientieset al., “Evaluation of an institutional AI assistant: Usability, trust, and course alignment,” working paper, The Open University, 2025

2025

[20] [20]

Leveraging ChatGPT agents for academic advising at a mega- university,

S. Öncü, “Leveraging ChatGPT agents for academic advising at a mega- university,” working paper, Anadolu University, 2025

2025

[21] [21]

The measurement of observer agreement for categorical data,

J. R. Landis and G. G. Koch, “The measurement of observer agreement for categorical data,”Biometrics, vol. 33, no. 1, pp. 159–174, 1977

1977

[22] [22]

Nielsen,Usability Engineering

J. Nielsen,Usability Engineering. San Francisco, CA, USA: Morgan Kaufmann, 1994. APPENDIXA PROTOCOLUSED(THINKING-ALOUD+ INTERVIEW) A. Session Structure (45–60 min)

1994

[23] [23]

What are you expecting here?

Welcome & consent→2) Warm-up→3) Five think-aloud tasks on the live prototype→4) Low-fi walkthrough of future features→5) Semi-structured interview→6) Debrief. B. Moderator Prompts During Tasks •“What are you expecting here?” / “What would you do next?” / “Say what’s confusing (if anything).” C. Interview Guide (Post-Task) •Usability & clarity: discoverabi...