Tutor, Not Solver: Designing a Guardrailed AI Assistant for Learning in Higher Education: A Design Case of PeteChat
Pith reviewed 2026-07-01 08:54 UTC · model grok-4.3
The pith
PeteChat shows how to design AI tutors with eight principles that scaffold learning while blocking direct answers on assessments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a course-aligned AI tutor built with retrieval-augmented generation on a locally hosted model, refined through pre-deployment baseline analysis and expert evaluation, yields eight transferable design principles for assessment-aware AI tutors. These principles span homework guardrails that avoid providing direct answers, debugging scaffolds that guide without solving, supports for self-regulated learning, and tools allowing instructors to customize the system to their course.
What carries the argument
The PeteChat system itself, developed iteratively with course-specific materials, functions as the central mechanism for surfacing the eight design principles through repeated cycles of deployment, observation, and refinement.
If this is right
- AI tutors can incorporate homework guardrails that prevent direct provision of assessment answers while still offering useful guidance.
- Debugging scaffolds can be structured to walk students through problem-solving steps without delivering complete solutions.
- Features supporting self-regulated learning can be added to help students plan, monitor, and reflect on their own progress.
- Instructor-facing customization tools allow tailoring of the AI to fit specific course requirements and policies.
- Such systems can be built on local models with course materials to maintain alignment and control.
Where Pith is reading between the lines
- The principles could be tested for effectiveness by comparing student outcomes in courses using guarded versus unguarded AI tools.
- Similar guardrail patterns might extend to AI assistants in professional training or online certification programs.
- The design approach could inform institutional policies on acceptable AI use by providing concrete examples of integrity-preserving features.
- Integration with existing course platforms might increase the likelihood that instructors adopt and maintain the customized tools.
Load-bearing premise
Design principles drawn from one deployment and evaluation will transfer to other courses, institutions, and higher-education settings.
What would settle it
A follow-up deployment at another institution that applies the eight principles yet observes either increased academic integrity violations or no measurable improvement in student learning processes.
Figures
read the original abstract
Generative AI tutors hold significant promise for higher education, yet designing systems that scaffold learning without undermining academic integrity remains an open design challenge. This paper presents PeteChat, a course-aligned AI tutor developed and deployed at Purdue University, documented through the lens of design-based research (DBR). Drawing on literature-informed design inputs, a pre-deployment baseline analysis of authentic student-system interactions, and formative expert evaluation with teaching assistants and UX/developer stakeholders, we report eight transferable design principles for assessment-aware AI tutors: from homework guardrails and debugging scaffolds to self-regulated learning support and instructor-facing customization tools. The system is built on a locally hosted Llama-3 model enhanced with retrieval-augmented generation (RAG) grounded in course-specific materials. Rather than reporting controlled experimental outcomes, this design case foregrounds the situated design reasoning, iterative refinement, and principled decision-making that shaped PeteChat across multiple development phases. The resulting principles and methodological approach offer actionable guidance for institutions seeking to deploy responsible, integrity-preserving AI tutors at scale.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents PeteChat, a guardrailed AI tutor built on a locally hosted Llama-3 model with RAG over course materials, developed and deployed at Purdue University. Through design-based research incorporating literature inputs, pre-deployment baseline analysis of student interactions, and formative expert evaluation with TAs and UX stakeholders, the work extracts and reports eight transferable design principles for assessment-aware AI tutors. These address areas such as homework guardrails, debugging scaffolds, self-regulated learning support, and instructor customization tools. The manuscript emphasizes situated design reasoning and iterative refinement rather than controlled experiments or quantitative outcome measures.
Significance. If the reported principles demonstrate transferability, the work provides timely, actionable guidance for higher-education institutions seeking to deploy integrity-preserving AI tutors at scale. The design case foregrounds concrete decisions around guardrails and scaffolds that balance learning support with academic integrity, using practical elements like local hosting and course-specific RAG. This contributes situated knowledge to the HCI and learning-sciences literature on responsible generative AI in education.
major comments (2)
- [Abstract] Abstract: The central claim that the eight design principles are 'transferable' rests on a single PeteChat deployment and formative evaluation at one institution; the design-case format supplies rich situated reasoning but supplies no cross-context cases, replication, or outcome measures that would test whether the principles survive changes in course, model, or institutional setting.
- [Formative expert evaluation] Formative expert evaluation description: The scope, participant criteria, and evaluation protocol for the TA and stakeholder feedback are not detailed, making it difficult to assess how the eight principles were systematically derived or to judge their robustness independent of the specific Purdue context.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our design case. We address the two major comments point by point below, clarifying the scope and intent of the work while indicating revisions where appropriate to improve transparency and precision.
read point-by-point responses
-
Referee: [Abstract] The central claim that the eight design principles are 'transferable' rests on a single PeteChat deployment and formative evaluation at one institution; the design-case format supplies rich situated reasoning but supplies no cross-context cases, replication, or outcome measures that would test whether the principles survive changes in course, model, or institutional setting.
Authors: We agree that the manuscript is a single-institution design case and does not include cross-context replication or outcome measures. In design-based research, principles are derived from iterative, situated practice and offered as transferable guidance rather than validated universals; the eight principles integrate literature synthesis, baseline interaction analysis, and expert input to support adaptation in comparable higher-education settings. We will revise the abstract to replace the unqualified claim of transferability with language that more precisely describes the principles as derived from this deployment and positioned for adaptation by other institutions, while retaining the design-case framing. revision: partial
-
Referee: [Formative expert evaluation] The scope, participant criteria, and evaluation protocol for the TA and stakeholder feedback are not detailed, making it difficult to assess how the eight principles were systematically derived or to judge their robustness independent of the specific Purdue context.
Authors: We concur that greater detail on the formative expert evaluation is needed. The revision will expand the relevant section to specify the number and roles of participants (TAs and UX/developer stakeholders), recruitment criteria, session structure, feedback prompts, and the process by which comments were mapped to the eight principles. This addition will clarify the derivation pathway without altering the design-case methodology. revision: yes
Circularity Check
No circularity; design principles derived from iterative deployment without reduction to inputs.
full rationale
The paper presents a design case of PeteChat, extracting eight design principles via design-based research from a single Purdue deployment, pre-deployment analysis, and formative evaluation. No equations, fitted parameters, predictions, or self-citations are described that would reduce the reported principles to their own inputs by construction. The derivation chain consists of literature-informed inputs, observed interactions, and stakeholder feedback leading to situated principles; this process is self-contained and does not exhibit self-definitional, fitted-input, or uniqueness-imported patterns. The transferability claim is an assertion of generalizability, not a tautological step.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Design-based research methodology is suitable for deriving transferable design principles for AI tutors in higher education.
Reference graph
Works this paper leans on
-
[1]
OpenAI, “GPT-4 technical report,” 2023, doi: 10.48550/arXiv.2303.08774
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774 2023
-
[2]
ChatGPT for good? On opportunities and challenges of large language models for education,
E. Kasneci, K. Seßler, S. Küchemann, M. Bannert, D. Dementieva, F. Fischer, U. Gasser, G. Groh, S. Günnemann, E. Hüllermeier, S. Kr- usche, G. Kutyniok, T. Michaeli, C. Nerdel, J. Pfeffer, O. Poquet, M. Sailer, A. Schmidt, T. Seidel, and G. Kasneci, “ChatGPT for good? On opportunities and challenges of large language models for education,”Learning and Ind...
-
[3]
Practical and ethical challenges of large language models in education: A systematic scoping review,
L. Yan, L. Sha, L. Zhao, Y . Li, R. Martinez-Maldonado, G. Chen, X. Li, Y . Jin, and D. Gasevic, “Practical and ethical challenges of large language models in education: A systematic scoping review,”British Journal of Educational Technology, vol. 55, no. 4, pp. 1132–1150, 2023, doi: 10.1111/bjet.13370
-
[4]
Chatting and cheating: Ensuring academic integrity in the era of ChatGPT,
D. R. E. Cotton, P. A. Cotton, and J. R. Shipway, “Chatting and cheating: Ensuring academic integrity in the era of ChatGPT,”Innovations in Education and Teaching International, vol. 61, no. 2, pp. 228–239, 2024, doi: 10.1080/14703297.2023.2190148
-
[5]
Learning analytics: Ethical issues and dilem- mas,
S. Slade and P. Prinsloo, “Learning analytics: Ethical issues and dilem- mas,”American Behavioral Scientist, vol. 57, no. 10, pp. 1510–1529, 2013, doi: 10.1177/0002764213479366
-
[6]
The need for design cases: Disseminating design knowl- edge,
E. Boling, “The need for design cases: Disseminating design knowl- edge,”International Journal of Designs for Learning, vol. 1, no. 1, pp. 1–8, 2010, doi: 10.14434/ijdl.v1i1.919
-
[7]
Design-based research: A decade of progress in education research?
T. Anderson and J. Shattuck, “Design-based research: A decade of progress in education research?”Educational Researcher, vol. 41, no. 1, pp. 16–25, 2012, doi: 10.3102/0013189X11428813
-
[8]
Design-based research: An emerg- ing paradigm for educational inquiry,
Design-Based Research Collective, “Design-based research: An emerg- ing paradigm for educational inquiry,”Educational Researcher, vol. 32, no. 1, pp. 5–8, 2003, doi: 10.3102/0013189X032001005
-
[9]
Design-based research: Putting a stake in the ground,
S. Barab and K. Squire, “Design-based research: Putting a stake in the ground,”Journal of the Learning Sciences, vol. 13, no. 1, pp. 1–14, 2004, doi: 10.1207/s15327809jls1301_1
-
[10]
Interacting with educational chatbots: A systematic review,
M. A. Kuhail, N. Alturki, S. Alramlawi, and K. Alhejori, “Interacting with educational chatbots: A systematic review,”Education and Infor- mation Technologies, vol. 28, pp. 973–1018, 2022, doi: 10.1007/s10639- 022-11177-3
-
[11]
Role of AI chatbots in education: Systematic literature review,
L. Labadze, M. Grigolia, and L. Machaidze, “Role of AI chatbots in education: Systematic literature review,”International Journal of Educational Technology in Higher Education, vol. 20, p. 56, 2023, doi: 10.1186/s41239-023-00426-1
-
[12]
J. Schell, K. Ford, and A. B. Markman, “Building responsible AI chatbot platforms in higher education: An evidence-based framework from design to implementation,”Frontiers in Education, vol. 10, Art. no. 1604934, 2025, doi: 10.3389/feduc.2025.1604934
-
[13]
Argumate: A three-year design-based research project on AI chatbots for argumentative writing instruction,
K. Guo, E. D. Zhang, and D. Li, “Argumate: A three-year design-based research project on AI chatbots for argumentative writing instruction,” British Journal of Educational Technology, in press, 2025
2025
-
[14]
D. Chang and M. P.-C. Lin, “Educational design principles of using AI chatbot that supports self-regulated learning in education: Goal setting, feedback, and personalization,”Sustainability, vol. 15, no. 17, p. 12921, 2023, doi: 10.3390/su151712921
-
[15]
Becoming a self-regulated learner: An overview,
B. J. Zimmerman, “Becoming a self-regulated learner: An overview,” Theory Into Practice, vol. 41, no. 2, pp. 64–70, 2002, doi: 10.1207/s15430421tip4102_2
-
[16]
A conceptual framework for assessing motivation and self-regulated learning in college students,
P. R. Pintrich, “A conceptual framework for assessing motivation and self-regulated learning in college students,”Educational Psychology Review, vol. 16, no. 4, pp. 385–407, 2004, doi: 10.1007/s10648-004- 0006-x
-
[17]
ChatGPT-driven virtual manipulatives in anatomy education,
Z. Bolatli and S. Öncü, “ChatGPT-driven virtual manipulatives in anatomy education,”Computers & Education, in press, 2025
2025
-
[18]
Learner preferences for institutional versus public AI tools in distance higher education,
B. Rientieset al., “Learner preferences for institutional versus public AI tools in distance higher education,” working paper, The Open University, 2025
2025
-
[19]
Evaluation of an institutional AI assistant: Usability, trust, and course alignment,
B. Rientieset al., “Evaluation of an institutional AI assistant: Usability, trust, and course alignment,” working paper, The Open University, 2025
2025
-
[20]
Leveraging ChatGPT agents for academic advising at a mega- university,
S. Öncü, “Leveraging ChatGPT agents for academic advising at a mega- university,” working paper, Anadolu University, 2025
2025
-
[21]
The measurement of observer agreement for categorical data,
J. R. Landis and G. G. Koch, “The measurement of observer agreement for categorical data,”Biometrics, vol. 33, no. 1, pp. 159–174, 1977
1977
-
[22]
Nielsen,Usability Engineering
J. Nielsen,Usability Engineering. San Francisco, CA, USA: Morgan Kaufmann, 1994. APPENDIXA PROTOCOLUSED(THINKING-ALOUD+ INTERVIEW) A. Session Structure (45–60 min)
1994
-
[23]
What are you expecting here?
Welcome & consent→2) Warm-up→3) Five think-aloud tasks on the live prototype→4) Low-fi walkthrough of future features→5) Semi-structured interview→6) Debrief. B. Moderator Prompts During Tasks •“What are you expecting here?” / “What would you do next?” / “Say what’s confusing (if anything).” C. Interview Guide (Post-Task) •Usability & clarity: discoverabi...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.