From Explanation to Diagnosis: Next Generation Interactive Video Coach with Misstep Awareness

Ashok K. Goel; Rahul K. Dass; Xiao Jin

arxiv: 2606.02970 · v1 · pith:VDF433Z5new · submitted 2026-06-02 · 💻 cs.HC

From Explanation to Diagnosis: Next Generation Interactive Video Coach with Misstep Awareness

Xiao Jin , Rahul K. Dass , Ashok K. Goel This is my paper

Pith reviewed 2026-06-28 09:02 UTC · model grok-4.3

classification 💻 cs.HC

keywords intelligent tutoring systemsmisstep awarenesspedagogical modelneurosymbolic AIlearner diagnosisadaptive learningconceptual changeAI education

0 comments

The pith

Ivy AI coach adds a pedagogical model that encodes instructor diagnostic knowledge to classify learner errors and generate targeted scaffolding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to move an AI tutoring system from generating explanations to diagnosing why a learner is wrong. It augments the existing Task-Method-Knowledge model with a new Pedagogical Model that turns the instructor's Q&A key into explicit encodings for each incorrect quiz answer. These encodings capture the learner's underlying belief, the source of the misunderstanding in the TMK structure, the misconception type, and the right scaffolding to address it. A proof-of-concept pipeline using real course quiz questions demonstrates detection and classification of errors, producing feedback that aims at conceptual change rather than simple knowledge retrieval.

Core claim

By making the instructor's diagnostic knowledge machine-readable in a Pedagogical Model, the Ivy coach can detect learner errors on quiz questions, classify them by underlying belief and misconception type, locate the TMK locus, and generate diagnosis-grounded scaffolding that supports conceptual change.

What carries the argument

The Pedagogical Model (PM), which augments the TMK model by encoding for each incorrect response the learner's underlying belief, TMK locus, misconception type, and targeted scaffolding derived from the instructor's Q&A key.

If this is right

Feedback becomes more precise and actionable by addressing the specific source of misunderstanding.
The coach supports conceptual change instead of only retrieving or explaining correct knowledge.
Adaptive learning systems gain a concrete mechanism for misstep-aware coaching in AI education.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same encoding approach could be tested in non-AI courses if instructors provide structured Q&A keys.
Combining the PM with video interaction data might allow real-time misstep detection during coaching sessions.
Scaling the pipeline to thousands of responses would test whether the classification remains reliable across varied learner populations.

Load-bearing premise

The instructor's Q&A key can be translated into accurate, machine-readable encodings of the learner's underlying belief, TMK locus, misconception type, and targeted scaffolding for each incorrect response.

What would settle it

Run the pipeline on a set of learner responses and compare the system's diagnoses and scaffolding against independent expert instructor judgments of the same errors; mismatch on a majority of cases would falsify the claim that the encodings produce accurate diagnosis.

Figures

Figures reproduced from arXiv: 2606.02970 by Ashok K. Goel, Rahul K. Dass, Xiao Jin.

**Figure 1.** Figure 1: Ivy Pedagogical Model Integration Pipeline 4 Proof-of-Concept Instantiation 4.1 Source Material PMs were constructed from quiz questions in the Spring 2026 offering of CS 7637: Knowledge-Based AI(KBAI) at Georgia Institute of Technology. The source material comprised: (1) the full text of each question and its answer choices; (2) the expert authored Q&A key specifying the correct answer and, for each inco… view at source ↗

read the original abstract

Intelligent tutoring systems excel at generating explanations but rarely provide principled diagnosis of where and why a learner is wrong. We introduce a misstep-aware coaching capability for Ivy, a neurosymbolic AI coach, built on a two-model architecture that augments a Task-Method-Knowledge (TMK) model with a new Pedagogical Model (PM) in the context of an online graduate AI course at Georgia Tech. The PM makes instructor diagnostic knowledge explicit and machine-readable by encoding, for each quiz question and incorrect response, the learner's underlying belief(a brief statement of the incorrect idea or missing knowledge), a TMK locus(the source of the misunderstanding), a misconception type and targeted scaffolding derived from the instructor's Q\&A key. Using quiz questions from the course, we demonstrate a proof-of-concept pipeline that detects and classifies learner errors and generates diagnosis-grounded scaffolding, moving Ivy beyond knowledge retrieval toward diagnostic misstep awareness, and enabling more precise, actionable feedback that supports conceptual change and advances adaptive learning systems in AI in education and the learning sciences.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a structured Pedagogical Model to Ivy for explicit error diagnosis but supplies no validation or results, so the central claim rests on untested manual encodings.

read the letter

The new element here is the Pedagogical Model that sits on top of the authors' earlier TMK work. For each incorrect quiz response it records the learner's underlying belief, the TMK locus, misconception type, and targeted scaffolding, all pulled from the instructor's Q&A key and made machine-readable. That two-model setup is presented as the step that moves Ivy from explanation to diagnosis.

What the paper does is make the diagnostic knowledge explicit rather than leaving it implicit in the tutor. The architecture itself is a straightforward extension and could be useful to people building neurosymbolic tutors.

The soft spot is exactly the one the stress-test flags. The demonstration depends on turning instructor Q&A keys into accurate PM entries, yet the abstract gives no process, no reliability checks, and no evidence that those entries match actual learner thinking instead of instructor interpretation. Without that, the pipeline reduces to lookup against potentially noisy labels. There are also no error rates, no baseline comparisons, and no user data, so we cannot tell whether the diagnosis-grounded scaffolding improves anything.

This is for researchers working on AI in education and conceptual-change tutoring systems. A reader who wants to see how diagnostic knowledge can be encoded in a TMK-style system might get an idea from it. The paper deserves peer review because the direction is clear and the extension is motivated, even though the current version needs the missing validation work filled in.

Referee Report

2 major / 1 minor

Summary. The paper claims to extend the Ivy neurosymbolic AI coach with misstep awareness by augmenting its Task-Method-Knowledge (TMK) model with a new Pedagogical Model (PM). The PM encodes, for each quiz question and incorrect response in a Georgia Tech graduate AI course, the learner's underlying belief, TMK locus, misconception type, and targeted scaffolding derived from the instructor's Q&A key. Using course quiz questions, it demonstrates a proof-of-concept pipeline that detects and classifies learner errors and generates diagnosis-grounded scaffolding, advancing Ivy from knowledge retrieval toward diagnostic feedback that supports conceptual change in AI education and adaptive learning systems.

Significance. If the result holds, the work has moderate significance for AI in education and the learning sciences by making instructor diagnostic knowledge explicit and machine-readable, potentially enabling more precise, actionable scaffolding beyond standard explanations. The two-model architecture and explicit encoding of misconception types represent a clear conceptual advance over prior TMK-based systems. However, as presented, the manuscript supplies only a high-level description of the pipeline without implementation details, performance metrics, or validation, limiting demonstrated impact.

major comments (2)

[Abstract] Abstract: The central claim that the pipeline 'detects and classifies learner errors' and provides 'diagnosis-grounded scaffolding' depends entirely on the accuracy of the PM encodings of underlying belief, TMK locus, misconception type, and scaffolding. The manuscript provides no description of the translation process from the instructor's Q&A key, no inter-rater reliability checks, and no evidence that the resulting encodings reflect actual learner misconceptions rather than post-hoc interpretation. This encoding step is load-bearing for the misstep-awareness contribution.
[Abstract] Abstract / demonstration section: No evaluation data, error rates, success metrics on the quiz questions, or comparison to baselines are reported. The proof-of-concept is asserted via 'using quiz questions from the course' but supplies no results, undermining the claim of advancing adaptive learning systems.

minor comments (1)

[Abstract] The abstract introduces the PM without a brief inline definition of its components before using them in the pipeline description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for greater transparency on the PM encoding process and for explicit results in the demonstration. We address each point below and will revise the manuscript accordingly to strengthen the presentation of this proof-of-concept work.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the pipeline 'detects and classifies learner errors' and provides 'diagnosis-grounded scaffolding' depends entirely on the accuracy of the PM encodings of underlying belief, TMK locus, misconception type, and scaffolding. The manuscript provides no description of the translation process from the instructor's Q&A key, no inter-rater reliability checks, and no evidence that the resulting encodings reflect actual learner misconceptions rather than post-hoc interpretation. This encoding step is load-bearing for the misstep-awareness contribution.

Authors: We agree the encoding process is central to the contribution and will add a dedicated subsection describing the translation. The PM entries were constructed by directly mapping each incorrect response in the instructor-provided Q&A key to the corresponding underlying belief, TMK locus, misconception type, and scaffolding suggestion; the authors (including the course instructor) performed this mapping using the key as the authoritative source. Because the encodings originate from the instructor's own diagnostic annotations rather than independent interpretation, they represent the intended expert knowledge for that course. Inter-rater reliability checks were not performed, as the source material came from a single expert. We will include example mappings and the explicit statement that the PM captures instructor diagnostic intent. revision: yes
Referee: [Abstract] Abstract / demonstration section: No evaluation data, error rates, success metrics on the quiz questions, or comparison to baselines are reported. The proof-of-concept is asserted via 'using quiz questions from the course' but supplies no results, undermining the claim of advancing adaptive learning systems.

Authors: We acknowledge that the current manuscript presents only a high-level demonstration without quantitative metrics. As a proof-of-concept paper focused on the two-model architecture and the explicit PM, the section illustrates the end-to-end pipeline on selected quiz questions by showing input missteps, PM-derived classifications, and generated scaffolding. We will revise to include concrete example outputs (e.g., specific quiz items, detected TMK loci, and scaffolding text) and will add an explicit statement that systematic evaluation with learner performance data and baseline comparisons is reserved for future work. This framing positions the contribution as the architectural and representational advance rather than an empirical validation study. revision: partial

Circularity Check

0 steps flagged

No circularity; proof-of-concept demonstration is self-contained.

full rationale

The manuscript describes a proof-of-concept pipeline that augments an existing TMK model with a new Pedagogical Model (PM) whose entries are manually derived from instructor Q&A keys. No equations, parameter fits, derivations, or predictive claims appear anywhere in the text. The demonstration consists of applying the resulting PM encodings to course quiz data to produce scaffolding outputs; these outputs are direct consequences of the supplied encodings rather than quantities derived from them by any formal step. Prior TMK work is referenced as background architecture but is not invoked via a uniqueness theorem or load-bearing self-citation that would force the new results. The central claim therefore rests on the independent construction and application of the PM rather than on any reduction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Ledger constructed from abstract only; full paper would likely reveal additional modeling assumptions.

axioms (1)

domain assumption Instructor diagnostic knowledge for each incorrect response can be accurately captured by encoding the learner's underlying belief, TMK locus, misconception type, and targeted scaffolding.
The Pedagogical Model depends on this encoding being both feasible and faithful to the instructor's intent.

invented entities (1)

Pedagogical Model (PM) no independent evidence
purpose: To make instructor diagnostic knowledge explicit and machine-readable for misstep awareness
New component introduced to augment the existing TMK model.

pith-pipeline@v0.9.1-grok · 5715 in / 1437 out tokens · 33385 ms · 2026-06-28T09:02:07.893127+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 2 canonical work pages

[1]

Education and Infor- mation Technologies26(2), 1367–1385 (2021)

Castro, M.D.B., Tumibay, G.M.: A literature review: efficacy of online learning courses for higher education institution using meta-analysis. Education and Infor- mation Technologies26(2), 1367–1385 (2021)

2021
[2]

Communications of the ACM35(9), 124–137 (1992)

Chandrasekaran, B., Johnson, T.R., Smith, J.W.: Task-structure analysis for knowledge modeling. Communications of the ACM35(9), 124–137 (1992)

1992
[3]

Educational Psychologist 49(4), 219–243 (2014)

Chi, M.T.H., Wylie, R.: The icap framework: Linking cognitive en- gagement to active learning outcomes. Educational Psychologist 49(4), 219–243 (2014). https://doi.org/10.1080/00461520.2014.965823, https://doi.org/10.1080/00461520.2014.965823

work page doi:10.1080/00461520.2014.965823 2014
[4]

Chi, M.T., Roscoe, R.D.: The processes and challenges of conceptual change, pp. 3–27. Springer (2002)

2002
[5]

Dass, R., Bowlin, T., Li, Z., Jin, X., Goel, A.: Improving procedural skill ex- planations via constrained generation: A symbolic-llm hybrid architecture (2025), https://arxiv.org/abs/2511.20942

work page arXiv 2025
[6]

In: Cristea, A.I., Walker, E., Lu, Y., Santos, O.C., Isotani, S

Dass, R.K., Madhusudhana, R.H., Deye, E.C., Verma, S., Bydlon, T.A., Brazil, G., Goel, A.K.: Ivy: A hybrid knowledge-based andÂ generative ai coach forÂ explain- ing procedural skills. In: Cristea, A.I., Walker, E., Lu, Y., Santos, O.C., Isotani, S. (eds.) Artificial Intelligence in Education. pp. 233–246. Springer Nature Switzer- land, Cham (2025)

2025
[7]

Neural Computing and Ap- plications35(12), 9225–9251 (2023)

Demirezen, M.U., Yilmaz, O., Ince, E.: New models developed for detection of misconceptions in physics with artificial intelligence. Neural Computing and Ap- plications35(12), 9225–9251 (2023)

2023
[8]

Disciplinary and Interdisciplinary Science Education Research7(1), 6 (2025)

El Fathi, T., Saad, A., Larhzil, H., Lamri, D., Al Ibrahmi, E.M.: Integrating gener- ative ai into stem education: Enhancing conceptual understanding, addressing mis- conceptions, and assessing student acceptance. Disciplinary and Interdisciplinary Science Education Research7(1), 6 (2025)

2025
[9]

IEEE Intelligent Systems32(3), 60–67 (2017)

Goel, A.K., Rugaber, S.: Gaia: A cad-like environment for designing game-playing agents. IEEE Intelligent Systems32(3), 60–67 (2017)

2017
[10]

Education and Information Technologies30(3), 3035–3066 (2025)

Kökver, Y., Pektaş, H.M., Çelik, H.: Artificial intelligence applications in edu- cation: Natural language processing in detecting misconceptions. Education and Information Technologies30(3), 3035–3066 (2025)

2025
[11]

Education and Information Technologies28(1), 973–1018 (2023)

Kuhail, M.A., Alturki, N., Alramlawi, S., Alhejori, K.: Interacting with educational chatbots: A systematic review. Education and Information Technologies28(1), 973–1018 (2023)

2023
[12]

In: Graf, S., Markos, A

Lum, C., Deye, E., Brazil, G., Bydlon, T., Verma, S., Madhusudhana, R., Dass, R., Goel, A.: Designing anÂ ai coaching system forÂ interactive video-based skill learning. In: Graf, S., Markos, A. (eds.) Generative Systems and Intelligent Tutor- ing Systems. pp. 281–291. Springer Nature Switzerland, Cham (2026) 10 X. Jin et al

2026
[13]

Means, B., Neisler, J., et al.: Suddenly online: A national survey of undergraduates during the covid-19 pandemic. Tech. rep., Digital Promise (2020)

2020
[14]

Niakan Kalhori, S., Rakhshan, M., Keikha, L., Ghazi Saeedi, M.: Intelligent tutoring systems: a systematic review of charac- teristics, applications, and evaluation methods

Mousavinasab, E., Zarifsanaiey, N., R. Niakan Kalhori, S., Rakhshan, M., Keikha, L., Ghazi Saeedi, M.: Intelligent tutoring systems: a systematic review of charac- teristics, applications, and evaluation methods. Interactive learning environments 29(1), 142–163 (2021)

2021
[15]

Journal of Experimental & Theoretical Artificial Intelligence 20(1), 1–36 (2008)

Murdock, J.W., Goel, A.K.: Meta-case-based reasoning: self-improvement through self-understanding. Journal of Experimental & Theoretical Artificial Intelligence 20(1), 1–36 (2008)

2008
[16]

Prentice-hall (1972)

Newell, A., Simon, H.A.: Human problem solving. Prentice-hall (1972)

1972
[17]

MIT Press (2013)

Norman Donald, A.: The design of everyday things. MIT Press (2013)

2013
[18]

Cambridge university press (1990)

Reason, J.: Human error. Cambridge university press (1990)

1990

[1] [1]

Education and Infor- mation Technologies26(2), 1367–1385 (2021)

Castro, M.D.B., Tumibay, G.M.: A literature review: efficacy of online learning courses for higher education institution using meta-analysis. Education and Infor- mation Technologies26(2), 1367–1385 (2021)

2021

[2] [2]

Communications of the ACM35(9), 124–137 (1992)

Chandrasekaran, B., Johnson, T.R., Smith, J.W.: Task-structure analysis for knowledge modeling. Communications of the ACM35(9), 124–137 (1992)

1992

[3] [3]

Educational Psychologist 49(4), 219–243 (2014)

Chi, M.T.H., Wylie, R.: The icap framework: Linking cognitive en- gagement to active learning outcomes. Educational Psychologist 49(4), 219–243 (2014). https://doi.org/10.1080/00461520.2014.965823, https://doi.org/10.1080/00461520.2014.965823

work page doi:10.1080/00461520.2014.965823 2014

[4] [4]

Chi, M.T., Roscoe, R.D.: The processes and challenges of conceptual change, pp. 3–27. Springer (2002)

2002

[5] [5]

Dass, R., Bowlin, T., Li, Z., Jin, X., Goel, A.: Improving procedural skill ex- planations via constrained generation: A symbolic-llm hybrid architecture (2025), https://arxiv.org/abs/2511.20942

work page arXiv 2025

[6] [6]

In: Cristea, A.I., Walker, E., Lu, Y., Santos, O.C., Isotani, S

Dass, R.K., Madhusudhana, R.H., Deye, E.C., Verma, S., Bydlon, T.A., Brazil, G., Goel, A.K.: Ivy: A hybrid knowledge-based andÂ generative ai coach forÂ explain- ing procedural skills. In: Cristea, A.I., Walker, E., Lu, Y., Santos, O.C., Isotani, S. (eds.) Artificial Intelligence in Education. pp. 233–246. Springer Nature Switzer- land, Cham (2025)

2025

[7] [7]

Neural Computing and Ap- plications35(12), 9225–9251 (2023)

Demirezen, M.U., Yilmaz, O., Ince, E.: New models developed for detection of misconceptions in physics with artificial intelligence. Neural Computing and Ap- plications35(12), 9225–9251 (2023)

2023

[8] [8]

Disciplinary and Interdisciplinary Science Education Research7(1), 6 (2025)

El Fathi, T., Saad, A., Larhzil, H., Lamri, D., Al Ibrahmi, E.M.: Integrating gener- ative ai into stem education: Enhancing conceptual understanding, addressing mis- conceptions, and assessing student acceptance. Disciplinary and Interdisciplinary Science Education Research7(1), 6 (2025)

2025

[9] [9]

IEEE Intelligent Systems32(3), 60–67 (2017)

Goel, A.K., Rugaber, S.: Gaia: A cad-like environment for designing game-playing agents. IEEE Intelligent Systems32(3), 60–67 (2017)

2017

[10] [10]

Education and Information Technologies30(3), 3035–3066 (2025)

Kökver, Y., Pektaş, H.M., Çelik, H.: Artificial intelligence applications in edu- cation: Natural language processing in detecting misconceptions. Education and Information Technologies30(3), 3035–3066 (2025)

2025

[11] [11]

Education and Information Technologies28(1), 973–1018 (2023)

Kuhail, M.A., Alturki, N., Alramlawi, S., Alhejori, K.: Interacting with educational chatbots: A systematic review. Education and Information Technologies28(1), 973–1018 (2023)

2023

[12] [12]

In: Graf, S., Markos, A

Lum, C., Deye, E., Brazil, G., Bydlon, T., Verma, S., Madhusudhana, R., Dass, R., Goel, A.: Designing anÂ ai coaching system forÂ interactive video-based skill learning. In: Graf, S., Markos, A. (eds.) Generative Systems and Intelligent Tutor- ing Systems. pp. 281–291. Springer Nature Switzerland, Cham (2026) 10 X. Jin et al

2026

[13] [13]

Means, B., Neisler, J., et al.: Suddenly online: A national survey of undergraduates during the covid-19 pandemic. Tech. rep., Digital Promise (2020)

2020

[14] [14]

Niakan Kalhori, S., Rakhshan, M., Keikha, L., Ghazi Saeedi, M.: Intelligent tutoring systems: a systematic review of charac- teristics, applications, and evaluation methods

Mousavinasab, E., Zarifsanaiey, N., R. Niakan Kalhori, S., Rakhshan, M., Keikha, L., Ghazi Saeedi, M.: Intelligent tutoring systems: a systematic review of charac- teristics, applications, and evaluation methods. Interactive learning environments 29(1), 142–163 (2021)

2021

[15] [15]

Journal of Experimental & Theoretical Artificial Intelligence 20(1), 1–36 (2008)

Murdock, J.W., Goel, A.K.: Meta-case-based reasoning: self-improvement through self-understanding. Journal of Experimental & Theoretical Artificial Intelligence 20(1), 1–36 (2008)

2008

[16] [16]

Prentice-hall (1972)

Newell, A., Simon, H.A.: Human problem solving. Prentice-hall (1972)

1972

[17] [17]

MIT Press (2013)

Norman Donald, A.: The design of everyday things. MIT Press (2013)

2013

[18] [18]

Cambridge university press (1990)

Reason, J.: Human error. Cambridge university press (1990)

1990