arxiv: 2604.22205 · v1 · submitted 2026-04-24 · 💻 cs.HC

Recognition: unknown

ArguMath: AI-Simulated Environment for Pre-Service Teacher Training in Orchestrating Classroom Mathematics Argumentation

Jiwon Chun , Yuling Zhuang , Armanto Sutedjo , Colin Xu , Rong Ren , Meng Xia

Authors on Pith no claims yet

Pith reviewed 2026-05-08 10:57 UTC · model grok-4.3

classification 💻 cs.HC

keywords pre-service teachersmathematical argumentationAI simulationclassroom orchestrationteacher trainingLLM in educationquestioning strategieseducational technology

0 comments

The pith

An AI-simulated classroom helps pre-service math teachers practice orchestrating student arguments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Pre-service mathematics teachers often lack chances to apply abstract knowledge about productive argumentation in actual classrooms. The paper introduces ArguMath after consulting experienced teachers on needed features such as personalization and realism. The system offers customizable settings, AI students drawn from real transcripts with live instructional prompts, and reflection tools that include discourse annotation. A study with seven pre-service teachers and four experienced ones indicates gains especially in theory-aligned questioning. Readers would care because the approach supplies repeatable, low-risk practice that could make this hard-to-master skill more attainable during training.

Core claim

ArguMath is an AI-simulated classroom environment for pre-service mathematics teachers to practice orchestrating mathematical argumentation. It includes customization of classroom settings, simulation of discussions with AI students grounded in authentic transcripts and augmented with real-time instructional suggestions, and structured reflection through discourse annotation and overall feedback. An exploratory user study with seven pre-service teachers and interviews with four experienced teachers suggest that ArguMath has the potential to support classroom orchestration skills, particularly theory-aligned questioning strategies.

What carries the argument

The simulation of classroom discussions using AI-based students grounded in authentic transcripts and augmented with real-time instructional suggestions.

If this is right

Pre-service teachers gain repeated access to orchestration practice without depending on live student groups.
Real-time suggestions during simulations encourage alignment with established argumentation theory.
Discourse annotation and feedback support deliberate analysis of questioning moves.
The overall design lowers barriers to developing classroom orchestration skills where authentic practice opportunities are scarce.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Incorporating input from practicing teachers during design may increase the practical relevance of the resulting training tool.
The transcript-grounded approach could make simulated interactions feel more credible than fully synthetic ones.
Similar AI environments might eventually reduce the total real-classroom hours required for initial teacher preparation.

Load-bearing premise

That practice inside the AI environment based on authentic transcripts will be realistic enough for skills to transfer to actual classrooms with real students.

What would settle it

A follow-up study in which pre-service teachers who used ArguMath show no measurable improvement in real-classroom questioning or argumentation facilitation compared with a control group.

Figures

Figures reproduced from arXiv: 2604.22205 by Armanto Sutedjo, Colin Xu, Jiwon Chun, Meng Xia, Rong Ren, Yuling Zhuang.

**Figure 1.** Figure 1: Interface of Step 2: Classroom Simulation (left) and Step 3: Strategy Reflection (right). (A) Navigation bar. (B) Virtual classroom with AI-simulated students. (C) Settings from Step 1. (D) Emoji and interaction legends. (E) Student faces and speech bubbles. (F) Student selection controls. (G) Hover view of student profiles. (H) Responding student count. (I) Full chat view. (J) Text input. (K) Get Suggest… view at source ↗

**Figure 2.** Figure 2: Means and standard errors (5-point Likert scale) for overall system comparison (Q1–Q5, left) and ArguMath-specific feature ratings (Q6–Q17, right). RQ2: Perceptions of ArguMath and Baseline As shown in view at source ↗

read the original abstract

Facilitating productive mathematical argumentation, especially asking rational questions, is essential yet remains challenging for pre-service mathematics teachers (PMTs), who often have limited opportunities to apply abstract theoretical knowledge in authentic practice. At the same time, recent advances in large language models (LLMs) have expanded the potential for simulating students in educational settings, enabling low-risk environments for instructional practice. To inform the design of a system that supports PMTs in orchestrating classroom argumentation, we conducted a formative study with eight experienced mathematics teachers to identify key design requirements, including personalization, realistic simulations, structured reflection, and ease of use. Building on these requirements, we developed ArguMath, an AI-simulated classroom environment that supports PMTs in practicing the orchestration of mathematical argumentation. ArguMath comprises three core components: (1) customization of classroom settings; (2) simulation of classroom discussions with AI-based students grounded in authentic transcripts and augmented with real-time instructional suggestions; and (3) structured reflection through discourse annotation and overall feedback. Results from an exploratory user study with seven PMTs, complemented by interviews with four experienced teachers, indicate that ArguMath has the potential to support PMTs' classroom orchestration skills, particularly theory-aligned questioning strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ArguMath offers a sensible AI simulation design for math teacher training grounded in real transcripts and teacher input, but the n=7 exploratory study with only self-reports gives weak support for claims about skill development.

read the letter

Hi, the main thing to know is that this paper describes a new system called ArguMath that lets pre-service math teachers practice leading classroom arguments through AI-simulated students. The simulations draw from authentic transcripts and include real-time suggestions plus a reflection tool for annotating the discussion. That combination addresses a clear gap in giving teachers low-stakes practice with orchestration skills like asking good questions. They started right by running a formative study with eight experienced teachers to identify needs such as personalization and realistic responses, then built the three parts around those. Using transcript data for the AI students is a practical choice that should keep the practice closer to real classrooms than off-the-shelf chatbots. The reflection component with discourse annotation also ties directly to helping users connect theory to what they just did. The evaluation is the clear weak point. Seven pre-service teachers plus four interviews produced only qualitative feedback and self-reports. Without pre/post measures, a control group, or any check on whether skills transfer to actual teaching, the results cannot separate the tool's effect from novelty or general reflection. The abstract's statement about potential support for theory-aligned questioning therefore rests on thin ground. This work would interest people building AI tools for teacher education or studying classroom discourse. A reader focused on system design could take away useful details on the requirements and components. Anyone needing evidence of actual learning gains will find it preliminary. It deserves peer review because the design is thoughtful and the problem matters, even though the study needs more rigor to strengthen the conclusions.

Referee Report

3 major / 4 minor

Summary. The manuscript introduces ArguMath, an AI-simulated classroom environment for training pre-service mathematics teachers (PMTs) in orchestrating mathematical argumentation. It details a formative study with eight experienced teachers to identify design requirements (personalization, realistic simulations, structured reflection, ease of use), followed by system development with three components: classroom customization, LLM-based student simulations grounded in authentic transcripts plus real-time instructional suggestions, and discourse annotation for reflection. An exploratory user study with seven PMTs, supplemented by interviews with four experienced teachers, indicates potential for supporting orchestration skills, especially theory-aligned questioning strategies.

Significance. If substantiated, this work could meaningfully advance HCI applications in teacher education by offering scalable, low-risk practice for complex pedagogical skills that are difficult to rehearse in live classrooms. The grounding of simulations in authentic transcripts and the integration of annotation-based reflection represent thoughtful, theory-informed design choices that align with established principles in educational technology. The formative-to-exploratory workflow is appropriate for an HCI design paper and provides a useful template for future LLM-supported professional development tools.

major comments (3)

[User Study / Results] User Study section: The central claim that ArguMath 'has the potential to support PMTs' classroom orchestration skills, particularly theory-aligned questioning strategies' rests entirely on post-session qualitative self-reports, annotations, and interviews from seven PMTs (plus four teacher interviews). No pre/post quantitative rubric scores, control condition, objective performance metrics, or transfer test to live classrooms are described, leaving the evidence unable to distinguish tool effects from novelty, social desirability, or general reflection practice.
[System Design] System Design / Simulation Component: The claim that interactions with 'AI students grounded in authentic transcripts' produce sufficiently realistic practice for developing transferable orchestration skills is load-bearing but unsupported by any validation data. The manuscript provides no details on transcript selection criteria, processing methods, fidelity checks against real student argumentation patterns, or empirical testing of simulation realism.
[Discussion] Discussion / Evaluation: The four teacher interviews are presented as complementing the PMT data, yet the manuscript does not report specific findings from these interviews, how they triangulate with PMT feedback, or any resulting design changes. This weakens the evidential basis for the 'potential' conclusion.

minor comments (4)

[Abstract / Methods] Clarify participant counts and roles for consistency: the abstract and text alternate between seven PMTs for the exploratory study and eight teachers for the formative study; ensure all numbers and selection criteria are stated uniformly.
[System Implementation] Add reproducibility details for the LLM component: specify the model (e.g., GPT-4 version), prompting strategy, temperature, and any fine-tuning or few-shot examples used to generate student responses and instructional suggestions.
[Figures / System Description] Improve figure and interface descriptions: include example screenshots of the customization panel, real-time suggestion overlay, and annotation interface with captions that explicitly link visual elements to the three core components.
[Abstract / Results] Minor phrasing: the phrase 'complemented by interviews with four experienced teachers' in the abstract and results could be expanded to indicate whether these teachers were distinct from the formative-study cohort.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the positive assessment of our work's significance and for the constructive major comments. We address each point below, providing clarifications and indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [User Study / Results] User Study section: The central claim that ArguMath 'has the potential to support PMTs' classroom orchestration skills, particularly theory-aligned questioning strategies' rests entirely on post-session qualitative self-reports, annotations, and interviews from seven PMTs (plus four teacher interviews). No pre/post quantitative rubric scores, control condition, objective performance metrics, or transfer test to live classrooms are described, leaving the evidence unable to distinguish tool effects from novelty, social desirability, or general reflection practice.

Authors: We agree that the study is exploratory and relies on qualitative data, which is standard for initial HCI design papers introducing novel systems. The goal was to gather preliminary user insights into potential benefits rather than to establish causal effects or generalizability. We will revise the abstract, results, and discussion sections to more explicitly frame the claims as indicative of potential, highlight the exploratory nature, and discuss limitations including the absence of controls or objective metrics. We will also expand the future work section to outline plans for subsequent studies with quantitative measures and transfer tests. revision: partial
Referee: [System Design] System Design / Simulation Component: The claim that interactions with 'AI students grounded in authentic transcripts' produce sufficiently realistic practice for developing transferable orchestration skills is load-bearing but unsupported by any validation data. The manuscript provides no details on transcript selection criteria, processing methods, fidelity checks against real student argumentation patterns, or empirical testing of simulation realism.

Authors: We acknowledge that additional details on the simulation grounding would improve transparency. The AI students are prompted using segments from published classroom transcripts in mathematics education literature focused on argumentation. In the revised manuscript, we will expand the System Design section to specify transcript sources, selection criteria (e.g., episodes demonstrating student reasoning and argumentation), processing methods (anonymization, segmentation into turns, and adaptation for LLM input), and how fidelity to real patterns was considered during design. While a dedicated empirical validation study of simulation realism was outside the scope of this work, the approach was shaped by input from the formative study. We will add explicit discussion of this as a limitation. revision: yes
Referee: [Discussion] Discussion / Evaluation: The four teacher interviews are presented as complementing the PMT data, yet the manuscript does not report specific findings from these interviews, how they triangulate with PMT feedback, or any resulting design changes. This weakens the evidential basis for the 'potential' conclusion.

Authors: We agree that the reporting of the teacher interviews was incomplete and will address this. These interviews served to validate the initial design requirements and offer expert perspectives on the system's features. In the revision, we will add a dedicated subsection summarizing the main themes from the interviews (such as support for realistic AI student responses and the utility of discourse annotation), describe triangulation with PMT feedback (showing alignment on benefits for questioning strategies), and note any resulting design adjustments incorporated into ArguMath. revision: yes

Circularity Check

0 steps flagged

No significant circularity; evaluation is external and non-derivational

full rationale

The paper describes a formative study with eight teachers to elicit design requirements, followed by system implementation and an exploratory user study with seven PMTs plus four teacher interviews. No equations, fitted parameters, predictions, or derivations appear in the provided text. The central claim rests on qualitative external feedback rather than any self-referential reduction of outputs to inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing steps. This is a standard iterative design-and-evaluation workflow with no mathematical or predictive circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard assumptions from educational technology and teacher education rather than new mathematical or empirical foundations.

axioms (2)

domain assumption AI language models can produce student-like responses in classroom math discussions that are realistic enough for skill practice when seeded with authentic transcripts
Invoked in the description of the simulation component.
domain assumption Structured reflection via discourse annotation and feedback improves pre-service teachers' ability to orchestrate argumentation
Central to the third core component and the overall design rationale.

pith-pipeline@v0.9.0 · 5534 in / 1449 out tokens · 34810 ms · 2026-05-08T10:57:27.414563+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 2 canonical work pages

[1]

International Journal of Higher Education ArguMath: AI-Simulated Environment for Pre-Service Teacher Training 9 7(1), 111–125 (2018)

Arseven, I.: The use of qualitative case studies as an experiential teaching method in the training of pre-service teachers. International Journal of Higher Education ArguMath: AI-Simulated Environment for Pre-Service Teacher Training 9 7(1), 111–125 (2018)

2018
[2]

Mathematics Education Research Journal29(2), 183–199 (2017)

Brown, R.: Using collective argumentation to engage students in a primary mathe- matics classroom. Mathematics Education Research Journal29(2), 183–199 (2017)

2017
[3]

In: Adjunct Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology

Chun, J., Zhang, G., Xia, M.: Conflictlens: Llm-based conflict resolution train- ing in romantic relationship. In: Adjunct Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. pp. 1–3 (2025)

2025
[4]

Educational Studies in Mathematics86(3), 401–429 (2014)

Conner, A., Singletary, L.M., Smith, R.C., Wagner, P.A., Francisco, R.T.: Teacher support for collective argumentation: A framework for examining how teachers support students’ engagement in mathematical activities. Educational Studies in Mathematics86(3), 401–429 (2014)

2014
[5]

Teachers college record111(9), 2055–2100 (2009)

Grossman, P., Compton, C., Igra, D., Ronfeldt, M., Shahan, E., Williamson, P.W.: Teaching practice: A cross-professional perspective. Teachers college record111(9), 2055–2100 (2009)

2055
[6]

In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems

Jin, H., Yoo, M., Park, J., Lee, Y., Wang, X., Kim, J.: Teachtune: Reviewing pedagogical agents against diverse student profiles with simulated students. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. pp. 1–28 (2025)

2025
[7]

Advances in neural information processing systems 33, 9459–9474 (2020)

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., et al.: Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33, 9459–9474 (2020)

2020
[8]

In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems

Pan, S., Schmucker, R., Garcia Bulle Bueno, B., Llanes, S.A., Albo Alarcón, F., Zhu, H., Teo, A., Xia, M.: Tutorup: What if your students were simulated? training tutors to address engagement challenges in online learning. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. pp. 1–18 (2025)

2025
[9]

Cambridge university press (2003)

Toulmin, S.E.: The uses of argument. Cambridge university press (2003)

2003
[10]

North American Chapter of the International Group for the Psychology of Mathematics Education (2013)

Wagner, P.A., Smith, R.C., Conner, A., Francisco, R.T., Singletary, L.: Using toul- min’s model to develop prospective teachers’ conceptions of collective argumenta- tion. North American Chapter of the International Group for the Psychology of Mathematics Education (2013)

2013
[11]

Mathematics Teacher Educator3(1), 8–26 (2014)

Wagner, P.A., Smith, R.C., Conner, A., Singletary, L.M., Francisco, R.T.: Using toulmin’s model to develop prospective secondary mathematics teachers’ concep- tions of collective argumentation. Mathematics Teacher Educator3(1), 8–26 (2014)

2014
[12]

Journal of Science Teacher Education36(3), 397–423 (2025)

Wess, R., Priemer, B.: Pre-service teachers’ professional vision of argumentation opportunities in their analysis of videotaped science classroom episodes. Journal of Science Teacher Education36(3), 397–423 (2025)

2025
[13]

Embracing imperfection: Simulating students with diverse cognitive levels using LLM-based agents.arXiv preprint arXiv:2505.19997, 2025

Wu, T., Chen, J., Lin, W., Li, M., Zhu, Y., Li, A., Kuang, K., Wu, F.: Embrac- ing imperfection: Simulating students with diverse cognitive levels using llm-based agents. arXiv preprint arXiv:2505.19997 (2025)

work page arXiv 2025
[14]

In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems

Xu, S., Wen, H.N., Pan, H., Dominguez, D., Hu, D., Zhang, X.: Classroom simu- lacra: Building contextual student generative agents in online education for learn- ing behavioral simulation. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. pp. 1–26 (2025)

2025
[15]

Leveraging generative artificial intelligence to simulate student learning behavior.arXiv preprint arXiv:2310.19206, 2023

Xu, S., Zhang, X.: Leveraging generative artificial intelligence to simulate student learning behavior. arXiv preprint arXiv:2310.19206 (2023)

work page arXiv 2023
[16]

Educational Studies in Mathe- matics111(2), 345–365 (2022)

Zhuang, Y., Conner, A.: Teachers’ use of rational questioning strategies to promote student participation in collective argumentation. Educational Studies in Mathe- matics111(2), 345–365 (2022)

2022