Recognition: unknown
ArguMath: AI-Simulated Environment for Pre-Service Teacher Training in Orchestrating Classroom Mathematics Argumentation
Pith reviewed 2026-05-08 10:57 UTC · model grok-4.3
The pith
An AI-simulated classroom helps pre-service math teachers practice orchestrating student arguments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ArguMath is an AI-simulated classroom environment for pre-service mathematics teachers to practice orchestrating mathematical argumentation. It includes customization of classroom settings, simulation of discussions with AI students grounded in authentic transcripts and augmented with real-time instructional suggestions, and structured reflection through discourse annotation and overall feedback. An exploratory user study with seven pre-service teachers and interviews with four experienced teachers suggest that ArguMath has the potential to support classroom orchestration skills, particularly theory-aligned questioning strategies.
What carries the argument
The simulation of classroom discussions using AI-based students grounded in authentic transcripts and augmented with real-time instructional suggestions.
If this is right
- Pre-service teachers gain repeated access to orchestration practice without depending on live student groups.
- Real-time suggestions during simulations encourage alignment with established argumentation theory.
- Discourse annotation and feedback support deliberate analysis of questioning moves.
- The overall design lowers barriers to developing classroom orchestration skills where authentic practice opportunities are scarce.
Where Pith is reading between the lines
- Incorporating input from practicing teachers during design may increase the practical relevance of the resulting training tool.
- The transcript-grounded approach could make simulated interactions feel more credible than fully synthetic ones.
- Similar AI environments might eventually reduce the total real-classroom hours required for initial teacher preparation.
Load-bearing premise
That practice inside the AI environment based on authentic transcripts will be realistic enough for skills to transfer to actual classrooms with real students.
What would settle it
A follow-up study in which pre-service teachers who used ArguMath show no measurable improvement in real-classroom questioning or argumentation facilitation compared with a control group.
Figures
read the original abstract
Facilitating productive mathematical argumentation, especially asking rational questions, is essential yet remains challenging for pre-service mathematics teachers (PMTs), who often have limited opportunities to apply abstract theoretical knowledge in authentic practice. At the same time, recent advances in large language models (LLMs) have expanded the potential for simulating students in educational settings, enabling low-risk environments for instructional practice. To inform the design of a system that supports PMTs in orchestrating classroom argumentation, we conducted a formative study with eight experienced mathematics teachers to identify key design requirements, including personalization, realistic simulations, structured reflection, and ease of use. Building on these requirements, we developed ArguMath, an AI-simulated classroom environment that supports PMTs in practicing the orchestration of mathematical argumentation. ArguMath comprises three core components: (1) customization of classroom settings; (2) simulation of classroom discussions with AI-based students grounded in authentic transcripts and augmented with real-time instructional suggestions; and (3) structured reflection through discourse annotation and overall feedback. Results from an exploratory user study with seven PMTs, complemented by interviews with four experienced teachers, indicate that ArguMath has the potential to support PMTs' classroom orchestration skills, particularly theory-aligned questioning strategies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces ArguMath, an AI-simulated classroom environment for training pre-service mathematics teachers (PMTs) in orchestrating mathematical argumentation. It details a formative study with eight experienced teachers to identify design requirements (personalization, realistic simulations, structured reflection, ease of use), followed by system development with three components: classroom customization, LLM-based student simulations grounded in authentic transcripts plus real-time instructional suggestions, and discourse annotation for reflection. An exploratory user study with seven PMTs, supplemented by interviews with four experienced teachers, indicates potential for supporting orchestration skills, especially theory-aligned questioning strategies.
Significance. If substantiated, this work could meaningfully advance HCI applications in teacher education by offering scalable, low-risk practice for complex pedagogical skills that are difficult to rehearse in live classrooms. The grounding of simulations in authentic transcripts and the integration of annotation-based reflection represent thoughtful, theory-informed design choices that align with established principles in educational technology. The formative-to-exploratory workflow is appropriate for an HCI design paper and provides a useful template for future LLM-supported professional development tools.
major comments (3)
- [User Study / Results] User Study section: The central claim that ArguMath 'has the potential to support PMTs' classroom orchestration skills, particularly theory-aligned questioning strategies' rests entirely on post-session qualitative self-reports, annotations, and interviews from seven PMTs (plus four teacher interviews). No pre/post quantitative rubric scores, control condition, objective performance metrics, or transfer test to live classrooms are described, leaving the evidence unable to distinguish tool effects from novelty, social desirability, or general reflection practice.
- [System Design] System Design / Simulation Component: The claim that interactions with 'AI students grounded in authentic transcripts' produce sufficiently realistic practice for developing transferable orchestration skills is load-bearing but unsupported by any validation data. The manuscript provides no details on transcript selection criteria, processing methods, fidelity checks against real student argumentation patterns, or empirical testing of simulation realism.
- [Discussion] Discussion / Evaluation: The four teacher interviews are presented as complementing the PMT data, yet the manuscript does not report specific findings from these interviews, how they triangulate with PMT feedback, or any resulting design changes. This weakens the evidential basis for the 'potential' conclusion.
minor comments (4)
- [Abstract / Methods] Clarify participant counts and roles for consistency: the abstract and text alternate between seven PMTs for the exploratory study and eight teachers for the formative study; ensure all numbers and selection criteria are stated uniformly.
- [System Implementation] Add reproducibility details for the LLM component: specify the model (e.g., GPT-4 version), prompting strategy, temperature, and any fine-tuning or few-shot examples used to generate student responses and instructional suggestions.
- [Figures / System Description] Improve figure and interface descriptions: include example screenshots of the customization panel, real-time suggestion overlay, and annotation interface with captions that explicitly link visual elements to the three core components.
- [Abstract / Results] Minor phrasing: the phrase 'complemented by interviews with four experienced teachers' in the abstract and results could be expanded to indicate whether these teachers were distinct from the formative-study cohort.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work's significance and for the constructive major comments. We address each point below, providing clarifications and indicating where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: [User Study / Results] User Study section: The central claim that ArguMath 'has the potential to support PMTs' classroom orchestration skills, particularly theory-aligned questioning strategies' rests entirely on post-session qualitative self-reports, annotations, and interviews from seven PMTs (plus four teacher interviews). No pre/post quantitative rubric scores, control condition, objective performance metrics, or transfer test to live classrooms are described, leaving the evidence unable to distinguish tool effects from novelty, social desirability, or general reflection practice.
Authors: We agree that the study is exploratory and relies on qualitative data, which is standard for initial HCI design papers introducing novel systems. The goal was to gather preliminary user insights into potential benefits rather than to establish causal effects or generalizability. We will revise the abstract, results, and discussion sections to more explicitly frame the claims as indicative of potential, highlight the exploratory nature, and discuss limitations including the absence of controls or objective metrics. We will also expand the future work section to outline plans for subsequent studies with quantitative measures and transfer tests. revision: partial
-
Referee: [System Design] System Design / Simulation Component: The claim that interactions with 'AI students grounded in authentic transcripts' produce sufficiently realistic practice for developing transferable orchestration skills is load-bearing but unsupported by any validation data. The manuscript provides no details on transcript selection criteria, processing methods, fidelity checks against real student argumentation patterns, or empirical testing of simulation realism.
Authors: We acknowledge that additional details on the simulation grounding would improve transparency. The AI students are prompted using segments from published classroom transcripts in mathematics education literature focused on argumentation. In the revised manuscript, we will expand the System Design section to specify transcript sources, selection criteria (e.g., episodes demonstrating student reasoning and argumentation), processing methods (anonymization, segmentation into turns, and adaptation for LLM input), and how fidelity to real patterns was considered during design. While a dedicated empirical validation study of simulation realism was outside the scope of this work, the approach was shaped by input from the formative study. We will add explicit discussion of this as a limitation. revision: yes
-
Referee: [Discussion] Discussion / Evaluation: The four teacher interviews are presented as complementing the PMT data, yet the manuscript does not report specific findings from these interviews, how they triangulate with PMT feedback, or any resulting design changes. This weakens the evidential basis for the 'potential' conclusion.
Authors: We agree that the reporting of the teacher interviews was incomplete and will address this. These interviews served to validate the initial design requirements and offer expert perspectives on the system's features. In the revision, we will add a dedicated subsection summarizing the main themes from the interviews (such as support for realistic AI student responses and the utility of discourse annotation), describe triangulation with PMT feedback (showing alignment on benefits for questioning strategies), and note any resulting design adjustments incorporated into ArguMath. revision: yes
Circularity Check
No significant circularity; evaluation is external and non-derivational
full rationale
The paper describes a formative study with eight teachers to elicit design requirements, followed by system implementation and an exploratory user study with seven PMTs plus four teacher interviews. No equations, fitted parameters, predictions, or derivations appear in the provided text. The central claim rests on qualitative external feedback rather than any self-referential reduction of outputs to inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing steps. This is a standard iterative design-and-evaluation workflow with no mathematical or predictive circularity.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption AI language models can produce student-like responses in classroom math discussions that are realistic enough for skill practice when seeded with authentic transcripts
- domain assumption Structured reflection via discourse annotation and feedback improves pre-service teachers' ability to orchestrate argumentation
Reference graph
Works this paper leans on
-
[1]
International Journal of Higher Education ArguMath: AI-Simulated Environment for Pre-Service Teacher Training 9 7(1), 111–125 (2018)
Arseven, I.: The use of qualitative case studies as an experiential teaching method in the training of pre-service teachers. International Journal of Higher Education ArguMath: AI-Simulated Environment for Pre-Service Teacher Training 9 7(1), 111–125 (2018)
2018
-
[2]
Mathematics Education Research Journal29(2), 183–199 (2017)
Brown, R.: Using collective argumentation to engage students in a primary mathe- matics classroom. Mathematics Education Research Journal29(2), 183–199 (2017)
2017
-
[3]
In: Adjunct Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology
Chun, J., Zhang, G., Xia, M.: Conflictlens: Llm-based conflict resolution train- ing in romantic relationship. In: Adjunct Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. pp. 1–3 (2025)
2025
-
[4]
Educational Studies in Mathematics86(3), 401–429 (2014)
Conner, A., Singletary, L.M., Smith, R.C., Wagner, P.A., Francisco, R.T.: Teacher support for collective argumentation: A framework for examining how teachers support students’ engagement in mathematical activities. Educational Studies in Mathematics86(3), 401–429 (2014)
2014
-
[5]
Teachers college record111(9), 2055–2100 (2009)
Grossman, P., Compton, C., Igra, D., Ronfeldt, M., Shahan, E., Williamson, P.W.: Teaching practice: A cross-professional perspective. Teachers college record111(9), 2055–2100 (2009)
2055
-
[6]
In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems
Jin, H., Yoo, M., Park, J., Lee, Y., Wang, X., Kim, J.: Teachtune: Reviewing pedagogical agents against diverse student profiles with simulated students. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. pp. 1–28 (2025)
2025
-
[7]
Advances in neural information processing systems 33, 9459–9474 (2020)
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., et al.: Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33, 9459–9474 (2020)
2020
-
[8]
In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems
Pan, S., Schmucker, R., Garcia Bulle Bueno, B., Llanes, S.A., Albo Alarcón, F., Zhu, H., Teo, A., Xia, M.: Tutorup: What if your students were simulated? training tutors to address engagement challenges in online learning. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. pp. 1–18 (2025)
2025
-
[9]
Cambridge university press (2003)
Toulmin, S.E.: The uses of argument. Cambridge university press (2003)
2003
-
[10]
North American Chapter of the International Group for the Psychology of Mathematics Education (2013)
Wagner, P.A., Smith, R.C., Conner, A., Francisco, R.T., Singletary, L.: Using toul- min’s model to develop prospective teachers’ conceptions of collective argumenta- tion. North American Chapter of the International Group for the Psychology of Mathematics Education (2013)
2013
-
[11]
Mathematics Teacher Educator3(1), 8–26 (2014)
Wagner, P.A., Smith, R.C., Conner, A., Singletary, L.M., Francisco, R.T.: Using toulmin’s model to develop prospective secondary mathematics teachers’ concep- tions of collective argumentation. Mathematics Teacher Educator3(1), 8–26 (2014)
2014
-
[12]
Journal of Science Teacher Education36(3), 397–423 (2025)
Wess, R., Priemer, B.: Pre-service teachers’ professional vision of argumentation opportunities in their analysis of videotaped science classroom episodes. Journal of Science Teacher Education36(3), 397–423 (2025)
2025
-
[13]
Wu, T., Chen, J., Lin, W., Li, M., Zhu, Y., Li, A., Kuang, K., Wu, F.: Embrac- ing imperfection: Simulating students with diverse cognitive levels using llm-based agents. arXiv preprint arXiv:2505.19997 (2025)
-
[14]
In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems
Xu, S., Wen, H.N., Pan, H., Dominguez, D., Hu, D., Zhang, X.: Classroom simu- lacra: Building contextual student generative agents in online education for learn- ing behavioral simulation. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. pp. 1–26 (2025)
2025
-
[15]
Xu, S., Zhang, X.: Leveraging generative artificial intelligence to simulate student learning behavior. arXiv preprint arXiv:2310.19206 (2023)
-
[16]
Educational Studies in Mathe- matics111(2), 345–365 (2022)
Zhuang, Y., Conner, A.: Teachers’ use of rational questioning strategies to promote student participation in collective argumentation. Educational Studies in Mathe- matics111(2), 345–365 (2022)
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.