pith. sign in

arxiv: 2606.12419 · v1 · pith:3CGBT3TPnew · submitted 2026-05-08 · 💻 cs.CY · cs.AI

GeoDial: A Multimodal Conversational Tutoring Dataset for Geometry Problem-Solving with Visual Tutor Turns

Pith reviewed 2026-06-30 23:17 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords multimodal tutoring datasetgeometry problem-solvingvisual groundingdiagram highlightsdialog actsvision-language modelseducational AIconversational tutoring
0
0 comments X

The pith

GeoDial supplies 1.3K geometry tutoring dialogs where each teacher turn is paired with explicit diagram highlights.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GeoDial, a multimodal dataset of over 1,300 teacher-student conversations in geometry collected from experienced instructors, with every instructional utterance tied to specific diagram highlights. Most prior tutoring datasets contain only text, which prevents models from learning the visual pointing and highlighting that human teachers use. The authors supply an annotation protocol that records dialog acts, highlight regions, and feedback in one pass. Fine-tuning vision-language models on the data raises the quality of generated tutoring language yet leaves diagram-highlight accuracy low, showing that current methods still separate language from visual reasoning.

Core claim

GeoDial is a dataset of more than 1,300 teacher-student dialogs in geometry in which instructional turns are grounded in diagram highlights; a scalable annotation protocol records dialog acts, visual highlights, and feedback together, and supervised fine-tuning of vision-language models improves generated utterances but not the accuracy of the highlights.

What carries the argument

The annotation protocol that jointly labels dialog acts, diagram highlight regions, and feedback to supervise both language and visual tutoring actions.

If this is right

  • Supervised fine-tuning on GeoDial raises the quality of generated tutoring utterances.
  • The same fine-tuned models still produce inaccurate diagram highlights.
  • Current vision-language methods do not yet integrate visual reasoning with pedagogical interaction at the level needed for tutoring.
  • New techniques that couple visual grounding more tightly with dialog generation are required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dataset could support tutors that generate live diagram annotations during explanations rather than after-the-fact text.
  • The same annotation style could be applied to tutoring in other diagram-rich subjects such as mechanics or organic chemistry.
  • Separate pre-training on visual grounding tasks before dialog fine-tuning might close the observed highlight gap.
  • Controlled classroom trials could test whether students using models trained on GeoDial solve geometry problems faster than those using text-only tutors.

Load-bearing premise

Dialogs collected from experienced math teachers with this annotation protocol capture effective visual tutoring strategies that transfer to training AI tutors.

What would settle it

If vision-language models fine-tuned on GeoDial produce no measurable gain in highlight accuracy or student understanding on a new set of geometry problems compared with text-only baselines, the dataset's claimed training value would be refuted.

Figures

Figures reproduced from arXiv: 2606.12419 by April Yi Wang, Donya Rooein, Junling Wang, Mrinmaya Sachan, Sankalan Pal Chowdhury.

Figure 1
Figure 1. Figure 1: Flowchart showing our setup to collect dialogs. Surrounding infoboxes give examples of the corresponding step in the flowchart with the same color. Our experiments show that standard training on GeoDial improves pedagogical tutor-turn gen￾eration, but does not produce teacher-like dia￾gram highlights: models learn to abstain more often, yet still struggle to select the correct vi￾sual elements. Further ana… view at source ↗
Figure 2
Figure 2. Figure 2: Tutor strategies (acts and subacts) in GeoDial. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Dialog Act Statistics. All 3 subacts of Generic sit among the top-5 sub￾acts with 11.3%, 11.2% and 9.2% respectively for Farewell, Continue, and Introduce. Intro￾duce and Farewell account for 78% of first utter￾ances and 90% of final utterances respectively. Rounding up the top-5 subacts, we have Calcu￾late and GetRelation at 9.9% and 7.5%. A full pie chart of subacts is presented in [PITH_FULL_IMAGE:figu… view at source ↗
Figure 14
Figure 14. Figure 14: 4.2.2 Annotator Interviews We interviewed 5 of our 11 main contributors to get a high level view of how realistic the AI students and their confusions felt, how it compared to their real life students, and most importantly, if the student fidelity was good enough such that the collected conversations would reflect their real teaching strategies. All of these teachers had at least three years of teaching e… view at source ↗
Figure 4
Figure 4. Figure 4: Example of a line highlight on a diagram. The Tutor is possibly trying to get the student to [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: In the left image, the length marker, which is not part of the diagram is highlighted. On the [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Example of the temporary numeric overlay on a diagram used for node-label matching. [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Example of automatic line highlighting. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Example of automatic angle highlighting. [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Example of automatic label highlighting. [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Example of automatic arc highlighting. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Histogram showing the distribution of conversation lengths in GeoDial. Only teacher [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Histogram showing distribution of number of highlighted diagrams per conversation. All [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Percentage of each subact in GeoDial. Subacts belonging to the same act use similar [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Full distribution of answers to the debrief questions [PITH_FULL_IMAGE:figures/full_fig_p023_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Our annotation interface. Highlighted elements indicate 1. The diagram which can be [PITH_FULL_IMAGE:figures/full_fig_p029_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Quiz questions for filtering out annotators who did not do the onboarding properly. Selected [PITH_FULL_IMAGE:figures/full_fig_p030_16.png] view at source ↗
read the original abstract

Several educational domains rely heavily on diagrams and visual cues, yet most existing tutoring datasets are limited to text-only interactions. This limits the development of AI tutors that can teach in visually grounded ways used by human instructors. Thus, we introduce GeoDial, a multimodal tutoring dataset of over 1.3K teacher-student dialogs in the domain of geometry collected from experienced math teachers, where instructional turns are explicitly grounded in diagram highlights. We propose a scalable annotation protocol that integrates dialog acts, visual highlighting, and feedback, enabling fine-grained supervision of both language and visual tutoring behavior. To illustrate the challenges posed by this setting, we fine-tune several vision-language models on GeoDial and evaluate their ability to generate tutoring utterances and diagram highlights. While supervised fine-tuning substantially improves the quality of generated dialog, it struggles to produce accurate diagram highlights, revealing a key limitation of current methods and highlighting the need for approaches that more effectively integrate visual reasoning with pedagogical interaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper introduces GeoDial, a multimodal tutoring dataset of over 1.3K teacher-student dialogs in geometry collected from experienced math teachers, with instructional turns explicitly grounded in diagram highlights. It proposes a scalable annotation protocol integrating dialog acts, visual highlighting, and feedback. Experiments fine-tune several vision-language models on the dataset and report that supervised fine-tuning substantially improves generated dialog quality but struggles to produce accurate diagram highlights.

Significance. If the collected dialogs and annotation protocol prove reliable, the dataset could meaningfully advance research on visually grounded AI tutors by addressing the gap in text-only educational datasets and providing explicit supervision for both language and visual actions. The release of such grounded multimodal data is a clear strength for the field.

minor comments (1)
  1. [Abstract] Abstract: no quantitative details on inter-annotator agreement, dataset statistics beyond the total count, evaluation metrics, or baseline comparisons are provided, which would allow readers to assess the robustness of the VLM limitation claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of GeoDial, the accurate summary of our contributions, and the recommendation for minor revision. We are pleased that the potential impact of releasing this visually grounded tutoring dataset is recognized.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces a multimodal dataset (GeoDial) collected from teachers, proposes an annotation protocol integrating dialog acts/visual highlights/feedback, and reports illustrative SFT experiments on vision-language models. No mathematical derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described content. The central claims rest on the external collection process and model evaluations rather than any internal reduction to inputs by construction. This is a standard dataset paper whose contribution is self-contained and falsifiable via the released data and replication of the reported metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central contribution rests on the representativeness of teacher-collected dialogs and the fidelity of the annotation protocol; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Dialogs collected from experienced math teachers constitute high-quality examples of visually grounded tutoring suitable for training AI systems.
    The dataset construction explicitly relies on this source of data without further validation described in the abstract.

pith-pipeline@v0.9.1-grok · 5711 in / 1270 out tokens · 30474 ms · 2026-06-30T23:17:06.222854+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

75 extracted references · 24 canonical work pages · 2 internal anchors

  1. [1]

    1969 , publisher=

    Audiovisual methods in teaching , author=. 1969 , publisher=

  2. [2]

    Monographs on statistics and applied probability , volume=

    An introduction to the bootstrap , author=. Monographs on statistics and applied probability , volume=

  3. [3]

    Statistical Significance Tests for Machine Translation Evaluation

    Koehn, Philipp. Statistical Significance Tests for Machine Translation Evaluation. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004

  4. [4]

    I nter- GPS : Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

    Lu, Pan and Gong, Ran and Jiang, Shibiao and Qiu, Liang and Huang, Siyuan and Liang, Xiaodan and Zhu, Song-Chun. I nter- GPS : Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Lan...

  5. [5]

    2020 , eprint=

    Scaling Laws for Neural Language Models , author=. 2020 , eprint=

  6. [6]

    2025 , eprint=

    LearnLM: Improving Gemini for Learning , author=. 2025 , eprint=

  7. [7]

    BLEU : a method for automatic evaluation of machine translation

    Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , title =. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , pages =. 2002 , publisher =. doi:10.3115/1073083.1073135 , abstract =

  8. [8]

    Learning and instruction , volume=

    Teacher emotions are linked with teaching quality: Cross-sectional and longitudinal evidence from two field studies , author=. Learning and instruction , volume=. 2023 , publisher=

  9. [9]

    Memory , volume=

    How much is remembered as a function of presentation modality? , author=. Memory , volume=. 2019 , publisher=

  10. [10]

    History of Education Quarterly , volume=

    An officer and a scholar: Nineteenth-century West Point and the invention of the blackboard , author=. History of Education Quarterly , volume=. 2015 , publisher=

  11. [11]

    Psychonomic bulletin & review , volume=

    Cognitive tutor: Applied research in mathematics education , author=. Psychonomic bulletin & review , volume=. 2007 , publisher=

  12. [12]

    , author=

    Explanation feedback is better than correct answer feedback for promoting transfer of learning. , author=. Journal of Educational Psychology , volume=. 2013 , publisher=

  13. [13]

    Reiser , title =

    Brian J. Reiser , title =. Journal of the Learning Sciences , volume =. 2004 , publisher =. doi:10.1207/s15327809jls1303\_2 , URL =

  14. [14]

    and Sumner, Tamara

    Suresh, Abhijit and Jacobs, Jennifer and Harty, Charis and Perkoff, Margaret and Martin, James H. and Sumner, Tamara. The T alk M oves Dataset: K-12 Mathematics Lesson Transcripts Annotated for Teacher and Student Discursive Moves. Proceedings of the Thirteenth Language Resources and Evaluation Conference. 2022

  15. [15]

    and Aleven, Vincent and Heffernan, Neil and McLaren, Bruce and Hockenberry, Matthew , editor=

    Koedinger, Kenneth R. and Aleven, Vincent and Heffernan, Neil and McLaren, Bruce and Hockenberry, Matthew , editor=. Opening the Door to Non-programmers:. Intelligent Tutoring Systems , year=

  16. [16]

    Can LLM s Effectively Simulate Human Learners? Teachers' Insights from Tutoring LLM Students

    Martynova, Daria and Macina, Jakub and Daheim, Nico and Yalcin, Nilay and Zhang, Xiaoyu and Sachan, Mrinmaya. Can LLM s Effectively Simulate Human Learners? Teachers' Insights from Tutoring LLM Students. Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025). 2025. doi:10.18653/v1/2025.bea-1.8

  17. [17]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Swift: a scalable lightweight infrastructure for fine-tuning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  18. [18]

    and Demszky, Dorottya and Koedinger, Kenneth R

    Thomas, Danielle R. and Demszky, Dorottya and Koedinger, Kenneth R. and Marland, Joshua and Pietrzak, Doug and Reich, Justin and Slama, Rachel and Toutziaridi, Amalia and Kizilcec, Ren\'. Advancing the Science of Teaching with Tutoring Data: A Collaborative Workshop with the National Tutoring Observatory , year =. Proceedings of the Twelfth ACM Conference...

  19. [19]

    Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes

    Wang, Rose and Zhang, Qingyang and Robinson, Carly and Loeb, Susanna and Demszky, Dorottya. Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long P...

  20. [20]

    2020 , eprint=

    BERTScore: Evaluating Text Generation with BERT , author=. 2020 , eprint=

  21. [21]

    2024 , eprint=

    The Llama 3 Herd of Models , author=. 2024 , eprint=

  22. [22]

    Proceedings of the AAAI Conference on Artificial Intelligence , author=

    Diagram Understanding in Geometry Questions , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2014 , month=. doi:10.1609/aaai.v28i1.9146 , abstractNote=

  23. [23]

    Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications , year=

    Fine-tuning transformers with additional context to classify discursive moves in mathematics classrooms , author=. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications , year=

  24. [24]

    Proceedings of the 9th Workshop on NLP for Computer Assisted Language Learning , pages=

    The teacher-student chatroom corpus , author=. Proceedings of the 9th Workshop on NLP for Computer Assisted Language Learning , pages=

  25. [25]

    M ath D ial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems

    Macina, Jakub and Daheim, Nico and Chowdhury, Sankalan and Sinha, Tanmay and Kapur, Manu and Gurevych, Iryna and Sachan, Mrinmaya. M ath D ial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.372

  26. [26]

    CIMA : A Large Open Access Dialogue Dataset for Tutoring

    Stasaski, Katherine and Kao, Kimberly and Hearst, Marti A. CIMA : A Large Open Access Dialogue Dataset for Tutoring. Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications. 2020. doi:10.18653/v1/2020.bea-1.5

  27. [27]

    2025 , month =

    Claude 4 System Card: Claude Opus 4 & Claude Sonnet 4 , author =. 2025 , month =

  28. [28]

    2024 , eprint=

    GPT-4o System Card , author=. 2024 , eprint=

  29. [29]

    2025 , eprint=

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author=. 2025 , eprint=

  30. [30]

    Educational Practices Series; 5 , year=

    Tutoring , author=. Educational Practices Series; 5 , year=

  31. [31]

    Handbook of research on educational communications and technology , pages=

    Multimedia instruction , author=. Handbook of research on educational communications and technology , pages=. 2013 , publisher=

  32. [32]

    , author=

    A meta-analysis of the efficacy of teaching mathematics with concrete manipulatives. , author=. Journal of educational psychology , volume=. 2013 , publisher=

  33. [33]

    Psychological science , volume=

    From action to abstraction: Using the hands to learn math , author=. Psychological science , volume=. 2014 , publisher=

  34. [34]

    Behavior Research Methods, Instruments, & Computers , volume=

    AutoTutor: A tutor with dialogue in natural language , author=. Behavior Research Methods, Instruments, & Computers , volume=. 2004 , publisher=

  35. [35]

    Educational researcher , volume=

    The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring , author=. Educational researcher , volume=. 1984 , publisher=

  36. [36]

    InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

    Internvl3. 5: Advancing open-source multimodal models in versatility, reasoning, and efficiency , author=. arXiv preprint arXiv:2508.18265 , year=

  37. [37]

    2025 , eprint=

    Qwen3-VL Technical Report , author=. 2025 , eprint=

  38. [38]

    Proceedings of the ACL-08: HLT Student Research Workshop , pages=

    The role of positive feedback in intelligent tutoring systems , author=. Proceedings of the ACL-08: HLT Student Research Workshop , pages=

  39. [39]

    Edward J Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=. Lo. 2022 , url=

  40. [40]

    nature , volume=

    Mastering the game of go without human knowledge , author=. nature , volume=. 2017 , publisher=

  41. [41]

    Learning and Motivation , volume=

    Positive feedback enhances motivation and skill learning in adolescents , author=. Learning and Motivation , volume=. 2024 , publisher=

  42. [42]

    G eo QA : A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

    Chen, Jiaqi and Tang, Jianheng and Qin, Jinghui and Liang, Xiaodan and Liu, Lingbo and Xing, Eric and Lin, Liang. G eo QA : A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.46

  43. [43]

    G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model , url =

    Gao, Jiahui and Pi, Renjie and Zhang, Jipeng and Ye, Jiacheng and Zhong, Wanjun and Wang, Yufei and HONG, Lanqing and Han, Jianhua and Xu, Hang and Li, Zhenguo and Kong, Lingpeng , booktitle =. G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model , url =

  44. [44]

    arXiv preprint arXiv:2504.12597 , year=

    Geosense: Evaluating identification and application of geometric principles in multimodal reasoning , author=. arXiv preprint arXiv:2504.12597 , year=

  45. [45]

    U ni G eo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression

    Chen, Jiaqi and Li, Tong and Qin, Jinghui and Lu, Pan and Lin, Liang and Chen, Chongyu and Liang, Xiaodan. U ni G eo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.218

  46. [46]

    Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , articleno =

    Zhang, Ming-Liang and Yin, Fei and Liu, Cheng-Lin , title =. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , articleno =. 2023 , isbn =. doi:10.24963/ijcai.2023/376 , abstract =

  47. [47]

    Educational Psychology Review , volume=

    The impact of visual displays on learning across the disciplines: A systematic review , author=. Educational Psychology Review , volume=. 2020 , publisher=

  48. [48]

    STEM education in the junior secondary: The state of play , pages=

    The importance of diagrams, graphics and other visual representations in STEM teaching , author=. STEM education in the junior secondary: The state of play , pages=. 2017 , publisher=

  49. [49]

    DeepSeek-OCR: Contexts Optical Compression

    DeepSeek-OCR: Contexts Optical Compression , author=. arXiv preprint arXiv:2510.18234 , year=

  50. [50]

    2025 , note =

    Google , title =. 2025 , note =

  51. [51]

    Solving Geometry Problems: Combining Text and Diagram Interpretation

    Seo, Minjoon and Hajishirzi, Hannaneh and Farhadi, Ali and Etzioni, Oren and Malcolm, Clint. Solving Geometry Problems: Combining Text and Diagram Interpretation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015. doi:10.18653/v1/D15-1171

  52. [52]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Scaling Inference Time Compute for Diffusion Models , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  53. [53]

    , author=

    Developing Mathematical Problem-Solving Skills in Primary School by Using Visual Representations on Heuristics. , author=. LUMAT: International Journal on Math, Science and Technology Education , volume=. 2022 , publisher=

  54. [54]

    International journal of Stem education , volume=

    The role of visual representations in scientific practices: from conceptual understanding and knowledge generation to ‘seeing’how science works , author=. International journal of Stem education , volume=. 2015 , publisher=

  55. [55]

    Educational studies in mathematics , volume=

    The role of visual representations in the learning of mathematics , author=. Educational studies in mathematics , volume=. 2003 , publisher=

  56. [56]

    Applied Cognitive Psychology , volume=

    Who benefits from diagrams and illustrations in math problems? Ability and attitudes matter , author=. Applied Cognitive Psychology , volume=. 2018 , publisher=

  57. [57]

    Mayer , abstract =

    Richard E. Mayer , abstract =. Multimedia learning , series =. 2002 , issn =. doi:https://doi.org/10.1016/S0079-7421(02)80005-6 , url =

  58. [58]

    2025 , publisher=

    Eyes on math: A visual approach to teaching math concepts , author=. 2025 , publisher=

  59. [59]

    , author=

    It's Not a Math Lesson--We're Learning to Draw! Teachers' Use of Visual Representations in Instructing Word Problem Solving in Sixth Grade of Elementary School. , author=. Frontline Learning Research , volume=. 2016 , publisher=

  60. [60]

    Proceedings of the British Society for research into Learning Mathematics , volume=

    Diagrams in the teaching and learning of geometry: some results and ideas for future research , author=. Proceedings of the British Society for research into Learning Mathematics , volume=

  61. [61]

    Strohmaier and Stanislaw Schukajlow , keywords =

    Johanna Schoenherr and Anselm R. Strohmaier and Stanislaw Schukajlow , keywords =. Learning with visualizations helps: A meta-analysis of visualization interventions in mathematics education , journal =. 2024 , issn =. doi:https://doi.org/10.1016/j.edurev.2024.100639 , url =

  62. [62]

    B ook2 D ial: Generating Teacher Student Interactions from Textbooks for Cost-Effective Development of Educational Chatbots

    Wang, Junling and Macina, Jakub and Daheim, Nico and Pal Chowdhury, Sankalan and Sachan, Mrinmaya. B ook2 D ial: Generating Teacher Student Interactions from Textbooks for Cost-Effective Development of Educational Chatbots. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.578

  63. [63]

    Soviet physics-doklady , volume=

    Binary coors capable or ‘correcting deletions, insertions, and reversals , author=. Soviet physics-doklady , volume=

  64. [64]

    and Kolter, J

    Aithal, Sumukh K and Maini, Pratyush and Lipton, Zachary C. and Kolter, J. Zico , title =. Proceedings of the 38th International Conference on Neural Information Processing Systems , articleno =. 2024 , isbn =

  65. [65]

    Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models

    Wang, Junling and Rutkiewicz, Anna and Wang, April and Sachan, Mrinmaya. Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.586

  66. [66]

    The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts

    Demszky, Dorottya and Hill, Heather. The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts. Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023). 2023. doi:10.18653/v1/2023.bea-1.44

  67. [67]

    Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting

    Pal Chowdhury, Sankalan and Zhang, Terry Jingchen and Rooein, Donya and Hovy, Dirk and K. Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting. Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025). 2025. doi:10.18653/v1/2025.bea-1.28

  68. [68]

    arXiv preprint arXiv:2505.04736 , year=

    The Promise and Limits of LLMs in Constructing Proofs and Hints for Logic Problems in Intelligent Tutoring Systems , author=. arXiv preprint arXiv:2505.04736 , year=

  69. [69]

    Aho and Jeffrey D

    Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

  70. [70]

    Publications Manual , year = "1983", publisher =

  71. [71]

    Chandra and Dexter C

    Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

  72. [72]

    Scalable training of

    Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

  73. [73]

    Dan Gusfield , title =. 1997

  74. [74]

    Tetreault , title =

    Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

  75. [75]

    A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

    Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =