GeoDial: A Multimodal Conversational Tutoring Dataset for Geometry Problem-Solving with Visual Tutor Turns

April Yi Wang; Donya Rooein; Junling Wang; Mrinmaya Sachan; Sankalan Pal Chowdhury

arxiv: 2606.12419 · v1 · pith:3CGBT3TPnew · submitted 2026-05-08 · 💻 cs.CY · cs.AI

GeoDial: A Multimodal Conversational Tutoring Dataset for Geometry Problem-Solving with Visual Tutor Turns

Sankalan Pal Chowdhury , Junling Wang , Donya Rooein , April Yi Wang , Mrinmaya Sachan This is my paper

Pith reviewed 2026-06-30 23:17 UTC · model grok-4.3

classification 💻 cs.CY cs.AI

keywords multimodal tutoring datasetgeometry problem-solvingvisual groundingdiagram highlightsdialog actsvision-language modelseducational AIconversational tutoring

0 comments

The pith

GeoDial supplies 1.3K geometry tutoring dialogs where each teacher turn is paired with explicit diagram highlights.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GeoDial, a multimodal dataset of over 1,300 teacher-student conversations in geometry collected from experienced instructors, with every instructional utterance tied to specific diagram highlights. Most prior tutoring datasets contain only text, which prevents models from learning the visual pointing and highlighting that human teachers use. The authors supply an annotation protocol that records dialog acts, highlight regions, and feedback in one pass. Fine-tuning vision-language models on the data raises the quality of generated tutoring language yet leaves diagram-highlight accuracy low, showing that current methods still separate language from visual reasoning.

Core claim

GeoDial is a dataset of more than 1,300 teacher-student dialogs in geometry in which instructional turns are grounded in diagram highlights; a scalable annotation protocol records dialog acts, visual highlights, and feedback together, and supervised fine-tuning of vision-language models improves generated utterances but not the accuracy of the highlights.

What carries the argument

The annotation protocol that jointly labels dialog acts, diagram highlight regions, and feedback to supervise both language and visual tutoring actions.

If this is right

Supervised fine-tuning on GeoDial raises the quality of generated tutoring utterances.
The same fine-tuned models still produce inaccurate diagram highlights.
Current vision-language methods do not yet integrate visual reasoning with pedagogical interaction at the level needed for tutoring.
New techniques that couple visual grounding more tightly with dialog generation are required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dataset could support tutors that generate live diagram annotations during explanations rather than after-the-fact text.
The same annotation style could be applied to tutoring in other diagram-rich subjects such as mechanics or organic chemistry.
Separate pre-training on visual grounding tasks before dialog fine-tuning might close the observed highlight gap.
Controlled classroom trials could test whether students using models trained on GeoDial solve geometry problems faster than those using text-only tutors.

Load-bearing premise

Dialogs collected from experienced math teachers with this annotation protocol capture effective visual tutoring strategies that transfer to training AI tutors.

What would settle it

If vision-language models fine-tuned on GeoDial produce no measurable gain in highlight accuracy or student understanding on a new set of geometry problems compared with text-only baselines, the dataset's claimed training value would be refuted.

Figures

Figures reproduced from arXiv: 2606.12419 by April Yi Wang, Donya Rooein, Junling Wang, Mrinmaya Sachan, Sankalan Pal Chowdhury.

**Figure 1.** Figure 1: Flowchart showing our setup to collect dialogs. Surrounding infoboxes give examples of the corresponding step in the flowchart with the same color. Our experiments show that standard training on GeoDial improves pedagogical tutor-turn generation, but does not produce teacher-like diagram highlights: models learn to abstain more often, yet still struggle to select the correct visual elements. Further ana… view at source ↗

**Figure 2.** Figure 2: Tutor strategies (acts and subacts) in GeoDial. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Dialog Act Statistics. All 3 subacts of Generic sit among the top-5 subacts with 11.3%, 11.2% and 9.2% respectively for Farewell, Continue, and Introduce. Introduce and Farewell account for 78% of first utterances and 90% of final utterances respectively. Rounding up the top-5 subacts, we have Calculate and GetRelation at 9.9% and 7.5%. A full pie chart of subacts is presented in [PITH_FULL_IMAGE:figu… view at source ↗

**Figure 14.** Figure 14: 4.2.2 Annotator Interviews We interviewed 5 of our 11 main contributors to get a high level view of how realistic the AI students and their confusions felt, how it compared to their real life students, and most importantly, if the student fidelity was good enough such that the collected conversations would reflect their real teaching strategies. All of these teachers had at least three years of teaching e… view at source ↗

**Figure 4.** Figure 4: Example of a line highlight on a diagram. The Tutor is possibly trying to get the student to [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: In the left image, the length marker, which is not part of the diagram is highlighted. On the [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Example of the temporary numeric overlay on a diagram used for node-label matching. [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Example of automatic line highlighting. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Example of automatic angle highlighting. [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Example of automatic label highlighting. [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

**Figure 10.** Figure 10: Example of automatic arc highlighting. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

**Figure 11.** Figure 11: Histogram showing the distribution of conversation lengths in GeoDial. Only teacher [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

**Figure 12.** Figure 12: Histogram showing distribution of number of highlighted diagrams per conversation. All [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗

**Figure 13.** Figure 13: Percentage of each subact in GeoDial. Subacts belonging to the same act use similar [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗

**Figure 14.** Figure 14: Full distribution of answers to the debrief questions [PITH_FULL_IMAGE:figures/full_fig_p023_14.png] view at source ↗

**Figure 15.** Figure 15: Our annotation interface. Highlighted elements indicate 1. The diagram which can be [PITH_FULL_IMAGE:figures/full_fig_p029_15.png] view at source ↗

**Figure 16.** Figure 16: Quiz questions for filtering out annotators who did not do the onboarding properly. Selected [PITH_FULL_IMAGE:figures/full_fig_p030_16.png] view at source ↗

read the original abstract

Several educational domains rely heavily on diagrams and visual cues, yet most existing tutoring datasets are limited to text-only interactions. This limits the development of AI tutors that can teach in visually grounded ways used by human instructors. Thus, we introduce GeoDial, a multimodal tutoring dataset of over 1.3K teacher-student dialogs in the domain of geometry collected from experienced math teachers, where instructional turns are explicitly grounded in diagram highlights. We propose a scalable annotation protocol that integrates dialog acts, visual highlighting, and feedback, enabling fine-grained supervision of both language and visual tutoring behavior. To illustrate the challenges posed by this setting, we fine-tune several vision-language models on GeoDial and evaluate their ability to generate tutoring utterances and diagram highlights. While supervised fine-tuning substantially improves the quality of generated dialog, it struggles to produce accurate diagram highlights, revealing a key limitation of current methods and highlighting the need for approaches that more effectively integrate visual reasoning with pedagogical interaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GeoDial is a straightforward dataset release for multimodal geometry tutoring with diagram highlights; the experiments flag a VLM weakness on visuals but stay illustrative.

read the letter

GeoDial is a dataset release for multimodal geometry tutoring dialogs that include diagram highlights, and the key finding from their experiments is that fine-tuning VLMs improves the language but not the visual grounding part.

The paper collects 1.3K dialogs from experienced math teachers and proposes an annotation protocol that ties together dialog acts, visual highlights on diagrams, and feedback labels. This is new relative to text-only tutoring datasets. They show some supervised fine-tuning results where the models get better at generating utterances but still fail at accurate highlights. That points to a genuine challenge in integrating visual reasoning with tutoring.

What the paper does well is identifying a clear gap in existing data and providing a concrete way to annotate visual tutoring behavior. Geometry is a solid choice because it relies on diagrams, and grounding the turns in highlights makes the data more useful for training AI tutors that can point to things.

The soft spots are in the evaluation details. The abstract mentions struggles with highlights but doesn't give numbers on inter-annotator agreement or dataset statistics beyond the total count. If the full paper has those, it would strengthen the case; otherwise the representativeness of the collected dialogs as effective tutoring remains an assumption. The experiments seem more like a proof of concept than a thorough benchmark, so the limitation claim needs the actual results to land solidly.

This paper is for people building multimodal educational AI or working on dataset creation in tutoring. A reader interested in visual language models for education would get value from the data and the annotation approach. It deserves a serious referee because the contribution is the dataset and protocol, which can be evaluated on their own terms even if the modeling experiments are light.

I'd recommend sending it out for review.

Referee Report

0 major / 1 minor

Summary. The paper introduces GeoDial, a multimodal tutoring dataset of over 1.3K teacher-student dialogs in geometry collected from experienced math teachers, with instructional turns explicitly grounded in diagram highlights. It proposes a scalable annotation protocol integrating dialog acts, visual highlighting, and feedback. Experiments fine-tune several vision-language models on the dataset and report that supervised fine-tuning substantially improves generated dialog quality but struggles to produce accurate diagram highlights.

Significance. If the collected dialogs and annotation protocol prove reliable, the dataset could meaningfully advance research on visually grounded AI tutors by addressing the gap in text-only educational datasets and providing explicit supervision for both language and visual actions. The release of such grounded multimodal data is a clear strength for the field.

minor comments (1)

[Abstract] Abstract: no quantitative details on inter-annotator agreement, dataset statistics beyond the total count, evaluation metrics, or baseline comparisons are provided, which would allow readers to assess the robustness of the VLM limitation claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of GeoDial, the accurate summary of our contributions, and the recommendation for minor revision. We are pleased that the potential impact of releasing this visually grounded tutoring dataset is recognized.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces a multimodal dataset (GeoDial) collected from teachers, proposes an annotation protocol integrating dialog acts/visual highlights/feedback, and reports illustrative SFT experiments on vision-language models. No mathematical derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described content. The central claims rest on the external collection process and model evaluations rather than any internal reduction to inputs by construction. This is a standard dataset paper whose contribution is self-contained and falsifiable via the released data and replication of the reported metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central contribution rests on the representativeness of teacher-collected dialogs and the fidelity of the annotation protocol; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Dialogs collected from experienced math teachers constitute high-quality examples of visually grounded tutoring suitable for training AI systems.
The dataset construction explicitly relies on this source of data without further validation described in the abstract.

pith-pipeline@v0.9.1-grok · 5711 in / 1270 out tokens · 30474 ms · 2026-06-30T23:17:06.222854+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

75 extracted references · 24 canonical work pages · 2 internal anchors

[1]

1969 , publisher=

Audiovisual methods in teaching , author=. 1969 , publisher=

1969
[2]

Monographs on statistics and applied probability , volume=

An introduction to the bootstrap , author=. Monographs on statistics and applied probability , volume=
[3]

Statistical Significance Tests for Machine Translation Evaluation

Koehn, Philipp. Statistical Significance Tests for Machine Translation Evaluation. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004

2004
[4]

I nter- GPS : Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

Lu, Pan and Gong, Ran and Jiang, Shibiao and Qiu, Liang and Huang, Siyuan and Liang, Xiaodan and Zhu, Song-Chun. I nter- GPS : Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Lan...

work page doi:10.18653/v1/2021.acl-long.528 2021
[5]

2020 , eprint=

Scaling Laws for Neural Language Models , author=. 2020 , eprint=

2020
[6]

2025 , eprint=

LearnLM: Improving Gemini for Learning , author=. 2025 , eprint=

2025
[7]

BLEU : a method for automatic evaluation of machine translation

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , title =. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , pages =. 2002 , publisher =. doi:10.3115/1073083.1073135 , abstract =

work page doi:10.3115/1073083.1073135 2002
[8]

Learning and instruction , volume=

Teacher emotions are linked with teaching quality: Cross-sectional and longitudinal evidence from two field studies , author=. Learning and instruction , volume=. 2023 , publisher=

2023
[9]

Memory , volume=

How much is remembered as a function of presentation modality? , author=. Memory , volume=. 2019 , publisher=

2019
[10]

History of Education Quarterly , volume=

An officer and a scholar: Nineteenth-century West Point and the invention of the blackboard , author=. History of Education Quarterly , volume=. 2015 , publisher=

2015
[11]

Psychonomic bulletin & review , volume=

Cognitive tutor: Applied research in mathematics education , author=. Psychonomic bulletin & review , volume=. 2007 , publisher=

2007
[12]

, author=

Explanation feedback is better than correct answer feedback for promoting transfer of learning. , author=. Journal of Educational Psychology , volume=. 2013 , publisher=

2013
[13]

Reiser , title =

Brian J. Reiser , title =. Journal of the Learning Sciences , volume =. 2004 , publisher =. doi:10.1207/s15327809jls1303\_2 , URL =

work page doi:10.1207/s15327809jls1303 2004
[14]

and Sumner, Tamara

Suresh, Abhijit and Jacobs, Jennifer and Harty, Charis and Perkoff, Margaret and Martin, James H. and Sumner, Tamara. The T alk M oves Dataset: K-12 Mathematics Lesson Transcripts Annotated for Teacher and Student Discursive Moves. Proceedings of the Thirteenth Language Resources and Evaluation Conference. 2022

2022
[15]

and Aleven, Vincent and Heffernan, Neil and McLaren, Bruce and Hockenberry, Matthew , editor=

Koedinger, Kenneth R. and Aleven, Vincent and Heffernan, Neil and McLaren, Bruce and Hockenberry, Matthew , editor=. Opening the Door to Non-programmers:. Intelligent Tutoring Systems , year=
[16]

Can LLM s Effectively Simulate Human Learners? Teachers' Insights from Tutoring LLM Students

Martynova, Daria and Macina, Jakub and Daheim, Nico and Yalcin, Nilay and Zhang, Xiaoyu and Sachan, Mrinmaya. Can LLM s Effectively Simulate Human Learners? Teachers' Insights from Tutoring LLM Students. Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025). 2025. doi:10.18653/v1/2025.bea-1.8

work page doi:10.18653/v1/2025.bea-1.8 2025
[17]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Swift: a scalable lightweight infrastructure for fine-tuning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[18]

and Demszky, Dorottya and Koedinger, Kenneth R

Thomas, Danielle R. and Demszky, Dorottya and Koedinger, Kenneth R. and Marland, Joshua and Pietrzak, Doug and Reich, Justin and Slama, Rachel and Toutziaridi, Amalia and Kizilcec, Ren\'. Advancing the Science of Teaching with Tutoring Data: A Collaborative Workshop with the National Tutoring Observatory , year =. Proceedings of the Twelfth ACM Conference...

work page doi:10.1145/3698205.3733961
[19]

Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes

Wang, Rose and Zhang, Qingyang and Robinson, Carly and Loeb, Susanna and Demszky, Dorottya. Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long P...

work page doi:10.18653/v1/2024.naacl-long.120 2024
[20]

2020 , eprint=

BERTScore: Evaluating Text Generation with BERT , author=. 2020 , eprint=

2020
[21]

2024 , eprint=

The Llama 3 Herd of Models , author=. 2024 , eprint=

2024
[22]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Diagram Understanding in Geometry Questions , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2014 , month=. doi:10.1609/aaai.v28i1.9146 , abstractNote=

work page doi:10.1609/aaai.v28i1.9146 2014
[23]

Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications , year=

Fine-tuning transformers with additional context to classify discursive moves in mathematics classrooms , author=. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications , year=
[24]

Proceedings of the 9th Workshop on NLP for Computer Assisted Language Learning , pages=

The teacher-student chatroom corpus , author=. Proceedings of the 9th Workshop on NLP for Computer Assisted Language Learning , pages=
[25]

M ath D ial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems

Macina, Jakub and Daheim, Nico and Chowdhury, Sankalan and Sinha, Tanmay and Kapur, Manu and Gurevych, Iryna and Sachan, Mrinmaya. M ath D ial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.372

work page doi:10.18653/v1/2023.findings-emnlp.372 2023
[26]

CIMA : A Large Open Access Dialogue Dataset for Tutoring

Stasaski, Katherine and Kao, Kimberly and Hearst, Marti A. CIMA : A Large Open Access Dialogue Dataset for Tutoring. Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications. 2020. doi:10.18653/v1/2020.bea-1.5

work page doi:10.18653/v1/2020.bea-1.5 2020
[27]

2025 , month =

Claude 4 System Card: Claude Opus 4 & Claude Sonnet 4 , author =. 2025 , month =

2025
[28]

2024 , eprint=

GPT-4o System Card , author=. 2024 , eprint=

2024
[29]

2025 , eprint=

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author=. 2025 , eprint=

2025
[30]

Educational Practices Series; 5 , year=

Tutoring , author=. Educational Practices Series; 5 , year=
[31]

Handbook of research on educational communications and technology , pages=

Multimedia instruction , author=. Handbook of research on educational communications and technology , pages=. 2013 , publisher=

2013
[32]

, author=

A meta-analysis of the efficacy of teaching mathematics with concrete manipulatives. , author=. Journal of educational psychology , volume=. 2013 , publisher=

2013
[33]

Psychological science , volume=

From action to abstraction: Using the hands to learn math , author=. Psychological science , volume=. 2014 , publisher=

2014
[34]

Behavior Research Methods, Instruments, & Computers , volume=

AutoTutor: A tutor with dialogue in natural language , author=. Behavior Research Methods, Instruments, & Computers , volume=. 2004 , publisher=

2004
[35]

Educational researcher , volume=

The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring , author=. Educational researcher , volume=. 1984 , publisher=

1984
[36]

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Internvl3. 5: Advancing open-source multimodal models in versatility, reasoning, and efficiency , author=. arXiv preprint arXiv:2508.18265 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[37]

2025 , eprint=

Qwen3-VL Technical Report , author=. 2025 , eprint=

2025
[38]

Proceedings of the ACL-08: HLT Student Research Workshop , pages=

The role of positive feedback in intelligent tutoring systems , author=. Proceedings of the ACL-08: HLT Student Research Workshop , pages=
[39]

Edward J Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=. Lo. 2022 , url=

2022
[40]

nature , volume=

Mastering the game of go without human knowledge , author=. nature , volume=. 2017 , publisher=

2017
[41]

Learning and Motivation , volume=

Positive feedback enhances motivation and skill learning in adolescents , author=. Learning and Motivation , volume=. 2024 , publisher=

2024
[42]

G eo QA : A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

Chen, Jiaqi and Tang, Jianheng and Qin, Jinghui and Liang, Xiaodan and Liu, Lingbo and Xing, Eric and Lin, Liang. G eo QA : A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.46

work page doi:10.18653/v1/2021.findings-acl.46 2021
[43]

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model , url =

Gao, Jiahui and Pi, Renjie and Zhang, Jipeng and Ye, Jiacheng and Zhong, Wanjun and Wang, Yufei and HONG, Lanqing and Han, Jianhua and Xu, Hang and Li, Zhenguo and Kong, Lingpeng , booktitle =. G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model , url =
[44]

arXiv preprint arXiv:2504.12597 , year=

Geosense: Evaluating identification and application of geometric principles in multimodal reasoning , author=. arXiv preprint arXiv:2504.12597 , year=

work page arXiv
[45]

U ni G eo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression

Chen, Jiaqi and Li, Tong and Qin, Jinghui and Lu, Pan and Lin, Liang and Chen, Chongyu and Liang, Xiaodan. U ni G eo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.218

work page doi:10.18653/v1/2022.emnlp-main.218 2022
[46]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , articleno =

Zhang, Ming-Liang and Yin, Fei and Liu, Cheng-Lin , title =. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , articleno =. 2023 , isbn =. doi:10.24963/ijcai.2023/376 , abstract =

work page doi:10.24963/ijcai.2023/376 2023
[47]

Educational Psychology Review , volume=

The impact of visual displays on learning across the disciplines: A systematic review , author=. Educational Psychology Review , volume=. 2020 , publisher=

2020
[48]

STEM education in the junior secondary: The state of play , pages=

The importance of diagrams, graphics and other visual representations in STEM teaching , author=. STEM education in the junior secondary: The state of play , pages=. 2017 , publisher=

2017
[49]

DeepSeek-OCR: Contexts Optical Compression

DeepSeek-OCR: Contexts Optical Compression , author=. arXiv preprint arXiv:2510.18234 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[50]

2025 , note =

Google , title =. 2025 , note =

2025
[51]

Solving Geometry Problems: Combining Text and Diagram Interpretation

Seo, Minjoon and Hajishirzi, Hannaneh and Farhadi, Ali and Etzioni, Oren and Malcolm, Clint. Solving Geometry Problems: Combining Text and Diagram Interpretation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015. doi:10.18653/v1/D15-1171

work page doi:10.18653/v1/d15-1171 2015
[52]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Scaling Inference Time Compute for Diffusion Models , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
[53]

, author=

Developing Mathematical Problem-Solving Skills in Primary School by Using Visual Representations on Heuristics. , author=. LUMAT: International Journal on Math, Science and Technology Education , volume=. 2022 , publisher=

2022
[54]

International journal of Stem education , volume=

The role of visual representations in scientific practices: from conceptual understanding and knowledge generation to ‘seeing’how science works , author=. International journal of Stem education , volume=. 2015 , publisher=

2015
[55]

Educational studies in mathematics , volume=

The role of visual representations in the learning of mathematics , author=. Educational studies in mathematics , volume=. 2003 , publisher=

2003
[56]

Applied Cognitive Psychology , volume=

Who benefits from diagrams and illustrations in math problems? Ability and attitudes matter , author=. Applied Cognitive Psychology , volume=. 2018 , publisher=

2018
[57]

Mayer , abstract =

Richard E. Mayer , abstract =. Multimedia learning , series =. 2002 , issn =. doi:https://doi.org/10.1016/S0079-7421(02)80005-6 , url =

work page doi:10.1016/s0079-7421(02)80005-6 2002
[58]

2025 , publisher=

Eyes on math: A visual approach to teaching math concepts , author=. 2025 , publisher=

2025
[59]

, author=

It's Not a Math Lesson--We're Learning to Draw! Teachers' Use of Visual Representations in Instructing Word Problem Solving in Sixth Grade of Elementary School. , author=. Frontline Learning Research , volume=. 2016 , publisher=

2016
[60]

Proceedings of the British Society for research into Learning Mathematics , volume=

Diagrams in the teaching and learning of geometry: some results and ideas for future research , author=. Proceedings of the British Society for research into Learning Mathematics , volume=
[61]

Strohmaier and Stanislaw Schukajlow , keywords =

Johanna Schoenherr and Anselm R. Strohmaier and Stanislaw Schukajlow , keywords =. Learning with visualizations helps: A meta-analysis of visualization interventions in mathematics education , journal =. 2024 , issn =. doi:https://doi.org/10.1016/j.edurev.2024.100639 , url =

work page doi:10.1016/j.edurev.2024.100639 2024
[62]

B ook2 D ial: Generating Teacher Student Interactions from Textbooks for Cost-Effective Development of Educational Chatbots

Wang, Junling and Macina, Jakub and Daheim, Nico and Pal Chowdhury, Sankalan and Sachan, Mrinmaya. B ook2 D ial: Generating Teacher Student Interactions from Textbooks for Cost-Effective Development of Educational Chatbots. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.578

work page doi:10.18653/v1/2024.findings-acl.578 2024
[63]

Soviet physics-doklady , volume=

Binary coors capable or ‘correcting deletions, insertions, and reversals , author=. Soviet physics-doklady , volume=
[64]

and Kolter, J

Aithal, Sumukh K and Maini, Pratyush and Lipton, Zachary C. and Kolter, J. Zico , title =. Proceedings of the 38th International Conference on Neural Information Processing Systems , articleno =. 2024 , isbn =

2024
[65]

Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models

Wang, Junling and Rutkiewicz, Anna and Wang, April and Sachan, Mrinmaya. Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.586

work page doi:10.18653/v1/2025.findings-acl.586 2025
[66]

The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts

Demszky, Dorottya and Hill, Heather. The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts. Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023). 2023. doi:10.18653/v1/2023.bea-1.44

work page doi:10.18653/v1/2023.bea-1.44 2023
[67]

Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting

Pal Chowdhury, Sankalan and Zhang, Terry Jingchen and Rooein, Donya and Hovy, Dirk and K. Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting. Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025). 2025. doi:10.18653/v1/2025.bea-1.28

work page doi:10.18653/v1/2025.bea-1.28 2025
[68]

arXiv preprint arXiv:2505.04736 , year=

The Promise and Limits of LLMs in Constructing Proofs and Hints for Logic Problems in Intelligent Tutoring Systems , author=. arXiv preprint arXiv:2505.04736 , year=

work page arXiv
[69]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972
[70]

Publications Manual , year = "1983", publisher =

1983
[71]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[72]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
[73]

Dan Gusfield , title =. 1997

1997
[74]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015
[75]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

[1] [1]

1969 , publisher=

Audiovisual methods in teaching , author=. 1969 , publisher=

1969

[2] [2]

Monographs on statistics and applied probability , volume=

An introduction to the bootstrap , author=. Monographs on statistics and applied probability , volume=

[3] [3]

Statistical Significance Tests for Machine Translation Evaluation

Koehn, Philipp. Statistical Significance Tests for Machine Translation Evaluation. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004

2004

[4] [4]

I nter- GPS : Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

Lu, Pan and Gong, Ran and Jiang, Shibiao and Qiu, Liang and Huang, Siyuan and Liang, Xiaodan and Zhu, Song-Chun. I nter- GPS : Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Lan...

work page doi:10.18653/v1/2021.acl-long.528 2021

[5] [5]

2020 , eprint=

Scaling Laws for Neural Language Models , author=. 2020 , eprint=

2020

[6] [6]

2025 , eprint=

LearnLM: Improving Gemini for Learning , author=. 2025 , eprint=

2025

[7] [7]

BLEU : a method for automatic evaluation of machine translation

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , title =. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , pages =. 2002 , publisher =. doi:10.3115/1073083.1073135 , abstract =

work page doi:10.3115/1073083.1073135 2002

[8] [8]

Learning and instruction , volume=

Teacher emotions are linked with teaching quality: Cross-sectional and longitudinal evidence from two field studies , author=. Learning and instruction , volume=. 2023 , publisher=

2023

[9] [9]

Memory , volume=

How much is remembered as a function of presentation modality? , author=. Memory , volume=. 2019 , publisher=

2019

[10] [10]

History of Education Quarterly , volume=

An officer and a scholar: Nineteenth-century West Point and the invention of the blackboard , author=. History of Education Quarterly , volume=. 2015 , publisher=

2015

[11] [11]

Psychonomic bulletin & review , volume=

Cognitive tutor: Applied research in mathematics education , author=. Psychonomic bulletin & review , volume=. 2007 , publisher=

2007

[12] [12]

, author=

Explanation feedback is better than correct answer feedback for promoting transfer of learning. , author=. Journal of Educational Psychology , volume=. 2013 , publisher=

2013

[13] [13]

Reiser , title =

Brian J. Reiser , title =. Journal of the Learning Sciences , volume =. 2004 , publisher =. doi:10.1207/s15327809jls1303\_2 , URL =

work page doi:10.1207/s15327809jls1303 2004

[14] [14]

and Sumner, Tamara

Suresh, Abhijit and Jacobs, Jennifer and Harty, Charis and Perkoff, Margaret and Martin, James H. and Sumner, Tamara. The T alk M oves Dataset: K-12 Mathematics Lesson Transcripts Annotated for Teacher and Student Discursive Moves. Proceedings of the Thirteenth Language Resources and Evaluation Conference. 2022

2022

[15] [15]

and Aleven, Vincent and Heffernan, Neil and McLaren, Bruce and Hockenberry, Matthew , editor=

Koedinger, Kenneth R. and Aleven, Vincent and Heffernan, Neil and McLaren, Bruce and Hockenberry, Matthew , editor=. Opening the Door to Non-programmers:. Intelligent Tutoring Systems , year=

[16] [16]

Can LLM s Effectively Simulate Human Learners? Teachers' Insights from Tutoring LLM Students

Martynova, Daria and Macina, Jakub and Daheim, Nico and Yalcin, Nilay and Zhang, Xiaoyu and Sachan, Mrinmaya. Can LLM s Effectively Simulate Human Learners? Teachers' Insights from Tutoring LLM Students. Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025). 2025. doi:10.18653/v1/2025.bea-1.8

work page doi:10.18653/v1/2025.bea-1.8 2025

[17] [17]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Swift: a scalable lightweight infrastructure for fine-tuning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[18] [18]

and Demszky, Dorottya and Koedinger, Kenneth R

Thomas, Danielle R. and Demszky, Dorottya and Koedinger, Kenneth R. and Marland, Joshua and Pietrzak, Doug and Reich, Justin and Slama, Rachel and Toutziaridi, Amalia and Kizilcec, Ren\'. Advancing the Science of Teaching with Tutoring Data: A Collaborative Workshop with the National Tutoring Observatory , year =. Proceedings of the Twelfth ACM Conference...

work page doi:10.1145/3698205.3733961

[19] [19]

Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes

Wang, Rose and Zhang, Qingyang and Robinson, Carly and Loeb, Susanna and Demszky, Dorottya. Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long P...

work page doi:10.18653/v1/2024.naacl-long.120 2024

[20] [20]

2020 , eprint=

BERTScore: Evaluating Text Generation with BERT , author=. 2020 , eprint=

2020

[21] [21]

2024 , eprint=

The Llama 3 Herd of Models , author=. 2024 , eprint=

2024

[22] [22]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Diagram Understanding in Geometry Questions , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2014 , month=. doi:10.1609/aaai.v28i1.9146 , abstractNote=

work page doi:10.1609/aaai.v28i1.9146 2014

[23] [23]

Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications , year=

Fine-tuning transformers with additional context to classify discursive moves in mathematics classrooms , author=. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications , year=

[24] [24]

Proceedings of the 9th Workshop on NLP for Computer Assisted Language Learning , pages=

The teacher-student chatroom corpus , author=. Proceedings of the 9th Workshop on NLP for Computer Assisted Language Learning , pages=

[25] [25]

M ath D ial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems

Macina, Jakub and Daheim, Nico and Chowdhury, Sankalan and Sinha, Tanmay and Kapur, Manu and Gurevych, Iryna and Sachan, Mrinmaya. M ath D ial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.372

work page doi:10.18653/v1/2023.findings-emnlp.372 2023

[26] [26]

CIMA : A Large Open Access Dialogue Dataset for Tutoring

Stasaski, Katherine and Kao, Kimberly and Hearst, Marti A. CIMA : A Large Open Access Dialogue Dataset for Tutoring. Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications. 2020. doi:10.18653/v1/2020.bea-1.5

work page doi:10.18653/v1/2020.bea-1.5 2020

[27] [27]

2025 , month =

Claude 4 System Card: Claude Opus 4 & Claude Sonnet 4 , author =. 2025 , month =

2025

[28] [28]

2024 , eprint=

GPT-4o System Card , author=. 2024 , eprint=

2024

[29] [29]

2025 , eprint=

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author=. 2025 , eprint=

2025

[30] [30]

Educational Practices Series; 5 , year=

Tutoring , author=. Educational Practices Series; 5 , year=

[31] [31]

Handbook of research on educational communications and technology , pages=

Multimedia instruction , author=. Handbook of research on educational communications and technology , pages=. 2013 , publisher=

2013

[32] [32]

, author=

A meta-analysis of the efficacy of teaching mathematics with concrete manipulatives. , author=. Journal of educational psychology , volume=. 2013 , publisher=

2013

[33] [33]

Psychological science , volume=

From action to abstraction: Using the hands to learn math , author=. Psychological science , volume=. 2014 , publisher=

2014

[34] [34]

Behavior Research Methods, Instruments, & Computers , volume=

AutoTutor: A tutor with dialogue in natural language , author=. Behavior Research Methods, Instruments, & Computers , volume=. 2004 , publisher=

2004

[35] [35]

Educational researcher , volume=

The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring , author=. Educational researcher , volume=. 1984 , publisher=

1984

[36] [36]

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Internvl3. 5: Advancing open-source multimodal models in versatility, reasoning, and efficiency , author=. arXiv preprint arXiv:2508.18265 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[37] [37]

2025 , eprint=

Qwen3-VL Technical Report , author=. 2025 , eprint=

2025

[38] [38]

Proceedings of the ACL-08: HLT Student Research Workshop , pages=

The role of positive feedback in intelligent tutoring systems , author=. Proceedings of the ACL-08: HLT Student Research Workshop , pages=

[39] [39]

Edward J Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=. Lo. 2022 , url=

2022

[40] [40]

nature , volume=

Mastering the game of go without human knowledge , author=. nature , volume=. 2017 , publisher=

2017

[41] [41]

Learning and Motivation , volume=

Positive feedback enhances motivation and skill learning in adolescents , author=. Learning and Motivation , volume=. 2024 , publisher=

2024

[42] [42]

G eo QA : A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

Chen, Jiaqi and Tang, Jianheng and Qin, Jinghui and Liang, Xiaodan and Liu, Lingbo and Xing, Eric and Lin, Liang. G eo QA : A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.46

work page doi:10.18653/v1/2021.findings-acl.46 2021

[43] [43]

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model , url =

Gao, Jiahui and Pi, Renjie and Zhang, Jipeng and Ye, Jiacheng and Zhong, Wanjun and Wang, Yufei and HONG, Lanqing and Han, Jianhua and Xu, Hang and Li, Zhenguo and Kong, Lingpeng , booktitle =. G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model , url =

[44] [44]

arXiv preprint arXiv:2504.12597 , year=

Geosense: Evaluating identification and application of geometric principles in multimodal reasoning , author=. arXiv preprint arXiv:2504.12597 , year=

work page arXiv

[45] [45]

U ni G eo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression

Chen, Jiaqi and Li, Tong and Qin, Jinghui and Lu, Pan and Lin, Liang and Chen, Chongyu and Liang, Xiaodan. U ni G eo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.218

work page doi:10.18653/v1/2022.emnlp-main.218 2022

[46] [46]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , articleno =

Zhang, Ming-Liang and Yin, Fei and Liu, Cheng-Lin , title =. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence , articleno =. 2023 , isbn =. doi:10.24963/ijcai.2023/376 , abstract =

work page doi:10.24963/ijcai.2023/376 2023

[47] [47]

Educational Psychology Review , volume=

The impact of visual displays on learning across the disciplines: A systematic review , author=. Educational Psychology Review , volume=. 2020 , publisher=

2020

[48] [48]

STEM education in the junior secondary: The state of play , pages=

The importance of diagrams, graphics and other visual representations in STEM teaching , author=. STEM education in the junior secondary: The state of play , pages=. 2017 , publisher=

2017

[49] [49]

DeepSeek-OCR: Contexts Optical Compression

DeepSeek-OCR: Contexts Optical Compression , author=. arXiv preprint arXiv:2510.18234 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[50] [50]

2025 , note =

Google , title =. 2025 , note =

2025

[51] [51]

Solving Geometry Problems: Combining Text and Diagram Interpretation

Seo, Minjoon and Hajishirzi, Hannaneh and Farhadi, Ali and Etzioni, Oren and Malcolm, Clint. Solving Geometry Problems: Combining Text and Diagram Interpretation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015. doi:10.18653/v1/D15-1171

work page doi:10.18653/v1/d15-1171 2015

[52] [52]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Scaling Inference Time Compute for Diffusion Models , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

[53] [53]

, author=

Developing Mathematical Problem-Solving Skills in Primary School by Using Visual Representations on Heuristics. , author=. LUMAT: International Journal on Math, Science and Technology Education , volume=. 2022 , publisher=

2022

[54] [54]

International journal of Stem education , volume=

The role of visual representations in scientific practices: from conceptual understanding and knowledge generation to ‘seeing’how science works , author=. International journal of Stem education , volume=. 2015 , publisher=

2015

[55] [55]

Educational studies in mathematics , volume=

The role of visual representations in the learning of mathematics , author=. Educational studies in mathematics , volume=. 2003 , publisher=

2003

[56] [56]

Applied Cognitive Psychology , volume=

Who benefits from diagrams and illustrations in math problems? Ability and attitudes matter , author=. Applied Cognitive Psychology , volume=. 2018 , publisher=

2018

[57] [57]

Mayer , abstract =

Richard E. Mayer , abstract =. Multimedia learning , series =. 2002 , issn =. doi:https://doi.org/10.1016/S0079-7421(02)80005-6 , url =

work page doi:10.1016/s0079-7421(02)80005-6 2002

[58] [58]

2025 , publisher=

Eyes on math: A visual approach to teaching math concepts , author=. 2025 , publisher=

2025

[59] [59]

, author=

It's Not a Math Lesson--We're Learning to Draw! Teachers' Use of Visual Representations in Instructing Word Problem Solving in Sixth Grade of Elementary School. , author=. Frontline Learning Research , volume=. 2016 , publisher=

2016

[60] [60]

Proceedings of the British Society for research into Learning Mathematics , volume=

Diagrams in the teaching and learning of geometry: some results and ideas for future research , author=. Proceedings of the British Society for research into Learning Mathematics , volume=

[61] [61]

Strohmaier and Stanislaw Schukajlow , keywords =

Johanna Schoenherr and Anselm R. Strohmaier and Stanislaw Schukajlow , keywords =. Learning with visualizations helps: A meta-analysis of visualization interventions in mathematics education , journal =. 2024 , issn =. doi:https://doi.org/10.1016/j.edurev.2024.100639 , url =

work page doi:10.1016/j.edurev.2024.100639 2024

[62] [62]

B ook2 D ial: Generating Teacher Student Interactions from Textbooks for Cost-Effective Development of Educational Chatbots

Wang, Junling and Macina, Jakub and Daheim, Nico and Pal Chowdhury, Sankalan and Sachan, Mrinmaya. B ook2 D ial: Generating Teacher Student Interactions from Textbooks for Cost-Effective Development of Educational Chatbots. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.578

work page doi:10.18653/v1/2024.findings-acl.578 2024

[63] [63]

Soviet physics-doklady , volume=

Binary coors capable or ‘correcting deletions, insertions, and reversals , author=. Soviet physics-doklady , volume=

[64] [64]

and Kolter, J

Aithal, Sumukh K and Maini, Pratyush and Lipton, Zachary C. and Kolter, J. Zico , title =. Proceedings of the 38th International Conference on Neural Information Processing Systems , articleno =. 2024 , isbn =

2024

[65] [65]

Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models

Wang, Junling and Rutkiewicz, Anna and Wang, April and Sachan, Mrinmaya. Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.586

work page doi:10.18653/v1/2025.findings-acl.586 2025

[66] [66]

The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts

Demszky, Dorottya and Hill, Heather. The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts. Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023). 2023. doi:10.18653/v1/2023.bea-1.44

work page doi:10.18653/v1/2023.bea-1.44 2023

[67] [67]

Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting

Pal Chowdhury, Sankalan and Zhang, Terry Jingchen and Rooein, Donya and Hovy, Dirk and K. Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting. Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025). 2025. doi:10.18653/v1/2025.bea-1.28

work page doi:10.18653/v1/2025.bea-1.28 2025

[68] [68]

arXiv preprint arXiv:2505.04736 , year=

The Promise and Limits of LLMs in Constructing Proofs and Hints for Logic Problems in Intelligent Tutoring Systems , author=. arXiv preprint arXiv:2505.04736 , year=

work page arXiv

[69] [69]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972

[70] [70]

Publications Manual , year = "1983", publisher =

1983

[71] [71]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981

[72] [72]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

[73] [73]

Dan Gusfield , title =. 1997

1997

[74] [74]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015

[75] [75]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =