Million Tutoring Moves (MTM): An Open Multimodal Dataset for the Science of Tutoring

Ren\'e Kizilcec , Kirk Vanacore , Zhuqian Zhou , Doug Pietrzak , Jorge Dias , Haocheng Zhang , Bakhtawar Ahtisham , Joshua Marland

show 3 more authors

Rachel Slama Justin Reich Kenneth Koedinger

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:04 UTC · model grok-4.3

classification 💻 cs.CY

keywords tutoring datasetopen dataeducational interactionsmath tutoringAI in educationinstructional processesmultimodal repository

0 comments

The pith

MTM v1 releases 4,654 math tutoring transcripts to make authentic interactions observable at scale.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Million Tutoring Moves project to build an open dataset of tutoring interactions that researchers can reuse. MTM v1 supplies 4,654 transcripts drawn from real math tutoring sessions on a U.S. nonprofit platform. The release is presented as the first concrete step toward a larger repository that remains safe, open, and multimodal. A sympathetic reader would care because the data could let researchers examine instructional moves directly, refine tutoring methods, and train AI systems on genuine exchanges rather than simulated ones.

Core claim

MTM v1 consists of 4,654 math tutoring transcripts collected from a U.S.-based nonprofit online platform and is released as the initial component of the National Tutoring Observatory to advance the science of tutoring through large-scale, reusable interaction data.

What carries the argument

The MTM v1 dataset of tutoring transcripts, which renders previously private instructional exchanges systematically observable and analyzable.

If this is right

Researchers gain a reusable resource for studying instructional processes at scale.
Tutoring organizations can examine concrete session data to refine their methods.
Developers of AI educational tools obtain real interaction examples to ground model training.
Findings from the data can be translated into practical recommendations for educators.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Expansion to video or audio would allow study of timing and nonverbal elements in tutoring exchanges.
Linkage with other open education datasets could enable cross-context comparisons of tutoring styles.
Long-term growth of the repository might support longitudinal tracking of tutoring effectiveness over multiple sessions.

Load-bearing premise

The collected transcripts accurately represent authentic tutoring interactions and their open release will yield actionable research insights without significant privacy or bias problems.

What would settle it

A controlled comparison showing that patterns derived from the released transcripts fail to predict tutoring outcomes observed in independent, non-released sessions would falsify the utility claim.

Figures

Figures reproduced from arXiv: 2605.08092 by Bakhtawar Ahtisham, Doug Pietrzak, Haocheng Zhang, Jorge Dias, Joshua Marland, Justin Reich, Kenneth Koedinger, Kirk Vanacore, Rachel Slama, Ren\'e Kizilcec, Zhuqian Zhou.

read the original abstract

We introduce the Million Tutoring Moves (MTM) project, an open dataset initiative aimed at advancing the science of tutoring through large-scale, reusable, and multimodal interaction data. MTM is developed within the National Tutoring Observatory (NTO), a research infrastructure designed to study authentic tutoring interactions and translate them into actionable insights for research, practice, and AI-powered educational technology development. In this paper, we present the vision behind MTM and describe MTM v1, an initial release consisting of 4,654 math tutoring transcripts from a U.S.-based nonprofit online tutoring platform. MTM v1 serves as a first step toward a broader repository that is safe, open, large-scale, broad-coverage, and multimodal. By making tutoring interactions systematically observable and analyzable, MTM aims to support research on instructional processes, improve tutoring practice, and enable the development of AI systems grounded in real educational interactions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Million Tutoring Moves (MTM) project under the National Tutoring Observatory and describes its initial release, MTM v1, consisting of 4,654 math tutoring transcripts collected from a U.S.-based nonprofit online tutoring platform. It frames MTM v1 as a first step toward a broader repository that is safe, open, large-scale, broad-coverage, and multimodal, with the goal of enabling research on instructional processes, improving tutoring practice, and supporting AI development in education.

Significance. If the data collection and documentation issues are addressed, MTM v1 could provide a valuable open resource for the science of tutoring, where large-scale authentic interaction data remains scarce. The open release supports reproducibility and community-driven analysis, which aligns with the paper's stated aims for actionable insights in research and educational technology.

major comments (2)

[MTM v1 description] The description of MTM v1 states that the transcripts come from a U.S. nonprofit online platform and positions the release as 'safe' and representative of 'authentic tutoring interactions,' but supplies no information on IRB approval, participant consent, de-identification pipeline, quality checks, or the sampling method that yielded the 4,654 sessions from the full platform population. These details are load-bearing for the central claim that MTM v1 is a reliable first step toward a safe repository.
[Abstract and vision] The abstract and vision section claim the dataset supports broad-coverage and multimodal research, yet MTM v1 is described only as text transcripts; no details are given on whether audio, video, or other modalities are included in v1 or how they will be added in future releases.

minor comments (1)

[Abstract] The abstract refers to 'multimodal interaction data' while MTM v1 is limited to transcripts; clarify this distinction to avoid reader confusion.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset release paper with no mathematical derivations, fitted parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5501 in / 958 out tokens · 52754 ms · 2026-05-13T18:04:14.263707+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 2 internal anchors

[1]

David Carrell, Bradley Malin, John Aberdeen, Samuel Bayer, Cheryl Clark, Ben Wellner, and Lynette Hirschman. Hiding in plain sight: use of realistic surrogates to reduce exposure of pro- tected health information in clinical text.Journal of the American Medical Informatics Association, 20(2):342–348, 2013

work page 2013
[2]

Children’s online privacy protection act (coppa) guidance.https://www.ftc.gov/business-guidance/resources/ complying-coppa-frequently-asked-questions, 2013

Federal Trade Commission. Children’s online privacy protection act (coppa) guidance.https://www.ftc.gov/business-guidance/resources/ complying-coppa-frequently-asked-questions, 2013. Accessed: 2026-02-10

work page 2013
[3]

Jens Dietrichson, Trine Filges, Julie K Seerup, Rasmus H Klokker, Bjørn CA Viinholt, Martin Bøg, and Misja Eiberg. Targeted school-based interventions for improving reading and mathematics for students with or at risk of academic difficulties in grades k-6: A systematic review.Campbell Systematic Reviews, 17(2):e1152, 2021

work page 2021
[4]

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. Deberta: Decoding-enhanced bert with disentangled attention.arXiv preprint arXiv:2006.03654, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006
[5]

Sandpiper: Orchestrated AI-Annotation for Educational Discourse at Scale

Daryl Hedley, Doug Pietrzak, Jorge Dias, Ian Burden, Bakhtawar Ahtisham, Zhuqian Zhou, Kirk Vanacore, Josh Marland, Rachel Slama, Justin Reich, et al. Sandpiper: Orchestrated ai-annotation for educational discourse at scale.arXiv preprint arXiv:2603.08406, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[6]

Mathdial: A dialogue tutoring dataset with rich pedagogical properties grounded in math reasoning problems

Jakub Macina, Nico Daheim, Sankalan Chowdhury, Tanmay Sinha, Manu Kapur, Iryna Gurevych, and Mrinmaya Sachan. Mathdial: A dialogue tutoring dataset with rich pedagogical properties grounded in math reasoning problems. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 5602–5621, 2023

work page 2023
[7]

Opportunities and challenges in neural dialog tutoring

Jakub Macina, Nico Daheim, Lingzhi Wang, Tanmay Sinha, Manu Kapur, Iryna Gurevych, and Mrinmaya Sachan. Opportunities and challenges in neural dialog tutoring. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2357–2372, 2023

work page 2023
[8]

The promise of tutoring for prek–12 learning: A systematic review and meta-analysis of the experimental evidence.American Educational Research Journal, 61(1):74–107, 2024

Andre Nickow, Philip Oreopoulos, and Vincent Quan. The promise of tutoring for prek–12 learning: A systematic review and meta-analysis of the experimental evidence.American Educational Research Journal, 61(1):74–107, 2024

work page 2024
[9]

Department of Education

U.S. Department of Education. Family educational rights and privacy act.https://www.ecfr. gov/current/title-34/part-99, 2011

work page 2011
[10]

High-impact tutoring: State of the research and priorities for future learning.National Student Support Accelerator, 21(284):1–53, 2021

Carly D Robinson and Susanna Loeb. High-impact tutoring: State of the research and priorities for future learning.National Student Support Accelerator, 21(284):1–53, 2021

work page 2021
[11]

Cima: A large open access dialogue dataset for tutoring

Katherine Stasaski, Kimberly Kao, and Marti A Hearst. Cima: A large open access dialogue dataset for tutoring. InProceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 52–64, 2020

work page 2020
[12]

The talkmoves dataset: K-12 mathematics lesson transcripts annotated for teacher and student discursive moves

Abhijit Suresh, Jennifer Jacobs, Charis Harty, Margaret Perkoff, James H Martin, and Tamara Sumner. The talkmoves dataset: K-12 mathematics lesson transcripts annotated for teacher and student discursive moves. InProceedings of the thirteenth language resources and evaluation confer- ence, pages 4654–4662, 2022

work page 2022
[13]

Department of Health and Human Services, Office for Civil Rights

U.S. Department of Health and Human Services, Office for Civil Rights. Guidance regarding methods for de-identification of protected health information in accordance with the HIPAA pri- vacy rule.https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/understanding/ coveredentities/De-identification/hhs_deid_guidance.pdf, 2012. 5

work page 2012
[14]

The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems.Educational psychologist, 46(4):197–221, 2011

Kurt VanLehn. The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems.Educational psychologist, 46(4):197–221, 2011

work page 2011
[15]

Strategize before teaching: A conversational tutoring system with pedagogy self-distillation

Lingzhi Wang, Mrinmaya Sachan, Xingshan Zeng, and Kam-Fai Wong. Strategize before teaching: A conversational tutoring system with pedagogy self-distillation. InFindings of the Association for Computational Linguistics: EACL 2023, pages 2268–2274, 2023

work page 2023
[16]

Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetun- ing and inference

Benjamin Warner, Antoine Chaffin, Benjamin Clavi´ e, Orion Weller, Oskar Hallstr¨ om, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, et al. Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetun- ing and inference. InProceedings of the 63rd Annual Meeting of the Ass...

work page 2025