Spatio-Temporal Retrieval-based Priors for Adaptive Computational Teaching in Driving

Avinash Balachandran; Deepak Edakkattil Gopinath; Guy Rosman; Jonathan DeCastro; Xiongyi Cui

arxiv: 2606.25224 · v1 · pith:6IL2G6GZnew · submitted 2026-06-23 · 💻 cs.RO

Spatio-Temporal Retrieval-based Priors for Adaptive Computational Teaching in Driving

Deepak Edakkattil Gopinath , Xiongyi Cui , Jonathan DeCastro , Avinash Balachandran , Guy Rosman This is my paper

Pith reviewed 2026-06-25 23:27 UTC · model grok-4.3

classification 💻 cs.RO

keywords adaptive teachingimitation learningnearest neighbor retrievalcross-attentiondriving coachingtemporal reasoninglow-data regimesstudent-teacher interaction

0 comments

The pith

A nearest-neighbor retrieval and cross-attention prior lets an imitation-learning coach adapt to student history in driving tasks under limited data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an imitation-learning model for adaptive computational teaching in high-performance driving that includes a temporal reasoning module to capture the cumulative effects of repeated teacher-student interactions. To handle scarce interactive training data, the model retrieves a narrowed set of semantically similar past sessions via nearest neighbors and applies cross-attention within an encoder-decoder architecture. Experiments on a semi-synthetic closed-loop dataset from Waymo Open Motion and a small real-world race-coaching simulator show consistent gains over a non-adaptive baseline and other adaptive variants that use different priors or fusion methods. A sympathetic reader would care because local, context-only reasoning in coaching systems misses the long-term temporal nature of skill acquisition. If the claim holds, retrieval-based priors become a practical route to data-efficient adaptive teaching for complex motor tasks.

Core claim

The model with nearest-neighbor retrieval and cross-attention prior demonstrates a consistent advantage over a non-adaptive baseline and a suite of adaptive models that vary in their choice of priors and temporal fusion mechanisms, as measured on both a novel semi-synthetic longitudinal student-teacher dataset and a naturalistic simulator race-coaching dataset.

What carries the argument

Nearest-neighbor retrieval and cross-attention prior that restricts reasoning to a narrowed set of semantically similar past interactions inside an encoder-decoder concurrent teaching model.

If this is right

The model reasons over long-term interaction history even when interactive training data are scarce.
Nearest-neighbor retrieval compensates for limited data by exploiting the repetitive structure of teaching sessions.
The approach outperforms both non-adaptive baselines and alternative adaptive models that use different priors or fusion mechanisms.
Validation holds across a semi-synthetic closed-loop longitudinal dataset and a small-scale real-world simulator dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same retrieval prior could be tested in other sequential motor-teaching domains such as sports or rehabilitation coaching where interaction histories are also repetitive.
If the mechanism scales, it would lower the data-collection burden for training autonomous coaching systems in any domain with cumulative student progress.
Extending the temporal module to handle multi-turn or multi-student histories would be a direct next measurement of the same architecture.

Load-bearing premise

The teaching process must produce enough semantically similar past interactions that can be reliably retrieved to offset limited interactive training data.

What would settle it

Run the same model on a dataset of highly varied, non-repetitive student behaviors with no semantically similar past cases; if the performance advantage over baselines disappears, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2606.25224 by Avinash Balachandran, Deepak Edakkattil Gopinath, Guy Rosman, Jonathan DeCastro, Xiongyi Cui.

**Figure 2.** Figure 2: Model performance as a function of dataset size. KNN-CrossAttn exhibits a gradual degradation compared to FullCrossAttn for smaller amounts of data. Results on SIMCOACHCORPUS [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: KNN-CrossAttn model performance as a function of noise in the nearest neighbor retrieval process. The degradation is higher (∼6%) in the beginning followed by a steady decrease (∼2-3%) all the way till Random-CrossAttn. ,where R is the set of the first round(ϵ · (max(0, t − |P|))) number of scenarios in H \P (sorted in increasing distance from qcurr). We then randomly sample K neighbors from Ksample to… view at source ↗

**Figure 4.** Figure 4: Map of Thunderhill West color coded according to segments. In this section, we present results from a naive model mismatch experiment in which we run the non-adaptive teacher’s decision rule on the scenarios in a student sequence generated by the adaptive teacher. When we treat the non-adaptive teacher’s actions as non-neural baseline predictions on the adaptive teacher’s scenarios, the weighted F1-score … view at source ↗

**Figure 5.** Figure 5: Percentage gain in model performance for KNN-CrossAttn compared to NonAdaptive as a function of entropy of instruction distribution per segment. Relative gain of the adaptive model is higher in those segments where entropy of instruction distribution is higher [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Examples of trajectory and teaching action predictions from the [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

read the original abstract

Learning-based automated coaching systems for complex motor tasks such as high-performance driving remain limited in the ability to be adaptive by their reliance only on local, context-dependent reasoning, failing to account for the long-term temporal nature of student learning and the cumulative impact of repeated teacher-student interactions. In this paper, we propose an imitation learning based computational model for adaptive teaching with a dedicated temporal reasoning module that can reason over the interaction history under low-data regimes. To compensate for limited amounts of interactive training data, and based on the repetitive nature of the teaching process, the model relies on a nearest neighbor retrieval and cross attention prior, reasoning only on a narrowed-down set of semantically similar past interactions with an encoder-decoder based concurrent teaching model. We validate our approach with (i) a novel semi-synthetic closed-loop longitudinal student-teacher interaction dataset based on Waymo Open Motion Dataset and (ii) a small-scale real-world naturalistic simulator race coaching dataset. Our results reveal the consistent advantage of our adaptive teaching model with the nearest neighbor retrieval and cross-attention prior over a non-adaptive baseline as well as a suite of adaptive models that differ in their choice of priors and temporal fusion mechanisms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The retrieval-plus-cross-attention prior is a concrete idea for low-data temporal coaching, but the Waymo-derived dataset probably lacks the repeated interactions needed to make the mechanism work.

read the letter

The paper's core move is to add nearest-neighbor retrieval over past teacher-student interactions, then feed the retrieved histories through cross-attention inside an encoder-decoder imitation learner. That combination is presented as the way to handle long-term student progress when interactive data is scarce. The claim is that this beats both a non-adaptive baseline and other adaptive models that use different priors or fusion methods.

What is actually new is the specific pairing of NN retrieval with cross-attention for this teaching setting; the abstract does not point to prior work that already does exactly this. The two validation sets—one semi-synthetic from Waymo trajectories and one small real-world race-coaching simulator—are reasonable choices for the domain.

The soft spot is the dataset construction. Waymo segments are short and diverse rather than repeated longitudinal sessions with the same student, so nearest-neighbor lookup is unlikely to surface genuinely similar past interactions. If that is the case, any measured gains cannot be cleanly attributed to the retrieval mechanism and may come from other modeling choices. The abstract also gives no numbers, error bars, or details on how the comparison models were built, which makes it hard to judge the size or reliability of the reported advantage.

This is the kind of paper that would interest people working on practical adaptive coaching or low-data imitation learning in robotics. It is coherent on its own terms and engages the right literature, so it is worth sending to referees even though the current evidence is thin and the central empirical test may not actually exercise the claimed mechanism.

Referee Report

3 major / 2 minor

Summary. The paper proposes an imitation-learning model for adaptive computational teaching in high-performance driving. It augments a concurrent encoder-decoder teaching policy with a nearest-neighbor retrieval module and cross-attention prior that conditions on semantically similar past teacher-student interaction histories, aiming to compensate for limited interactive training data. The approach is evaluated on a novel semi-synthetic closed-loop longitudinal dataset derived from the Waymo Open Motion Dataset and on a small-scale real-world naturalistic simulator race-coaching dataset; the central empirical claim is a consistent advantage of the NN-retrieval + cross-attention variant over a non-adaptive baseline and over other adaptive models that differ in prior choice or temporal fusion.

Significance. If the reported gains are shown to arise specifically from the retrieval mechanism locating repeated similar interactions, the work would provide a concrete, data-efficient way to incorporate long-term temporal structure into computational teaching models for motor skills. The use of retrieval priors to address low-data regimes is a strength worth exploring further in robotics and human-AI interaction.

major comments (3)

[§3.1, §4.1] Dataset construction (§3.1 and §4.1): the semi-synthetic Waymo-derived trajectories are described as short, diverse, non-repeated driving segments. It is therefore unclear whether the nearest-neighbor lookup can locate a sufficient number of semantically similar past interactions; without quantitative evidence (e.g., distribution of retrieval similarity scores or ablation removing the retrieval step) the performance gains cannot be attributed to the spatio-temporal prior rather than other modeling choices.
[§4.2–4.3] Empirical results (§4.2–4.3): the abstract and results claim “consistent advantage” yet the provided description supplies no numerical metrics, error bars, statistical significance tests, or explicit construction details for the suite of comparison models. This absence makes it impossible to assess whether the advantage is robust or load-bearing for the central claim.
[§3.1] Assumption validation: the weakest modeling assumption—that the repetitive nature of teaching produces retrievable similar histories—is not tested on the Waymo-derived data; a direct measurement of how often high-similarity neighbors exist would be required before the retrieval mechanism can be credited for the reported improvements.

minor comments (2)

Notation for the cross-attention prior and the retrieval index should be introduced once with a clear equation reference rather than scattered across sections.
Figure captions for the dataset and model diagrams would benefit from explicit indication of which components are learned versus retrieved.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to supply the requested quantitative evidence, metrics, and analyses.

read point-by-point responses

Referee: [§3.1, §4.1] Dataset construction (§3.1 and §4.1): the semi-synthetic Waymo-derived trajectories are described as short, diverse, non-repeated driving segments. It is therefore unclear whether the nearest-neighbor lookup can locate a sufficient number of semantically similar past interactions; without quantitative evidence (e.g., distribution of retrieval similarity scores or ablation removing the retrieval step) the performance gains cannot be attributed to the spatio-temporal prior rather than other modeling choices.

Authors: We agree that quantitative validation of the retrieval step is needed. In the revision we will add (i) the distribution of retrieval similarity scores on the Waymo-derived data and (ii) an ablation that removes the nearest-neighbor retrieval module while keeping all other components fixed, allowing direct attribution of gains to the spatio-temporal prior. revision: yes
Referee: [§4.2–4.3] Empirical results (§4.2–4.3): the abstract and results claim “consistent advantage” yet the provided description supplies no numerical metrics, error bars, statistical significance tests, or explicit construction details for the suite of comparison models. This absence makes it impossible to assess whether the advantage is robust or load-bearing for the central claim.

Authors: We will expand §4.2–4.3 to report all numerical metrics with error bars, include statistical significance tests, and provide explicit construction details for every baseline and ablation model so that the robustness of the reported advantage can be fully evaluated. revision: yes
Referee: [§3.1] Assumption validation: the weakest modeling assumption—that the repetitive nature of teaching produces retrievable similar histories—is not tested on the Waymo-derived data; a direct measurement of how often high-similarity neighbors exist would be required before the retrieval mechanism can be credited for the reported improvements.

Authors: We will add a dedicated analysis in the revised §3.1 that directly measures the frequency and distribution of high-similarity neighbors retrieved from the Waymo-derived dataset, thereby testing the core assumption on the data used for the main experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation of retrieval prior rests on external datasets and baselines

full rationale

The paper introduces a nearest-neighbor retrieval plus cross-attention model for adaptive teaching and reports performance gains on a Waymo-derived semi-synthetic dataset and a small real-world coaching dataset. No equations, fitted parameters, or self-citations are shown that reduce the claimed advantage to an identity or to the inputs by construction. The central result is an empirical comparison against non-adaptive and alternative adaptive baselines; the repetitive-interaction assumption is stated as a modeling premise rather than derived from the model itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that teaching interactions are sufficiently repetitive to allow useful nearest-neighbor retrieval and that semantic similarity can be reliably computed from the available features.

axioms (1)

domain assumption The teaching process is repetitive enough that past interactions contain semantically similar examples useful for current teaching decisions.
Stated in the abstract as the basis for using nearest-neighbor retrieval to compensate for limited interactive data.

pith-pipeline@v0.9.1-grok · 5758 in / 1287 out tokens · 19237 ms · 2026-06-25T23:27:44.627735+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 12 canonical work pages · 6 internal anchors

[1]

Rojas-Mu ˜noz, K

E. Rojas-Mu ˜noz, K. Couperus, and J. Wachs. DAISI: Database for AI surgical instruction. arXiv preprint arXiv:2004.02809, 2020

work page arXiv 2004
[2]

Giglio, A

B. Giglio, A. Albeloushi, A. K. Alhaj, M. Alhantoobi, R. Saeedi, V . Davidovic, A. Uthamacumaran, R. Yilmaz, J. Lapointe, N. Balasubramaniam, T. Tee, A. M. Fazlol- lahi, J. A. Correa, and R. F. Del Maestro. Artificial intelligence–augmented human instruc- tion and surgical simulation performance: A randomized clinical trial. JAMA Surgery, 160 (9):993–1003...

work page doi:10.1001/jamasurg.2025.2564 2025
[3]

H. Yin, L. Gu, P. Parmar, L. Xu, T. Guo, W. Fu, Y . Zhang, and T. Zheng. Flex: A large- scale multi-modal multi-action dataset for fitness action quality assessment. arXiv preprint arXiv:2506.03198, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Pashaie, S

S. Pashaie, S. Mohammadi, and H. Golmohammadi. Unlocking athlete potential: The evo- lution of coaching strategies through artificial intelligence. Proceedings of the Institution of Mechanical Engineers, Part P: Journal of Sports Engineering and Technology, page 17543371241300889, 2024

2024
[5]

Gopinath, X

D. Gopinath, X. Cui, J. DeCastro, E. Sumner, J. Costa, H. Yasuda, A. Morgan, L. Dees, S. Chau, J. Leonard, et al. Computational teaching for driving via multi-task imitation learning. In 2025 IEEE international conference on robotics and automation (ICRA), pages 7019–7027. IEEE, 2025

2025
[6]

Santos, A

L. Santos, A. Geminiani, P. Schydlo, I. Olivieri, J. Santos-Victor, and A. Pedrocchi. Design of a robotic coach for motor, social and cognitive skills training toward applications with asd children. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29:1223– 1232, 2021

2021
[7]

L. Chen, P. Chen, and Z. Lin. Artificial intelligence in education: A review. IEEE access, 8: 75264–75278, 2020

2020
[8]

Mayhew, K

S. Mayhew, K. Bicknell, C. Brust, B. McDowell, W. Monroe, and B. Settles. Simultaneous translation and paraphrase for language education. In Proceedings of the fourth workshop on neural generation and translation, pages 232–243, 2020

2020
[9]

Y . Choi, Y . Lee, D. Shin, J. Cho, S. Park, S. Lee, J. Baek, C. Bae, B. Kim, and J. Heo. Ednet: A large-scale hierarchical dataset in education. In International conference on artificial intelligence in education, pages 69–73. Springer, 2020

2020
[10]

Lyster and L

R. Lyster and L. Ranta. Corrective feedback and learner uptake: Negotiation of form in com- municative classrooms. Studies in second language acquisition, 19(1):37–66, 1997

1997
[11]

L. Mondada. Driving instruction at high speed on a race circuit: Issues in action formation and sequence organization. International Journal of Applied Linguistics, 28(2):304–325, 2018

2018
[12]

Ettinger, S

S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y . Chai, B. Sapp, C. R. Qi, Y . Zhou, et al. Large scale interactive motion forecasting for autonomous driving: The Waymo open motion dataset. In International conference on computer vision, pages 9710–9719, 2021

2021
[13]

Sumner, D

E. Sumner, D. E. Gopinath, L. Dees, P. R. Gomez, X. Cui, A. Silva, J. Costa, A. Morgan, M. Schrum, T. L. Chen, A. Balachandran, and G. Rosman. SimCoachCorpus: A naturalistic dataset with language and trajectories for embodied teaching, 2025. URLhttps://arxiv. org/abs/2509.14548

work page arXiv 2025
[14]

H. Le, N. Jiang, A. Agarwal, M. Dud ´ık, Y . Yue, and H. Daum´e III. Hierarchical imitation and reinforcement learning. In International conference on machine learning, pages 2917–2926. PMLR, 2018. 9

2018
[15]

Z. Liu, C. Li, Y . Wang, N. Yang, X. Fan, J. Ma, and X. Zhao. Multi-scale temporal fusion trans- former for incomplete vehicle trajectory prediction. IEEE transactions on intelligent vehicles, 2024

2024
[16]

R. Fox, R. Shin, W. Paul, Y . Zou, D. Song, K. Goldberg, P. Abbeel, and I. Stoica. Hierarchical variational imitation learning of control programs. arXiv preprint arXiv:1912.12612, 2019

work page arXiv 1912
[17]

L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021

2021
[18]

A. T. Corbett and J. R. Anderson. Knowledge tracing: Modeling the acquisition of procedural knowledge. User modeling and user-adapted interaction, 4(4):253–278, 1994

1994
[19]

Piech, J

C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L. J. Guibas, and J. Sohl-Dickstein. Deep knowledge tracing. Advances in neural information processing systems, 28, 2015

2015
[20]

Zhang, X

J. Zhang, X. Shi, I. King, and D.-Y . Yeung. Dynamic key-value memory networks for knowl- edge tracing. In Proceedings of the 26th international conference on World Wide Web, pages 765–774, 2017

2017
[21]

P. I. Pavlik and J. R. Anderson. Using a model to compute the optimal schedule of practice. Journal of experimental psychology: applied, 14(2):101, 2008

2008
[22]

R. A. Calvo and S. D’Mello. Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE transactions on affective computing, 1(1):18–37, 2010

2010
[23]

Jeong, A

H. Jeong, A. Gupta, R. Roscoe, J. Wagster, G. Biswas, and D. Schwartz. Using hidden markov models to characterize student behaviors in learning-by-teaching environments. In International conference on intelligent tutoring systems, pages 614–625. Springer, 2008

2008
[24]

Settles and B

B. Settles and B. Meeder. A trainable spaced repetition model for language learning. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pages 1848–1858, 2016

2016
[25]

Balakrishnan and D

G. Balakrishnan and D. Coetzee. Predicting student retention in massive open online courses using hidden markov models. Technical report, Electrical Engineering and Computer Sciences dept., University of California at Berkeley, 2013

2013
[26]

S. Pu, M. Yudelson, L. Ou, and Y . Huang. Deep knowledge tracing with transformers. In International conference on artificial intelligence in education, pages 252–256. Springer, 2020

2020
[27]

Srivastava, E

M. Srivastava, E. Biyik, S. Mirchandani, N. Goodman, and D. Sadigh. Assistive teaching of motor control tasks to humans. In Advances in neural information processing systems, Nov. 2022

2022
[28]

Srivastava, N

M. Srivastava, N. Goodman, and D. Sadigh. Generating language corrections for teaching physical control tasks. In International conference on machine learning, volume 202, pages 32561–32574, July 2023

2023
[29]

Srivastava, R

M. Srivastava, R. Iranmanesh, Y . Cui, D. Gopinath, E. S. Sumner, A. Silva, L. Dees, G. Ros- man, and D. Sadigh. Shared autonomy for proximal teaching. In 2025 20th ACM/IEEE International conference on human-robot interaction (HRI), pages 232–241. IEEE, 2025

2025
[30]

Hierarchical Multiscale Recurrent Neural Networks

J. Chung, S. Ahn, and Y . Bengio. Hierarchical multiscale recurrent neural networks, 2017. URLhttps://arxiv.org/abs/1609.01704

work page internal anchor Pith review Pith/arXiv arXiv 2017
[31]

Multi-Scale Context Aggregation by Dilated Convolutions

F. Yu and V . Koltun. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015. 10

work page internal anchor Pith review Pith/arXiv arXiv 2015
[32]

Z. Dai, Z. Yang, Y . Yang, J. G. Carbonell, Q. V . Le, and R. Salakhutdinov. Transformer-xl: Attentive language models beyond a fixed-length context.CoRR, abs/1901.02860, 2019. URL http://arxiv.org/abs/1901.02860

work page internal anchor Pith review Pith/arXiv arXiv 1901
[33]

Neural Turing Machines

A. Graves, G. Wayne, and I. Danihelka. Neural turing machines. CoRR, abs/1410.5401, 2014. URLhttp://arxiv.org/abs/1410.5401

work page internal anchor Pith review Pith/arXiv arXiv 2014
[34]

Lewis, E

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K¨uttler, M. Lewis, W.-t. Yih, T. Rockt¨aschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33:9459–9474, 2020

2020
[35]

J. Pari, N. M. Shafiullah, S. P. Arunachalam, and L. Pinto. The surprising effectiveness of representation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021

work page arXiv 2021
[36]

L. Deng, D. Lian, C. Wu, and E. Chen. Learning from highly sparse spatio-temporal data. Advances in neural information processing systems, 37:94022–94046, 2024

2024
[37]

C. Yu, Y . Xu, L. Li, and D. Hsu. Coach: Cooperative robot teaching. In Conference on robot learning, pages 1092–1103. PMLR, 2023

2023
[38]

Shlomov, J

S. Shlomov, J. Muehlstein, N. Guetta, and L. Limonad. Ongoing tracking of engagement in motor learning. arXiv preprint arXiv:2308.07670, 2023

work page arXiv 2023
[39]

Fuchino, M

K. Fuchino, M. Al-Sada, T. Miyake, and T. Nakajima. T2snaker: a robotic coach for table tennis. In Proceedings of the augmented humans international conference 2022, pages 305– 308, 2022

2022
[40]

Ziegenbein, J

N. Ziegenbein, J. Friedman, and A. Moringen. Monitoring the learning progress in piano playing with hidden markov models. In Adjunct proceedings of the 30th ACM conference on user modeling, adaptation and personalization, pages 335–341, 2022

2022
[41]

Forestier, L

G. Forestier, L. Riffaud, F. Petitjean, P.-L. Henaux, and P. Jannin. Surgical skills: Can learning curves be computed from recordings of surgical activities? International journal of computer assisted radiology and surgery, 13(5):629–636, 2018

2018
[42]

Skill-informed Data-driven Haptic Nudges for High-dimensional Human Motor Learning

A. Kamboj, R. Ranganathan, X. Tan, and V . Srivastava. Skill-informed data-driven haptic nudges for high-dimensional human motor learning. arXiv preprint arXiv:2603.12583, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[43]

Ropelato, F

S. Ropelato, F. Z ¨und, S. Magnenat, M. Menozzi, and R. W. Sumner. Adaptive tutoring on a virtual reality driving simulator. In International SERIES on Information Systems and Management in Creative EMedia (CreMedia), 2017

2017
[44]

C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017

2017
[45]

J. Gu, C. Sun, and H. Zhao. Densetnt: End-to-end trajectory prediction from dense goal sets. In Proceedings of the IEEE/CVF international conference on computer vision, pages 15303– 15312, 2021

2021
[46]

throttle off

S. Garg, D. Tsipras, P. S. Liang, and G. Valiant. What can transformers learn in-context? a case study of simple function classes. Advances in neural information processing systems, 35: 30583–30598, 2022. 11 1 Additional WAYCOACHResults 1.1 Effect ofa h C encoding as part of input We saw in Table 1 thatFull-CrossAttnis easily able to achieve high validati...

2022

[1] [1]

Rojas-Mu ˜noz, K

E. Rojas-Mu ˜noz, K. Couperus, and J. Wachs. DAISI: Database for AI surgical instruction. arXiv preprint arXiv:2004.02809, 2020

work page arXiv 2004

[2] [2]

Giglio, A

B. Giglio, A. Albeloushi, A. K. Alhaj, M. Alhantoobi, R. Saeedi, V . Davidovic, A. Uthamacumaran, R. Yilmaz, J. Lapointe, N. Balasubramaniam, T. Tee, A. M. Fazlol- lahi, J. A. Correa, and R. F. Del Maestro. Artificial intelligence–augmented human instruc- tion and surgical simulation performance: A randomized clinical trial. JAMA Surgery, 160 (9):993–1003...

work page doi:10.1001/jamasurg.2025.2564 2025

[3] [3]

H. Yin, L. Gu, P. Parmar, L. Xu, T. Guo, W. Fu, Y . Zhang, and T. Zheng. Flex: A large- scale multi-modal multi-action dataset for fitness action quality assessment. arXiv preprint arXiv:2506.03198, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

Pashaie, S

S. Pashaie, S. Mohammadi, and H. Golmohammadi. Unlocking athlete potential: The evo- lution of coaching strategies through artificial intelligence. Proceedings of the Institution of Mechanical Engineers, Part P: Journal of Sports Engineering and Technology, page 17543371241300889, 2024

2024

[5] [5]

Gopinath, X

D. Gopinath, X. Cui, J. DeCastro, E. Sumner, J. Costa, H. Yasuda, A. Morgan, L. Dees, S. Chau, J. Leonard, et al. Computational teaching for driving via multi-task imitation learning. In 2025 IEEE international conference on robotics and automation (ICRA), pages 7019–7027. IEEE, 2025

2025

[6] [6]

Santos, A

L. Santos, A. Geminiani, P. Schydlo, I. Olivieri, J. Santos-Victor, and A. Pedrocchi. Design of a robotic coach for motor, social and cognitive skills training toward applications with asd children. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29:1223– 1232, 2021

2021

[7] [7]

L. Chen, P. Chen, and Z. Lin. Artificial intelligence in education: A review. IEEE access, 8: 75264–75278, 2020

2020

[8] [8]

Mayhew, K

S. Mayhew, K. Bicknell, C. Brust, B. McDowell, W. Monroe, and B. Settles. Simultaneous translation and paraphrase for language education. In Proceedings of the fourth workshop on neural generation and translation, pages 232–243, 2020

2020

[9] [9]

Y . Choi, Y . Lee, D. Shin, J. Cho, S. Park, S. Lee, J. Baek, C. Bae, B. Kim, and J. Heo. Ednet: A large-scale hierarchical dataset in education. In International conference on artificial intelligence in education, pages 69–73. Springer, 2020

2020

[10] [10]

Lyster and L

R. Lyster and L. Ranta. Corrective feedback and learner uptake: Negotiation of form in com- municative classrooms. Studies in second language acquisition, 19(1):37–66, 1997

1997

[11] [11]

L. Mondada. Driving instruction at high speed on a race circuit: Issues in action formation and sequence organization. International Journal of Applied Linguistics, 28(2):304–325, 2018

2018

[12] [12]

Ettinger, S

S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y . Chai, B. Sapp, C. R. Qi, Y . Zhou, et al. Large scale interactive motion forecasting for autonomous driving: The Waymo open motion dataset. In International conference on computer vision, pages 9710–9719, 2021

2021

[13] [13]

Sumner, D

E. Sumner, D. E. Gopinath, L. Dees, P. R. Gomez, X. Cui, A. Silva, J. Costa, A. Morgan, M. Schrum, T. L. Chen, A. Balachandran, and G. Rosman. SimCoachCorpus: A naturalistic dataset with language and trajectories for embodied teaching, 2025. URLhttps://arxiv. org/abs/2509.14548

work page arXiv 2025

[14] [14]

H. Le, N. Jiang, A. Agarwal, M. Dud ´ık, Y . Yue, and H. Daum´e III. Hierarchical imitation and reinforcement learning. In International conference on machine learning, pages 2917–2926. PMLR, 2018. 9

2018

[15] [15]

Z. Liu, C. Li, Y . Wang, N. Yang, X. Fan, J. Ma, and X. Zhao. Multi-scale temporal fusion trans- former for incomplete vehicle trajectory prediction. IEEE transactions on intelligent vehicles, 2024

2024

[16] [16]

R. Fox, R. Shin, W. Paul, Y . Zou, D. Song, K. Goldberg, P. Abbeel, and I. Stoica. Hierarchical variational imitation learning of control programs. arXiv preprint arXiv:1912.12612, 2019

work page arXiv 1912

[17] [17]

L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021

2021

[18] [18]

A. T. Corbett and J. R. Anderson. Knowledge tracing: Modeling the acquisition of procedural knowledge. User modeling and user-adapted interaction, 4(4):253–278, 1994

1994

[19] [19]

Piech, J

C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L. J. Guibas, and J. Sohl-Dickstein. Deep knowledge tracing. Advances in neural information processing systems, 28, 2015

2015

[20] [20]

Zhang, X

J. Zhang, X. Shi, I. King, and D.-Y . Yeung. Dynamic key-value memory networks for knowl- edge tracing. In Proceedings of the 26th international conference on World Wide Web, pages 765–774, 2017

2017

[21] [21]

P. I. Pavlik and J. R. Anderson. Using a model to compute the optimal schedule of practice. Journal of experimental psychology: applied, 14(2):101, 2008

2008

[22] [22]

R. A. Calvo and S. D’Mello. Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE transactions on affective computing, 1(1):18–37, 2010

2010

[23] [23]

Jeong, A

H. Jeong, A. Gupta, R. Roscoe, J. Wagster, G. Biswas, and D. Schwartz. Using hidden markov models to characterize student behaviors in learning-by-teaching environments. In International conference on intelligent tutoring systems, pages 614–625. Springer, 2008

2008

[24] [24]

Settles and B

B. Settles and B. Meeder. A trainable spaced repetition model for language learning. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pages 1848–1858, 2016

2016

[25] [25]

Balakrishnan and D

G. Balakrishnan and D. Coetzee. Predicting student retention in massive open online courses using hidden markov models. Technical report, Electrical Engineering and Computer Sciences dept., University of California at Berkeley, 2013

2013

[26] [26]

S. Pu, M. Yudelson, L. Ou, and Y . Huang. Deep knowledge tracing with transformers. In International conference on artificial intelligence in education, pages 252–256. Springer, 2020

2020

[27] [27]

Srivastava, E

M. Srivastava, E. Biyik, S. Mirchandani, N. Goodman, and D. Sadigh. Assistive teaching of motor control tasks to humans. In Advances in neural information processing systems, Nov. 2022

2022

[28] [28]

Srivastava, N

M. Srivastava, N. Goodman, and D. Sadigh. Generating language corrections for teaching physical control tasks. In International conference on machine learning, volume 202, pages 32561–32574, July 2023

2023

[29] [29]

Srivastava, R

M. Srivastava, R. Iranmanesh, Y . Cui, D. Gopinath, E. S. Sumner, A. Silva, L. Dees, G. Ros- man, and D. Sadigh. Shared autonomy for proximal teaching. In 2025 20th ACM/IEEE International conference on human-robot interaction (HRI), pages 232–241. IEEE, 2025

2025

[30] [30]

Hierarchical Multiscale Recurrent Neural Networks

J. Chung, S. Ahn, and Y . Bengio. Hierarchical multiscale recurrent neural networks, 2017. URLhttps://arxiv.org/abs/1609.01704

work page internal anchor Pith review Pith/arXiv arXiv 2017

[31] [31]

Multi-Scale Context Aggregation by Dilated Convolutions

F. Yu and V . Koltun. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015. 10

work page internal anchor Pith review Pith/arXiv arXiv 2015

[32] [32]

Z. Dai, Z. Yang, Y . Yang, J. G. Carbonell, Q. V . Le, and R. Salakhutdinov. Transformer-xl: Attentive language models beyond a fixed-length context.CoRR, abs/1901.02860, 2019. URL http://arxiv.org/abs/1901.02860

work page internal anchor Pith review Pith/arXiv arXiv 1901

[33] [33]

Neural Turing Machines

A. Graves, G. Wayne, and I. Danihelka. Neural turing machines. CoRR, abs/1410.5401, 2014. URLhttp://arxiv.org/abs/1410.5401

work page internal anchor Pith review Pith/arXiv arXiv 2014

[34] [34]

Lewis, E

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K¨uttler, M. Lewis, W.-t. Yih, T. Rockt¨aschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33:9459–9474, 2020

2020

[35] [35]

J. Pari, N. M. Shafiullah, S. P. Arunachalam, and L. Pinto. The surprising effectiveness of representation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021

work page arXiv 2021

[36] [36]

L. Deng, D. Lian, C. Wu, and E. Chen. Learning from highly sparse spatio-temporal data. Advances in neural information processing systems, 37:94022–94046, 2024

2024

[37] [37]

C. Yu, Y . Xu, L. Li, and D. Hsu. Coach: Cooperative robot teaching. In Conference on robot learning, pages 1092–1103. PMLR, 2023

2023

[38] [38]

Shlomov, J

S. Shlomov, J. Muehlstein, N. Guetta, and L. Limonad. Ongoing tracking of engagement in motor learning. arXiv preprint arXiv:2308.07670, 2023

work page arXiv 2023

[39] [39]

Fuchino, M

K. Fuchino, M. Al-Sada, T. Miyake, and T. Nakajima. T2snaker: a robotic coach for table tennis. In Proceedings of the augmented humans international conference 2022, pages 305– 308, 2022

2022

[40] [40]

Ziegenbein, J

N. Ziegenbein, J. Friedman, and A. Moringen. Monitoring the learning progress in piano playing with hidden markov models. In Adjunct proceedings of the 30th ACM conference on user modeling, adaptation and personalization, pages 335–341, 2022

2022

[41] [41]

Forestier, L

G. Forestier, L. Riffaud, F. Petitjean, P.-L. Henaux, and P. Jannin. Surgical skills: Can learning curves be computed from recordings of surgical activities? International journal of computer assisted radiology and surgery, 13(5):629–636, 2018

2018

[42] [42]

Skill-informed Data-driven Haptic Nudges for High-dimensional Human Motor Learning

A. Kamboj, R. Ranganathan, X. Tan, and V . Srivastava. Skill-informed data-driven haptic nudges for high-dimensional human motor learning. arXiv preprint arXiv:2603.12583, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[43] [43]

Ropelato, F

S. Ropelato, F. Z ¨und, S. Magnenat, M. Menozzi, and R. W. Sumner. Adaptive tutoring on a virtual reality driving simulator. In International SERIES on Information Systems and Management in Creative EMedia (CreMedia), 2017

2017

[44] [44]

C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017

2017

[45] [45]

J. Gu, C. Sun, and H. Zhao. Densetnt: End-to-end trajectory prediction from dense goal sets. In Proceedings of the IEEE/CVF international conference on computer vision, pages 15303– 15312, 2021

2021

[46] [46]

throttle off

S. Garg, D. Tsipras, P. S. Liang, and G. Valiant. What can transformers learn in-context? a case study of simple function classes. Advances in neural information processing systems, 35: 30583–30598, 2022. 11 1 Additional WAYCOACHResults 1.1 Effect ofa h C encoding as part of input We saw in Table 1 thatFull-CrossAttnis easily able to achieve high validati...

2022