CANINE: Coaching Visually Impaired Users for Interactive Navigation with a Robot Guide Dog

Anxing Xiao; Cunjun Yu; David Hsu; Linfeng Li; Zishuo Wang

arxiv: 2605.19501 · v1 · pith:AKYGY52Unew · submitted 2026-05-19 · 💻 cs.RO · cs.AI

CANINE: Coaching Visually Impaired Users for Interactive Navigation with a Robot Guide Dog

Cunjun Yu , Zishuo Wang , Anxing Xiao , Linfeng Li , David Hsu This is my paper

Pith reviewed 2026-05-20 05:29 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords robot guide dogvisually impaired navigationhuman-robot coordinationadaptive verbal coachingknowledge tracingfoundation modelsinteractive navigationpersonalized feedback

0 comments

The pith

CANINE uses a two-level system of skill tracking and AI error analysis to deliver personalized verbal coaching for robot guide dog navigation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that an automated coaching system called CANINE trains users to coordinate effectively with a robot guide dog by breaking the task into sub-skills and providing adaptive verbal feedback. At the high level it tracks proficiency with knowledge tracing to focus training on weak areas, while at the low level it watches practice episodes and applies foundation models to infer the causes of coordination errors before generating targeted corrections. A controlled study with blindfolded participants demonstrates faster learning and stronger final navigation performance than generic verbal instructions, with supporting evidence from a retention test after two weeks and a case study with a visually impaired user. This addresses the difficulty of learning subtle human-robot coordination that limits the real-world benefit of guide robots for independent mobility.

Core claim

CANINE decomposes a complex coordination task into sub-skills and operates at two levels. At the high level, it decides what to train by tracking the learner's proficiency across sub-skills using knowledge tracing and prioritizing training on the weakest areas. At the low level, CANINE decides how to train each sub-skill by observing each human practice episode, using foundation models to infer the underlying causes of errors, and generating targeted verbal corrections adaptively. A controlled study with blindfolded participants demonstrates that CANINE significantly improves both learning efficiency and final navigation performance compared to generic verbal instructions.

What carries the argument

Two-level coaching system that combines knowledge tracing for proficiency tracking and prioritization with foundation-model inference of error causes to produce adaptive verbal corrections.

If this is right

Users reach higher navigation performance after fewer practice episodes than with generic instructions.
Skill gains persist for at least two weeks following the training sessions.
The same coaching approach produces measurable benefits when applied to a real visually impaired user.
Targeted corrections derived from observed error causes outperform non-adaptive verbal guidance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Embedding the coaching loop inside the robot itself could allow ongoing skill refinement during everyday use rather than separate training sessions.
The two-level structure of skill tracing plus model-based error diagnosis may transfer to training users for other complex assistive robots such as wheelchairs or manipulators.
Larger field trials that vary user experience levels and environments would test how robust the foundation-model inference step remains outside controlled settings.

Load-bearing premise

Blindfolded participants serve as a valid proxy population for quantitatively evaluating effectiveness with visually impaired users.

What would settle it

A direct comparison study with actual visually impaired users that finds no significant improvement in learning efficiency or navigation performance over generic verbal instructions.

Figures

Figures reproduced from arXiv: 2605.19501 by Anxing Xiao, Cunjun Yu, David Hsu, Linfeng Li, Zishuo Wang.

**Figure 1.** Figure 1: CANINE. In our study, a robot guide dog coaches a visually impaired user to navigate through a doorway. damental challenges. First, these tasks are complex processes composed of distinct phases, each requiring different coordination strategies. For example, locating a door handle requires spatial exploration, whereas passing through the door requires precise coordination. Given the heterogeneity of these … view at source ↗

**Figure 2.** Figure 2: Expert coach insights from formative study. Analysis reveals common learner challenges, effective coaching strategies (timing and content), and design implications for automated coaching systems. and adaptively diagnose “how” the execution failed, modifying feedback to correct specific physical errors. Interactive tasks with timing constraints are most difficult. Tasks requiring active physical coordinatio… view at source ↗

**Figure 3.** Figure 3: Overview of CANINE. CANINE employs a two-level coaching strategy. The inter-skill coaching (up-right) tracks proficiency across sub-skills using knowledge tracing and selects the sub-skill to practice next. The intra-skill coaching (downright) takes in video observations and generates coaching instructions for the selected sub-skill. B. Inter-skill Coaching The inter-skill layer uses the POMDP formulation… view at source ↗

**Figure 4.** Figure 4: Illustration of the summarized timeline. The timeline aggregates per-frame observations sampled every 0.5 s into a structured episode-level summary. symbolic state representations h1:T . Given the observation trajectory, we uniformly sample key frames and apply ht = fframe(o1:T , t) to each timestamp t. In practice, we capture an image every 0.5 seconds and feed it to the VLM. The output ht encodes: user … view at source ↗

**Figure 7.** Figure 7: Frame analysis accuracy comparison. Preference rates of VLM-generated descriptions versus human annotations across different language models (GPT-5.1, Claude-4.5- Sonnet, Gemini-3.0-Pro amd Qwen3-VL). TABLE I: Human evaluation on coaching feedback quality. Preference rate indicates percentage of decisive preference. Other metrics are mean ratings on 1-5 Likert scales (higher is better). Latency is average… view at source ↗

**Figure 9.** Figure 9: Objective performance. a) Performance improvement across sub-skills. b) Final completion time in evaluation trials. SUS TLX PU PEU 0 20 40 60 80 100 * Baseline Ours [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

**Figure 11.** Figure 11: Qualitative examples of generated instructions. Top-left shows the image captured by the chest-mount camera. Sub-skill Practice Personalized Feedback Adaptive Curriculum Take in Human Feedback Terminal Feedback Timing Progressive Difficulty 6.40 6.40 6.30 6.10 5.60 4.90 [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

**Figure 12.** Figure 12: User ratings of CANINE components. Using a 1 - 7 Likert scale (1 = not helpful, 7 = very helpful). −2.61, p = .019, g = −1.11, leading to faster completion times (M = 11.36, SD = 0.52 vs. M = 14.42, SD = 0.76), t(18) = 3.00, p = .006, g = 1.35. 4) Results on Subjective Experience: Participants rated the coaching system as more useful than baseline instruction, valuing its targeted feedback and adaptive st… view at source ↗

**Figure 13.** Figure 13: Skill retention. Bars show the mean retention change for each skill in second. substantial improvement in the “Open Door” task. In contrast, “Enter Room” and “Navigate to Door” showed minor performance degradation, indicating these skills remained relatively stable. Interestingly, 60% of participants identified “Open Door” as the most challenging aspect of the task. We hypothesize that the observed impr… view at source ↗

**Figure 15.** Figure 15: User interfaces used for data collection and evaluation. (a) Interface for human annotators to label ground truth frame [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗

**Figure 16.** Figure 16: Objects used for handover experiment. 3) Cross-Task Comparison Interview (5 min): • Comparison of navigation vs. handover coaching experience • Feedback timing preferences: terminal feedback for navigation vs. concurrent feedback for handover • Task-specific coaching requirements and design considerations M. Study 4: Robotic Handover Task Setup This section details the experimental setup for the robotic h… view at source ↗

read the original abstract

Robot guide dogs offer navigation assistance that greatly expands the independent mobility of the visually impaired, but their effective use requires subtle human-robot coordination that is difficult for users to learn from generic verbal instructions. To tackle this challenge, we present CANINE, an automated coaching system that trains users for interactive navigation with a robot guide dog, through personalized, adaptive verbal feedback. CANINE decomposes a complex coordination task into sub-skills and operates at two levels. At the high level, it decides what to train by tracking the learner's proficiency across sub-skills using knowledge tracing and prioritizing training on the weakest areas. At the low level, CANINE decides how to train each sub-skill by observing each human practice episode, using foundation models to infer the underlying causes of errors, and generating targeted verbal corrections adaptively. A controlled study with blindfolded participants, treated as a proxy population for quantitative evaluation, demonstrates that CANINE significantly improves both learning efficiency and final navigation performance compared to generic verbal instructions. We further validate CANINE through a retention study and an exploratory case study. The retention study shows lasting skill improvement after two weeks. The case study confirms CANINE's effectiveness in training a visually impaired user, while revealing additional design considerations for real-world deployment. Both are well aligned with the findings of the controlled study. Project page: https://cunjunyu.github.io/project/canine/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CANINE combines knowledge tracing and foundation-model error diagnosis for adaptive coaching on robot guide dogs, but the main study relies on blindfolded sighted proxies whose results may not transfer cleanly to visually impaired users.

read the letter

The main point is that CANINE puts together a two-level system: knowledge tracing to decide which sub-skills to focus on, and foundation models to spot why a practice run went wrong and give specific verbal fixes. That architecture is the concrete new piece here for this application area. The paper lays out the decomposition of coordination tasks clearly and shows the controlled study with blindfolded participants produced better learning speed and final performance than plain instructions. The retention check after two weeks and the single case study with a visually impaired user both point in the same direction, which is useful to see even if the numbers are not fully detailed yet. The system description itself is coherent and the components seem to operate without obvious circularity. The soft spot is the proxy population. Blindfolded sighted people do not have the long-term sensory adaptations that visually impaired users develop, so the kinds of errors and how well verbal corrections land could differ. The paper only reports one exploratory case with an actual user and no quantitative comparison, which leaves the central claim resting on thinner ground than it first appears. This is for people working on assistive robotics and human-robot interaction for accessibility. A reader who wants concrete ideas for adaptive training systems in mobility aids will get value from the design and the initial empirical direction. It deserves peer review because the problem is real and the approach is grounded enough to be worth referee time, even if the evaluation needs tightening around the target population.

Referee Report

2 major / 2 minor

Summary. The paper presents CANINE, an automated coaching system for training users in interactive navigation with a robot guide dog. CANINE decomposes coordination into sub-skills, uses knowledge tracing at the high level to prioritize weakest areas, and foundation models at the low level to infer error causes from practice episodes and generate targeted verbal corrections. A controlled study with blindfolded participants (treated as proxy for visually impaired users) reports statistically significant gains in learning efficiency and final navigation performance versus generic verbal instructions. Validation includes a retention study showing lasting skill retention after two weeks and an exploratory case study with one visually impaired user.

Significance. If the central empirical claims hold under proper validation, the work offers a concrete, scalable approach to personalized human-robot coaching in assistive robotics. The combination of knowledge tracing with foundation-model-based error diagnosis is a technically interesting contribution that could generalize to other interactive skill-acquisition settings. The retention and case-study elements provide useful longitudinal and real-user signals that strengthen the overall narrative.

major comments (2)

[Evaluation / Controlled Study] The headline result—that CANINE produces statistically significant improvements in learning efficiency and navigation performance—rests entirely on the controlled study with blindfolded sighted participants (described in the abstract and Evaluation section). The manuscript supplies no sample sizes, power analysis, exact statistical tests, or full methodology, and the single exploratory case study with one VI user reports no quantitative metrics or direct statistical comparison to the proxy cohort. This omission makes it impossible to evaluate whether the observed gains are robust or an artifact of the proxy population.
[Evaluation / Proxy Population Justification] The validity of blindfolded sighted participants as a quantitative proxy for visually impaired users is not adequately justified. Visually impaired users typically possess long-term compensatory spatial cognition and haptic/auditory integration that sighted blindfolded subjects lack; these differences can alter both the distribution of coordination errors and the effectiveness of verbal corrections. The paper acknowledges the proxy choice but provides no evidence (e.g., comparative error distributions or pilot data) that the two populations respond similarly to the coaching interventions.

minor comments (2)

[Abstract] The abstract states that the controlled study demonstrates 'statistically significant gains' yet omits all numerical details (N, p-values, effect sizes). Adding these numbers, even in summary form, would improve transparency.
[Case Study] The case study is described as 'exploratory' and 'well aligned' with the controlled-study findings, but no concrete metrics or qualitative observations are supplied. A brief table or bullet list of observed behaviors and outcomes would help readers assess alignment.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback on our manuscript. We appreciate the emphasis on evaluation transparency and proxy validation, which are critical for establishing the robustness of our claims in assistive robotics. Below we provide point-by-point responses to the major comments and describe the revisions we will make.

read point-by-point responses

Referee: [Evaluation / Controlled Study] The headline result—that CANINE produces statistically significant improvements in learning efficiency and navigation performance—rests entirely on the controlled study with blindfolded sighted participants (described in the abstract and Evaluation section). The manuscript supplies no sample sizes, power analysis, exact statistical tests, or full methodology, and the single exploratory case study with one VI user reports no quantitative metrics or direct statistical comparison to the proxy cohort. This omission makes it impossible to evaluate whether the observed gains are robust or an artifact of the proxy population.

Authors: We agree that greater methodological transparency is necessary to allow readers to assess the strength of our empirical claims. In the revised manuscript we will expand the Evaluation section to report the precise sample size (N=20 for the controlled study), a post-hoc power analysis, the exact statistical procedures (including paired t-tests or equivalent non-parametric tests with all p-values, confidence intervals, and effect sizes), and a complete description of the experimental protocol, participant instructions, and data collection procedures. For the exploratory case study with the single visually impaired participant, we will add the available quantitative session metrics (navigation completion time and error counts) while clearly stating that its small sample size precludes formal statistical comparison to the proxy cohort; the case study is presented as qualitative validation only. These additions will directly address the concern about robustness. revision: yes
Referee: [Evaluation / Proxy Population Justification] The validity of blindfolded sighted participants as a quantitative proxy for visually impaired users is not adequately justified. Visually impaired users typically possess long-term compensatory spatial cognition and haptic/auditory integration that sighted blindfolded subjects lack; these differences can alter both the distribution of coordination errors and the effectiveness of verbal corrections. The paper acknowledges the proxy choice but provides no evidence (e.g., comparative error distributions or pilot data) that the two populations respond similarly to the coaching interventions.

Authors: We acknowledge that the proxy justification in the current manuscript is brief and would benefit from additional support. In the revision we will add a dedicated subsection under Evaluation that (1) cites prior work in assistive navigation and human-robot interaction that has employed blindfolded sighted proxies for similar coordination tasks, (2) reports any available pilot observations from our own development phase comparing error distributions between the two populations, and (3) explicitly discusses the limitations of the proxy approach along with how the two-week retention study and the real-user case study provide complementary evidence of generalizability. We believe these changes will strengthen the argument without overstating the equivalence of the populations. revision: yes

Circularity Check

0 steps flagged

No significant circularity: claims rest on empirical user studies

full rationale

The paper describes an engineering system (CANINE) that decomposes navigation coordination into sub-skills, applies knowledge tracing for proficiency tracking, and uses foundation models to generate verbal corrections from observed episodes. Central claims of improved learning efficiency and navigation performance are supported solely by results from a controlled study with blindfolded participants (treated explicitly as a proxy), a retention study, and an exploratory case study with one visually impaired user. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citation chains appear in the derivation of these outcomes; the evaluation is independent of any internal reduction and stands on external participant data.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach depends on standard assumptions about human skill decomposition and model-based error inference rather than new free parameters or invented physical entities; no ad-hoc constants are introduced to fit results.

axioms (2)

domain assumption Navigation coordination can be decomposed into independent sub-skills whose proficiency can be tracked separately via knowledge tracing.
Invoked at the high level to decide training priorities.
domain assumption Foundation models can accurately infer causes of coordination errors from observed human-robot interaction episodes.
Central to the low-level adaptive feedback generation.

pith-pipeline@v0.9.0 · 5789 in / 1394 out tokens · 46233 ms · 2026-05-20T05:29:47.716713+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

141 extracted references · 141 canonical work pages · 1 internal anchor

[1]

Do As I Can, Not As I Say: Grounding language in robotic affordances

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Haus- man, et al. Do As I Can, Not As I Say: Grounding language in robotic affordances. InConference on Robot Learning (CoRL), 2022

work page 2022
[2]

In Robotics: Science and Systems (RSS), 2025

Kevin Black, Noah Brown, Danny Driess, Adnan Es- mail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al.π 0: A vision- language-action flow model for general robot control. In Robotics: Science and Systems (RSS), 2025

work page 2025
[3]

RT-2: Vision-language-action models transfer web knowledge to robotic control

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control. InConference on Robot Learning (CoRL), 2023

work page 2023
[4]

quick and dirty

John Brooke. SUS: A “quick and dirty” usability scale. In Usability Evaluation in Industry, pages 189–194. Taylor & Francis, 1996

work page 1996
[5]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott G...

work page 2020
[6]

Navigating real-world challenges: A quadruped robot guiding system for visually impaired people in diverse environments

Shaojun Cai, Ashwin Ram, Zhengtai Gou, Mohd Alqama Wasim Shaikh, Yu-An Chen, Yingjia Wan, Ko- taro Hara, Shengdong Zhao, and David Hsu. Navigating real-world challenges: A quadruped robot guiding system for visually impaired people in diverse environments. InCHI Conference on Human Factors in Computing Systems (CHI), 2024

work page 2024
[7]

Nav- igation beyond wayfinding: Robots collaborating with visually impaired users for environmental interactions

Shaojun Cai, Nuwan Janaka, Ashwin Ram, Janidu She- han, Yingjia Wan, Kotaro Hara, and David Hsu. Nav- igation beyond wayfinding: Robots collaborating with visually impaired users for environmental interactions. InACM/IEEE International Conference on Human-Robot Interaction(HRI), 2026

work page 2026
[8]

A human-inspired ob- ject handover controller.The International Journal of Robotics Research (IJRR), 32(8):971–983, 2013

Wesley P Chan, Chris AC Parker, HF Machiel Van der Loos, and Elizabeth A Croft. A human-inspired ob- ject handover controller.The International Journal of Robotics Research (IJRR), 32(8):971–983, 2013

work page 2013
[9]

Quadruped guidance robot for the visually impaired: A comfort-based approach

Yanbo Chen, Zhengzhe Xu, Zhuozhu Jian, Gengpan Tang, Liyunong Yang, Anxing Xiao, Xueqian Wang, and Bin Liang. Quadruped guidance robot for the visually impaired: A comfort-based approach. InIEEE International Conference on Robotics and Automation (ICRA), 2023

work page 2023
[10]

Upper-limb rehabilita- tion with a dual-mode individualized exoskeleton robot: A generative-model-based solution.The International Journal of Robotics Research (IJRR), 2025

Yu Chen, Shu Miao, Jing Ye, Gong Chen, Jianghua Cheng, Ketao Du, and Xiang Li. Upper-limb rehabilita- tion with a dual-mode individualized exoskeleton robot: A generative-model-based solution.The International Journal of Robotics Research (IJRR), 2025

work page 2025
[11]

Multi-armed bandits for intelligent tutoring systems.Journal of Educational Data Mining, 7(2):20–48, 2015

Benjamin Clement, Didier Roy, Pierre-Yves Oudeyer, and Manuel Lopes. Multi-armed bandits for intelligent tutoring systems.Journal of Educational Data Mining, 7(2):20–48, 2015

work page 2015
[12]

Corbett and John R

Albert T. Corbett and John R. Anderson. Knowledge trac- ing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction (UMUAI), 4(4):253–278, 1994

work page 1994
[13]

No, to the Right: Online language corrections for robotic manipu- lation via shared autonomy

Yuchen Cui, Siddharth Karamcheti, Raj Palleti, Nidhya Shivakumar, Percy Liang, and Dorsa Sadigh. No, to the Right: Online language corrections for robotic manipu- lation via shared autonomy. InACM/IEEE International Conference on Human-Robot Interaction(HRI). Associa- tion for Computing Machinery, 2023

work page 2023
[14]

A policy- blending formalism for shared control.The International Journal of Robotics Research (IJRR), 32(7):790–805, 2013

Anca D Dragan and Siddhartha S Srinivasa. A policy- blending formalism for shared control.The International Journal of Robotics Research (IJRR), 32(7):790–805, 2013

work page 2013
[15]

Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. PaLM-E: An embodied multimodal language model. InInternational Conference on Machine Learning (ICML), 2023

work page 2023
[16]

Will people enjoy a robot trainer? a case study with snoopie the pacerbot

Maximilian Du, Jennifer Grannen, Shuran Song, and Dorsa Sadigh. Will people enjoy a robot trainer? a case study with snoopie the pacerbot. InIEEE International Conference on Robotics and Automation (ICRA), 2026

work page 2026
[17]

Foundation models in robotics: Applications, challenges, and the future.The International Journal of Robotics Research (IJRR), 44(5):701–739, 2025

Roya Firoozi, Johnathan Tucker, Stephen Tian, Anirudha Majumdar, Jiankai Sun, Weiyu Liu, Yuke Zhu, Shuran Song, Ashish Kapoor, Karol Hausman, et al. Foundation models in robotics: Applications, challenges, and the future.The International Journal of Robotics Research (IJRR), 44(5):701–739, 2025

work page 2025
[18]

Computational teaching for driving via multi-task imitation learning.IEEE International Conference on Robotics and Automation (ICRA), 2025

Deepak Gopinath, Xiongyi Cui, Jonathan DeCastro, Emily Sumner, Jean Costa, Hiroshi Yasuda, Allison Morgan, Laporsha Dees, Sheryl Chau, John Leonard, et al. Computational teaching for driving via multi-task imitation learning.IEEE International Conference on Robotics and Automation (ICRA), 2025

work page 2025
[19]

Cabot: De- signing and evaluating an autonomous navigation robot for blind people

Jo ˜ao Guerreiro, Daisuke Sato, Saki Asakawa, Huixu Dong, Kris M Kitani, and Chieko Asakawa. Cabot: De- signing and evaluating an autonomous navigation robot for blind people. InACM ASSETS 2019 (ACM SIGAC- CESS Conference on Computers and Accessibility), 2019

work page 2019
[20]

Hambleton and Hariharan Swaminathan.Item Response Theory: Principles and Applications

Ronald K. Hambleton and Hariharan Swaminathan.Item Response Theory: Principles and Applications. Evalua- tion in Education and Human Services. 1985

work page 1985
[21]

S. G. Hart and Lowell E. Staveland. Development of nasa-tlx (task load index): Results of empirical and theoretical research.Advances in Psychology, 52:139– 183, 1988

work page 1988
[22]

Teachingbot: Robot teacher for human handwriting

Zhimin Hou, Cunjun Yu, David Hsu, and Haoyong Yu. Teachingbot: Robot teacher for human handwriting. IEEE Robotics and Automation Letters (RA-L), 11(3): 2610–2617, 2026

work page 2026
[23]

Guidenav: User-informed devel- opment of a vision-only robotic navigation assistant for blind travelers

Hochul Hwang, Soowan Yang, Jahir S Monon, Nicholas A Giudice, Sunghoon I Lee, Joydeep Biswas, and Donghyun Kim. Guidenav: User-informed devel- opment of a vision-only robotic navigation assistant for blind travelers. InACM/IEEE International Conference on Human-Robot Interaction(HRI), 2026

work page 2026
[24]

Physical human- robot interaction: Mutual learning and adaptation.IEEE Robotics and Automation Magazine (RAM), 19(4):24–35, 2012

Shuhei Ikemoto, Heni Ben Amor, Takashi Minato, Bern- hard Jung, and Hiroshi Ishiguro. Physical human- robot interaction: Mutual learning and adaptation.IEEE Robotics and Automation Magazine (RAM), 19(4):24–35, 2012

work page 2012
[25]

Between reality and delusion: Challenges of applying large language models to companion robots for open-domain dialogues with older adults.Autonomous Robots, 49(1):9, 2025

Bahar Irfan, Sanna-Mari Kuoppam ¨aki, and Gabriel Skantze. Between reality and delusion: Challenges of applying large language models to companion robots for open-domain dialogues with older adults.Autonomous Robots, 49(1):9, 2025

work page 2025
[26]

FEAST: A flexible mealtime-assistance system towards in-the-wild personalization

Rajat Kumar Jenamani, Tom Silver, Ben Dodson, Shiqin Tong, Anthony Song, Yuting Yang, Ziang Liu, Benjamin Howe, Aimee Whitneck, and Tapomayukh Bhattacharjee. FEAST: A flexible mealtime-assistance system towards in-the-wild personalization. InRobotics: Science and Systems (RSS), 2025

work page 2025
[27]

Matthew J ¨orke, Shardul Sapkota, Lyndsea Warkenthien, Niklas Vainio, Paul Schmiedmayer, Emma Brunskill, and James A. Landay. Gptcoach: Towards llm-based physical activity coaching. InCHI Conference on Human Factors in Computing Systems (CHI), CHI ’25, 2025

work page 2025
[28]

Beyond omakase: Designing shared control for navigation robots with blind people

Rie Kamikubo, Seita Kayukawa, Yuka Kaniwa, Allan Wang, Hernisa Kacorri, Hironobu Takagi, and Chieko Asakawa. Beyond omakase: Designing shared control for navigation robots with blind people. InCHI Conference on Human Factors in Computing Systems (CHI), 2025

work page 2025
[29]

Un- derstanding large-language model (llm)-powered human- robot interaction

Callie Y Kim, Christine P Lee, and Bilge Mutlu. Un- derstanding large-language model (llm)-powered human- robot interaction. InACM/IEEE International Confer- ence on Human-Robot Interaction(HRI), 2024

work page 2024
[30]

Learning dynamic robot-to-human object handover from human feedback

Andras Kupcsik, David Hsu, and Wee Sun Lee. Learning dynamic robot-to-human object handover from human feedback. In Antonio Bicchi and Wolfram Burgard, editors,Robotics Research, volume 2 ofSpringer Pro- ceedings in Advanced Robotics, pages 161–176. Springer, Cham, 2018

work page 2018
[31]

Pathfinder: Designing a map-less navigation system for blind people in unfamiliar buildings

Masaki Kuribayashi, Tatsuya Ishihara, Daisuke Sato, Jayakorn V ongkulbhisal, Karnik Ram, Seita Kayukawa, Hironobu Takagi, Shigeo Morishima, and Chieko Asakawa. Pathfinder: Designing a map-less navigation system for blind people in unfamiliar buildings. InCHI Conference on Human Factors in Computing Systems (CHI), 2023

work page 2023
[32]

Lan, Andrew E

Andrew S. Lan, Andrew E. Waters, Christoph Studer, and Richard G. Baraniuk. Sparse factor analysis for learning and content analytics.Journal of Machine Learning Research (JMLR), 15(1):1959–2008, 2014

work page 1959
[33]

Towards robo-coach: Robot interactive stiffness/- position adaptation for human strength and conditioning training

Chenzui Li, Xi Wu, Tao Teng, Sylvain Calinon, and Fei Chen. Towards robo-coach: Robot interactive stiffness/- position adaptation for human strength and conditioning training. InIEEE International Conference on Robotics and Automation (ICRA), 2024

work page 2024
[34]

Continuous role adaptation for human–robot shared control.IEEE Transactions on Robotics (T-RO), 31(3):672–681, 2015

Yanan Li, Keng Peng Tee, Wei Liang Chan, Rui Yan, Yuanwei Chua, and Dilip Kumar Limbu. Continuous role adaptation for human–robot shared control.IEEE Transactions on Robotics (T-RO), 31(3):672–681, 2015

work page 2015
[35]

Code as policies: Language model programs for em- bodied control

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for em- bodied control. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023

work page 2023
[36]

RadarMath: An intelligent tutoring system for math education

Yu Lu, Yang Pian, Penghe Chen, Qinggang Meng, and Yunbo Cao. RadarMath: An intelligent tutoring system for math education. InAAAI Conference on Artificial Intelligence (AAAI), 2021

work page 2021
[37]

Mariah Lynn Schrum, Srijan Srivatsa, Laporsha Dees, Evelyn Dixon, Patricio Reyes Gomez, Deepak Gopinath, Emily Sarah Sumner, Guy Rosman, and Tiffany L. Chen. Skill Modulates Coaching Language in Embodied Motor Learning. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA), 2026

work page 2026
[38]

Evaluation of the potential suitability of guide dog candi- dates by continuous observation during training.Journal of Veterinary Behavior, 3(5):193–198, 2008

Mina Mizukoshi, Mana Kondo, and Toru Nakamura. Evaluation of the potential suitability of guide dog candi- dates by continuous observation during training.Journal of Veterinary Behavior, 3(5):193–198, 2008

work page 2008
[39]

Task-agnostic exoskeleton control via biological joint moment estimation.Nature, 635(8038): 337–344, 2024

Dean D Molinaro, Keaton L Scherpereel, Ethan B Schon- haut, Georgios Evangelopoulos, Max K Shepherd, and Aaron J Young. Task-agnostic exoskeleton control via biological joint moment estimation.Nature, 635(8038): 337–344, 2024

work page 2024
[40]

Formalizing human-robot mutual adaptation via a bounded memory based model

Stefanos Nikolaidis, Anton Kuznetsov, David Hsu, and Siddhartha Srinivasa. Formalizing human-robot mutual adaptation via a bounded memory based model. In ACM/IEEE International Conference on Human-Robot Interaction(HRI), 2016

work page 2016
[41]

Octo: An open-source generalist robot policy

Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, et al. Octo: An open-source generalist robot policy. InRobotics: Science and Systems (RSS), 2024

work page 2024
[42]

GPT-4 Technical Report

OpenAI. GPT-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[43]

W AFFLE: A wearable approach to bite timing estimation in robot-assisted feeding

Akhil Padmanabha, Jessie Yuan, Tanisha Mehta, Ra- jat Kumar Jenamani, Eric Hu, Victoria de Le ´on, An- thony Wertz, Janavi Gupta, Ben Dodson, Yunting Yan, Carmel Majidi, Tapomayukh Bhattacharjee, and Zackory Erickson. W AFFLE: A wearable approach to bite timing estimation in robot-assisted feeding. InACM/IEEE Inter- national Conference on Human-Robot Inte...

work page 2026
[44]

Mutter, editors.Intelligent Tutoring Systems: Lessons Learned

Joseph Psotka, Leonard Daniel Massey, and Sharon A. Mutter, editors.Intelligent Tutoring Systems: Lessons Learned. 1988

work page 1988
[45]

Hamlin, Lydia E

Peizhu Qian, Filip Bajraktari, Carlos Quintero-Pe ˜na, Qingxi Meng, Shannan K. Hamlin, Lydia E. Kavraki, and Vaibhav Unhelkar. ASTRID: A robotic tutor for nurse training to reduce healthcare-associated infections. InRobotics: Science and Systems (RSS), 2025

work page 2025
[46]

Rafferty, Emma Brunskill, Thomas L

Anna N. Rafferty, Emma Brunskill, Thomas L. Griffiths, and Patrick Shafto. Faster teaching by pomdp planning. InArtificial Intelligence in Education, 2011

work page 2011
[47]

Dragan, and Sergey Levine

Siddharth Reddy, Anca D. Dragan, and Sergey Levine. Shared autonomy via deep reinforcement learning. In Robotics: Science and Systems (RSS), 2018

work page 2018
[48]

Learning physical col- laborative robot behaviors from human demonstrations

Leonel Rozo, Sylvain Calinon, Darwin G Caldwell, Pablo Jimenez, and Carme Torras. Learning physical col- laborative robot behaviors from human demonstrations. IEEE Transactions on Robotics (T-RO), 32(3):513–527, 2016

work page 2016
[49]

Adaptive robot language tutoring based on bayesian knowledge tracing and predictive decision-making

Thorsten Schodde, Kirsten Bergmann, and Stefan Kopp. Adaptive robot language tutoring based on bayesian knowledge tracing and predictive decision-making. In ACM/IEEE International Conference on Human-Robot Interaction(HRI), 2017

work page 2017
[50]

Autonomy in phys- ical human-robot interaction: A brief survey.IEEE Robotics and Automation Letters (RA-L), 6(4):7989– 7996, 2021

Mario Selvaggio, Marco Cognetti, Stefanos Nikolaidis, Serena Ivaldi, and Bruno Siciliano. Autonomy in phys- ical human-robot interaction: A brief survey.IEEE Robotics and Automation Letters (RA-L), 6(4):7989– 7996, 2021

work page 2021
[51]

ViNT: A foundation model for visual navigation

Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Sta- chowicz, Kevin Black, Noriaki Hirose, and Sergey Levine. ViNT: A foundation model for visual navigation. InConference on Robot Learning (CoRL), 2023

work page 2023
[52]

Hi Robot: Open-ended instruction follow- ing with hierarchical vision-language-action models

Lucy Xiaoyang Shi, brian ichter, Michael Robert Equi, Liyiming Ke, Karl Pertsch, Quan Vuong, James Tanner, Anna Walling, Haohuan Wang, Niccolo Fusai, Adrian Li- Bell, Danny Driess, Lachy Groom, Sergey Levine, and Chelsea Finn. Hi Robot: Open-ended instruction follow- ing with hierarchical vision-language-action models. In International Conference on Machi...

work page 2025
[53]

Ut Na Sio and Thomas C. Ormerod. Does incubation enhance problem solving? a meta-analytic review.Psy- chological bulletin, 135 1:94–120, 2009

work page 2009
[54]

Assistive teaching of motor control tasks to humans

Megha Srivastava, Erdem Biyik, Suvir Mirchandani, Noah Goodman, and Dorsa Sadigh. Assistive teaching of motor control tasks to humans. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[55]

Generating language corrections for teaching physical control tasks

Megha Srivastava, Noah Goodman, and Dorsa Sadigh. Generating language corrections for teaching physical control tasks. InInternational Conference on Machine Learning (ICML), 2023

work page 2023
[56]

Shared autonomy for proximal teaching

Megha Srivastava, Reihaneh Iranmanesh, Yuchen Cui, Deepak Gopinath, Emily Sarah Sumner, Andrew Silva, Laporsha Dees, Guy Rosman, and Dorsa Sadigh. Shared autonomy for proximal teaching. InACM/IEEE Inter- national Conference on Human-Robot Interaction(HRI), 2025

work page 2025
[57]

A meta- analysis of the effectiveness of intelligent tutoring sys- tems on k–12 students’ mathematical learning.Journal of Educational Psychology, 105(4):970–987, 2013

Saiying Steenbergen-Hu and Harris Cooper. A meta- analysis of the effectiveness of intelligent tutoring sys- tems on k–12 students’ mathematical learning.Journal of Educational Psychology, 105(4):970–987, 2013

work page 2013
[58]

Toward seamless human-robot handovers

Kyle Strabala, Min Kyung Lee, Anca Dragan, Jodi For- lizzi, Siddhartha S Srinivasa, Maya Cakmak, and Vin- cenzo Micelli. Toward seamless human-robot handovers. Journal of Human-Robot Interaction (JHRI), 2(1):112– 132, 2013

work page 2013
[59]

Merryanna L. Swartz and Masoud Yazdani, editors.Intel- ligent Tutoring Systems for Foreign Language Learning: The Bridge to International Communication, volume 80 ofNATO ASI Series F: Computer and Systems Sciences. Springer, 1992

work page 1992
[60]

Dragan, and Andrea V

Ran Tian, Masayoshi Tomizuka, Anca D. Dragan, and Andrea V . Bajcsy. Towards modeling and influencing the dynamics of human learning.ACM/IEEE International Conference on Human-Robot Interaction(HRI), 2023

work page 2023
[61]

Lami: Large lan- guage models for multi-modal human-robot interaction

Chao Wang, Stephan Hasler, Daniel Tanneberg, Fe- lix Ocker, Frank Joublin, Antonello Ceravola, Joerg Deigmoeller, and Michael Gienger. Lami: Large lan- guage models for multi-modal human-robot interaction. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA), 2024

work page 2024
[62]

Wilson, Yan Karklin, Bojian Han, and Chai- tanya Ekanadham

Kevin H. Wilson, Yan Karklin, Bojian Han, and Chai- tanya Ekanadham. Back to the basics: Bayesian exten- sions of irt outperform neural networks for proficiency estimation. InEducational Data Mining, 2016

work page 2016
[63]

Blindness and vision impair- ment

World Health Organization. Blindness and vision impair- ment. https://www.who.int/news-room/fact-sheets/detail/ blindness-and-visual-impairment, 2026. Accessed 2026- 05-11

work page 2026
[64]

SA VOR: Skill affordance learning from visuo- haptic perception for robot-assisted bite acquisition

Zhanxin Wu, Bo Ai, Tom Silver, and Tapomayukh Bhat- tacharjee. SA VOR: Skill affordance learning from visuo- haptic perception for robot-assisted bite acquisition. In Conference on Robot Learning (CoRL), 2025

work page 2025
[65]

Robotic guide dog: Leading a human with leash-guided hybrid physical in- teraction

Anxing Xiao, Wenzhe Tong, Lizhi Yang, Jun Zeng, Zhongyu Li, and Koushil Sreenath. Robotic guide dog: Leading a human with leash-guided hybrid physical in- teraction. InIEEE International Conference on Robotics and Automation (ICRA), 2021

work page 2021
[66]

Coach: Cooperative robot teaching

Cunjun Yu, Yiqing Xu, Linfeng Li, and David Hsu. Coach: Cooperative robot teaching. InConference on Robot Learning (CoRL), 2022

work page 2022
[67]

user standing 1.5m from door, facing slightly left

G.A. Zijlstra, Judith Ballemans, and Gertrudis Kempen. Orientation and mobility training for adults with low vision: a new standardized approach.Clinical Rehabili- tation, 27:3–18, 2013. TABLE II: Parameter distributions for the Simulated Learner. Parameters are randomized per skill or per learner episode to create diverse student profiles. Parameter Dist...

work page 2013
[68]

The order of A/B is randomized to prevent position bias

work page
[69]

Judges are instructed to evaluate based on the accuracy of the description

work page
[70]

For each model, we recruited 4 annotators to rate the pairwise comparison

work page
[71]

Each human judge evaluated 50 frames

work page
[72]

equally accurate

Judges were blind to which description came from human vs. VLM Inter-Rater AgreementTable V shows inter-rater agreement metrics, including Fleiss’s kappa and unanimous agreement rates when judges had clear preferences (excluding “equally accurate” responses). TABLE V:Inter-rater agreement for frame analysis evaluation.Four independent judges rated VLM-gen...

work page
[73]

Frame analysis (f frame): Extract structured observations from individual frames

work page
[74]

Timeline summarization (f time): Aggregate frame observations into episode-level summary

work page
[75]

Coaching generation (f coach): Generate personalized feedback based on timeline and proficiency model

work page
[76]

move hand forward 20cm

Robot adaptation (f param): Adjust robot parameters based on diagnosed errors Table VI summarizes the input and output contract for each stage, making the intermediate representations used by the decomposed pipeline explicit. Video SelectionWe selected 10 navigation videos where users made clear errors requiring coaching intervention: •Error types: Wrong ...

work page
[77]

Please rank your top 3 most valuable features from the list above

work page
[78]

What aspects of the navigation task did you find most challenging?

work page
[79]

What aspects of the teaching method were most helpful for your learning?

work page
[80]

Did you develop any specific strategies for successful navigation? If so, please describe them

work page

Showing first 80 references.

[1] [1]

Do As I Can, Not As I Say: Grounding language in robotic affordances

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Haus- man, et al. Do As I Can, Not As I Say: Grounding language in robotic affordances. InConference on Robot Learning (CoRL), 2022

work page 2022

[2] [2]

In Robotics: Science and Systems (RSS), 2025

Kevin Black, Noah Brown, Danny Driess, Adnan Es- mail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al.π 0: A vision- language-action flow model for general robot control. In Robotics: Science and Systems (RSS), 2025

work page 2025

[3] [3]

RT-2: Vision-language-action models transfer web knowledge to robotic control

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control. InConference on Robot Learning (CoRL), 2023

work page 2023

[4] [4]

quick and dirty

John Brooke. SUS: A “quick and dirty” usability scale. In Usability Evaluation in Industry, pages 189–194. Taylor & Francis, 1996

work page 1996

[5] [5]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott G...

work page 2020

[6] [6]

Navigating real-world challenges: A quadruped robot guiding system for visually impaired people in diverse environments

Shaojun Cai, Ashwin Ram, Zhengtai Gou, Mohd Alqama Wasim Shaikh, Yu-An Chen, Yingjia Wan, Ko- taro Hara, Shengdong Zhao, and David Hsu. Navigating real-world challenges: A quadruped robot guiding system for visually impaired people in diverse environments. InCHI Conference on Human Factors in Computing Systems (CHI), 2024

work page 2024

[7] [7]

Nav- igation beyond wayfinding: Robots collaborating with visually impaired users for environmental interactions

Shaojun Cai, Nuwan Janaka, Ashwin Ram, Janidu She- han, Yingjia Wan, Kotaro Hara, and David Hsu. Nav- igation beyond wayfinding: Robots collaborating with visually impaired users for environmental interactions. InACM/IEEE International Conference on Human-Robot Interaction(HRI), 2026

work page 2026

[8] [8]

A human-inspired ob- ject handover controller.The International Journal of Robotics Research (IJRR), 32(8):971–983, 2013

Wesley P Chan, Chris AC Parker, HF Machiel Van der Loos, and Elizabeth A Croft. A human-inspired ob- ject handover controller.The International Journal of Robotics Research (IJRR), 32(8):971–983, 2013

work page 2013

[9] [9]

Quadruped guidance robot for the visually impaired: A comfort-based approach

Yanbo Chen, Zhengzhe Xu, Zhuozhu Jian, Gengpan Tang, Liyunong Yang, Anxing Xiao, Xueqian Wang, and Bin Liang. Quadruped guidance robot for the visually impaired: A comfort-based approach. InIEEE International Conference on Robotics and Automation (ICRA), 2023

work page 2023

[10] [10]

Upper-limb rehabilita- tion with a dual-mode individualized exoskeleton robot: A generative-model-based solution.The International Journal of Robotics Research (IJRR), 2025

Yu Chen, Shu Miao, Jing Ye, Gong Chen, Jianghua Cheng, Ketao Du, and Xiang Li. Upper-limb rehabilita- tion with a dual-mode individualized exoskeleton robot: A generative-model-based solution.The International Journal of Robotics Research (IJRR), 2025

work page 2025

[11] [11]

Multi-armed bandits for intelligent tutoring systems.Journal of Educational Data Mining, 7(2):20–48, 2015

Benjamin Clement, Didier Roy, Pierre-Yves Oudeyer, and Manuel Lopes. Multi-armed bandits for intelligent tutoring systems.Journal of Educational Data Mining, 7(2):20–48, 2015

work page 2015

[12] [12]

Corbett and John R

Albert T. Corbett and John R. Anderson. Knowledge trac- ing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction (UMUAI), 4(4):253–278, 1994

work page 1994

[13] [13]

No, to the Right: Online language corrections for robotic manipu- lation via shared autonomy

Yuchen Cui, Siddharth Karamcheti, Raj Palleti, Nidhya Shivakumar, Percy Liang, and Dorsa Sadigh. No, to the Right: Online language corrections for robotic manipu- lation via shared autonomy. InACM/IEEE International Conference on Human-Robot Interaction(HRI). Associa- tion for Computing Machinery, 2023

work page 2023

[14] [14]

A policy- blending formalism for shared control.The International Journal of Robotics Research (IJRR), 32(7):790–805, 2013

Anca D Dragan and Siddhartha S Srinivasa. A policy- blending formalism for shared control.The International Journal of Robotics Research (IJRR), 32(7):790–805, 2013

work page 2013

[15] [15]

Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. PaLM-E: An embodied multimodal language model. InInternational Conference on Machine Learning (ICML), 2023

work page 2023

[16] [16]

Will people enjoy a robot trainer? a case study with snoopie the pacerbot

Maximilian Du, Jennifer Grannen, Shuran Song, and Dorsa Sadigh. Will people enjoy a robot trainer? a case study with snoopie the pacerbot. InIEEE International Conference on Robotics and Automation (ICRA), 2026

work page 2026

[17] [17]

Foundation models in robotics: Applications, challenges, and the future.The International Journal of Robotics Research (IJRR), 44(5):701–739, 2025

Roya Firoozi, Johnathan Tucker, Stephen Tian, Anirudha Majumdar, Jiankai Sun, Weiyu Liu, Yuke Zhu, Shuran Song, Ashish Kapoor, Karol Hausman, et al. Foundation models in robotics: Applications, challenges, and the future.The International Journal of Robotics Research (IJRR), 44(5):701–739, 2025

work page 2025

[18] [18]

Computational teaching for driving via multi-task imitation learning.IEEE International Conference on Robotics and Automation (ICRA), 2025

Deepak Gopinath, Xiongyi Cui, Jonathan DeCastro, Emily Sumner, Jean Costa, Hiroshi Yasuda, Allison Morgan, Laporsha Dees, Sheryl Chau, John Leonard, et al. Computational teaching for driving via multi-task imitation learning.IEEE International Conference on Robotics and Automation (ICRA), 2025

work page 2025

[19] [19]

Cabot: De- signing and evaluating an autonomous navigation robot for blind people

Jo ˜ao Guerreiro, Daisuke Sato, Saki Asakawa, Huixu Dong, Kris M Kitani, and Chieko Asakawa. Cabot: De- signing and evaluating an autonomous navigation robot for blind people. InACM ASSETS 2019 (ACM SIGAC- CESS Conference on Computers and Accessibility), 2019

work page 2019

[20] [20]

Hambleton and Hariharan Swaminathan.Item Response Theory: Principles and Applications

Ronald K. Hambleton and Hariharan Swaminathan.Item Response Theory: Principles and Applications. Evalua- tion in Education and Human Services. 1985

work page 1985

[21] [21]

S. G. Hart and Lowell E. Staveland. Development of nasa-tlx (task load index): Results of empirical and theoretical research.Advances in Psychology, 52:139– 183, 1988

work page 1988

[22] [22]

Teachingbot: Robot teacher for human handwriting

Zhimin Hou, Cunjun Yu, David Hsu, and Haoyong Yu. Teachingbot: Robot teacher for human handwriting. IEEE Robotics and Automation Letters (RA-L), 11(3): 2610–2617, 2026

work page 2026

[23] [23]

Guidenav: User-informed devel- opment of a vision-only robotic navigation assistant for blind travelers

Hochul Hwang, Soowan Yang, Jahir S Monon, Nicholas A Giudice, Sunghoon I Lee, Joydeep Biswas, and Donghyun Kim. Guidenav: User-informed devel- opment of a vision-only robotic navigation assistant for blind travelers. InACM/IEEE International Conference on Human-Robot Interaction(HRI), 2026

work page 2026

[24] [24]

Physical human- robot interaction: Mutual learning and adaptation.IEEE Robotics and Automation Magazine (RAM), 19(4):24–35, 2012

Shuhei Ikemoto, Heni Ben Amor, Takashi Minato, Bern- hard Jung, and Hiroshi Ishiguro. Physical human- robot interaction: Mutual learning and adaptation.IEEE Robotics and Automation Magazine (RAM), 19(4):24–35, 2012

work page 2012

[25] [25]

Between reality and delusion: Challenges of applying large language models to companion robots for open-domain dialogues with older adults.Autonomous Robots, 49(1):9, 2025

Bahar Irfan, Sanna-Mari Kuoppam ¨aki, and Gabriel Skantze. Between reality and delusion: Challenges of applying large language models to companion robots for open-domain dialogues with older adults.Autonomous Robots, 49(1):9, 2025

work page 2025

[26] [26]

FEAST: A flexible mealtime-assistance system towards in-the-wild personalization

Rajat Kumar Jenamani, Tom Silver, Ben Dodson, Shiqin Tong, Anthony Song, Yuting Yang, Ziang Liu, Benjamin Howe, Aimee Whitneck, and Tapomayukh Bhattacharjee. FEAST: A flexible mealtime-assistance system towards in-the-wild personalization. InRobotics: Science and Systems (RSS), 2025

work page 2025

[27] [27]

Matthew J ¨orke, Shardul Sapkota, Lyndsea Warkenthien, Niklas Vainio, Paul Schmiedmayer, Emma Brunskill, and James A. Landay. Gptcoach: Towards llm-based physical activity coaching. InCHI Conference on Human Factors in Computing Systems (CHI), CHI ’25, 2025

work page 2025

[28] [28]

Beyond omakase: Designing shared control for navigation robots with blind people

Rie Kamikubo, Seita Kayukawa, Yuka Kaniwa, Allan Wang, Hernisa Kacorri, Hironobu Takagi, and Chieko Asakawa. Beyond omakase: Designing shared control for navigation robots with blind people. InCHI Conference on Human Factors in Computing Systems (CHI), 2025

work page 2025

[29] [29]

Un- derstanding large-language model (llm)-powered human- robot interaction

Callie Y Kim, Christine P Lee, and Bilge Mutlu. Un- derstanding large-language model (llm)-powered human- robot interaction. InACM/IEEE International Confer- ence on Human-Robot Interaction(HRI), 2024

work page 2024

[30] [30]

Learning dynamic robot-to-human object handover from human feedback

Andras Kupcsik, David Hsu, and Wee Sun Lee. Learning dynamic robot-to-human object handover from human feedback. In Antonio Bicchi and Wolfram Burgard, editors,Robotics Research, volume 2 ofSpringer Pro- ceedings in Advanced Robotics, pages 161–176. Springer, Cham, 2018

work page 2018

[31] [31]

Pathfinder: Designing a map-less navigation system for blind people in unfamiliar buildings

Masaki Kuribayashi, Tatsuya Ishihara, Daisuke Sato, Jayakorn V ongkulbhisal, Karnik Ram, Seita Kayukawa, Hironobu Takagi, Shigeo Morishima, and Chieko Asakawa. Pathfinder: Designing a map-less navigation system for blind people in unfamiliar buildings. InCHI Conference on Human Factors in Computing Systems (CHI), 2023

work page 2023

[32] [32]

Lan, Andrew E

Andrew S. Lan, Andrew E. Waters, Christoph Studer, and Richard G. Baraniuk. Sparse factor analysis for learning and content analytics.Journal of Machine Learning Research (JMLR), 15(1):1959–2008, 2014

work page 1959

[33] [33]

Towards robo-coach: Robot interactive stiffness/- position adaptation for human strength and conditioning training

Chenzui Li, Xi Wu, Tao Teng, Sylvain Calinon, and Fei Chen. Towards robo-coach: Robot interactive stiffness/- position adaptation for human strength and conditioning training. InIEEE International Conference on Robotics and Automation (ICRA), 2024

work page 2024

[34] [34]

Continuous role adaptation for human–robot shared control.IEEE Transactions on Robotics (T-RO), 31(3):672–681, 2015

Yanan Li, Keng Peng Tee, Wei Liang Chan, Rui Yan, Yuanwei Chua, and Dilip Kumar Limbu. Continuous role adaptation for human–robot shared control.IEEE Transactions on Robotics (T-RO), 31(3):672–681, 2015

work page 2015

[35] [35]

Code as policies: Language model programs for em- bodied control

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for em- bodied control. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023

work page 2023

[36] [36]

RadarMath: An intelligent tutoring system for math education

Yu Lu, Yang Pian, Penghe Chen, Qinggang Meng, and Yunbo Cao. RadarMath: An intelligent tutoring system for math education. InAAAI Conference on Artificial Intelligence (AAAI), 2021

work page 2021

[37] [37]

Mariah Lynn Schrum, Srijan Srivatsa, Laporsha Dees, Evelyn Dixon, Patricio Reyes Gomez, Deepak Gopinath, Emily Sarah Sumner, Guy Rosman, and Tiffany L. Chen. Skill Modulates Coaching Language in Embodied Motor Learning. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA), 2026

work page 2026

[38] [38]

Evaluation of the potential suitability of guide dog candi- dates by continuous observation during training.Journal of Veterinary Behavior, 3(5):193–198, 2008

Mina Mizukoshi, Mana Kondo, and Toru Nakamura. Evaluation of the potential suitability of guide dog candi- dates by continuous observation during training.Journal of Veterinary Behavior, 3(5):193–198, 2008

work page 2008

[39] [39]

Task-agnostic exoskeleton control via biological joint moment estimation.Nature, 635(8038): 337–344, 2024

Dean D Molinaro, Keaton L Scherpereel, Ethan B Schon- haut, Georgios Evangelopoulos, Max K Shepherd, and Aaron J Young. Task-agnostic exoskeleton control via biological joint moment estimation.Nature, 635(8038): 337–344, 2024

work page 2024

[40] [40]

Formalizing human-robot mutual adaptation via a bounded memory based model

Stefanos Nikolaidis, Anton Kuznetsov, David Hsu, and Siddhartha Srinivasa. Formalizing human-robot mutual adaptation via a bounded memory based model. In ACM/IEEE International Conference on Human-Robot Interaction(HRI), 2016

work page 2016

[41] [41]

Octo: An open-source generalist robot policy

Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, et al. Octo: An open-source generalist robot policy. InRobotics: Science and Systems (RSS), 2024

work page 2024

[42] [42]

GPT-4 Technical Report

OpenAI. GPT-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[43] [43]

W AFFLE: A wearable approach to bite timing estimation in robot-assisted feeding

Akhil Padmanabha, Jessie Yuan, Tanisha Mehta, Ra- jat Kumar Jenamani, Eric Hu, Victoria de Le ´on, An- thony Wertz, Janavi Gupta, Ben Dodson, Yunting Yan, Carmel Majidi, Tapomayukh Bhattacharjee, and Zackory Erickson. W AFFLE: A wearable approach to bite timing estimation in robot-assisted feeding. InACM/IEEE Inter- national Conference on Human-Robot Inte...

work page 2026

[44] [44]

Mutter, editors.Intelligent Tutoring Systems: Lessons Learned

Joseph Psotka, Leonard Daniel Massey, and Sharon A. Mutter, editors.Intelligent Tutoring Systems: Lessons Learned. 1988

work page 1988

[45] [45]

Hamlin, Lydia E

Peizhu Qian, Filip Bajraktari, Carlos Quintero-Pe ˜na, Qingxi Meng, Shannan K. Hamlin, Lydia E. Kavraki, and Vaibhav Unhelkar. ASTRID: A robotic tutor for nurse training to reduce healthcare-associated infections. InRobotics: Science and Systems (RSS), 2025

work page 2025

[46] [46]

Rafferty, Emma Brunskill, Thomas L

Anna N. Rafferty, Emma Brunskill, Thomas L. Griffiths, and Patrick Shafto. Faster teaching by pomdp planning. InArtificial Intelligence in Education, 2011

work page 2011

[47] [47]

Dragan, and Sergey Levine

Siddharth Reddy, Anca D. Dragan, and Sergey Levine. Shared autonomy via deep reinforcement learning. In Robotics: Science and Systems (RSS), 2018

work page 2018

[48] [48]

Learning physical col- laborative robot behaviors from human demonstrations

Leonel Rozo, Sylvain Calinon, Darwin G Caldwell, Pablo Jimenez, and Carme Torras. Learning physical col- laborative robot behaviors from human demonstrations. IEEE Transactions on Robotics (T-RO), 32(3):513–527, 2016

work page 2016

[49] [49]

Adaptive robot language tutoring based on bayesian knowledge tracing and predictive decision-making

Thorsten Schodde, Kirsten Bergmann, and Stefan Kopp. Adaptive robot language tutoring based on bayesian knowledge tracing and predictive decision-making. In ACM/IEEE International Conference on Human-Robot Interaction(HRI), 2017

work page 2017

[50] [50]

Autonomy in phys- ical human-robot interaction: A brief survey.IEEE Robotics and Automation Letters (RA-L), 6(4):7989– 7996, 2021

Mario Selvaggio, Marco Cognetti, Stefanos Nikolaidis, Serena Ivaldi, and Bruno Siciliano. Autonomy in phys- ical human-robot interaction: A brief survey.IEEE Robotics and Automation Letters (RA-L), 6(4):7989– 7996, 2021

work page 2021

[51] [51]

ViNT: A foundation model for visual navigation

Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Sta- chowicz, Kevin Black, Noriaki Hirose, and Sergey Levine. ViNT: A foundation model for visual navigation. InConference on Robot Learning (CoRL), 2023

work page 2023

[52] [52]

Hi Robot: Open-ended instruction follow- ing with hierarchical vision-language-action models

Lucy Xiaoyang Shi, brian ichter, Michael Robert Equi, Liyiming Ke, Karl Pertsch, Quan Vuong, James Tanner, Anna Walling, Haohuan Wang, Niccolo Fusai, Adrian Li- Bell, Danny Driess, Lachy Groom, Sergey Levine, and Chelsea Finn. Hi Robot: Open-ended instruction follow- ing with hierarchical vision-language-action models. In International Conference on Machi...

work page 2025

[53] [53]

Ut Na Sio and Thomas C. Ormerod. Does incubation enhance problem solving? a meta-analytic review.Psy- chological bulletin, 135 1:94–120, 2009

work page 2009

[54] [54]

Assistive teaching of motor control tasks to humans

Megha Srivastava, Erdem Biyik, Suvir Mirchandani, Noah Goodman, and Dorsa Sadigh. Assistive teaching of motor control tasks to humans. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022

[55] [55]

Generating language corrections for teaching physical control tasks

Megha Srivastava, Noah Goodman, and Dorsa Sadigh. Generating language corrections for teaching physical control tasks. InInternational Conference on Machine Learning (ICML), 2023

work page 2023

[56] [56]

Shared autonomy for proximal teaching

Megha Srivastava, Reihaneh Iranmanesh, Yuchen Cui, Deepak Gopinath, Emily Sarah Sumner, Andrew Silva, Laporsha Dees, Guy Rosman, and Dorsa Sadigh. Shared autonomy for proximal teaching. InACM/IEEE Inter- national Conference on Human-Robot Interaction(HRI), 2025

work page 2025

[57] [57]

A meta- analysis of the effectiveness of intelligent tutoring sys- tems on k–12 students’ mathematical learning.Journal of Educational Psychology, 105(4):970–987, 2013

Saiying Steenbergen-Hu and Harris Cooper. A meta- analysis of the effectiveness of intelligent tutoring sys- tems on k–12 students’ mathematical learning.Journal of Educational Psychology, 105(4):970–987, 2013

work page 2013

[58] [58]

Toward seamless human-robot handovers

Kyle Strabala, Min Kyung Lee, Anca Dragan, Jodi For- lizzi, Siddhartha S Srinivasa, Maya Cakmak, and Vin- cenzo Micelli. Toward seamless human-robot handovers. Journal of Human-Robot Interaction (JHRI), 2(1):112– 132, 2013

work page 2013

[59] [59]

Merryanna L. Swartz and Masoud Yazdani, editors.Intel- ligent Tutoring Systems for Foreign Language Learning: The Bridge to International Communication, volume 80 ofNATO ASI Series F: Computer and Systems Sciences. Springer, 1992

work page 1992

[60] [60]

Dragan, and Andrea V

Ran Tian, Masayoshi Tomizuka, Anca D. Dragan, and Andrea V . Bajcsy. Towards modeling and influencing the dynamics of human learning.ACM/IEEE International Conference on Human-Robot Interaction(HRI), 2023

work page 2023

[61] [61]

Lami: Large lan- guage models for multi-modal human-robot interaction

Chao Wang, Stephan Hasler, Daniel Tanneberg, Fe- lix Ocker, Frank Joublin, Antonello Ceravola, Joerg Deigmoeller, and Michael Gienger. Lami: Large lan- guage models for multi-modal human-robot interaction. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA), 2024

work page 2024

[62] [62]

Wilson, Yan Karklin, Bojian Han, and Chai- tanya Ekanadham

Kevin H. Wilson, Yan Karklin, Bojian Han, and Chai- tanya Ekanadham. Back to the basics: Bayesian exten- sions of irt outperform neural networks for proficiency estimation. InEducational Data Mining, 2016

work page 2016

[63] [63]

Blindness and vision impair- ment

World Health Organization. Blindness and vision impair- ment. https://www.who.int/news-room/fact-sheets/detail/ blindness-and-visual-impairment, 2026. Accessed 2026- 05-11

work page 2026

[64] [64]

SA VOR: Skill affordance learning from visuo- haptic perception for robot-assisted bite acquisition

Zhanxin Wu, Bo Ai, Tom Silver, and Tapomayukh Bhat- tacharjee. SA VOR: Skill affordance learning from visuo- haptic perception for robot-assisted bite acquisition. In Conference on Robot Learning (CoRL), 2025

work page 2025

[65] [65]

Robotic guide dog: Leading a human with leash-guided hybrid physical in- teraction

Anxing Xiao, Wenzhe Tong, Lizhi Yang, Jun Zeng, Zhongyu Li, and Koushil Sreenath. Robotic guide dog: Leading a human with leash-guided hybrid physical in- teraction. InIEEE International Conference on Robotics and Automation (ICRA), 2021

work page 2021

[66] [66]

Coach: Cooperative robot teaching

Cunjun Yu, Yiqing Xu, Linfeng Li, and David Hsu. Coach: Cooperative robot teaching. InConference on Robot Learning (CoRL), 2022

work page 2022

[67] [67]

user standing 1.5m from door, facing slightly left

G.A. Zijlstra, Judith Ballemans, and Gertrudis Kempen. Orientation and mobility training for adults with low vision: a new standardized approach.Clinical Rehabili- tation, 27:3–18, 2013. TABLE II: Parameter distributions for the Simulated Learner. Parameters are randomized per skill or per learner episode to create diverse student profiles. Parameter Dist...

work page 2013

[68] [68]

The order of A/B is randomized to prevent position bias

work page

[69] [69]

Judges are instructed to evaluate based on the accuracy of the description

work page

[70] [70]

For each model, we recruited 4 annotators to rate the pairwise comparison

work page

[71] [71]

Each human judge evaluated 50 frames

work page

[72] [72]

equally accurate

Judges were blind to which description came from human vs. VLM Inter-Rater AgreementTable V shows inter-rater agreement metrics, including Fleiss’s kappa and unanimous agreement rates when judges had clear preferences (excluding “equally accurate” responses). TABLE V:Inter-rater agreement for frame analysis evaluation.Four independent judges rated VLM-gen...

work page

[73] [73]

Frame analysis (f frame): Extract structured observations from individual frames

work page

[74] [74]

Timeline summarization (f time): Aggregate frame observations into episode-level summary

work page

[75] [75]

Coaching generation (f coach): Generate personalized feedback based on timeline and proficiency model

work page

[76] [76]

move hand forward 20cm

Robot adaptation (f param): Adjust robot parameters based on diagnosed errors Table VI summarizes the input and output contract for each stage, making the intermediate representations used by the decomposed pipeline explicit. Video SelectionWe selected 10 navigation videos where users made clear errors requiring coaching intervention: •Error types: Wrong ...

work page

[77] [77]

Please rank your top 3 most valuable features from the list above

work page

[78] [78]

What aspects of the navigation task did you find most challenging?

work page

[79] [79]

What aspects of the teaching method were most helpful for your learning?

work page

[80] [80]

Did you develop any specific strategies for successful navigation? If so, please describe them

work page