pith. sign in

arxiv: 2605.19501 · v1 · pith:AKYGY52Unew · submitted 2026-05-19 · 💻 cs.RO · cs.AI

CANINE: Coaching Visually Impaired Users for Interactive Navigation with a Robot Guide Dog

Pith reviewed 2026-05-20 05:29 UTC · model grok-4.3

classification 💻 cs.RO cs.AI
keywords robot guide dogvisually impaired navigationhuman-robot coordinationadaptive verbal coachingknowledge tracingfoundation modelsinteractive navigationpersonalized feedback
0
0 comments X

The pith

CANINE uses a two-level system of skill tracking and AI error analysis to deliver personalized verbal coaching for robot guide dog navigation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that an automated coaching system called CANINE trains users to coordinate effectively with a robot guide dog by breaking the task into sub-skills and providing adaptive verbal feedback. At the high level it tracks proficiency with knowledge tracing to focus training on weak areas, while at the low level it watches practice episodes and applies foundation models to infer the causes of coordination errors before generating targeted corrections. A controlled study with blindfolded participants demonstrates faster learning and stronger final navigation performance than generic verbal instructions, with supporting evidence from a retention test after two weeks and a case study with a visually impaired user. This addresses the difficulty of learning subtle human-robot coordination that limits the real-world benefit of guide robots for independent mobility.

Core claim

CANINE decomposes a complex coordination task into sub-skills and operates at two levels. At the high level, it decides what to train by tracking the learner's proficiency across sub-skills using knowledge tracing and prioritizing training on the weakest areas. At the low level, CANINE decides how to train each sub-skill by observing each human practice episode, using foundation models to infer the underlying causes of errors, and generating targeted verbal corrections adaptively. A controlled study with blindfolded participants demonstrates that CANINE significantly improves both learning efficiency and final navigation performance compared to generic verbal instructions.

What carries the argument

Two-level coaching system that combines knowledge tracing for proficiency tracking and prioritization with foundation-model inference of error causes to produce adaptive verbal corrections.

If this is right

  • Users reach higher navigation performance after fewer practice episodes than with generic instructions.
  • Skill gains persist for at least two weeks following the training sessions.
  • The same coaching approach produces measurable benefits when applied to a real visually impaired user.
  • Targeted corrections derived from observed error causes outperform non-adaptive verbal guidance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Embedding the coaching loop inside the robot itself could allow ongoing skill refinement during everyday use rather than separate training sessions.
  • The two-level structure of skill tracing plus model-based error diagnosis may transfer to training users for other complex assistive robots such as wheelchairs or manipulators.
  • Larger field trials that vary user experience levels and environments would test how robust the foundation-model inference step remains outside controlled settings.

Load-bearing premise

Blindfolded participants serve as a valid proxy population for quantitatively evaluating effectiveness with visually impaired users.

What would settle it

A direct comparison study with actual visually impaired users that finds no significant improvement in learning efficiency or navigation performance over generic verbal instructions.

Figures

Figures reproduced from arXiv: 2605.19501 by Anxing Xiao, Cunjun Yu, David Hsu, Linfeng Li, Zishuo Wang.

Figure 1
Figure 1. Figure 1: CANINE. In our study, a robot guide dog coaches a visually impaired user to navigate through a doorway. damental challenges. First, these tasks are complex processes composed of distinct phases, each requiring different coordi￾nation strategies. For example, locating a door handle requires spatial exploration, whereas passing through the door requires precise coordination. Given the heterogeneity of these … view at source ↗
Figure 2
Figure 2. Figure 2: Expert coach insights from formative study. Analysis reveals common learner challenges, effective coaching strategies (timing and content), and design implications for automated coaching systems. and adaptively diagnose “how” the execution failed, modifying feedback to correct specific physical errors. Interactive tasks with timing constraints are most difficult. Tasks requiring active physical coordinatio… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of CANINE. CANINE employs a two-level coaching strategy. The inter-skill coaching (up-right) tracks proficiency across sub-skills using knowledge tracing and selects the sub-skill to practice next. The intra-skill coaching (down￾right) takes in video observations and generates coaching instructions for the selected sub-skill. B. Inter-skill Coaching The inter-skill layer uses the POMDP formulation… view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of the summarized timeline. The timeline aggregates per-frame observations sampled every 0.5 s into a structured episode-level summary. symbolic state representations h1:T . Given the observation trajectory, we uniformly sample key frames and apply ht = fframe(o1:T , t) to each timestamp t. In practice, we capture an image every 0.5 seconds and feed it to the VLM. The out￾put ht encodes: user … view at source ↗
Figure 7
Figure 7. Figure 7: Frame analysis accuracy comparison. Preference rates of VLM-generated descriptions versus human annota￾tions across different language models (GPT-5.1, Claude-4.5- Sonnet, Gemini-3.0-Pro amd Qwen3-VL). TABLE I: Human evaluation on coaching feedback quality. Preference rate indicates percentage of decisive preference. Other metrics are mean ratings on 1-5 Likert scales (higher is better). Latency is average… view at source ↗
Figure 9
Figure 9. Figure 9: Objective performance. a) Performance improvement across sub-skills. b) Final completion time in evaluation trials. SUS TLX PU PEU 0 20 40 60 80 100 * Baseline Ours [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative examples of generated instructions. Top-left shows the image captured by the chest-mount camera. Sub-skill Practice Personalized Feedback Adaptive Curriculum Take in Human Feedback Terminal Feedback Timing Progressive Difficulty 6.40 6.40 6.30 6.10 5.60 4.90 [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: User ratings of CANINE components. Using a 1 - 7 Likert scale (1 = not helpful, 7 = very helpful). −2.61, p = .019, g = −1.11, leading to faster completion times (M = 11.36, SD = 0.52 vs. M = 14.42, SD = 0.76), t(18) = 3.00, p = .006, g = 1.35. 4) Results on Subjective Experience: Participants rated the coaching system as more useful than baseline instruction, valuing its targeted feedback and adaptive st… view at source ↗
Figure 13
Figure 13. Figure 13: Skill retention. Bars show the mean retention change for each skill in second. substantial improvement in the “Open Door” task. In con￾trast, “Enter Room” and “Navigate to Door” showed mi￾nor performance degradation, indicating these skills remained relatively stable. Interestingly, 60% of participants identified “Open Door” as the most challenging aspect of the task. We hypothesize that the observed impr… view at source ↗
Figure 15
Figure 15. Figure 15: User interfaces used for data collection and evaluation. (a) Interface for human annotators to label ground truth frame [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Objects used for handover experiment. 3) Cross-Task Comparison Interview (5 min): • Comparison of navigation vs. handover coaching experience • Feedback timing preferences: terminal feedback for navigation vs. concurrent feedback for handover • Task-specific coaching requirements and design considerations M. Study 4: Robotic Handover Task Setup This section details the experimental setup for the robotic h… view at source ↗
read the original abstract

Robot guide dogs offer navigation assistance that greatly expands the independent mobility of the visually impaired, but their effective use requires subtle human-robot coordination that is difficult for users to learn from generic verbal instructions. To tackle this challenge, we present CANINE, an automated coaching system that trains users for interactive navigation with a robot guide dog, through personalized, adaptive verbal feedback. CANINE decomposes a complex coordination task into sub-skills and operates at two levels. At the high level, it decides what to train by tracking the learner's proficiency across sub-skills using knowledge tracing and prioritizing training on the weakest areas. At the low level, CANINE decides how to train each sub-skill by observing each human practice episode, using foundation models to infer the underlying causes of errors, and generating targeted verbal corrections adaptively. A controlled study with blindfolded participants, treated as a proxy population for quantitative evaluation, demonstrates that CANINE significantly improves both learning efficiency and final navigation performance compared to generic verbal instructions. We further validate CANINE through a retention study and an exploratory case study. The retention study shows lasting skill improvement after two weeks. The case study confirms CANINE's effectiveness in training a visually impaired user, while revealing additional design considerations for real-world deployment. Both are well aligned with the findings of the controlled study. Project page: https://cunjunyu.github.io/project/canine/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents CANINE, an automated coaching system for training users in interactive navigation with a robot guide dog. CANINE decomposes coordination into sub-skills, uses knowledge tracing at the high level to prioritize weakest areas, and foundation models at the low level to infer error causes from practice episodes and generate targeted verbal corrections. A controlled study with blindfolded participants (treated as proxy for visually impaired users) reports statistically significant gains in learning efficiency and final navigation performance versus generic verbal instructions. Validation includes a retention study showing lasting skill retention after two weeks and an exploratory case study with one visually impaired user.

Significance. If the central empirical claims hold under proper validation, the work offers a concrete, scalable approach to personalized human-robot coaching in assistive robotics. The combination of knowledge tracing with foundation-model-based error diagnosis is a technically interesting contribution that could generalize to other interactive skill-acquisition settings. The retention and case-study elements provide useful longitudinal and real-user signals that strengthen the overall narrative.

major comments (2)
  1. [Evaluation / Controlled Study] The headline result—that CANINE produces statistically significant improvements in learning efficiency and navigation performance—rests entirely on the controlled study with blindfolded sighted participants (described in the abstract and Evaluation section). The manuscript supplies no sample sizes, power analysis, exact statistical tests, or full methodology, and the single exploratory case study with one VI user reports no quantitative metrics or direct statistical comparison to the proxy cohort. This omission makes it impossible to evaluate whether the observed gains are robust or an artifact of the proxy population.
  2. [Evaluation / Proxy Population Justification] The validity of blindfolded sighted participants as a quantitative proxy for visually impaired users is not adequately justified. Visually impaired users typically possess long-term compensatory spatial cognition and haptic/auditory integration that sighted blindfolded subjects lack; these differences can alter both the distribution of coordination errors and the effectiveness of verbal corrections. The paper acknowledges the proxy choice but provides no evidence (e.g., comparative error distributions or pilot data) that the two populations respond similarly to the coaching interventions.
minor comments (2)
  1. [Abstract] The abstract states that the controlled study demonstrates 'statistically significant gains' yet omits all numerical details (N, p-values, effect sizes). Adding these numbers, even in summary form, would improve transparency.
  2. [Case Study] The case study is described as 'exploratory' and 'well aligned' with the controlled-study findings, but no concrete metrics or qualitative observations are supplied. A brief table or bullet list of observed behaviors and outcomes would help readers assess alignment.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback on our manuscript. We appreciate the emphasis on evaluation transparency and proxy validation, which are critical for establishing the robustness of our claims in assistive robotics. Below we provide point-by-point responses to the major comments and describe the revisions we will make.

read point-by-point responses
  1. Referee: [Evaluation / Controlled Study] The headline result—that CANINE produces statistically significant improvements in learning efficiency and navigation performance—rests entirely on the controlled study with blindfolded sighted participants (described in the abstract and Evaluation section). The manuscript supplies no sample sizes, power analysis, exact statistical tests, or full methodology, and the single exploratory case study with one VI user reports no quantitative metrics or direct statistical comparison to the proxy cohort. This omission makes it impossible to evaluate whether the observed gains are robust or an artifact of the proxy population.

    Authors: We agree that greater methodological transparency is necessary to allow readers to assess the strength of our empirical claims. In the revised manuscript we will expand the Evaluation section to report the precise sample size (N=20 for the controlled study), a post-hoc power analysis, the exact statistical procedures (including paired t-tests or equivalent non-parametric tests with all p-values, confidence intervals, and effect sizes), and a complete description of the experimental protocol, participant instructions, and data collection procedures. For the exploratory case study with the single visually impaired participant, we will add the available quantitative session metrics (navigation completion time and error counts) while clearly stating that its small sample size precludes formal statistical comparison to the proxy cohort; the case study is presented as qualitative validation only. These additions will directly address the concern about robustness. revision: yes

  2. Referee: [Evaluation / Proxy Population Justification] The validity of blindfolded sighted participants as a quantitative proxy for visually impaired users is not adequately justified. Visually impaired users typically possess long-term compensatory spatial cognition and haptic/auditory integration that sighted blindfolded subjects lack; these differences can alter both the distribution of coordination errors and the effectiveness of verbal corrections. The paper acknowledges the proxy choice but provides no evidence (e.g., comparative error distributions or pilot data) that the two populations respond similarly to the coaching interventions.

    Authors: We acknowledge that the proxy justification in the current manuscript is brief and would benefit from additional support. In the revision we will add a dedicated subsection under Evaluation that (1) cites prior work in assistive navigation and human-robot interaction that has employed blindfolded sighted proxies for similar coordination tasks, (2) reports any available pilot observations from our own development phase comparing error distributions between the two populations, and (3) explicitly discusses the limitations of the proxy approach along with how the two-week retention study and the real-user case study provide complementary evidence of generalizability. We believe these changes will strengthen the argument without overstating the equivalence of the populations. revision: yes

Circularity Check

0 steps flagged

No significant circularity: claims rest on empirical user studies

full rationale

The paper describes an engineering system (CANINE) that decomposes navigation coordination into sub-skills, applies knowledge tracing for proficiency tracking, and uses foundation models to generate verbal corrections from observed episodes. Central claims of improved learning efficiency and navigation performance are supported solely by results from a controlled study with blindfolded participants (treated explicitly as a proxy), a retention study, and an exploratory case study with one visually impaired user. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citation chains appear in the derivation of these outcomes; the evaluation is independent of any internal reduction and stands on external participant data.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach depends on standard assumptions about human skill decomposition and model-based error inference rather than new free parameters or invented physical entities; no ad-hoc constants are introduced to fit results.

axioms (2)
  • domain assumption Navigation coordination can be decomposed into independent sub-skills whose proficiency can be tracked separately via knowledge tracing.
    Invoked at the high level to decide training priorities.
  • domain assumption Foundation models can accurately infer causes of coordination errors from observed human-robot interaction episodes.
    Central to the low-level adaptive feedback generation.

pith-pipeline@v0.9.0 · 5789 in / 1394 out tokens · 46233 ms · 2026-05-20T05:29:47.716713+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

141 extracted references · 141 canonical work pages · 1 internal anchor

  1. [1]

    Do As I Can, Not As I Say: Grounding language in robotic affordances

    Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Haus- man, et al. Do As I Can, Not As I Say: Grounding language in robotic affordances. InConference on Robot Learning (CoRL), 2022

  2. [2]

    In Robotics: Science and Systems (RSS), 2025

    Kevin Black, Noah Brown, Danny Driess, Adnan Es- mail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al.π 0: A vision- language-action flow model for general robot control. In Robotics: Science and Systems (RSS), 2025

  3. [3]

    RT-2: Vision-language-action models transfer web knowledge to robotic control

    Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control. InConference on Robot Learning (CoRL), 2023

  4. [4]

    quick and dirty

    John Brooke. SUS: A “quick and dirty” usability scale. In Usability Evaluation in Industry, pages 189–194. Taylor & Francis, 1996

  5. [5]

    Language models are few-shot learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott G...

  6. [6]

    Navigating real-world challenges: A quadruped robot guiding system for visually impaired people in diverse environments

    Shaojun Cai, Ashwin Ram, Zhengtai Gou, Mohd Alqama Wasim Shaikh, Yu-An Chen, Yingjia Wan, Ko- taro Hara, Shengdong Zhao, and David Hsu. Navigating real-world challenges: A quadruped robot guiding system for visually impaired people in diverse environments. InCHI Conference on Human Factors in Computing Systems (CHI), 2024

  7. [7]

    Nav- igation beyond wayfinding: Robots collaborating with visually impaired users for environmental interactions

    Shaojun Cai, Nuwan Janaka, Ashwin Ram, Janidu She- han, Yingjia Wan, Kotaro Hara, and David Hsu. Nav- igation beyond wayfinding: Robots collaborating with visually impaired users for environmental interactions. InACM/IEEE International Conference on Human-Robot Interaction(HRI), 2026

  8. [8]

    A human-inspired ob- ject handover controller.The International Journal of Robotics Research (IJRR), 32(8):971–983, 2013

    Wesley P Chan, Chris AC Parker, HF Machiel Van der Loos, and Elizabeth A Croft. A human-inspired ob- ject handover controller.The International Journal of Robotics Research (IJRR), 32(8):971–983, 2013

  9. [9]

    Quadruped guidance robot for the visually impaired: A comfort-based approach

    Yanbo Chen, Zhengzhe Xu, Zhuozhu Jian, Gengpan Tang, Liyunong Yang, Anxing Xiao, Xueqian Wang, and Bin Liang. Quadruped guidance robot for the visually impaired: A comfort-based approach. InIEEE International Conference on Robotics and Automation (ICRA), 2023

  10. [10]

    Upper-limb rehabilita- tion with a dual-mode individualized exoskeleton robot: A generative-model-based solution.The International Journal of Robotics Research (IJRR), 2025

    Yu Chen, Shu Miao, Jing Ye, Gong Chen, Jianghua Cheng, Ketao Du, and Xiang Li. Upper-limb rehabilita- tion with a dual-mode individualized exoskeleton robot: A generative-model-based solution.The International Journal of Robotics Research (IJRR), 2025

  11. [11]

    Multi-armed bandits for intelligent tutoring systems.Journal of Educational Data Mining, 7(2):20–48, 2015

    Benjamin Clement, Didier Roy, Pierre-Yves Oudeyer, and Manuel Lopes. Multi-armed bandits for intelligent tutoring systems.Journal of Educational Data Mining, 7(2):20–48, 2015

  12. [12]

    Corbett and John R

    Albert T. Corbett and John R. Anderson. Knowledge trac- ing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction (UMUAI), 4(4):253–278, 1994

  13. [13]

    No, to the Right: Online language corrections for robotic manipu- lation via shared autonomy

    Yuchen Cui, Siddharth Karamcheti, Raj Palleti, Nidhya Shivakumar, Percy Liang, and Dorsa Sadigh. No, to the Right: Online language corrections for robotic manipu- lation via shared autonomy. InACM/IEEE International Conference on Human-Robot Interaction(HRI). Associa- tion for Computing Machinery, 2023

  14. [14]

    A policy- blending formalism for shared control.The International Journal of Robotics Research (IJRR), 32(7):790–805, 2013

    Anca D Dragan and Siddhartha S Srinivasa. A policy- blending formalism for shared control.The International Journal of Robotics Research (IJRR), 32(7):790–805, 2013

  15. [15]

    Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. PaLM-E: An embodied multimodal language model. InInternational Conference on Machine Learning (ICML), 2023

  16. [16]

    Will people enjoy a robot trainer? a case study with snoopie the pacerbot

    Maximilian Du, Jennifer Grannen, Shuran Song, and Dorsa Sadigh. Will people enjoy a robot trainer? a case study with snoopie the pacerbot. InIEEE International Conference on Robotics and Automation (ICRA), 2026

  17. [17]

    Foundation models in robotics: Applications, challenges, and the future.The International Journal of Robotics Research (IJRR), 44(5):701–739, 2025

    Roya Firoozi, Johnathan Tucker, Stephen Tian, Anirudha Majumdar, Jiankai Sun, Weiyu Liu, Yuke Zhu, Shuran Song, Ashish Kapoor, Karol Hausman, et al. Foundation models in robotics: Applications, challenges, and the future.The International Journal of Robotics Research (IJRR), 44(5):701–739, 2025

  18. [18]

    Computational teaching for driving via multi-task imitation learning.IEEE International Conference on Robotics and Automation (ICRA), 2025

    Deepak Gopinath, Xiongyi Cui, Jonathan DeCastro, Emily Sumner, Jean Costa, Hiroshi Yasuda, Allison Morgan, Laporsha Dees, Sheryl Chau, John Leonard, et al. Computational teaching for driving via multi-task imitation learning.IEEE International Conference on Robotics and Automation (ICRA), 2025

  19. [19]

    Cabot: De- signing and evaluating an autonomous navigation robot for blind people

    Jo ˜ao Guerreiro, Daisuke Sato, Saki Asakawa, Huixu Dong, Kris M Kitani, and Chieko Asakawa. Cabot: De- signing and evaluating an autonomous navigation robot for blind people. InACM ASSETS 2019 (ACM SIGAC- CESS Conference on Computers and Accessibility), 2019

  20. [20]

    Hambleton and Hariharan Swaminathan.Item Response Theory: Principles and Applications

    Ronald K. Hambleton and Hariharan Swaminathan.Item Response Theory: Principles and Applications. Evalua- tion in Education and Human Services. 1985

  21. [21]

    S. G. Hart and Lowell E. Staveland. Development of nasa-tlx (task load index): Results of empirical and theoretical research.Advances in Psychology, 52:139– 183, 1988

  22. [22]

    Teachingbot: Robot teacher for human handwriting

    Zhimin Hou, Cunjun Yu, David Hsu, and Haoyong Yu. Teachingbot: Robot teacher for human handwriting. IEEE Robotics and Automation Letters (RA-L), 11(3): 2610–2617, 2026

  23. [23]

    Guidenav: User-informed devel- opment of a vision-only robotic navigation assistant for blind travelers

    Hochul Hwang, Soowan Yang, Jahir S Monon, Nicholas A Giudice, Sunghoon I Lee, Joydeep Biswas, and Donghyun Kim. Guidenav: User-informed devel- opment of a vision-only robotic navigation assistant for blind travelers. InACM/IEEE International Conference on Human-Robot Interaction(HRI), 2026

  24. [24]

    Physical human- robot interaction: Mutual learning and adaptation.IEEE Robotics and Automation Magazine (RAM), 19(4):24–35, 2012

    Shuhei Ikemoto, Heni Ben Amor, Takashi Minato, Bern- hard Jung, and Hiroshi Ishiguro. Physical human- robot interaction: Mutual learning and adaptation.IEEE Robotics and Automation Magazine (RAM), 19(4):24–35, 2012

  25. [25]

    Between reality and delusion: Challenges of applying large language models to companion robots for open-domain dialogues with older adults.Autonomous Robots, 49(1):9, 2025

    Bahar Irfan, Sanna-Mari Kuoppam ¨aki, and Gabriel Skantze. Between reality and delusion: Challenges of applying large language models to companion robots for open-domain dialogues with older adults.Autonomous Robots, 49(1):9, 2025

  26. [26]

    FEAST: A flexible mealtime-assistance system towards in-the-wild personalization

    Rajat Kumar Jenamani, Tom Silver, Ben Dodson, Shiqin Tong, Anthony Song, Yuting Yang, Ziang Liu, Benjamin Howe, Aimee Whitneck, and Tapomayukh Bhattacharjee. FEAST: A flexible mealtime-assistance system towards in-the-wild personalization. InRobotics: Science and Systems (RSS), 2025

  27. [27]

    Matthew J ¨orke, Shardul Sapkota, Lyndsea Warkenthien, Niklas Vainio, Paul Schmiedmayer, Emma Brunskill, and James A. Landay. Gptcoach: Towards llm-based physical activity coaching. InCHI Conference on Human Factors in Computing Systems (CHI), CHI ’25, 2025

  28. [28]

    Beyond omakase: Designing shared control for navigation robots with blind people

    Rie Kamikubo, Seita Kayukawa, Yuka Kaniwa, Allan Wang, Hernisa Kacorri, Hironobu Takagi, and Chieko Asakawa. Beyond omakase: Designing shared control for navigation robots with blind people. InCHI Conference on Human Factors in Computing Systems (CHI), 2025

  29. [29]

    Un- derstanding large-language model (llm)-powered human- robot interaction

    Callie Y Kim, Christine P Lee, and Bilge Mutlu. Un- derstanding large-language model (llm)-powered human- robot interaction. InACM/IEEE International Confer- ence on Human-Robot Interaction(HRI), 2024

  30. [30]

    Learning dynamic robot-to-human object handover from human feedback

    Andras Kupcsik, David Hsu, and Wee Sun Lee. Learning dynamic robot-to-human object handover from human feedback. In Antonio Bicchi and Wolfram Burgard, editors,Robotics Research, volume 2 ofSpringer Pro- ceedings in Advanced Robotics, pages 161–176. Springer, Cham, 2018

  31. [31]

    Pathfinder: Designing a map-less navigation system for blind people in unfamiliar buildings

    Masaki Kuribayashi, Tatsuya Ishihara, Daisuke Sato, Jayakorn V ongkulbhisal, Karnik Ram, Seita Kayukawa, Hironobu Takagi, Shigeo Morishima, and Chieko Asakawa. Pathfinder: Designing a map-less navigation system for blind people in unfamiliar buildings. InCHI Conference on Human Factors in Computing Systems (CHI), 2023

  32. [32]

    Lan, Andrew E

    Andrew S. Lan, Andrew E. Waters, Christoph Studer, and Richard G. Baraniuk. Sparse factor analysis for learning and content analytics.Journal of Machine Learning Research (JMLR), 15(1):1959–2008, 2014

  33. [33]

    Towards robo-coach: Robot interactive stiffness/- position adaptation for human strength and conditioning training

    Chenzui Li, Xi Wu, Tao Teng, Sylvain Calinon, and Fei Chen. Towards robo-coach: Robot interactive stiffness/- position adaptation for human strength and conditioning training. InIEEE International Conference on Robotics and Automation (ICRA), 2024

  34. [34]

    Continuous role adaptation for human–robot shared control.IEEE Transactions on Robotics (T-RO), 31(3):672–681, 2015

    Yanan Li, Keng Peng Tee, Wei Liang Chan, Rui Yan, Yuanwei Chua, and Dilip Kumar Limbu. Continuous role adaptation for human–robot shared control.IEEE Transactions on Robotics (T-RO), 31(3):672–681, 2015

  35. [35]

    Code as policies: Language model programs for em- bodied control

    Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for em- bodied control. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023

  36. [36]

    RadarMath: An intelligent tutoring system for math education

    Yu Lu, Yang Pian, Penghe Chen, Qinggang Meng, and Yunbo Cao. RadarMath: An intelligent tutoring system for math education. InAAAI Conference on Artificial Intelligence (AAAI), 2021

  37. [37]

    Mariah Lynn Schrum, Srijan Srivatsa, Laporsha Dees, Evelyn Dixon, Patricio Reyes Gomez, Deepak Gopinath, Emily Sarah Sumner, Guy Rosman, and Tiffany L. Chen. Skill Modulates Coaching Language in Embodied Motor Learning. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA), 2026

  38. [38]

    Evaluation of the potential suitability of guide dog candi- dates by continuous observation during training.Journal of Veterinary Behavior, 3(5):193–198, 2008

    Mina Mizukoshi, Mana Kondo, and Toru Nakamura. Evaluation of the potential suitability of guide dog candi- dates by continuous observation during training.Journal of Veterinary Behavior, 3(5):193–198, 2008

  39. [39]

    Task-agnostic exoskeleton control via biological joint moment estimation.Nature, 635(8038): 337–344, 2024

    Dean D Molinaro, Keaton L Scherpereel, Ethan B Schon- haut, Georgios Evangelopoulos, Max K Shepherd, and Aaron J Young. Task-agnostic exoskeleton control via biological joint moment estimation.Nature, 635(8038): 337–344, 2024

  40. [40]

    Formalizing human-robot mutual adaptation via a bounded memory based model

    Stefanos Nikolaidis, Anton Kuznetsov, David Hsu, and Siddhartha Srinivasa. Formalizing human-robot mutual adaptation via a bounded memory based model. In ACM/IEEE International Conference on Human-Robot Interaction(HRI), 2016

  41. [41]

    Octo: An open-source generalist robot policy

    Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, et al. Octo: An open-source generalist robot policy. InRobotics: Science and Systems (RSS), 2024

  42. [42]

    GPT-4 Technical Report

    OpenAI. GPT-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  43. [43]

    W AFFLE: A wearable approach to bite timing estimation in robot-assisted feeding

    Akhil Padmanabha, Jessie Yuan, Tanisha Mehta, Ra- jat Kumar Jenamani, Eric Hu, Victoria de Le ´on, An- thony Wertz, Janavi Gupta, Ben Dodson, Yunting Yan, Carmel Majidi, Tapomayukh Bhattacharjee, and Zackory Erickson. W AFFLE: A wearable approach to bite timing estimation in robot-assisted feeding. InACM/IEEE Inter- national Conference on Human-Robot Inte...

  44. [44]

    Mutter, editors.Intelligent Tutoring Systems: Lessons Learned

    Joseph Psotka, Leonard Daniel Massey, and Sharon A. Mutter, editors.Intelligent Tutoring Systems: Lessons Learned. 1988

  45. [45]

    Hamlin, Lydia E

    Peizhu Qian, Filip Bajraktari, Carlos Quintero-Pe ˜na, Qingxi Meng, Shannan K. Hamlin, Lydia E. Kavraki, and Vaibhav Unhelkar. ASTRID: A robotic tutor for nurse training to reduce healthcare-associated infections. InRobotics: Science and Systems (RSS), 2025

  46. [46]

    Rafferty, Emma Brunskill, Thomas L

    Anna N. Rafferty, Emma Brunskill, Thomas L. Griffiths, and Patrick Shafto. Faster teaching by pomdp planning. InArtificial Intelligence in Education, 2011

  47. [47]

    Dragan, and Sergey Levine

    Siddharth Reddy, Anca D. Dragan, and Sergey Levine. Shared autonomy via deep reinforcement learning. In Robotics: Science and Systems (RSS), 2018

  48. [48]

    Learning physical col- laborative robot behaviors from human demonstrations

    Leonel Rozo, Sylvain Calinon, Darwin G Caldwell, Pablo Jimenez, and Carme Torras. Learning physical col- laborative robot behaviors from human demonstrations. IEEE Transactions on Robotics (T-RO), 32(3):513–527, 2016

  49. [49]

    Adaptive robot language tutoring based on bayesian knowledge tracing and predictive decision-making

    Thorsten Schodde, Kirsten Bergmann, and Stefan Kopp. Adaptive robot language tutoring based on bayesian knowledge tracing and predictive decision-making. In ACM/IEEE International Conference on Human-Robot Interaction(HRI), 2017

  50. [50]

    Autonomy in phys- ical human-robot interaction: A brief survey.IEEE Robotics and Automation Letters (RA-L), 6(4):7989– 7996, 2021

    Mario Selvaggio, Marco Cognetti, Stefanos Nikolaidis, Serena Ivaldi, and Bruno Siciliano. Autonomy in phys- ical human-robot interaction: A brief survey.IEEE Robotics and Automation Letters (RA-L), 6(4):7989– 7996, 2021

  51. [51]

    ViNT: A foundation model for visual navigation

    Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Sta- chowicz, Kevin Black, Noriaki Hirose, and Sergey Levine. ViNT: A foundation model for visual navigation. InConference on Robot Learning (CoRL), 2023

  52. [52]

    Hi Robot: Open-ended instruction follow- ing with hierarchical vision-language-action models

    Lucy Xiaoyang Shi, brian ichter, Michael Robert Equi, Liyiming Ke, Karl Pertsch, Quan Vuong, James Tanner, Anna Walling, Haohuan Wang, Niccolo Fusai, Adrian Li- Bell, Danny Driess, Lachy Groom, Sergey Levine, and Chelsea Finn. Hi Robot: Open-ended instruction follow- ing with hierarchical vision-language-action models. In International Conference on Machi...

  53. [53]

    Ut Na Sio and Thomas C. Ormerod. Does incubation enhance problem solving? a meta-analytic review.Psy- chological bulletin, 135 1:94–120, 2009

  54. [54]

    Assistive teaching of motor control tasks to humans

    Megha Srivastava, Erdem Biyik, Suvir Mirchandani, Noah Goodman, and Dorsa Sadigh. Assistive teaching of motor control tasks to humans. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

  55. [55]

    Generating language corrections for teaching physical control tasks

    Megha Srivastava, Noah Goodman, and Dorsa Sadigh. Generating language corrections for teaching physical control tasks. InInternational Conference on Machine Learning (ICML), 2023

  56. [56]

    Shared autonomy for proximal teaching

    Megha Srivastava, Reihaneh Iranmanesh, Yuchen Cui, Deepak Gopinath, Emily Sarah Sumner, Andrew Silva, Laporsha Dees, Guy Rosman, and Dorsa Sadigh. Shared autonomy for proximal teaching. InACM/IEEE Inter- national Conference on Human-Robot Interaction(HRI), 2025

  57. [57]

    A meta- analysis of the effectiveness of intelligent tutoring sys- tems on k–12 students’ mathematical learning.Journal of Educational Psychology, 105(4):970–987, 2013

    Saiying Steenbergen-Hu and Harris Cooper. A meta- analysis of the effectiveness of intelligent tutoring sys- tems on k–12 students’ mathematical learning.Journal of Educational Psychology, 105(4):970–987, 2013

  58. [58]

    Toward seamless human-robot handovers

    Kyle Strabala, Min Kyung Lee, Anca Dragan, Jodi For- lizzi, Siddhartha S Srinivasa, Maya Cakmak, and Vin- cenzo Micelli. Toward seamless human-robot handovers. Journal of Human-Robot Interaction (JHRI), 2(1):112– 132, 2013

  59. [59]

    Merryanna L. Swartz and Masoud Yazdani, editors.Intel- ligent Tutoring Systems for Foreign Language Learning: The Bridge to International Communication, volume 80 ofNATO ASI Series F: Computer and Systems Sciences. Springer, 1992

  60. [60]

    Dragan, and Andrea V

    Ran Tian, Masayoshi Tomizuka, Anca D. Dragan, and Andrea V . Bajcsy. Towards modeling and influencing the dynamics of human learning.ACM/IEEE International Conference on Human-Robot Interaction(HRI), 2023

  61. [61]

    Lami: Large lan- guage models for multi-modal human-robot interaction

    Chao Wang, Stephan Hasler, Daniel Tanneberg, Fe- lix Ocker, Frank Joublin, Antonello Ceravola, Joerg Deigmoeller, and Michael Gienger. Lami: Large lan- guage models for multi-modal human-robot interaction. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA), 2024

  62. [62]

    Wilson, Yan Karklin, Bojian Han, and Chai- tanya Ekanadham

    Kevin H. Wilson, Yan Karklin, Bojian Han, and Chai- tanya Ekanadham. Back to the basics: Bayesian exten- sions of irt outperform neural networks for proficiency estimation. InEducational Data Mining, 2016

  63. [63]

    Blindness and vision impair- ment

    World Health Organization. Blindness and vision impair- ment. https://www.who.int/news-room/fact-sheets/detail/ blindness-and-visual-impairment, 2026. Accessed 2026- 05-11

  64. [64]

    SA VOR: Skill affordance learning from visuo- haptic perception for robot-assisted bite acquisition

    Zhanxin Wu, Bo Ai, Tom Silver, and Tapomayukh Bhat- tacharjee. SA VOR: Skill affordance learning from visuo- haptic perception for robot-assisted bite acquisition. In Conference on Robot Learning (CoRL), 2025

  65. [65]

    Robotic guide dog: Leading a human with leash-guided hybrid physical in- teraction

    Anxing Xiao, Wenzhe Tong, Lizhi Yang, Jun Zeng, Zhongyu Li, and Koushil Sreenath. Robotic guide dog: Leading a human with leash-guided hybrid physical in- teraction. InIEEE International Conference on Robotics and Automation (ICRA), 2021

  66. [66]

    Coach: Cooperative robot teaching

    Cunjun Yu, Yiqing Xu, Linfeng Li, and David Hsu. Coach: Cooperative robot teaching. InConference on Robot Learning (CoRL), 2022

  67. [67]

    user standing 1.5m from door, facing slightly left

    G.A. Zijlstra, Judith Ballemans, and Gertrudis Kempen. Orientation and mobility training for adults with low vision: a new standardized approach.Clinical Rehabili- tation, 27:3–18, 2013. TABLE II: Parameter distributions for the Simulated Learner. Parameters are randomized per skill or per learner episode to create diverse student profiles. Parameter Dist...

  68. [68]

    The order of A/B is randomized to prevent position bias

  69. [69]

    Judges are instructed to evaluate based on the accuracy of the description

  70. [70]

    For each model, we recruited 4 annotators to rate the pairwise comparison

  71. [71]

    Each human judge evaluated 50 frames

  72. [72]

    equally accurate

    Judges were blind to which description came from human vs. VLM Inter-Rater AgreementTable V shows inter-rater agreement metrics, including Fleiss’s kappa and unanimous agreement rates when judges had clear preferences (excluding “equally accurate” responses). TABLE V:Inter-rater agreement for frame analysis evaluation.Four independent judges rated VLM-gen...

  73. [73]

    Frame analysis (f frame): Extract structured observations from individual frames

  74. [74]

    Timeline summarization (f time): Aggregate frame observations into episode-level summary

  75. [75]

    Coaching generation (f coach): Generate personalized feedback based on timeline and proficiency model

  76. [76]

    move hand forward 20cm

    Robot adaptation (f param): Adjust robot parameters based on diagnosed errors Table VI summarizes the input and output contract for each stage, making the intermediate representations used by the decomposed pipeline explicit. Video SelectionWe selected 10 navigation videos where users made clear errors requiring coaching intervention: •Error types: Wrong ...

  77. [77]

    Please rank your top 3 most valuable features from the list above

  78. [78]

    What aspects of the navigation task did you find most challenging?

  79. [79]

    What aspects of the teaching method were most helpful for your learning?

  80. [80]

    Did you develop any specific strategies for successful navigation? If so, please describe them

Showing first 80 references.