Achieving Interaction Fluidity in a Wizard-of-Oz Robotic System: A Prototype for Fluid Error-Correction

Carlos Baptista De Lima; Frank F\"orster; Julian Hough; Patrick Holthaus; Yongjun Zheng

arxiv: 2604.19374 · v1 · submitted 2026-04-21 · 💻 cs.RO

Achieving Interaction Fluidity in a Wizard-of-Oz Robotic System: A Prototype for Fluid Error-Correction

Carlos Baptista De Lima , Julian Hough , Frank F\"orster , Patrick Holthaus , Yongjun Zheng This is my paper

Pith reviewed 2026-05-10 02:21 UTC · model grok-4.3

classification 💻 cs.RO

keywords Wizard-of-OzHuman-Robot Interactionfluid interactionerror correctionvirtual realityrobot simulationmobile manipulators

0 comments

The pith

A VR simulation environment for robots meets criteria for fluid Wizard-of-Oz error correction through interruptibility, pollability and precise logging.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that current Wizard-of-Oz systems for robot prototyping make interactions feel laboured and frustrating because they lack support for natural corrections during speech-based exchanges. It proposes four specific properties as necessary for fluidity: interruptibility and correction, pollability, latency measurement with optimisation, and time-accurate reproducibility of actions from logs. These properties would let an operator seamlessly stop, fix mistakes, query the system, and replay events exactly as they occurred. The authors describe a virtual reality simulation for mobile manipulators that implements all four properties to enable better HRI development and data collection.

Core claim

Based on previous systems, we propose the properties of interruptibility and correction (IaC), pollability, latency measurement and optimisation and time-accurate reproducibility of actions from logging data as key criteria for a fluid WoZ system to support fluid error correction. We finish by presenting a Virtual Reality (VR) HRI simulation environment for mobile manipulators which meets these criteria.

What carries the argument

The interruptibility and correction (IaC) property together with pollability, latency measurement and optimisation, and time-accurate reproducibility from logging data, all realised inside a VR HRI simulation environment for mobile manipulators.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar simulation platforms could let researchers test robot behaviours without physical hardware, reducing setup time for HRI experiments.
The criteria might apply to non-VR simulators or even live robot setups to improve real-world Wizard-of-Oz sessions.
Measuring actual user frustration before and after adding these properties would test whether the technical features translate to perceived fluidity.

Load-bearing premise

The listed properties are the critical and sufficient criteria for fluid interaction in Wizard-of-Oz systems, and the VR prototype actually delivers those properties when used.

What would settle it

A user study in which participants still experience persistent delays, failed corrections, or non-reproducible logs while using the VR system.

Figures

Figures reproduced from arXiv: 2604.19374 by Carlos Baptista De Lima, Frank F\"orster, Julian Hough, Patrick Holthaus, Yongjun Zheng.

**Figure 1.** Figure 1: A diagram showing the system architecture of the proposed VR-HRI system. Hardware interfaces are blue boxes, [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

read the original abstract

Achieving truly fluid interaction with robots with speech interfaces remains a hard problem, and the experience of current Human-Robot Interaction (HRI) remains laboured and frustrating. Some of the barriers to fluid interaction stem from a lack of a suitable development platform for HRI for improving interaction, even in robotic Wizard-of-Oz (WoZ) modes of operation used for data collection and prototyping. Based on previous systems, we propose the properties of interruptibility and correction (IaC), pollability, latency measurement and optimisation and time-accurate reproducibility of actions from logging data as key criteria for a fluid WoZ system to support fluid error correction. We finish by presenting a Virtual Reality (VR) HRI simulation environment for mobile manipulators which meets these criteria.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper names four practical criteria for fluid WoZ error correction and claims a VR prototype meets them, but supplies no metrics or implementation details to support that claim.

read the letter

The main thing to know is that the authors define four properties they see as essential for fluid error correction in Wizard-of-Oz robotic setups and then claim their VR simulation meets all of them. That claim is the core of the paper, yet it comes without supporting measurements or implementation specifics. What the paper does well is to pull together practical requirements from prior WoZ work and focus them on interruptibility with correction, pollability for quick responses, latency handling, and accurate replay from logs. These seem like reasonable targets for anyone trying to collect natural speech data with robots, especially mobile manipulators where timing matters for physical actions. The VR angle is a solid choice for controlled prototyping without hardware risks. They also do a good job of explaining the problem: current systems make interactions feel laboured because corrections and interruptions are hard to handle smoothly. Naming these as key criteria gives the community something concrete to aim for or debate. The soft spots are right where the stress test points. The paper asserts that the VR environment achieves low latency, reliable interrupts, and reproducible timing, but supplies no numbers on end-to-end delays, no description of how speech input is routed to allow mid-action corrections, and no logs or examples showing sub-frame accuracy in replays. Without those, the claim that it meets these criteria stays untested. This is a system paper, so readers expect at least basic verification or architecture details to judge if the properties hold in practice. This work is for HRI researchers who build or use WoZ tools for speech interface development. Someone looking for a new simulation platform or a checklist for their own setup could find the criteria list helpful as a starting point. It deserves a serious referee. The ideas on fluidity criteria are worth community input, and reviewers could ask for the missing details that would make the prototype claim credible. I would recommend sending it for review rather than desk rejecting, with the expectation that the authors add concrete evidence of the claimed properties.

Referee Report

2 major / 2 minor

Summary. The paper identifies four properties—interruptibility and correction (IaC), pollability, latency measurement and optimisation, and time-accurate reproducibility of logged actions—as necessary criteria for fluid Wizard-of-Oz (WoZ) robotic interaction, especially to support error correction. It concludes by describing a VR-based HRI simulation environment for mobile manipulators that is asserted to satisfy these criteria.

Significance. If the VR prototype were shown to deliver the listed properties with measurable performance, the work would supply a concrete development platform that could accelerate prototyping of fluid speech-based HRI. The explicit enumeration of IaC, pollability, and reproducible logging as design targets is a useful conceptual contribution even if the implementation details remain to be verified.

major comments (2)

[section presenting the VR HRI simulation environment] The central claim that the presented VR HRI simulation environment meets the IaC, pollability, latency-optimisation, and reproducibility criteria (stated in the abstract and reiterated in the final section) is unsupported by any quantitative evidence. No measured end-to-end latency values, interrupt success rates, poll-response timings, or timestamp-accuracy statistics are reported, nor are implementation specifics (e.g., how speech interrupts are routed inside the VR loop or how logged actions are replayed with sub-frame timing) supplied. This absence renders the assertion that the prototype “meets these criteria” unevaluable from the manuscript.
[section proposing the properties of IaC, pollability, latency measurement and optimisation, and reproducibility] The paper treats the four listed properties as both necessary and sufficient for fluid error-correction WoZ without providing a justification or comparison against alternative criteria (e.g., explicit turn-taking protocols or multi-modal fusion latency). Because the sufficiency claim is load-bearing for the recommendation of the VR prototype, an explicit argument or reference to prior empirical work establishing these properties as the critical set is required.

minor comments (2)

The acronym IaC is introduced without an explicit expansion on first use; a parenthetical definition would improve readability.
Figure captions and axis labels in any latency or timing diagrams should explicitly state the measurement method and sampling rate so that reproducibility claims can be assessed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for your constructive review and the opportunity to clarify our contributions. We address the major comments point by point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: The central claim that the presented VR HRI simulation environment meets the IaC, pollability, latency-optimisation, and reproducibility criteria (stated in the abstract and reiterated in the final section) is unsupported by any quantitative evidence. No measured end-to-end latency values, interrupt success rates, poll-response timings, or timestamp-accuracy statistics are reported, nor are implementation specifics (e.g., how speech interrupts are routed inside the VR loop or how logged actions are replayed with sub-frame timing) supplied. This absence renders the assertion that the prototype “meets these criteria” unevaluable from the manuscript.

Authors: We agree that quantitative measurements would strengthen the claim. The manuscript presents the VR environment through its architectural design choices intended to satisfy the criteria, but does not include performance data. In revision we will add an evaluation subsection reporting preliminary end-to-end latency, interrupt success rates, poll-response timings, and timestamp accuracy from our test runs, together with expanded implementation details on speech interrupt routing within the VR loop and sub-frame-accurate replay of logged actions. revision: yes
Referee: The paper treats the four listed properties as both necessary and sufficient for fluid error-correction WoZ without providing a justification or comparison against alternative criteria (e.g., explicit turn-taking protocols or multi-modal fusion latency). Because the sufficiency claim is load-bearing for the recommendation of the VR prototype, an explicit argument or reference to prior empirical work establishing these properties as the critical set is required.

Authors: The four properties were derived from observed failure modes in existing speech-based WoZ systems that hinder fluid error correction. We will revise the relevant section to include an explicit justification, supported by references to prior HRI literature on interruptibility and latency, and a brief comparison to alternative design criteria such as turn-taking protocols and multi-modal fusion latency, thereby clarifying why these properties form a minimal necessary set for the targeted WoZ use case. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive proposal with no derivations or self-referential reductions

full rationale

The paper proposes IaC, pollability, latency optimisation and logging reproducibility as criteria for fluid WoZ systems, then asserts that its VR prototype meets them. No equations, fitted parameters, predictions, or derivation chains exist that could reduce any claim to its own inputs by construction. The text contains no self-citation load-bearing steps, uniqueness theorems, or ansatzes smuggled via prior work; it is a straightforward system description whose central assertion is simply an unverified claim rather than a circular derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on domain assumptions about what properties produce fluid interaction and on the unverified assertion that the VR prototype satisfies them.

axioms (1)

domain assumption Interruptibility and correction, pollability, latency measurement/optimisation, and time-accurate reproducibility are the key criteria required for fluid WoZ error correction.
These properties are proposed without supporting evidence or derivation in the abstract.

pith-pipeline@v0.9.0 · 5444 in / 1217 out tokens · 47296 ms · 2026-05-10T02:21:57.365403+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

[1]

Mitchell Abrams, Thies Oelerich, Christin Hartl-Nesic, Andreas Kugi, and Matthias Scheutz. 2025. Incremental Language Understanding for Online Motion Planning of Robot Manipulators. InProceedings of IROS

work page 2025
[2]

Alexander Arntz. 2024. Enabling Safe Empirical Studies for Human-Robot Col- laboration: Implementation of a Sensor Array Driven Control Interface. InInter- national Conference on Human-Computer Interaction. Springer, 42–57

work page 2024
[3]

Alexander Arntz, André Helgert, Carolin Straßmann, and Sabrina C Eimler. 2024. Enhancing Human-Robot Interaction Research by Using a Virtual Reality Lab Approach. In2024 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR). IEEE, 340–344

work page 2024
[4]

Matthew Peter Aylett and Marta Romeo. 2023. You Don’t Need to Speak, You Need to Listen: Robot Interaction and Human-Like Turn-Taking. InProceedings of the 5th International Conference on Conversational User Interfaces. 1–5

work page 2023
[5]

Carlos Valter Baptista De Lima, Julian Hough, Frank Förster, Patrick Holthaus, and Yongjun Zheng. 2024. Improving Fluidity Through Action: A Proposal for a Virtual Reality Platform for Improving Real-World HRI. InProceedings of the 12th International Conference on Human-Agent Interaction. 358–360

work page 2024
[6]

Judith S Heinisch, Jérôme Kirchhoff, Philip Busch, Janine Wendt, Oskar von Stryk, and Klaus David. 2024. Physiological data for affective computing in HRI with anthropomorphic service robots: the AFFECT-HRI data set.Scientific Data11, 1 (2024), 333

work page 2024
[7]

David Hinwood, James Ireland, Elizabeth Ann Jochum, and Damith Herath. 2018. A proposed wizard of OZ architecture for a human-robot collaborative drawing task. InInternational Conference on Social Robotics. Springer, 35–44

work page 2018
[8]

Patrick Holthaus, Trenton Schulz, Gabriella Lakatos, and Rebekka Soma. 2023. Communicative Robot Signals: Presenting a New Typology for Human-Robot Interaction. InInternational Conference on Human-Robot Interaction (HRI 2023). ACM/IEEE, Stockholm, Sweden, 132–141. doi:10.1145/3568162.3578631

work page doi:10.1145/3568162.3578631 2023
[9]

Julian Hough and David Schlangen. 2016. Investigating fluidity for human-robot interaction with real-time, real-world grounding strategies. InProceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 288–298

work page 2016
[10]

Nikolas Martelaro. 2016. Wizard-of-oz interfaces as a step towards autonomous hri. In2016 AAAI spring symposium series

work page 2016
[11]

Laurel D. Riek. 2012. Wizard of Oz studies in HRI: a systematic review and new reporting guidelines.J. Hum.-Robot Interact.1, 1 (July 2012), 119–136. doi:10.5898/JHRI.1.1.Riek

work page doi:10.5898/jhri.1.1.riek 2012
[12]

Finn Rietz, Alexander Sutherland, Suna Bensch, Stefan Wermter, and Thomas Hellström. 2021. WoZ4U: an open-source wizard-of-oz interface for easy, efficient and robust HRI experiments.Frontiers in Robotics and AI8 (2021), 668057

work page 2021
[13]

María Trinidad Rodríguez-Domínguez, María Isabel Bazago-Dómine, María Jiménez-Palomares, Gerardo Pérez-González, Pedro Núñez, Esperanza Santano- Mogena, and Elisa María Garrido-Ardila. 2024. Interaction Assessment of a Social-Care Robot in Day center Patients with Mild to Moderate Cognitive Im- pairment: A Pilot Study.International Journal of Social Robot...

work page 2024
[14]

Tabea Runzheimer, Stefan Friesen, Sven Milde, Johannes-Hubert Peiffer, and Jan- Torsten Milde. 2024. Exploring VR Wizardry: A Generic Control Tool for Wizard of Oz Experiments. InInternational Conference on Human-Computer Interaction. Springer, 60–73

work page 2024
[15]

Moritz Schmidt and Claudia Meitinger. 2024. Convenience vs. Reliability? Evalu- ation of Human-Robot Interaction Preferences in a Production Environment. In International Conference on Human-Computer Interaction. Springer, 168–179

work page 2024
[16]

Trenton Schulz, Rebekka Soma, and Patrick Holthaus. 2021. Movement acts in breakdown situations: How a robot’s recovery procedure affects participants’ opinions.Paladyn, Journal of Behavioral Robotics12, 1 (2021), 336–355

work page 2021
[17]

Barbara Sienkiewicz, Gabriela Sejnova, Paul Gajewski, Michal Vavrecka, and Bipin Indurkhya. 2023. How language of interaction affects the user perception of a robot. InInternational Conference on Social Robotics. Springer, 308–321

work page 2023
[18]

Yao-Lin Tsai, Chinmay Wadgaonkar, Bohkyung Chun, and Heather Knight. 2022. How service robots can improve workplace experience: Camaraderie, customiza- tion, and humans-in-the-loop.International Journal of Social Robotics14, 7 (2022), 1605–1624

work page 2022
[19]

Caroline L Van Straten, Jochen Peter, Rinaldo Kühne, and Alex Barco. 2022. The wizard and I: How transparent teleoperation and self-description (do not) affect children’s robot perceptions and child-robot relationship formation.Ai & Society 37, 1 (2022), 383–399

work page 2022
[20]

He Can Walk, He Just Doesn’t Want To

Paulina Zguda, Alicja Wróbel, Paweł Gajewski, and Bipin Indurkhya. 2024. “He Can Walk, He Just Doesn’t Want To”-On Machine/Human-Likeness of Robots in Polish Children’s Perception. InInternational Conference on Human-Computer Interaction. Springer, 221–239

work page 2024
[21]

Jianling Zou, Soizic Gauthier, Salvatore M Anzalone, David Cohen, and Do- minique Archambault. 2022. A wizard of oz interface with qtrobot for facilitating the handwriting learning in children with dysgraphia and its usability evalua- tion. InInternational Conference on Computers Helping People with Special Needs. Springer, 219–225

work page 2022
[22]

Jianling Zou, Soizic Gauthier, Hugues Pellerin, Thomas Gargot, Dominique Ar- chambault, Mohamed Chetouani, David Cohen, and Salvatore M Anzalone. 2024. R2C3, a rehabilitation robotic companion for children and caregivers: the collab- orative design of a social robot for children with neurodevelopmental disorders. International Journal of Social Robotics16...

work page 2024

[1] [1]

Mitchell Abrams, Thies Oelerich, Christin Hartl-Nesic, Andreas Kugi, and Matthias Scheutz. 2025. Incremental Language Understanding for Online Motion Planning of Robot Manipulators. InProceedings of IROS

work page 2025

[2] [2]

Alexander Arntz. 2024. Enabling Safe Empirical Studies for Human-Robot Col- laboration: Implementation of a Sensor Array Driven Control Interface. InInter- national Conference on Human-Computer Interaction. Springer, 42–57

work page 2024

[3] [3]

Alexander Arntz, André Helgert, Carolin Straßmann, and Sabrina C Eimler. 2024. Enhancing Human-Robot Interaction Research by Using a Virtual Reality Lab Approach. In2024 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR). IEEE, 340–344

work page 2024

[4] [4]

Matthew Peter Aylett and Marta Romeo. 2023. You Don’t Need to Speak, You Need to Listen: Robot Interaction and Human-Like Turn-Taking. InProceedings of the 5th International Conference on Conversational User Interfaces. 1–5

work page 2023

[5] [5]

Carlos Valter Baptista De Lima, Julian Hough, Frank Förster, Patrick Holthaus, and Yongjun Zheng. 2024. Improving Fluidity Through Action: A Proposal for a Virtual Reality Platform for Improving Real-World HRI. InProceedings of the 12th International Conference on Human-Agent Interaction. 358–360

work page 2024

[6] [6]

Judith S Heinisch, Jérôme Kirchhoff, Philip Busch, Janine Wendt, Oskar von Stryk, and Klaus David. 2024. Physiological data for affective computing in HRI with anthropomorphic service robots: the AFFECT-HRI data set.Scientific Data11, 1 (2024), 333

work page 2024

[7] [7]

David Hinwood, James Ireland, Elizabeth Ann Jochum, and Damith Herath. 2018. A proposed wizard of OZ architecture for a human-robot collaborative drawing task. InInternational Conference on Social Robotics. Springer, 35–44

work page 2018

[8] [8]

Patrick Holthaus, Trenton Schulz, Gabriella Lakatos, and Rebekka Soma. 2023. Communicative Robot Signals: Presenting a New Typology for Human-Robot Interaction. InInternational Conference on Human-Robot Interaction (HRI 2023). ACM/IEEE, Stockholm, Sweden, 132–141. doi:10.1145/3568162.3578631

work page doi:10.1145/3568162.3578631 2023

[9] [9]

Julian Hough and David Schlangen. 2016. Investigating fluidity for human-robot interaction with real-time, real-world grounding strategies. InProceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 288–298

work page 2016

[10] [10]

Nikolas Martelaro. 2016. Wizard-of-oz interfaces as a step towards autonomous hri. In2016 AAAI spring symposium series

work page 2016

[11] [11]

Laurel D. Riek. 2012. Wizard of Oz studies in HRI: a systematic review and new reporting guidelines.J. Hum.-Robot Interact.1, 1 (July 2012), 119–136. doi:10.5898/JHRI.1.1.Riek

work page doi:10.5898/jhri.1.1.riek 2012

[12] [12]

Finn Rietz, Alexander Sutherland, Suna Bensch, Stefan Wermter, and Thomas Hellström. 2021. WoZ4U: an open-source wizard-of-oz interface for easy, efficient and robust HRI experiments.Frontiers in Robotics and AI8 (2021), 668057

work page 2021

[13] [13]

María Trinidad Rodríguez-Domínguez, María Isabel Bazago-Dómine, María Jiménez-Palomares, Gerardo Pérez-González, Pedro Núñez, Esperanza Santano- Mogena, and Elisa María Garrido-Ardila. 2024. Interaction Assessment of a Social-Care Robot in Day center Patients with Mild to Moderate Cognitive Im- pairment: A Pilot Study.International Journal of Social Robot...

work page 2024

[14] [14]

Tabea Runzheimer, Stefan Friesen, Sven Milde, Johannes-Hubert Peiffer, and Jan- Torsten Milde. 2024. Exploring VR Wizardry: A Generic Control Tool for Wizard of Oz Experiments. InInternational Conference on Human-Computer Interaction. Springer, 60–73

work page 2024

[15] [15]

Moritz Schmidt and Claudia Meitinger. 2024. Convenience vs. Reliability? Evalu- ation of Human-Robot Interaction Preferences in a Production Environment. In International Conference on Human-Computer Interaction. Springer, 168–179

work page 2024

[16] [16]

Trenton Schulz, Rebekka Soma, and Patrick Holthaus. 2021. Movement acts in breakdown situations: How a robot’s recovery procedure affects participants’ opinions.Paladyn, Journal of Behavioral Robotics12, 1 (2021), 336–355

work page 2021

[17] [17]

Barbara Sienkiewicz, Gabriela Sejnova, Paul Gajewski, Michal Vavrecka, and Bipin Indurkhya. 2023. How language of interaction affects the user perception of a robot. InInternational Conference on Social Robotics. Springer, 308–321

work page 2023

[18] [18]

Yao-Lin Tsai, Chinmay Wadgaonkar, Bohkyung Chun, and Heather Knight. 2022. How service robots can improve workplace experience: Camaraderie, customiza- tion, and humans-in-the-loop.International Journal of Social Robotics14, 7 (2022), 1605–1624

work page 2022

[19] [19]

Caroline L Van Straten, Jochen Peter, Rinaldo Kühne, and Alex Barco. 2022. The wizard and I: How transparent teleoperation and self-description (do not) affect children’s robot perceptions and child-robot relationship formation.Ai & Society 37, 1 (2022), 383–399

work page 2022

[20] [20]

He Can Walk, He Just Doesn’t Want To

Paulina Zguda, Alicja Wróbel, Paweł Gajewski, and Bipin Indurkhya. 2024. “He Can Walk, He Just Doesn’t Want To”-On Machine/Human-Likeness of Robots in Polish Children’s Perception. InInternational Conference on Human-Computer Interaction. Springer, 221–239

work page 2024

[21] [21]

Jianling Zou, Soizic Gauthier, Salvatore M Anzalone, David Cohen, and Do- minique Archambault. 2022. A wizard of oz interface with qtrobot for facilitating the handwriting learning in children with dysgraphia and its usability evalua- tion. InInternational Conference on Computers Helping People with Special Needs. Springer, 219–225

work page 2022

[22] [22]

Jianling Zou, Soizic Gauthier, Hugues Pellerin, Thomas Gargot, Dominique Ar- chambault, Mohamed Chetouani, David Cohen, and Salvatore M Anzalone. 2024. R2C3, a rehabilitation robotic companion for children and caregivers: the collab- orative design of a social robot for children with neurodevelopmental disorders. International Journal of Social Robotics16...

work page 2024