arxiv: 2605.08886 · v1 · submitted 2026-05-09 · 📡 eess.IV · cs.RO

Recognition: no theorem link

VISTA: A Benchmark for Real-Time Video Streaming under Network Impairments in Surgical Teleoperation

Zexin Deng , Zhenhui Yuan , Tian Lu , Gaofeng Li , Meipeng Huang , Longhao Zou

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:37 UTC · model grok-4.3

classification 📡 eess.IV cs.RO

keywords surgical teleoperationvideo streamingnetwork impairmentsbenchmarkNetEmGilbert-Elliott modelpeg transfer taskvideo quality metrics

0 comments

The pith

Network impairments reduce surgical teleoperation success from 97% in hospital LAN to 12% in GEO satellite conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces VISTA, a benchmark that emulates five network conditions to measure their effects on video streaming quality and human performance during a peg transfer task. It combines controlled impairments created by NetEm and a Gilbert-Elliott loss model with synchronized tracking of network QoS, video metrics such as PSNR, SSIM and VMAF, freeze rates, and task outcomes. Across hundreds of trials the results show clear drops in success rates and rises in completion time as conditions worsen from ideal LAN to satellite links. The work supplies a reproducible testbed for evaluating video delivery under constraints typical of remote surgery.

Core claim

VISTA integrates a standardized peg transfer task with synchronized measurements of network quality of service, objective video quality through PSNR, SSIM and VMAF, and temporal continuity through freeze rate, while keeping a stable reverse control channel. Under emulated Hospital LAN, 5G Urban, 4G Rural, LEO Satellite and GEO Satellite profiles the benchmark records success rates falling from 97% to 79%, 35%, 71% and 12% respectively, with mean completion times for successful trials rising from 80 s to 117 s, 211 s, 152 s and 255 s.

What carries the argument

The VISTA benchmark, which uses Linux Traffic Control together with NetEm and the Gilbert-Elliott loss model to generate repeatable forward-path impairments while collecting multi-modal data on QoS, video quality and operator task performance.

If this is right

Video streaming pipelines for teleoperation must maintain low freeze rates even under high-loss satellite profiles to preserve task success.
Minimum network specifications for remote surgery can be derived by comparing the measured performance across the five emulated profiles.
Objective video quality scores such as VMAF show correlation with human task completion under these impairments.
Reproducible emulation enables direct comparison of alternative codecs or transport protocols without requiring physical network access.
LEO satellite links support substantially higher success rates than GEO links for the same task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adaptive streaming techniques could be tested inside the same benchmark to quantify how much they offset the observed drops in success rate.
Extending the task set beyond peg transfer would show whether more complex or time-critical procedures suffer even larger performance losses.
The benchmark data could guide prioritization of low-latency links over raw bandwidth in the design of future medical networks.
Real packet traces captured at actual surgical sites could later replace the synthetic loss model to increase ecological validity.

Load-bearing premise

The five emulated network conditions created with NetEm and the Gilbert-Elliott loss model accurately represent the impairments that occur in real surgical teleoperation deployments.

What would settle it

Repeating the peg transfer trials over live rather than emulated 4G rural or GEO satellite links and checking whether the observed success rates and completion times match the reported 35% and 12% figures.

Figures

Figures reproduced from arXiv: 2605.08886 by Gaofeng Li, Longhao Zou, Meipeng Huang, Tian Lu, Zexin Deng, Zhenhui Yuan.

**Figure 2.** Figure 2: Overview of the VISTA HIL testbed at the University of Warwick. The emulated network injects impairments on the forward video path between the laparoscopic box trainer and the operator side, while control commands are transmitted to the UFactory Lite 6 manipulator over a dedicated unimpaired low-latency link. TABLE II BENCHMARK NETWORK TIERS AND IMPAIRMENT PARAMETERS. Networks Bandwidth (Mbps) Nominal one-… view at source ↗

**Figure 3.** Figure 3: Central-to-Peripheral Peg Transfer (C2P) task used in VISTA. Four [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Results across the five benchmark tiers. Panels (a)–(c) show objective video quality metrics, where higher values indicate better received video quality. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Experimental protocol used in VISTA. Operators first undergo [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

read the original abstract

Real-time video streaming is crucial in surgical teleoperation, yet reproducible evaluation under realistic network impairments remains limited. This paper presents VISTA, a benchmark designed to study how impairments along the forward video path affect received video quality, temporal continuity, and human task performance. VISTA employs Linux Traffic Control with NetEm and a Gilbert-Elliott loss model to emulate five network conditions: Hospital LAN, 5G Urban, 4G Rural, LEO Satellite, and GEO Satellite. The benchmark integrates a standardised peg transfer task with synchronized measurements of network quality of service (QoS), objective video quality (PSNR, SSIM, and VMAF), and temporal continuity through freeze rate, while maintaining a stable reverse control channel. Across 375 experimental trials, network degradation substantially reduced teleoperation performance: success rate decreased from 97% in Hospital LAN to 79% in 5G Urban, 35% in 4G Rural, 71% in LEO Satellite, and 12% in GEO Satellite, while mean task completion time for successful trials increased from 80 s in Hospital LAN to 117 s in 5G Urban, 211 s in 4G Rural, 152 s in LEO Satellite, and 255 s in GEO Satellite. These findings show that network impairments have a direct impact on task completion and success in surgical teleoperation, and provide a reproducible basis for evaluating teleoperation video under realistic network constraints. Source code available at https://github.com/Dzxx623/VISTA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VISTA gives a clean, reproducible benchmark with direct numbers showing network conditions tank surgical teleop success rates and times, though the emulations' match to real deployments is the main assumption to check.

read the letter

The key takeaway is that this paper introduces VISTA, an open benchmark that demonstrates substantial performance degradation in surgical teleoperation as network quality declines, backed by 375 trials across five emulated conditions. What the work does well is integrate common network emulation with a standardized peg-transfer task and track multiple metrics at once: network QoS, objective video quality scores, freeze rates, and actual task success and timing. The public GitHub repo means others can run the same setup without starting from scratch. The results show consistent patterns, like success dropping to 35% in 4G rural and completion times rising sharply in satellite links. A potential soft spot is the assumption that the NetEm and Gilbert-Elliott emulations accurately reflect real-world surgical teleoperation networks. The paper chooses these standard tools, but real deployments might involve different loss patterns or jitter not fully captured here. Minor details like exact participant numbers and controls for learning effects are not highlighted in the abstract, though the trial count suggests some robustness. This kind of benchmark is aimed at engineers and researchers focused on real-time video for medical robotics or other critical remote operations. Readers who need data to guide codec choices or protocol improvements under impairment will get concrete numbers to work with. I would send it to peer review. The empirical grounding and reproducibility make it worth a closer look by specialists in the area.

Referee Report

1 major / 2 minor

Summary. The paper introduces VISTA, a benchmark for real-time video streaming in surgical teleoperation. It emulates five network conditions (Hospital LAN, 5G Urban, 4G Rural, LEO Satellite, GEO Satellite) via NetEm and the Gilbert-Elliott loss model, integrates a standardized peg-transfer task, and reports synchronized QoS, objective video quality (PSNR, SSIM, VMAF), freeze rate, and human performance metrics across 375 trials. Key findings include success-rate drops from 97% (Hospital LAN) to 12% (GEO Satellite) and corresponding increases in mean completion time for successful trials.

Significance. If the results hold, VISTA supplies a reproducible, publicly coded empirical benchmark that quantifies the sensitivity of surgical teleoperation to forward-path network impairments. The use of standard tools, a fixed task, and direct measurement of both objective video metrics and task outcomes strengthens its utility for the community and supports the central claim of direct performance impact.

major comments (1)

Network Emulation section: the claim that the five NetEm/Gilbert-Elliott profiles represent realistic surgical teleoperation impairments rests on parameter selection without reported validation against field traces from actual operating-room or remote-surgery deployments; this assumption is load-bearing for the benchmark's claimed applicability.

minor comments (2)

Abstract: the summary omits the number of participants/operators, whether trial order was randomized, and any statistical tests or confidence intervals supporting the reported success-rate and time differences.
Results presentation: tables or figures reporting per-condition success rates and completion times should include standard deviations or inter-quartile ranges to convey variability across the 375 trials.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. We address the major comment point by point below.

read point-by-point responses

Referee: Network Emulation section: the claim that the five NetEm/Gilbert-Elliott profiles represent realistic surgical teleoperation impairments rests on parameter selection without reported validation against field traces from actual operating-room or remote-surgery deployments; this assumption is load-bearing for the benchmark's claimed applicability.

Authors: We agree that the original manuscript does not report direct validation of the chosen parameters against field traces collected from actual surgical teleoperation deployments. Such traces are not publicly available, and collecting them would require access to operational remote-surgery systems that was outside the scope of this work. The five profiles were instead constructed from standard latency, jitter, bandwidth, and loss values drawn from the networking literature for the respective environments (Hospital LAN, 5G urban, 4G rural, LEO, and GEO satellite). In the revised manuscript we will (i) explicitly cite the literature sources used for each parameter set, (ii) describe the Gilbert-Elliott model configuration in greater detail, and (iii) add a limitations paragraph that qualifies the applicability claim and notes the absence of surgical-specific field validation. These changes make the parameter-selection process transparent and address the load-bearing assumption without overstating the realism of the emulation. revision: partial

Circularity Check

0 steps flagged

No significant circularity; purely empirical benchmark

full rationale

The paper reports direct experimental measurements from 375 controlled trials using NetEm/Gilbert-Elliott emulations on a peg-transfer task. Success rates, completion times, PSNR/SSIM/VMAF, and freeze rates are measured outcomes, not derived via equations, fitted parameters, or predictions that reduce to prior definitions. No load-bearing self-citations, ansatzes, or uniqueness theorems appear in the central claims. The study is self-contained against its stated experimental conditions and external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is an empirical benchmark relying on established open-source tools (Linux Traffic Control, NetEm) and standard video quality metrics (PSNR, SSIM, VMAF); no free parameters, domain axioms, or invented entities are required for the central claim.

pith-pipeline@v0.9.0 · 5600 in / 1227 out tokens · 62374 ms · 2026-05-12T01:37:44.169488+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

[1]

Global Surgery 2030: evidence and solutions for achieving health, wel- fare, and economic development,

J. G. Meara, A. J. M. Leather, L. Hagander, B. C. Alkire, N. Alonso, E. A. Ameh, S. W. Bickler, L. Conteh, A. J. Dare, J. Davies,et al., “Global Surgery 2030: evidence and solutions for achieving health, wel- fare, and economic development,”Lancet, vol. 386, no. 9993, pp. 569– 624, 2015

work page 2030
[2]

Transatlantic robot-assisted telesurgery,

J. Marescaux, J. Leroy, M. Gagner, F. Rubino, D. Mutter, M. Vix, S. E. Butner, and M. K. Smith, “Transatlantic robot-assisted telesurgery,” Nature, vol. 413, no. 6854, pp. 379–380, 2001

work page 2001
[3]

Insights from telesurgery expert conference on recent clinical experience and current status of remote surgery,

B. Rocco, M. C. Moschovas, S. Saikali, G. Gaia, V . Patel, and M. C. Sighinolfi, “Insights from telesurgery expert conference on recent clinical experience and current status of remote surgery,”Journal of Robotic Surgery, vol. 18, no. 1, p. 240, 2024

work page 2024
[4]

Impact of the suboptimal communication network environment on telerobotic surgery performance and surgeon fatigue,

H. Akasaka, K. Hakamada, H. Morohashi, T. Kanno, K. Kawashima, Y . Ebihara, E. Oki, S. Hirano, and M. Mori, “Impact of the suboptimal communication network environment on telerobotic surgery performance and surgeon fatigue,”PLoS One, vol. 17, no. 6, p. e0270039, 2022

work page 2022
[5]

Service requirements for Video, Imaging and Audio for Profes- sional Applications (VIAPA),

3GPP, “Service requirements for Video, Imaging and Audio for Profes- sional Applications (VIAPA),”3GPP TS 22.263, Release 19, 2025

work page 2025
[6]

Influence of network latency and bandwidth on robot- assisted laparoscopic telesurgery: A pre-clinical experiment,

Y . Wang, Q. Ai, T. Shi, B. Gao, W. Zhao, C. Jiang, G. Liu, L. Zhang, H. Li, et al., “Influence of network latency and bandwidth on robot- assisted laparoscopic telesurgery: A pre-clinical experiment,”Chinese Medical Journal, vol. 138, no. 3, pp. 325–331, 2025

work page 2025
[7]

Telesurgery: current status and strategies for latency reduction,

Z. Y . Motiwala, A. Desai, R. Bisht, S. Lathkar, S. Misra, and D. D. Carbin, “Telesurgery: current status and strategies for latency reduction,” J. Robot. Surg., vol. 19, no. 1, p. 153, 2025

work page 2025
[8]

Analyzing Real-time Video Delivery over Cellular Networks for Remote Piloting Aerial Vehicles,

A. Baltaci, H. Cech, N. Mohan, F. Geyer, V . Bajpai, J. Ott, and D. Schupke, “Analyzing Real-time Video Delivery over Cellular Networks for Remote Piloting Aerial Vehicles,” inProc. ACM Internet Measure- ment Conference (IMC), 2022, pp. 98–112

work page 2022
[9]

TeleSim: A Network-Aware Testbed and Benchmark Dataset for Telerobotic Applications

Z. Deng, Z. Yuan, and L. Zou, “TeleSim: A Network-Aware Testbed and Benchmark Dataset for Telerobotic Applications,”arXiv preprint arXiv:2507.04425, 2025

work page internal anchor Pith review arXiv 2025
[10]

Human factors in telesurgery: Effects of time delay and asynchrony in video and control feedback with local manipulative assistance,

J. M. Thompson, M. P. Ottensmeyer, and T. B. Sheridan, “Human factors in telesurgery: Effects of time delay and asynchrony in video and control feedback with local manipulative assistance,”Telemedicine Journal, vol. 5, no. 2, pp. 129–137, 1999

work page 1999
[11]

Determination of the latency effects on surgical performance and the acceptable latency levels in telesurgery using the dV-Trainer® simulator,

S. Xu, M. Perez, K. Yang, C. Perrenot, J. Felblinger, and J. Hubert, “Determination of the latency effects on surgical performance and the acceptable latency levels in telesurgery using the dV-Trainer® simulator,”Surg. Endosc., vol. 28, no. 9, pp. 2569–2576, Sep. 2014, doi: 10.1007/s00464-014-3504-z

work page doi:10.1007/s00464-014-3504-z 2014
[12]

Effect of video lag on laparoscopic surgery: correlation between performance and usability at low latencies,

A. Kumcu, L. Vermeulen, S. A. Elprama, P. Duysburgh, L. Plati ˇsa, Y . Van Nieuwenhove, N. Van De Winkel, A. Jacobs, J. Van Looy, and W. Philips, “Effect of video lag on laparoscopic surgery: correlation between performance and usability at low latencies,”Int. J. Med. Robot. Comput. Assist. Surg., vol. 13, no. 2, Art. no. e1758, 2017, doi: 10.1002/rcs.1758

work page doi:10.1002/rcs.1758 2017
[13]

tc-netem(8) — Linux manual page,

F. Ludovici and H. P. Pfeifer, “tc-netem(8) — Linux manual page,” man7.org. [Online]. Available: https://www.man7.org/linux/man-pages/ man8/tc-netem.8.html. Accessed: Mar. 14, 2026

work page 2026
[14]

Capacity of a burst-noise channel,

E. N. Gilbert, “Capacity of a burst-noise channel,”Bell Syst. Tech. J., vol. 39, no. 5, pp. 1253–1265, 1960

work page 1960
[15]

The Gilbert–Elliott model for packet loss in real-time services on the Internet,

G. Haßlinger and O. Hohlfeld, “The Gilbert–Elliott model for packet loss in real-time services on the Internet,” inProc. 14th GI/ITG Conf. Meas., Model. Eval. Comput. Commun. Syst. (MMB), Dortmund, Germany, 2008, pp. 1–15

work page 2008
[16]

Network performance objectives for IP-based services,

ITU-T, “Network performance objectives for IP-based services,”ITU-T Rec. Y.1541, 2011

work page 2011
[17]

Service requirements for the 5G system,

3GPP, “Service requirements for the 5G system,”3GPP TS 22.261, Release 17

work page
[18]

Breaking Through the Clouds: Performance Insights into Starlink’s Latency and Packet Loss,

R. Richter, V . Ververis, and V . Bajpai, “Breaking Through the Clouds: Performance Insights into Starlink’s Latency and Packet Loss,” inIFIP Networking Conference (Networking), 2025

work page 2025
[19]

One-way transmission time,

ITU-T, “One-way transmission time,”ITU-T Rec. G.114, 2003

work page 2003
[20]

A systematic review of virtual reality simulators for robot-assisted surgery,

A. Moglia, V . Ferrari, L. Morelli, M. Ferrari, F. Mosca, and A. Cuschieri, “A systematic review of virtual reality simulators for robot-assisted surgery,”Eur. Urol., vol. 69, no. 6, pp. 1065–1080, 2016

work page 2016
[21]

Does training novices to criteria and does rapid acquisition of skills on laparoscopic simulators have predictive validity or are we just playing video games?,

N. J. Hogle, W. D. Widmann, A. O. Ude, M. A. Hardy, and D. L. Fowler, “Does training novices to criteria and does rapid acquisition of skills on laparoscopic simulators have predictive validity or are we just playing video games?,”J. Surg. Educ., vol. 65, no. 6, pp. 431–435, 2008

work page 2008
[22]

x264 — a free H.264/A VC encoder,

VideoLAN Organization, “x264 — a free H.264/A VC encoder,” [On- line]. Available: https://www.videolan.org/developers/x264.html. Ac- cessed: Mar. 14, 2026

work page 2026