pith. sign in

arxiv: 2501.09209 · v2 · submitted 2025-01-16 · 💻 cs.CV

Surgical Visual Understanding (SurgVU) Dataset

Pith reviewed 2026-05-23 05:55 UTC · model grok-4.3

classification 💻 cs.CV
keywords surgical datasetrobotic surgeryvisual understandingmachine learningtool detectionvisual question answeringsurgical data science
0
0 comments X

The pith

A dataset of robotic surgery videos paired with labels is released to support machine learning work in surgical data science.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the Surgical Visual Understanding dataset, which consists of surgical videos collected during robotic-assisted procedures along with their corresponding labels. It describes the collection process and notes several distinctive attributes of the data. The authors outline example problems such as tool detection and visual question answering that the dataset can address. Although assembled around a specific set of challenges, the work states that the resource is general enough for a wide range of machine learning questions. The release is intended to connect the broader machine learning community with open problems in surgical visual understanding.

Core claim

We present a large dataset of surgical videos and their accompanying labels for foundational work in surgical data science. The videos come from robotic-assisted surgeries and carry labels suited to multiple tasks. A validation set for tool detection and a sample set of question-answer pairs for visual question answering are also supplied. The dataset is made available through public links so that it can serve as a shared resource for future research.

What carries the argument

The SurgVU dataset of surgical videos and labels, which carries the argument by supplying raw visual data and annotations for training and testing models on surgical scenes.

If this is right

  • Models can be trained to detect surgical tools directly from the labeled video frames.
  • Question-answer pairs enable development of systems that answer queries about surgical scenes.
  • The dataset supplies a common benchmark that different research groups can use to compare methods.
  • Public release of both videos and labels allows reproduction and extension of experiments in surgical data science.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same videos could be reused to study temporal patterns such as procedure phase recognition if additional time-stamped labels were added later.
  • Combining SurgVU with non-surgical video datasets might test whether general video models transfer to the surgical domain.
  • Hospitals could use the dataset to prototype privacy-preserving training pipelines before applying them to their own private recordings.

Load-bearing premise

The dataset, although curated for a particular set of scientific challenges, is general enough to be used for a broad range of machine learning questions.

What would settle it

A controlled test in which models trained on the SurgVU training split show no improvement over random baselines when evaluated on the provided public validation set for tool detection would indicate the labels do not support the intended tasks.

Figures

Figures reproduced from arXiv: 2501.09209 by Aneeq Zia, Anthony Jarc, Benjamin Mueller, Conor Perreault, Kiran Bhattacharyya, Max Berniker, Rogerio Nespolo, Ryan Schmidt, Xiaorui Zhang, Xi Liu, Ziheng Wang.

Figure 1
Figure 1. Figure 1: Sample frames of multiple surgical tasks included in our dataset. Note that some frames capture [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: example images of surgical instruments present in the dataset [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of tool labels in training dataset [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of surgical tasks in training dataset [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Class distribution of tools in the validation dataset [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Sample frames with bounding boxes. To prevent the embedded information (description of the [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
read the original abstract

Owing to recent advances in machine learning and the ability to harvest large amounts of data during robotic-assisted surgeries, surgical data science is ripe for foundational work. We present a large dataset of surgical videos and their accompanying labels for this purpose. We describe how the data was collected and some of its unique attributes. Multiple example problems are outlined. Although the dataset was curated for a particular set of scientific challenges (in an accompanying paper), it is general enough to be used for a broad range machine learning questions. Our hope is that this dataset exposes the larger machine learning community to the challenging problems within surgical data science, and becomes a touch-stone for future research. The videos are available at https://storage.googleapis.com/isi-surgvu/surgvu24_videos_only.zip, the labels at https://storage.googleapis.com/isi-surgvu/surgvu24_labels_updated_v2.zip, a validation set for tool detection problem at https://storage.googleapis.com/isi-surgvu/cat1_test_set_public.zip, and a sample set of question & answer pairs dataset for surgical visual question answering at https://storage.googleapis.com/isi-surgvu/SURGVU25_cat_2_sample_set_public.zip.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript announces the release of the Surgical Visual Understanding (SurgVU) Dataset consisting of surgical videos from robotic-assisted procedures together with accompanying labels. It describes the collection process and unique attributes of the data, outlines multiple example problems, and supplies public download links for the full videos, labels, a validation set for tool detection, and a sample set for surgical visual question answering. The authors note that the dataset was curated for specific challenges in an accompanying paper but assert that it remains general enough for a broad range of machine learning questions, with the goal of engaging the larger ML community in surgical data science.

Significance. If the dataset is released and documented as described, the contribution is a publicly accessible, large-scale labeled surgical video resource that can support benchmarking and foundational modeling in computer vision and surgical data science. The inclusion of task-specific subsets (tool detection validation and VQA samples) and direct Google Cloud links strengthens accessibility and reproducibility. This type of data descriptor can serve as a touchstone resource for the field.

minor comments (2)
  1. [Abstract] Abstract: the phrase 'a broad range machine learning questions' is missing 'of' and should read 'a broad range of machine learning questions'.
  2. [Abstract] Abstract: 'touch-stone' should be written as the single word 'touchstone'.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision for our manuscript describing the SurgVU dataset release. No specific major comments were listed in the report.

Circularity Check

0 steps flagged

No significant circularity: dataset release paper with no derivations or fitted claims

full rationale

The paper is a data release announcement describing collection of surgical videos and labels, providing download links, and outlining example problems. No equations, predictions, parameters, or derivation chains exist. The generality statement is an assertion, not a tested result derived from the data. No self-citations are load-bearing for any mathematical claim. The central contribution (public data availability) is directly supported by external links and does not reduce to any internal construction or fit.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset release paper with no mathematical content, free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5778 in / 954 out tokens · 54415 ms · 2026-05-23T05:55:55.775573+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures

    eess.IV 2025-06 unverdicted novelty 7.0

    Introduces the first publicly accessible native 4K resolution endoscopic video dataset for robotic-assisted minimally invasive procedures.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Estimation of the acquisition and operating costs for robotic surgery

    Christopher P Childers and Melinda Maggard-Gibbons. Estimation of the acquisition and operating costs for robotic surgery. Jama, 320(8):835–836, 2018

  2. [2]

    Trends in robot-assisted procedures for general surgery in the veterans health administration

    Michael A Mederos, R Lorie Jacob, Rachel Ward, Rivfka Shenoy, Melinda M Gibbons, Mark D Girgis, Devan Kansagara, Denise Hynes, Paul G Shekelle, and Karli Kondo. Trends in robot-assisted procedures for general surgery in the veterans health administration. Journal of Surgical Research , 279:788–795, 2022

  3. [3]

    Robotic surgery: finding value in 2019 and beyond

    Rafael E Perez and Steven D Schwaitzberg. Robotic surgery: finding value in 2019 and beyond. Annals of Laparoscopic and Endoscopic Surgery , 4, 2019

  4. [4]

    Exploring the paradigm of robotic surgery and its contribution to the growth of surgical volume

    Emily A Grimsley, Tara M Barry, Haroon Janjua, Emanuel Eguia, Christopher DuCoin, and Paul C Kuo. Exploring the paradigm of robotic surgery and its contribution to the growth of surgical volume. Surgery Open Science , 10:36–42, 2022

  5. [5]

    Status of robotic assisted surgery (ras) and the effects of coronavirus (covid-19) on ras in the department of defense (dod)

    Kayla R Rizzo, Samuel Grasso, Brandon Ford, Alex Myers, Emily Ofstun, and Avery Walker. Status of robotic assisted surgery (ras) and the effects of coronavirus (covid-19) on ras in the department of defense (dod). Journal of Robotic Surgery , 17(2):413–417, 2023

  6. [6]

    Biomedical image analysis competitions: The state of current participation practice

    Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J Adler, Patrick Godau, Veronika Cheplygina, Michal Kozubek, Sharib Ali, et al. Biomedical image analysis competitions: The state of current participation practice. arXiv preprint arXiv:2212.08568 , 2022

  7. [7]

    Why is the winner the best? arXiv preprint arXiv:2303.17719 , 2023

    Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J Adler, Sharib Ali, Vincent Andrearczyk, Marc Aubreville, Ujjwal Baid, et al. Why is the winner the best? arXiv preprint arXiv:2303.17719 , 2023

  8. [8]

    Intuitive Surgical SurgToolLoc and SurgVU Challenges Results: 2022-2025

    Aneeq Zia, Kiran Bhattacharyya, Xi Liu, Max Berniker, Ziheng Wang, Rogerio Nespolo, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Bo Liu, et al. Surgical tool classification and localization: results and methods from the miccai 2022 surgtoolloc challenge. arXiv preprint arXiv:2305.07152 , 2023

  9. [9]

    Endoscopic vision challenge 2021

    Stefanie Speidel, Lena Maier-Hein, Danail Stoyanov, Sebastian Bodenstedt, Martin Wagner, Beat M¨ uller, Jonathan Chen, Benjamin M¨ uller, Franziska Mathis-Ullrich, Paul Scheikl, et al. Endoscopic vision challenge 2021. In 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021) , 2021

  10. [10]

    15100264

    A Zia, X Liu, K Bhattacharyya, Z Wang, M Berniker, A Jarc, C Nwoye, D Alapatt, A Murali, S Sharma, et al. Endoscopic vision challenge 2022. In 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Zenodo. https://doi. org/10.5281/zenodo , volume 6362288, 2022

  11. [11]

    Surgical data science– from concepts to clinical translation

    Lena Maier-Hein, Matthias Eisenmann, Duygu Sarikaya, Keno M¨ arz, Toby Collins, Anand Malpani, Johannes Fallert, Hubertus Feussner, Stamatia Giannarou, Pietro Mascagni, et al. Surgical data science– from concepts to clinical translation. arXiv preprint arXiv:2011.02284 , 2020

  12. [12]

    2017 Robotic Instrument Segmentation Challenge

    Max Allan, Alex Shvets, Thomas Kurmann, Zichen Zhang, Rahul Duggal, Yun-Hsuan Su, Nicola Rieke, Iro Laina, Niveditha Kalavakonda, Sebastian Bodenstedt, et al. 2017 robotic instrument segmentation challenge. arXiv preprint arXiv:1902.06426 , 2019

  13. [13]

    arXiv preprint arXiv:2001.11190 (2020)

    Max Allan, Satoshi Kondo, Sebastian Bodenstedt, Stefan Leger, Rahim Kadkhodamohammadi, Imanol Luengo, Felix Fuentes, Evangello Flouty, Ahmed Mohammed, Marius Pedersen, et al. 2018 robotic scene segmentation challenge. arXiv preprint arXiv:2001.11190 , 2020

  14. [14]

    Endonet: a deep architecture for recognition tasks on laparoscopic videos

    Andru P Twinanda, Sherif Shehata, Didier Mutter, Jacques Marescaux, Michel De Mathelin, and Nicolas Padoy. Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE transactions on medical imaging , 36(1):86–97, 2016. 7

  15. [15]

    A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery

    Narges Ahmidi, Lingling Tao, Shahin Sefati, Yixin Gao, Colin Lea, Benjamin Bejar Haro, Luca Zappella, Sanjeev Khudanpur, Ren´ e Vidal, and Gregory D Hager. A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Transactions on Biomedical Engineering , 64(9):2025– 2041, 2017

  16. [16]

    Comparative valida- tion of machine learning algorithms for surgical workflow and skill analysis with the heichole benchmark

    Martin Wagner, Beat-Peter M¨ uller-Stich, Anna Kisilenko, Duc Tran, Patrick Heger, Lars M¨ undermann, David M Lubotsky, Benjamin M¨ uller, Tornike Davitashvili, Manuela Capek, et al. Comparative valida- tion of machine learning algorithms for surgical workflow and skill analysis with the heichole benchmark. arXiv preprint arXiv:2109.14956 , 2021

  17. [17]

    Surgical visual domain adaptation: results from the miccai 2020 surgvisdom challenge

    Aneeq Zia, Kiran Bhattacharyya, Xi Liu, Ziheng Wang, Satoshi Kondo, Emanuele Colleoni, Beatrice van Amsterdam, Razeen Hussain, Raabid Hussain, Lena Maier-Hein, et al. Surgical visual domain adaptation: results from the miccai 2020 surgvisdom challenge. arXiv preprint arXiv:2102.13644 , 2021

  18. [18]

    Objective surgical skills assessment and tool localization: Results from the miccai 2021 simsurgskill challenge

    Aneeq Zia, Kiran Bhattacharyya, Xi Liu, Ziheng Wang, Max Berniker, Satoshi Kondo, Emanuele Colleoni, Dimitris Psychogyios, Yueming Jin, Jinfan Zhou, et al. Objective surgical skills assessment and tool localization: Results from the miccai 2021 simsurgskill challenge. arXiv preprint arXiv:2212.04448 , 2022. 8