OR3 converts OR clips to action-driven digital twins, uses LLM imagination for hypothetical ActDTs, and achieves 57.6 R@1 and 77.3 R@5 on 276 implicit queries from 386 robotic knee procedure clips, outperforming baselines.
fine-clip: Enhancing zero-shot fine-grained surgical action recognition with vision-language models.arXiv preprint arXiv:2503.19670, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Reasoning Text-to-Video Retrieval for Operating Room Clips via Action-Driven Digital Twins
OR3 converts OR clips to action-driven digital twins, uses LLM imagination for hypothetical ActDTs, and achieves 57.6 R@1 and 77.3 R@5 on 276 implicit queries from 386 robotic knee procedure clips, outperforming baselines.