QA-SNNE adds question-answer alignment via bilateral gating to semantic nearest neighbor entropy, yielding higher AUROC for uncertainty detection in surgical VQA models under both standard and rephrased questions.
arXiv preprint arXiv:2502.14149 (2025)
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3representative citing papers
SurgLQA introduces FTC for compact long-range video representations and TMS for adaptive test-time scaling, reporting gains on restructured Colon-LQA and REAL-Colon-VQA benchmarks.
SurgViVQA adds temporal video encoding to surgical VideoQA and reports 9-11% gains in keyword accuracy over image-only baselines on two datasets plus improved robustness to question rephrasing.
citing papers explorer
-
When to Trust the Answer: Question-Aligned Semantic Nearest Neighbor Entropy for Safer Surgical VQA
QA-SNNE adds question-answer alignment via bilateral gating to semantic nearest neighbor entropy, yielding higher AUROC for uncertainty detection in surgical VQA models under both standard and rephrased questions.
-
SurgLQA: Scalable Long-Horizon Surgical Video Question Answering
SurgLQA introduces FTC for compact long-range video representations and TMS for adaptive test-time scaling, reporting gains on restructured Colon-LQA and REAL-Colon-VQA benchmarks.
-
SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding
SurgViVQA adds temporal video encoding to surgical VideoQA and reports 9-11% gains in keyword accuracy over image-only baselines on two datasets plus improved robustness to question rephrasing.