CF-GRPO creates a consensus frame prior from intrinsic video cues and aligns it with model frame-use scores via a reward signal to enable evidence-aware reasoning in Video-MLLMs without temporal annotations.
Se- ViCES: Unifying semantic-visual evidence consen- sus for long video understanding,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
CoVER framework lets Video-LLMs gather query-expanded visual evidence and verify answers with answer-clue visual feedback to improve long-video understanding.
citing papers explorer
-
Reasoning as Intersection: Consensus-Frame Alignment for Visual Focus in Video-MLLMs
CF-GRPO creates a consensus frame prior from intrinsic video cues and aligns it with model frame-use scores via a reward signal to enable evidence-aware reasoning in Video-MLLMs without temporal annotations.
-
See More, Think Deeper: Query-Expanded Visual Evidence and Answer-Clue Guided Reflection for Long Video Understanding
CoVER framework lets Video-LLMs gather query-expanded visual evidence and verify answers with answer-clue visual feedback to improve long-video understanding.