arXiv preprint arXiv:2511.23478 , year=

Video-R2: Reinforcing Consistent, Grounded Reasoning in Multimodal Language Models , author= · 2025 · arXiv 2511.23478

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

VTI-CoT: Visual-Textual Interleaved Chain of Thought for Video Reasoning

cs.CV · 2026-06-04 · unverdicted · novelty 6.0

VTI-CoT proposes a visual-textual interleaved chain-of-thought method for video reasoning, built via automated annotation and OCR compression, claiming SOTA performance and better training efficiency on same-scale models.

See More, Think Deeper: Query-Expanded Visual Evidence and Answer-Clue Guided Reflection for Long Video Understanding

cs.CV · 2026-06-08 · unverdicted · novelty 5.0

CoVER framework lets Video-LLMs gather query-expanded visual evidence and verify answers with answer-clue visual feedback to improve long-video understanding.

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

cs.CV · 2026-06-05 · unverdicted · novelty 4.0

This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Watch, Remember, Reason: Human-View Video Understanding with MLLMs cs.CV · 2026-06-05 · unverdicted · none · ref 231
This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.

arXiv preprint arXiv:2511.23478 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer