Title resolution pending

Hu, J · 2025 · arXiv 2502.06428

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.

Act2See: Emergent Active Visual Perception for Video Reasoning

cs.CV · 2026-05-03 · unverdicted · novelty 7.0

Act2See trains VLMs via supervised fine-tuning on verified reasoning traces to interleave active frame calls within text CoTs, yielding SOTA results on video reasoning benchmarks.

Swift Sampling: Selecting Temporal Surprises via Taylor Series

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

Swift Sampling is a training-free frame selection method that uses Taylor expansions on video latent trajectories to pick temporally surprising frames, outperforming uniform sampling on long-video QA tasks.

PointLLM-R: Enhancing 3D Point Cloud Reasoning via Chain-of-Thought

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

PointLLM-R is a 3D multimodal model fine-tuned on the new 55K-sample PoCoTI CoT dataset built via VLM-based refinement and Human-in-the-Loop Prompt Optimization, achieving SOTA on generative 3D classification and captioning.

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

cs.CV · 2025-03-16 · unverdicted · novelty 2.0

The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.

citing papers explorer

Showing 5 of 5 citing papers.

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction cs.CV · 2026-06-28 · unverdicted · none · ref 19
Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.
Act2See: Emergent Active Visual Perception for Video Reasoning cs.CV · 2026-05-03 · unverdicted · none · ref 18
Act2See trains VLMs via supervised fine-tuning on verified reasoning traces to interleave active frame calls within text CoTs, yielding SOTA results on video reasoning benchmarks.
Swift Sampling: Selecting Temporal Surprises via Taylor Series cs.CV · 2026-05-21 · unverdicted · none · ref 46
Swift Sampling is a training-free frame selection method that uses Taylor expansions on video latent trajectories to pick temporally surprising frames, outperforming uniform sampling on long-video QA tasks.
PointLLM-R: Enhancing 3D Point Cloud Reasoning via Chain-of-Thought cs.CV · 2026-05-21 · unverdicted · none · ref 3
PointLLM-R is a 3D multimodal model fine-tuned on the new 55K-sample PoCoTI CoT dataset built via VLM-based refinement and Human-in-the-Loop Prompt Optimization, achieving SOTA on generative 3D classification and captioning.
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey cs.CV · 2025-03-16 · unverdicted · none · ref 112
The paper provides the first comprehensive survey of multimodal chain-of-thought reasoning, including foundational concepts, a taxonomy of methodologies, application analyses, challenges, and future directions.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer