Conan: Progressive learning to reason like a detective over multi-scale visual evidence

Kun Ouyang, Yuanxin Liu, Linli Yao, Yishuo Cai, Hao Zhou, Jie Zhou, Fandong Meng, Xu Sun · 2025 · arXiv 2510.20470

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

CrashSight is a new infrastructure-focused benchmark showing that state-of-the-art vision-language models can describe crash scenes but fail at temporal and causal reasoning.

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

cs.CV · 2026-05-19 · unverdicted · novelty 6.0 · 2 refs

ParaVT introduces the first multi-agent RL framework for parallel video tool calling in LMMs, using PARA-GRPO to resolve the Tool Prior Paradox and achieve +7.9% average improvement over Qwen3-VL baseline across six benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning cs.CV · 2026-04-09 · unverdicted · none · ref 18
CrashSight is a new infrastructure-focused benchmark showing that state-of-the-art vision-language models can describe crash scenes but fail at temporal and causal reasoning.
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning cs.CV · 2026-05-19 · unverdicted · none · ref 21 · 2 links
ParaVT introduces the first multi-agent RL framework for parallel video tool calling in LMMs, using PARA-GRPO to resolve the Tool Prior Paradox and achieve +7.9% average improvement over Qwen3-VL baseline across six benchmarks.

Conan: Progressive learning to reason like a detective over multi-scale visual evidence

fields

years

verdicts

representative citing papers

citing papers explorer