CrashSight is a new infrastructure-focused benchmark showing that state-of-the-art vision-language models can describe crash scenes but fail at temporal and causal reasoning.
Conan: Progressive learning to reason like a detective over multi-scale visual evidence
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
ParaVT introduces the first multi-agent RL framework for parallel video tool calling in LMMs, using PARA-GRPO to resolve the Tool Prior Paradox and achieve +7.9% average improvement over Qwen3-VL baseline across six benchmarks.
citing papers explorer
-
CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning
CrashSight is a new infrastructure-focused benchmark showing that state-of-the-art vision-language models can describe crash scenes but fail at temporal and causal reasoning.
-
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
ParaVT introduces the first multi-agent RL framework for parallel video tool calling in LMMs, using PARA-GRPO to resolve the Tool Prior Paradox and achieve +7.9% average improvement over Qwen3-VL baseline across six benchmarks.