CaC presents a new spatiotemporal concentrating reward model for video anomalies, built on a novel large-scale dataset and three-stage training with RL and IoU rewards, claiming 25.7% accuracy gains and 11.7% anomaly reduction.
Unveiling the cognitive compass: Theory-of-mind-guided multimodal emotion reasoning, 2026
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
The paper creates InsightVQA, a 725K QA-pair benchmark with perception, grounded-understanding, and cognition levels for emotion-cognitive visual question answering, plus a 30K-sample evaluation set and InsightNet baseline.
citing papers explorer
-
CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating
CaC presents a new spatiotemporal concentrating reward model for video anomalies, built on a novel large-scale dataset and three-stage training with RL and IoU rewards, claiming 25.7% accuracy gains and 11.7% anomaly reduction.
-
InsightVQA: High-Dimensional Emotion-Cognitive Visual Question Answering Benchmark
The paper creates InsightVQA, a 725K QA-pair benchmark with perception, grounded-understanding, and cognition levels for emotion-cognitive visual question answering, plus a 30K-sample evaluation set and InsightNet baseline.