Forced CoT produces video-dependent reasoning chains but does not improve MCQ accuracy on Qwen2.5-VL with Video-MME and causes a small drop on the 7B variant.
Findings of the Association for Computational Linguistics: EMNLP 2025 , pages =
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
Configuration choices alone flip pairwise safety verdicts on every tested alignment benchmark, isolated via a finite-envelope proposition linking disagreement rate to strict ordering reversal.
A prefix-window mean-NLL memorization probe disagrees with full-span NLL and exact-recall in three cases on a controlled autoregressive testbed, leading to recommendations for multi-probe reporting.
citing papers explorer
-
Chains That See, Answers That Don't: A Multi-Aspect Evaluation Recipe for Forced Chain-of-Thought on Video-MME
Forced CoT produces video-dependent reasoning chains but does not improve MCQ accuracy on Qwen2.5-VL with Video-MME and causes a small drop on the 7B variant.