MLLMs drop from over 85% accuracy on action presence to under 50% on matched action-denial videos, exposing a causal verification gap that causal graph prompts partially close.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
BridgeVLM internalizes causal supervision in VLMs via causal graph induction, Causal Tokens, and RAMP layers with M3S training, raising intervention accuracy on CausalVLBench from 33.2% to 54.4% and structure learning F1 from 33.4% to 75.1%.
citing papers explorer
-
Learning to Deny: Action Denial in Multimodal Large Language Models
MLLMs drop from over 85% accuracy on action presence to under 50% on matched action-denial videos, exposing a causal verification gap that causal graph prompts partially close.
-
From Prompts to Tokens: Internalizing Causal Supervision in Vision-Language Model for Multi-Image Causal Reasoning
BridgeVLM internalizes causal supervision in VLMs via causal graph induction, Causal Tokens, and RAMP layers with M3S training, raising intervention accuracy on CausalVLBench from 33.2% to 54.4% and structure learning F1 from 33.4% to 75.1%.