PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
arXiv preprint arXiv:2503.02863 (2025)
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Introduces Zoom-then-Diagnose paradigm and uncertainty-aware reward in GRPO for confidence-aware ultrasound VQA, reporting 39.3% improvement in lesion localization across liver, breast, and thyroid datasets.
CoMet decomposes MLLM uncertainty into context-specific and multiplicity-specific terms estimated by a trained post-hoc module, improving performance on open-ended multimodal benchmarks and hallucination detection.
citing papers explorer
-
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
-
Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming
Introduces Zoom-then-Diagnose paradigm and uncertainty-aware reward in GRPO for confidence-aware ultrasound VQA, reporting 39.3% improvement in lesion localization across liver, breast, and thyroid datasets.
-
CoMet: Context and Multiplicity Decomposition for Multimodal Uncertainty Estimation
CoMet decomposes MLLM uncertainty into context-specific and multiplicity-specific terms estimated by a trained post-hoc module, improving performance on open-ended multimodal benchmarks and hallucination detection.