arXiv preprint arXiv:2503.02863 (2025)

Zhou, Z · 2025 · arXiv 2503.02863

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

CaliDist: Calibrating Large Language Models via Behavioral Robustness to Distraction

cs.LG · 2026-06-04 · unverdicted · novelty 6.0

CaliDist calibrates LLMs by scaling confidence according to how much predictions change under semantic distractors, cutting average ECE from 23% to 7% on seven NLU benchmarks across six models.

Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming

cs.CV · 2026-05-20 · unverdicted · novelty 6.0 · 2 refs

Introduces Zoom-then-Diagnose paradigm and uncertainty-aware reward in GRPO for confidence-aware ultrasound VQA, reporting 39.3% improvement in lesion localization across liver, breast, and thyroid datasets.

CoMet: Context and Multiplicity Decomposition for Multimodal Uncertainty Estimation

cs.LG · 2026-06-30 · unverdicted · novelty 5.0

CoMet decomposes MLLM uncertainty into context-specific and multiplicity-specific terms estimated by a trained post-hoc module, improving performance on open-ended multimodal benchmarks and hallucination detection.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

arXiv preprint arXiv:2503.02863 (2025)

fields

years

verdicts

representative citing papers

citing papers explorer