CapRL++ applies reinforcement learning with verifiable rewards to dense image and video captioning by scoring captions via the accuracy of a vision-free LLM answering MCQs from the caption alone.
Toward robust hyper-detailed image captioning: A multiagent approach and dual evaluation metrics for factuality and coverage.arXiv preprint arXiv:2412.15484, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
TLVS mitigates hallucinations in LVLMs via token-level extraction and visual-sensitivity-adaptive steering applied only at critical decoding steps.
citing papers explorer
-
CapRL++: Unified Reinforcement Learning with Verifiable Rewards for Dense Image and Video Captioning
CapRL++ applies reinforcement learning with verifiable rewards to dense image and video captioning by scoring captions via the accuracy of a vision-free LLM answering MCQs from the caption alone.
-
Steer Where It Matters: Token-Level Visual-Sensitivity Steering for LVLMs Hallucination Mitigation
TLVS mitigates hallucinations in LVLMs via token-level extraction and visual-sensitivity-adaptive steering applied only at critical decoding steps.