DGAO uses reinforcement learning to optimize LLMs for both accuracy and order stability by balancing intra-group accuracy advantages and inter-group stability advantages.
arXiv preprint arXiv:2406.02536 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A training-free attention-guided debiasing framework mitigates position bias in MLLM multi-image retrieval by exploiting the observed mismatch between biased logits and aligned attention maps, yielding over 40% accuracy gains on MS-COCO benchmarks.
Saliency-R1 uses a novel saliency map technique and GRPO with human bounding-box overlap as reward to improve VLM reasoning faithfulness and interpretability.
citing papers explorer
-
Towards Order Fairness: Mitigating LLMs Order Sensitivity through Dual Group Advantage Optimization
DGAO uses reinforcement learning to optimize LLMs for both accuracy and order stability by balancing intra-group accuracy advantages and inter-group stability advantages.
-
Logit-Attention Divergence: Mitigating Position Bias in Multi-Image Retrieval via Attention-Guided Calibration
A training-free attention-guided debiasing framework mitigates position bias in MLLM multi-image retrieval by exploiting the observed mismatch between biased logits and aligned attention maps, yielding over 40% accuracy gains on MS-COCO benchmarks.
-
Saliency-R1: Enforcing Interpretable and Faithful Vision-language Reasoning via Saliency-map Alignment Reward
Saliency-R1 uses a novel saliency map technique and GRPO with human bounding-box overlap as reward to improve VLM reasoning faithfulness and interpretability.