DGAO uses reinforcement learning to optimize LLMs for both accuracy and order stability by balancing intra-group accuracy advantages and inter-group stability advantages.
arXiv preprint arXiv:2410.16983 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A training-free attention-guided debiasing framework mitigates position bias in MLLM multi-image retrieval by exploiting the observed mismatch between biased logits and aligned attention maps, yielding over 40% accuracy gains on MS-COCO benchmarks.
citing papers explorer
-
Logit-Attention Divergence: Mitigating Position Bias in Multi-Image Retrieval via Attention-Guided Calibration
A training-free attention-guided debiasing framework mitigates position bias in MLLM multi-image retrieval by exploiting the observed mismatch between biased logits and aligned attention maps, yielding over 40% accuracy gains on MS-COCO benchmarks.