Set-to-set distances on sentence embeddings provide a permutation-invariant reward signal that improves GRPO training and enables efficient test-time scaling for vision-language models generating chest X-ray reports.
More bang for the buck: Process reward modeling with entropy-driven uncertainty.arXiv preprint arXiv:2503.22233, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SDR: Set-Distance Rewards for Radiology Report Generation
Set-to-set distances on sentence embeddings provide a permutation-invariant reward signal that improves GRPO training and enables efficient test-time scaling for vision-language models generating chest X-ray reports.