EAPO uses policy entropy ratio to adaptively weight positive samples in RLVR for open-ended QA, claiming better diversity and stability than fixed-weight baselines on medical datasets.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
C-MIG uses multi-view information gain from retrieved documents and refinements to supervise RAG-RL for clinical diagnosis, claiming top performance on four medical benchmarks.
citing papers explorer
-
EAPO: Entropy-Driven Adaptive Positive-Negative Sample Weighting for Policy Optimization in Open-Ended QA
EAPO uses policy entropy ratio to adaptively weight positive samples in RLVR for open-ended QA, claiming better diversity and stability than fixed-weight baselines on medical datasets.
-
C-MIG: Multi-view Information Gain-based Retrieval-Augmented Generation for Clinical Diagnosis Reasoning
C-MIG uses multi-view information gain from retrieved documents and refinements to supervise RAG-RL for clinical diagnosis, claiming top performance on four medical benchmarks.