arXiv preprint arXiv:2311.08045 , year=

Adversarial preference optimization: Enhancing your alignment via rm-llm game , author= · arXiv 2311.08045

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

BV-Blend: Uncertainty-Weighted Historical Baselines for Stable Critic-Free RL with Verifiable Rewards

cs.AI · 2026-06-27 · unverdicted · novelty 4.0

BV-Blend blends prompt-local and semantic-cluster historical reward statistics via SEM-derived weights to stabilize critic-free RL advantage estimation.

citing papers explorer

Showing 1 of 1 citing paper.

BV-Blend: Uncertainty-Weighted Historical Baselines for Stable Critic-Free RL with Verifiable Rewards cs.AI · 2026-06-27 · unverdicted · none · ref 42
BV-Blend blends prompt-local and semantic-cluster historical reward statistics via SEM-derived weights to stabilize critic-free RL advantage estimation.

arXiv preprint arXiv:2311.08045 , year=

fields

years

verdicts

representative citing papers

citing papers explorer