pith. machine review for the scientific record. sign in

CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Policy Improvement Reinforcement Learning

cs.LG · 2026-04-01 · unverdicted · novelty 6.0

PIRL maximizes cumulative policy improvement across iterations instead of surrogate rewards and is proven aligned with final performance; PIPO implements it via retrospective verification for stable closed-loop optimization.

citing papers explorer

Showing 1 of 1 citing paper.

  • Policy Improvement Reinforcement Learning cs.LG · 2026-04-01 · unverdicted · none · ref 31

    PIRL maximizes cumulative policy improvement across iterations instead of surrogate rewards and is proven aligned with final performance; PIPO implements it via retrospective verification for stable closed-loop optimization.