pith. sign in

Rethinking kl regularization in rlhf: From value estimation to gradient optimization

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 1 method 1

citation-polarity summary

years

2026 4 2024 1

verdicts

UNVERDICTED 5

representative citing papers

KLip-PPO: A per-sample KL perspective on PPO-Clip

cs.LG · 2026-06-22 · unverdicted · novelty 7.0

PPO-Clip gradient equals a per-sample KL surrogate with closed-form coefficient on importance ratio and advantage, yielding identical curves on five MuJoCo tasks.

citing papers explorer

Showing 5 of 5 citing papers.