Why is rlhf alignment shallow? a gradient analysis.arXiv preprint arXiv: 2603.04851

Robin Young · 2026 · arXiv 2603.04851

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

HTPO: Towards Exploration-Exploitation Balanced Policy Optimization via Hierarchical Token-level Objective Control

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

HTPO introduces hierarchical token-level objective control in RLVR to balance exploration and exploitation by grouping tokens according to difficulty, correctness, and entropy, yielding up to 8.6% gains on AIME benchmarks over DAPO.

Gradient-Based LoRA Rank Allocation Under GRPO: An Empirical Study

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Gradient-based proportional LoRA rank allocation under GRPO reduces accuracy by 4.5 points versus uniform allocation because GRPO gradients are flatter across layers and non-uniform ranks amplify importance differences.

citing papers explorer

Showing 2 of 2 citing papers.

HTPO: Towards Exploration-Exploitation Balanced Policy Optimization via Hierarchical Token-level Objective Control cs.LG · 2026-05-08 · unverdicted · none · ref 40
HTPO introduces hierarchical token-level objective control in RLVR to balance exploration and exploitation by grouping tokens according to difficulty, correctness, and entropy, yielding up to 8.6% gains on AIME benchmarks over DAPO.
Gradient-Based LoRA Rank Allocation Under GRPO: An Empirical Study cs.CL · 2026-05-08 · unverdicted · none · ref 17
Gradient-based proportional LoRA rank allocation under GRPO reduces accuracy by 4.5 points versus uniform allocation because GRPO gradients are flatter across layers and non-uniform ranks amplify importance differences.

Why is rlhf alignment shallow? a gradient analysis.arXiv preprint arXiv: 2603.04851

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer