12 Haoxiang Wang, Wei Xiong, Tengyang Xie, Han Zhao, and Tong Zhang

URLhttps://arxiv · 2024 · DOI 10.18653/v1/2024.findings-emnlp.620

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Understanding helpfulness and harmless tension in reward models

cs.LG · 2026-06-11 · unverdicted · novelty 6.0

Mixed-objective reward models underperform single-objective ones because shared neurons support one objective while negatively affecting the other, creating alignment tension.

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

Skill-RM unifies heterogeneous reward criteria by modeling reward computation as dynamic execution of a reusable Reward-Evaluation Skill within an agent framework.

DynaCF: Mitigating Shortcut Learning in Reward Models via Dynamic Counterfactual Sensitivity

cs.LG · 2026-06-08 · unverdicted · novelty 5.0

DynaCF dynamically downweights shortcut-sensitive samples in reward model training by tracking margin shifts under online counterfactual perturbations within the Bradley-Terry loss.

Focal Reward: Balanced Reinforcement Learning under Rubric-Based Rewards

cs.LG · 2026-05-26 · unverdicted · novelty 4.0

Focal Reward balances rubric-based RL by saturation-aware reweighting derived from inverse reward projection, outperforming static aggregation on 18 model-benchmark pairs.

citing papers explorer

Showing 4 of 4 citing papers.

Understanding helpfulness and harmless tension in reward models cs.LG · 2026-06-11 · unverdicted · none · ref 45
Mixed-objective reward models underperform single-objective ones because shared neurons support one objective while negatively affecting the other, creating alignment tension.
Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill cs.LG · 2026-06-02 · unverdicted · none · ref 13
Skill-RM unifies heterogeneous reward criteria by modeling reward computation as dynamic execution of a reusable Reward-Evaluation Skill within an agent framework.
DynaCF: Mitigating Shortcut Learning in Reward Models via Dynamic Counterfactual Sensitivity cs.LG · 2026-06-08 · unverdicted · none · ref 44
DynaCF dynamically downweights shortcut-sensitive samples in reward model training by tracking margin shifts under online counterfactual perturbations within the Bradley-Terry loss.
Focal Reward: Balanced Reinforcement Learning under Rubric-Based Rewards cs.LG · 2026-05-26 · unverdicted · none · ref 4
Focal Reward balances rubric-based RL by saturation-aware reweighting derived from inverse reward projection, outperforming static aggregation on 18 model-benchmark pairs.

12 Haoxiang Wang, Wei Xiong, Tengyang Xie, Han Zhao, and Tong Zhang

fields

years

verdicts

representative citing papers

citing papers explorer