pith. sign in

hub

15 UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Function Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

hub tools

representative citing papers

Incentivizing High-Quality Human Annotations with Golden Questions

cs.GT · 2025-05-25 · unverdicted · novelty 7.0

The paper derives a Θ(1/√(n log n)) hypothesis testing rate under strategic annotator behavior and shows that high-certainty, format-similar golden questions better reveal annotation quality than standard checks.

KTO: Model Alignment as Prospect Theoretic Optimization

cs.LG · 2024-02-02 · conditional · novelty 7.0

KTO aligns LLMs by directly maximizing prospect-theoretic utility on binary signals and matches or exceeds preference-based methods like DPO from 1B to 30B parameters.

Multiplayer Nash Preference Optimization

cs.AI · 2025-09-27 · unverdicted · novelty 6.0

MNPO extends NLHF to multiplayer Nash games, inheriting equilibrium guarantees while showing empirical gains on instruction-following benchmarks under diverse preferences.

citing papers explorer

Showing 10 of 10 citing papers.