pith. machine review for the scientific record. sign in

hub

Open problems and fundamental limitations of reinforcement learning from human feedback

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

hub tools

years

2026 19 2024 1

clear filters

representative citing papers

Efficient Preference Poisoning Attack on Offline RLHF

cs.LG · 2026-05-04 · unverdicted · novelty 8.0

Label-flip attacks on log-linear DPO reduce to binary sparse approximation problems that can be solved efficiently by lattice-based and binary matching pursuit methods with recovery guarantees.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.