pith. machine review for the scientific record. sign in

hub

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

hub tools

citation-role summary

background 1

citation-polarity summary

years

2026 19 2024 1

roles

background 1

polarities

background 1

representative citing papers

Efficient Preference Poisoning Attack on Offline RLHF

cs.LG · 2026-05-04 · unverdicted · novelty 8.0

Label-flip attacks on log-linear DPO reduce to binary sparse approximation problems that can be solved efficiently by lattice-based and binary matching pursuit methods with recovery guarantees.

citing papers explorer

Showing 20 of 20 citing papers.