pith. machine review for the scientific record. sign in

hub

Rlhf workflow: From reward modeling to online rlhf.arXiv preprint arXiv:2405.07863

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

hub tools

citation-role summary

background 1

citation-polarity summary

years

2026 10 2025 1

verdicts

UNVERDICTED 11

roles

background 1

polarities

background 1

representative citing papers

Optimal Transport for LLM Reward Modeling from Noisy Preference

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

SelectiveRM applies optimal transport with a joint consistency discrepancy and partial mass relaxation to produce reward models that optimize a tighter upper bound on clean risk while autonomously dropping noisy preference samples.

SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

SceneCritic is a symbolic, ontology-grounded evaluator for floor-plan layouts that identifies specific semantic, orientation, and geometric violations and aligns better with human judgments than VLM-based scorers.

citing papers explorer

Showing 11 of 11 citing papers.