pith. machine review for the scientific record. sign in

arXiv preprint arXiv:2310.12036 , year=

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

method 1

citation-polarity summary

fields

cs.LG 4 cs.CL 2

years

2026 5 2025 1

roles

method 1

polarities

use method 1

representative citing papers

Process Reinforcement through Implicit Rewards

cs.LG · 2025-02-03 · conditional · novelty 6.0

PRIME enables online process reward model updates in LLM RL using implicit rewards from rollouts and outcome labels, yielding 15.1% average gains on reasoning benchmarks and surpassing a stronger instruct model with 10% of the data.

citing papers explorer

Showing 6 of 6 citing papers.