Back to Basics: Revisiting REINFORCE- Style Optimization for Learning from Human Feedback in LLMs

Ahmadian A, Cremer C, Gallé M, et al · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight

cs.AI · 2025-12-22 · conditional · novelty 6.0

Physician oversight reveals high error rates in LLM-generated labels for a clinical benchmark and demonstrates that corrected labels improve both evaluation accuracy and downstream model training.

citing papers explorer

Showing 1 of 1 citing paper.

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight cs.AI · 2025-12-22 · conditional · none · ref 26
Physician oversight reveals high error rates in LLM-generated labels for a clinical benchmark and demonstrates that corrected labels improve both evaluation accuracy and downstream model training.

Back to Basics: Revisiting REINFORCE- Style Optimization for Learning from Human Feedback in LLMs

fields

years

verdicts

representative citing papers

citing papers explorer