Configuration [11] (2 Analysts + 1 Evaluator) represents the optimal trade-off chosen for the main pipeline, achieving perfect precision (1.0) with robust recall (0.300)

nearly eliminates recall

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Debate as Reward: A Multi-Agent Reward System for Scientific Ideation via RL Post-Training

cs.AI · 2026-04-17 · unverdicted · novelty 7.0

A multi-agent binary reward system with unbiased GRPO post-training on ICLR-320 data outperforms baselines on expert-rated novelty, feasibility, and effectiveness for scientific idea generation.

citing papers explorer

Showing 1 of 1 citing paper.

Debate as Reward: A Multi-Agent Reward System for Scientific Ideation via RL Post-Training cs.AI · 2026-04-17 · unverdicted · none · ref 3
A multi-agent binary reward system with unbiased GRPO post-training on ICLR-320 data outperforms baselines on expert-rated novelty, feasibility, and effectiveness for scientific idea generation.

Configuration [11] (2 Analysts + 1 Evaluator) represents the optimal trade-off chosen for the main pipeline, achieving perfect precision (1.0) with robust recall (0.300)

fields

years

verdicts

representative citing papers

citing papers explorer