Example GSM8K problems and solutions We include several examples to provide a qualitative sense of the task and learned model behavior

20 Solving math word problems with process-, outcome-based feedback A · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Solving math word problems with process- and outcome-based feedback

cs.LG · 2022-11-25 · unverdicted · novelty 6.0

On GSM8K, outcome-based supervision achieves similar final-answer error rates to process-based with less labeling, but process-based or learned reward models are needed to reach 3.4% reasoning error among correct solutions.

citing papers explorer

Showing 1 of 1 citing paper.

Solving math word problems with process- and outcome-based feedback cs.LG · 2022-11-25 · unverdicted · none · ref 43
On GSM8K, outcome-based supervision achieves similar final-answer error rates to process-based with less labeling, but process-based or learned reward models are needed to reach 3.4% reasoning error among correct solutions.

Example GSM8K problems and solutions We include several examples to provide a qualitative sense of the task and learned model behavior

fields

years

verdicts

representative citing papers

citing papers explorer