A conformal procedure for CoT replaces majority voting with weighted aggregation and calibrates abstention to guarantee low confident-error rates, achieving 90.1% selective accuracy on GSM8K by abstaining on under 5% of cases.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4roles
background 1polarities
background 1representative citing papers
APPS approximates power targets p(x)^alpha via parallel particle propagation with proposal-corrected reweighting and future-value-guided selection at block boundaries, improving accuracy-runtime trade-offs in training-free decoding.
ProcessThinker assigns step-level rewards in GRPO by sampling continuations from each step prefix and using empirical success rates, improving video reasoning benchmarks without training a separate PRM.
R2IF uses a composite reward (binary correctness, CoT effectiveness, and parameter-level SMV) under GRPO to align LLM reasoning with function-calling decisions, improving accuracy and reasoning quality on BFCL/ACEBench.
citing papers explorer
No citing papers match the current filters.