The majority is not always right: Rl training for solution aggregation

Wenting Zhao, Pranjal Aggarwal, Swarnadeep Saha, Asli Celikyilmaz, Jason Weston, Ilia Kulikov · 2025 · arXiv 2509.06870

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

cs.CL · 2026-05-08 · conditional · novelty 8.0 · 2 refs

AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.

CAPS: Cascaded Adaptive Pairwise Selection for Efficient Parallel Reasoning

cs.AI · 2026-05-15 · unverdicted · novelty 7.0

CAPS is a four-stage inference-only cascade that adapts how much of each solution the verifier sees and how comparisons are distributed, halving per-candidate verifier tokens while outperforming uniform pairwise verification on most benchmarks.

MoCo: A One-Stop Shop for Model Collaboration Research

cs.CL · 2026-01-29 · accept · novelty 6.0

MoCo supplies a unified library of 26 collaboration strategies and benchmarks demonstrating average outperformance over single models in 61 percent of (model, data) pairs.

Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models

cs.CL · 2026-04-07 · unverdicted · novelty 5.0

Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.

Evolutionary Profiles for Protein Fitness Prediction

cs.LG · 2025-10-08 · unverdicted · novelty 5.0

EvoIF integrates within-family and cross-family evolutionary signals into a compact model to achieve competitive or state-of-the-art zero-shot fitness prediction on ProteinGym using only 0.15% of typical training data.

citing papers explorer

Showing 5 of 5 citing papers.

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling cs.CL · 2026-05-08 · conditional · none · ref 10 · 2 links
AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.
CAPS: Cascaded Adaptive Pairwise Selection for Efficient Parallel Reasoning cs.AI · 2026-05-15 · unverdicted · none · ref 56
CAPS is a four-stage inference-only cascade that adapts how much of each solution the verifier sees and how comparisons are distributed, halving per-candidate verifier tokens while outperforming uniform pairwise verification on most benchmarks.
MoCo: A One-Stop Shop for Model Collaboration Research cs.CL · 2026-01-29 · accept · none · ref 32
MoCo supplies a unified library of 26 collaboration strategies and benchmarks demonstrating average outperformance over single models in 61 percent of (model, data) pairs.
Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models cs.CL · 2026-04-07 · unverdicted · none · ref 31
Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.
Evolutionary Profiles for Protein Fitness Prediction cs.LG · 2025-10-08 · unverdicted · none · ref 32
EvoIF integrates within-family and cross-family evolutionary signals into a compact model to achieve competitive or state-of-the-art zero-shot fitness prediction on ProteinGym using only 0.15% of typical training data.

The majority is not always right: Rl training for solution aggregation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer