AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.
The majority is not always right: Rl training for solution aggregation
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
CAPS is a four-stage inference-only cascade that adapts how much of each solution the verifier sees and how comparisons are distributed, halving per-candidate verifier tokens while outperforming uniform pairwise verification on most benchmarks.
MoCo supplies a unified library of 26 collaboration strategies and benchmarks demonstrating average outperformance over single models in 61 percent of (model, data) pairs.
Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.
EvoIF integrates within-family and cross-family evolutionary signals into a compact model to achieve competitive or state-of-the-art zero-shot fitness prediction on ProteinGym using only 0.15% of typical training data.
citing papers explorer
-
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.
-
CAPS: Cascaded Adaptive Pairwise Selection for Efficient Parallel Reasoning
CAPS is a four-stage inference-only cascade that adapts how much of each solution the verifier sees and how comparisons are distributed, halving per-candidate verifier tokens while outperforming uniform pairwise verification on most benchmarks.
-
MoCo: A One-Stop Shop for Model Collaboration Research
MoCo supplies a unified library of 26 collaboration strategies and benchmarks demonstrating average outperformance over single models in 61 percent of (model, data) pairs.
-
Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models
Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.
-
Evolutionary Profiles for Protein Fitness Prediction
EvoIF integrates within-family and cross-family evolutionary signals into a compact model to achieve competitive or state-of-the-art zero-shot fitness prediction on ProteinGym using only 0.15% of typical training data.