RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.
2505.10772 , archiveprefix =
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Correct reasoning traces exhibit positive confidence gain while incorrect traces show declining confidence, enabling CDG-based voting that boosts performance on AIME, HMMT and BRUMO benchmarks across multiple LLM architectures.
CSI meta-scaffold unifies five LLM agent harnesses; a blackboard multi-agent system solves 19/33 cybench challenges (57.6%) versus 15/33 for the best single scaffold.
citing papers explorer
-
Boosting Self-Consistency with Ranking
RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.
-
Inference Time Optimization with Confidence Dynamics
Correct reasoning traces exhibit positive confidence gain while incorrect traces show declining confidence, enabling CDG-based voting that boosts performance on AIME, HMMT and BRUMO benchmarks across multiple LLM architectures.
-
Towards Cybersecurity SuperIntelligence (CSI): What's the best harness for cybersecurity?
CSI meta-scaffold unifies five LLM agent harnesses; a blackboard multi-agent system solves 19/33 cybench challenges (57.6%) versus 15/33 for the best single scaffold.