RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.
2505.10772 , archiveprefix =
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Correct reasoning traces exhibit positive confidence gain while incorrect traces show declining confidence, enabling CDG-based voting that boosts performance on AIME, HMMT and BRUMO benchmarks across multiple LLM architectures.
CSI meta-scaffold unifies five LLM agent harnesses; a blackboard multi-agent system solves 19/33 cybench challenges (57.6%) versus 15/33 for the best single scaffold.
citing papers explorer
-
Towards Cybersecurity SuperIntelligence (CSI): What's the best harness for cybersecurity?
CSI meta-scaffold unifies five LLM agent harnesses; a blackboard multi-agent system solves 19/33 cybench challenges (57.6%) versus 15/33 for the best single scaffold.