Sketch-and-Verify improves small-LLM code generation on HumanEval+ by factorizing search into K algorithmic sketches and M fillings each, outperforming flat sampling by up to 32 percentage points at matched budget while remaining cheaper than upgrading model tier.
and Baek, Jinheon and Hwang, Sung Ju , booktitle=
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2representative citing papers
Execution-based selectors for LLM code candidates outperform textual voting by large margins across configurations, with input generation quality mattering more than the specific aggregation rule.
citing papers explorer
-
Sketch-and-Verify: Structured Inference-Time Scaling via Program Sketching
Sketch-and-Verify improves small-LLM code generation on HumanEval+ by factorizing search into K algorithmic sketches and M fillings each, outperforming flat sampling by up to 32 percentage points at matched budget while remaining cheaper than upgrading model tier.
-
Semantic Voting: Execution-Grounded Consensus for LLM Code Generation
Execution-based selectors for LLM code candidates outperform textual voting by large margins across configurations, with input generation quality mattering more than the specific aggregation rule.