CoSPlay jointly refines self-generated codes and unit tests via bidirectional pass-count signals and consensus selection, raising pass@N and UT accuracy on code benchmarks without ground-truth data.
Livecodebench: Holistic and contamination free evaluation of large language models for code
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Functional Majority Voting selects code by runtime agreement on tests, boosting LiveCodeBench performance and serving as an aggregation method for label-free test-time RL without exceeding base model limits.
citing papers explorer
-
CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test
CoSPlay jointly refines self-generated codes and unit tests via bidirectional pass-count signals and consensus selection, raising pass@N and UT accuracy on code benchmarks without ground-truth data.
-
Majority Voting for Code Generation
Functional Majority Voting selects code by runtime agreement on tests, boosting LiveCodeBench performance and serving as an aggregation method for label-free test-time RL without exceeding base model limits.