Livecodebench: Holistic and contamination free evaluation of large language models for code

Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica · 2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

CoSPlay jointly refines self-generated codes and unit tests via bidirectional pass-count signals and consensus selection, raising pass@N and UT accuracy on code benchmarks without ground-truth data.

Majority Voting for Code Generation

cs.LG · 2026-04-17 · unverdicted · novelty 5.0

Functional Majority Voting selects code by runtime agreement on tests, boosting LiveCodeBench performance and serving as an aggregation method for label-free test-time RL without exceeding base model limits.

citing papers explorer

Showing 2 of 2 citing papers.

CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test cs.LG · 2026-05-22 · unverdicted · none · ref 12
CoSPlay jointly refines self-generated codes and unit tests via bidirectional pass-count signals and consensus selection, raising pass@N and UT accuracy on code benchmarks without ground-truth data.
Majority Voting for Code Generation cs.LG · 2026-04-17 · unverdicted · none · ref 4
Functional Majority Voting selects code by runtime agreement on tests, boosting LiveCodeBench performance and serving as an aggregation method for label-free test-time RL without exceeding base model limits.

Livecodebench: Holistic and contamination free evaluation of large language models for code

fields

years

verdicts

representative citing papers

citing papers explorer