Title resolution pending

Do task workers have access to evaluation assets?Some behavioral tests exercise the executable with input files such as images, audio files, videos, spreadsheets, or domain specific configurations · 2000

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

ProgramBench: Can Language Models Rebuild Programs From Scratch?

cs.SE · 2026-05-05 · unverdicted · novelty 7.0

ProgramBench introduces 200 tasks where models must reconstruct full programs like FFmpeg or SQLite from docs alone; none of 9 evaluated LMs fully solve any task and the best passes 95% tests on only 3% of tasks while favoring monolithic code.

citing papers explorer

Showing 1 of 1 citing paper.

ProgramBench: Can Language Models Rebuild Programs From Scratch? cs.SE · 2026-05-05 · unverdicted · none · ref 20
ProgramBench introduces 200 tasks where models must reconstruct full programs like FFmpeg or SQLite from docs alone; none of 9 evaluated LMs fully solve any task and the best passes 95% tests on only 3% of tasks while favoring monolithic code.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer