LilyBench evaluates open-weight LLMs on zero-shot LilyPond generation (achievable) and structural understanding tasks (challenging), with metric disagreements noted and code released.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Can LLMs understand LilyPond? A benchmark for symbolic music generation and understanding
LilyBench evaluates open-weight LLMs on zero-shot LilyPond generation (achievable) and structural understanding tasks (challenging), with metric disagreements noted and code released.