LilyBench evaluates open-weight LLMs on zero-shot LilyPond generation (achievable) and structural understanding tasks (challenging), with metric disagreements noted and code released.
A survey on evaluation metrics for music generation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SD 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
APEX jointly predicts popularity and aesthetic quality for AI-generated music from MERT embeddings and shows that aesthetic features improve human preference prediction on unseen generative systems.
citing papers explorer
No citing papers match the current filters.