A survey on evaluation metrics for music generation

· 2025 · arXiv 2509.00051

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Can LLMs understand LilyPond? A benchmark for symbolic music generation and understanding

cs.SD · 2026-06-07 · unverdicted · novelty 6.0

LilyBench evaluates open-weight LLMs on zero-shot LilyPond generation (achievable) and structural understanding tasks (challenging), with metric disagreements noted and code released.

APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music

cs.SD · 2026-05-05 · unverdicted · novelty 6.0 · 2 refs

APEX jointly predicts popularity and aesthetic quality for AI-generated music from MERT embeddings and shows that aesthetic features improve human preference prediction on unseen generative systems.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

A survey on evaluation metrics for music generation

fields

years

verdicts

representative citing papers

citing papers explorer