{"paper":{"title":"ServImage: An Image Generation and Editing Benchmark from Real-world Commercial Imaging Services","license":"http://creativecommons.org/licenses/by/4.0/","headline":"ServImage benchmark evaluates image models by their ability to produce outputs that clients would pay for in real design projects.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Fengxian Ji, Jinghui Zhang, Jingpu Yang, Junhong Liang, Lang Gao, Xiuying Chen, Zhenhao Chen, Zirui Song","submitted_at":"2026-04-27T04:11:06Z","abstract_excerpt":"Recent image generation and editing models demonstrate robust adherence to instructions and high visual quality on academic benchmarks. However, their performance on paid, real-world design projects remains uncertain. We introduce \\textbf{ServImage}, a benchmark that explicitly correlates model outputs with economic value in commercial design projects. ServImage consists of (i) \\textbf{\\textit{ServImageBench}}: a dataset of 1.07k paid commercial design tasks and 2.05k designer deliverables totaling over \\$295k, covering portrait, product, and digital content, along with 33k candidate images an"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Under the ServImageScore system, the proposed payment prediction model achieves 82.00% accuracy in predicting human payment decisions and produces calibrated payment probabilities.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the three quality dimensions (baseline requirements, visual execution, commercial necessity) fully capture the factors that drive human payment decisions in commercial design projects.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"ServImage supplies a commercial-design benchmark, three-dimensional scoring rubric, and 82%-accurate payment predictor trained on 33k human-annotated images from paid projects.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"ServImage benchmark evaluates image models by their ability to produce outputs that clients would pay for in real design projects.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"6ea42207c1ce409d5531e2ee32fac9335210b088fa370182b465656497db5c9e"},"source":{"id":"2604.24023","kind":"arxiv","version":2},"verdict":{"id":"0aa0da11-34f2-4e81-882a-9faac51cecff","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-08T04:53:02.555357Z","strongest_claim":"Under the ServImageScore system, the proposed payment prediction model achieves 82.00% accuracy in predicting human payment decisions and produces calibrated payment probabilities.","one_line_summary":"ServImage supplies a commercial-design benchmark, three-dimensional scoring rubric, and 82%-accurate payment predictor trained on 33k human-annotated images from paid projects.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the three quality dimensions (baseline requirements, visual execution, commercial necessity) fully capture the factors that drive human payment decisions in commercial design projects.","pith_extraction_headline":"ServImage benchmark evaluates image models by their ability to produce outputs that clients would pay for in real design projects."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2604.24023/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"ai_meta_artifact","ran_at":"2026-05-21T07:39:57.128776Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T22:32:09.376346Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"d09a02e3151c6e546a7f933ffa3506a990236088da9389284f9f4667f1efc585"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}