MUSE is a new benchmark and three-stage evaluation protocol for text-to-CAD generation that assesses functionality, manufacturability, and assemblability of B-Rep assemblies beyond geometric similarity.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Agent safety cannot be achieved via model refusal training and instead requires external least-privilege enforcement evaluated as action alignment.
PaintBench provides a scalable deterministic benchmark for precise visual editing operations, revealing that even the best of 11 models achieves only 17.1% mIoU and that scores correlate strongly with applied data visualization editing performance.
citing papers explorer
-
MUSE: Benchmarking Manufacturable, Functional, and Assemblable Text-to-CAD Generation
MUSE is a new benchmark and three-stage evaluation protocol for text-to-CAD generation that assesses functionality, manufacturability, and assemblability of B-Rep assemblies beyond geometric similarity.
-
Agent Safety Is Action Alignment
Agent safety cannot be achieved via model refusal training and instead requires external least-privilege enforcement evaluated as action alignment.
-
PaintBench: Deterministic Evaluation of Precise Visual Editing
PaintBench provides a scalable deterministic benchmark for precise visual editing operations, revealing that even the best of 11 models achieves only 17.1% mIoU and that scores correlate strongly with applied data visualization editing performance.