3D-ARD+ unifies autoregressive token prediction with diffusion-based 3D latent generation to co-produce indoor scene layouts and object geometries that follow complex text-specified spatial and semantic constraints.
Cube: A roblox view of 3d intelligence.arXiv preprint arXiv:2503.15475
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
other 1
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
other 1polarities
unclear 1representative citing papers
DB-3DME supplies a human-rated 3D mesh dataset and shows that fine-tuning the visual encoder of Qwen-2.5-VL-7B produces automatic evaluations that align better with humans than prior VLMs.
citing papers explorer
-
Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion
3D-ARD+ unifies autoregressive token prediction with diffusion-based 3D latent generation to co-produce indoor scene layouts and object geometries that follow complex text-specified spatial and semantic constraints.
-
DB-3DME: From Dataset to Benchmark for Human-aligned Automatic 3D Mesh Evaluation
DB-3DME supplies a human-rated 3D mesh dataset and shows that fine-tuning the visual encoder of Qwen-2.5-VL-7B produces automatic evaluations that align better with humans than prior VLMs.