Using the mosaic controlled dataset framework, experiments show scene complexity dominates over concept imbalance in diffusion model failures for multi-object generation, with counting especially hard in low-data regimes and compositional generalization collapsing under held-out combinations.
Pretraining frequency predicts compositional generalization of clip on real-world tasks.arXiv preprint arXiv:2502.18326,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
When Do Diffusion Models learn to Generate Multiple Objects?
Using the mosaic controlled dataset framework, experiments show scene complexity dominates over concept imbalance in diffusion model failures for multi-object generation, with counting especially hard in low-data regimes and compositional generalization collapsing under held-out combinations.