A decoupled question-conditioned image editor trained via supervised imitation then VLM-reward enhancement improves MLLM visual reasoning Pass@1 by 4.6-5.5 points across models and tasks.
Scalable diffusion models with transformers
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
CSF is the first black-box method to attribute fine-tuned text-to-image models to original lineages via compositional semantic probes and Bayesian decisions across multiple model families.
citing papers explorer
-
ETCHR: Editing To Clarify and Harness Reasoning
A decoupled question-conditioned image editor trained via supervised imitation then VLM-reward enhancement improves MLLM visual reasoning Pass@1 by 4.6-5.5 points across models and tasks.
-
CSF: Black-box Fingerprinting via Compositional Semantics for Text-to-Image Models
CSF is the first black-box method to attribute fine-tuned text-to-image models to original lineages via compositional semantic probes and Bayesian decisions across multiple model families.