A self-evolving framework with proposer-solver-generator roles, Solver Token Entropy, and multi-scale internal evaluation improves unified LMMs on understanding and generation tasks using only self-derived consistency signals.
OmniMamba: Efficient and unified multimodal understanding and generation via state space models.arXiv preprint arXiv:2503.08686, 2025.https: //arxiv.org/abs/2503.08686
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Iterative self-improving codebooks enhance safety in autoregressive multimodal models by self-identifying unsafe generations and updating the codebook to eliminate harmful visual token mappings without external feedback.
Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.
citing papers explorer
-
Ask, Solve, Generate: Self-Evolving Unified Multimodal Understanding and Generation via Self-Consistency Rewards
A self-evolving framework with proposer-solver-generator roles, Solver Token Entropy, and multi-scale internal evaluation improves unified LMMs on understanding and generation tasks using only self-derived consistency signals.
-
Safe Autoregressive Image Generation with Iterative Self-Improving Codebooks
Iterative self-improving codebooks enhance safety in autoregressive multimodal models by self-identifying unsafe generations and updating the codebook to eliminate harmful visual token mappings without external feedback.
-
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.