A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3representative citing papers
VICReg prevents collapse in self-supervised image embeddings via explicit variance, invariance, and covariance regularization and matches state-of-the-art downstream performance.
MambaPanoptic is a fully Mamba-based panoptic segmentation model that uses MambaFPN for multi-scale features and a QuadMamba kernel generator to outperform PanopticDeepLab and PanopticFCN on Cityscapes and COCO while using fewer parameters than Mask2Former.
citing papers explorer
-
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.
-
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
VICReg prevents collapse in self-supervised image embeddings via explicit variance, invariance, and covariance regularization and matches state-of-the-art downstream performance.
-
MambaPanoptic: A Vision Mamba-based Structured State Space Framework for Panoptic Segmentation
MambaPanoptic is a fully Mamba-based panoptic segmentation model that uses MambaFPN for multi-scale features and a QuadMamba kernel generator to outperform PanopticDeepLab and PanopticFCN on Cityscapes and COCO while using fewer parameters than Mask2Former.