MergeTok unifies VAE and VQ tokenizers via token merging to impose semantic alignment on continuous latents and stabilize discrete codebook training, achieving lower rFID on ImageNet-256.
org/abs/2503.17760
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
NSVQ mitigates codebook collapse in large-codebook VQ by addressing encoder drift via non-stationary loss, replacement, and staged freezing, improving rFID from 2.39 to 2.10 on ImageNet-1k while achieving 100% utilization.
citing papers explorer
-
MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging
MergeTok unifies VAE and VQ tokenizers via token merging to impose semantic alignment on continuous latents and stabilize discrete codebook training, achieving lower rFID on ImageNet-256.
-
NSVQ: Mitigating Codebook Collapse by Stabilizing Encoder Drift in Vector Quantization
NSVQ mitigates codebook collapse in large-codebook VQ by addressing encoder drift via non-stationary loss, replacement, and staged freezing, improving rFID from 2.39 to 2.10 on ImageNet-1k while achieving 100% utilization.