ChannelTok introduces channel-wise tokenization with stochastic tail-dropping to achieve rFID 2.92 on ImageNet at 8.6x faster decoding and 2.1x smaller size than prior flexible tokenizers.
Jiahui Zhang, Fangneng Zhan, Christian Theobalt, and Shijian Lu
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
NSVQ mitigates codebook collapse in large-codebook VQ by addressing encoder drift via non-stationary loss, replacement, and staged freezing, improving rFID from 2.39 to 2.10 on ImageNet-1k while achieving 100% utilization.
citing papers explorer
-
ChannelTok: Efficient Flexible-Length Vision Tokenization
ChannelTok introduces channel-wise tokenization with stochastic tail-dropping to achieve rFID 2.92 on ImageNet at 8.6x faster decoding and 2.1x smaller size than prior flexible tokenizers.
-
NSVQ: Mitigating Codebook Collapse by Stabilizing Encoder Drift in Vector Quantization
NSVQ mitigates codebook collapse in large-codebook VQ by addressing encoder drift via non-stationary loss, replacement, and staged freezing, improving rFID from 2.39 to 2.10 on ImageNet-1k while achieving 100% utilization.