pith. sign in

Language model beats diffusion - tokenizer is key to visual generation

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

method 1

citation-polarity summary

fields

cs.CV 5 cs.LG 1

years

2026 4 2025 2

roles

method 1

polarities

use method 1

clear filters

representative citing papers

Autoregressive Visual Generation Needs a Prologue

cs.CV · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

Prologue adds a small set of learnable tokens trained exclusively with AR cross-entropy loss to decouple generation from reconstruction in autoregressive visual models, yielding lower gFID on ImageNet 256x256.

Distilling Specialized Orders for Visual Generation

cs.CV · 2025-04-23 · unverdicted · novelty 7.0

OAR distills specialized generation orders from any-order AR models via self-distillation, improving FID from 2.39 to 2.17 on ImageNet 256x256 while preserving multi-task flexibility.

Amplifying Membership Signal Through Chained Regeneration

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

MADreMIA amplifies membership inference signals by showing that memorized samples maintain higher coherence and slower degradation in chained regeneration trajectories than non-members.

Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation

cs.CV · 2025-05-08 · unverdicted · novelty 6.0

Mogao presents a causal unified model with deep fusion, dual encoders, and interleaved position embeddings that achieves strong performance on multi-modal understanding, text-to-image generation, and coherent interleaved outputs including zero-shot editing.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • AdaTok: Self-Budgeting Image Tokenization with Quality-Preserving Dynamic Tokens cs.CV · 2026-06-05 · unverdicted · none · ref 11

    AdaTok learns content-dependent token budgets for discrete 1D image tokenization via prioritized representation learning and a GRPO allocation policy, achieving rFID 1.50 at ~118 tokens average versus fixed 256-token baselines.

  • Autoregressive Visual Generation Needs a Prologue cs.CV · 2026-05-07 · unverdicted · none · ref 57 · 2 links

    Prologue adds a small set of learnable tokens trained exclusively with AR cross-entropy loss to decouple generation from reconstruction in autoregressive visual models, yielding lower gFID on ImageNet 256x256.

  • Distilling Specialized Orders for Visual Generation cs.CV · 2025-04-23 · unverdicted · none · ref 14

    OAR distills specialized generation orders from any-order AR models via self-distillation, improving FID from 2.39 to 2.17 on ImageNet 256x256 while preserving multi-task flexibility.

  • Amplifying Membership Signal Through Chained Regeneration cs.LG · 2026-06-30 · unverdicted · none · ref 45

    MADreMIA amplifies membership inference signals by showing that memorized samples maintain higher coherence and slower degradation in chained regeneration trajectories than non-members.

  • Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation cs.CV · 2025-05-08 · unverdicted · none · ref 92

    Mogao presents a causal unified model with deep fusion, dual encoders, and interleaved position embeddings that achieves strong performance on multi-modal understanding, text-to-image generation, and coherent interleaved outputs including zero-shot editing.