arXiv preprint arXiv:2404.02905 , year =

Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, Liwei Wang · 2024 · arXiv 2404.02905

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

cs.CV · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

NTM models each generative reverse step as a conditional normalizing flow with a hybrid shallow-deep architecture, enabling exact-likelihood training and strong four-step sampling performance on text-to-image tasks.

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

cs.CV · 2024-06-10 · conditional · novelty 7.0

Scaled vanilla autoregressive models based on Llama achieve 2.18 FID on ImageNet 256x256 image generation, beating popular diffusion models without visual inductive biases.

TRACE: Temporal Routing with Autoregressive Cross-channel Experts for EEG Representation Learning

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

TRACE is an autoregressive EEG pre-training framework using temporally adaptive cross-channel expert routing to learn transferable representations, achieving best results on several of eight downstream benchmarks.

Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation

eess.AS · 2026-04-21 · unverdicted · novelty 6.0

Chain-of-Details (CoD) is a cascaded TTS method that explicitly models temporal coarse-to-fine dynamics with a shared decoder, achieving competitive performance using significantly fewer parameters.

Generative Refinement Networks for Visual Synthesis

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

GRN uses hierarchical binary quantization and entropy-guided refinement to set new ImageNet records of 0.56 rFID for reconstruction and 1.81 gFID for class-conditional generation while releasing code and models.

MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings

cs.CV · 2026-04-21 · unverdicted · novelty 4.0

MMCORE transfers VLM reasoning into diffusion-based image generation and editing via aligned latent embeddings from learnable queries, outperforming baselines on text-to-image and editing tasks.

citing papers explorer

Showing 6 of 6 citing papers.

Normalizing Trajectory Models cs.CV · 2026-05-08 · unverdicted · none · ref 19 · 2 links
NTM models each generative reverse step as a conditional normalizing flow with a hybrid shallow-deep architecture, enabling exact-likelihood training and strong four-step sampling performance on text-to-image tasks.
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation cs.CV · 2024-06-10 · conditional · none · ref 33
Scaled vanilla autoregressive models based on Llama achieve 2.18 FID on ImageNet 256x256 image generation, beating popular diffusion models without visual inductive biases.
TRACE: Temporal Routing with Autoregressive Cross-channel Experts for EEG Representation Learning cs.LG · 2026-05-12 · unverdicted · none · ref 13
TRACE is an autoregressive EEG pre-training framework using temporally adaptive cross-channel expert routing to learn transferable representations, achieving best results on several of eight downstream benchmarks.
Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation eess.AS · 2026-04-21 · unverdicted · none · ref 28
Chain-of-Details (CoD) is a cascaded TTS method that explicitly models temporal coarse-to-fine dynamics with a shared decoder, achieving competitive performance using significantly fewer parameters.
Generative Refinement Networks for Visual Synthesis cs.CV · 2026-04-14 · unverdicted · none · ref 52
GRN uses hierarchical binary quantization and entropy-guided refinement to set new ImageNet records of 0.56 rFID for reconstruction and 1.81 gFID for class-conditional generation while releasing code and models.
MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings cs.CV · 2026-04-21 · unverdicted · none · ref 34
MMCORE transfers VLM reasoning into diffusion-based image generation and editing via aligned latent embeddings from learnable queries, outperforming baselines on text-to-image and editing tasks.

arXiv preprint arXiv:2404.02905 , year =

fields

years

verdicts

representative citing papers

citing papers explorer