hub Mixed citations

Imagereward: Learning and evaluating human preferences for text-to-image generation

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, Yuxiao Dong · 2023

Mixed citation behavior. Most common role is background (33%).

15 Pith papers citing it

Background 33% of classified citations

browse 15 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 2 baseline 2 method 2

citation-polarity summary

background 2 baseline 2 use method 2

representative citing papers

AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment

cs.AI · 2026-05-17 · unverdicted · novelty 7.0 · 2 refs

AutoRubric-T2I learns and selects explicit rubrics from preference pairs to guide VLM judges, producing high-quality interpretable rewards for T2I alignment with far less data than traditional Bradley-Terry models.

LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.

Arena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Models

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

ArenaPO infers Gaussian capability distributions from pairwise preferences and applies truncated-normal latent inference to derive fine-grained offline rewards for preference optimization of text-to-image diffusion models.

LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

LeapAlign fine-tunes flow matching models by constructing two consecutive leaps that skip multiple ODE steps with randomized timesteps and consistency weighting, enabling stable updates at any generation step.

Hierarchical Variational Policies for Reward-Guided Diffusion

cs.LG · 2026-05-20 · conditional · novelty 6.0

A hierarchical variational formulation amortizes test-time guidance in diffusion models to achieve strong quality-speed tradeoffs with significantly reduced inference compute.

CPC-VAR:Continual Personalized and Compositional Generation in Visual Autoregressive Models

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

CPC-VAR adds Gradient-based Concept Neuron Selection for continual single-concept learning and a context-aware multi-branch composition strategy to reduce forgetting and entanglement in VAR-based personalized image generation.

When Policy Entropy Constraint Fails: Preserving Diversity in Flow-based RLHF via Perceptual Entropy

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

Policy entropy remains constant in flow-matching models during RLHF due to fixed noise schedules while perceptual diversity collapses from mode-seeking policy gradients, so perceptual entropy constraints are introduced to preserve diversity and improve quality.

Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

Auto-Rubric as Reward externalizes VLM preferences into structured rubrics and applies Rubric Policy Optimization to create more reliable binary rewards for multimodal generation, outperforming pairwise models on text-to-image and editing benchmarks.

Stream-T1: Test-Time Scaling for Streaming Video Generation

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

Stream-T1 is a test-time scaling framework for streaming video generation using scaled noise propagation from history, reward pruning across short and long windows, and feedback-guided memory sinking to improve temporal consistency and visual quality.

Leveraging Verifier-Based Reinforcement Learning in Image Editing

cs.CV · 2026-04-30 · unverdicted · novelty 6.0 · 2 refs

Edit-R1 builds a CoT-based reasoning reward model (RRM) via SFT and GCPO, then applies it with GRPO to improve image editing models such as FLUX.1-kontext.

DanceGRPO: Unleashing GRPO on Visual Generation

cs.CV · 2025-05-12 · unverdicted · novelty 6.0

DanceGRPO applies GRPO to visual generation tasks to achieve stable policy optimization across diffusion models, rectified flows, multiple tasks, and diverse reward models, outperforming prior RL methods.

Towards General Preference Alignment: Diffusion Models at Nash Equilibrium

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

Diff.-NPO frames diffusion alignment as a self-play game reaching Nash equilibrium and reports better text-to-image results than prior DPO-style methods.

A Systematic Post-Train Framework for Video Generation

cs.CV · 2026-04-28 · unverdicted · novelty 5.0

A post-training pipeline for video generation models combines SFT, RLHF with novel GRPO, prompt enhancement, and inference optimization to improve visual quality, temporal coherence, and instruction following.

CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating

cs.CV · 2026-05-12

SARA: Semantically Adaptive Relational Alignment for Video Diffusion Models

cs.CV · 2026-05-08

citing papers explorer

Showing 15 of 15 citing papers.

AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment cs.AI · 2026-05-17 · unverdicted · none · ref 32 · 2 links
AutoRubric-T2I learns and selects explicit rubrics from preference pairs to guide VLM judges, producing high-quality interpretable rewards for T2I alignment with far less data than traditional Bradley-Terry models.
LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling cs.CV · 2026-05-08 · unverdicted · none · ref 44
LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.
Arena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Models cs.CV · 2026-05-07 · unverdicted · none · ref 37
ArenaPO infers Gaussian capability distributions from pairwise preferences and applies truncated-normal latent inference to derive fine-grained offline rewards for preference optimization of text-to-image diffusion models.
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories cs.CV · 2026-04-16 · unverdicted · none · ref 53
LeapAlign fine-tunes flow matching models by constructing two consecutive leaps that skip multiple ODE steps with randomized timesteps and consistency weighting, enabling stable updates at any generation step.
Hierarchical Variational Policies for Reward-Guided Diffusion cs.LG · 2026-05-20 · conditional · none · ref 51
A hierarchical variational formulation amortizes test-time guidance in diffusion models to achieve strong quality-speed tradeoffs with significantly reduced inference compute.
CPC-VAR:Continual Personalized and Compositional Generation in Visual Autoregressive Models cs.CV · 2026-05-19 · unverdicted · none · ref 41
CPC-VAR adds Gradient-based Concept Neuron Selection for continual single-concept learning and a context-aware multi-branch composition strategy to reduce forgetting and entanglement in VAR-based personalized image generation.
When Policy Entropy Constraint Fails: Preserving Diversity in Flow-based RLHF via Perceptual Entropy cs.CV · 2026-05-12 · unverdicted · none · ref 72
Policy entropy remains constant in flow-matching models during RLHF due to fixed noise schedules while perceptual diversity collapses from mode-seeking policy gradients, so perceptual entropy constraints are introduced to preserve diversity and improve quality.
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria cs.AI · 2026-05-08 · unverdicted · none · ref 47
Auto-Rubric as Reward externalizes VLM preferences into structured rubrics and applies Rubric Policy Optimization to create more reliable binary rewards for multimodal generation, outperforming pairwise models on text-to-image and editing benchmarks.
Stream-T1: Test-Time Scaling for Streaming Video Generation cs.CV · 2026-05-06 · unverdicted · none · ref 42
Stream-T1 is a test-time scaling framework for streaming video generation using scaled noise propagation from history, reward pruning across short and long windows, and feedback-guided memory sinking to improve temporal consistency and visual quality.
Leveraging Verifier-Based Reinforcement Learning in Image Editing cs.CV · 2026-04-30 · unverdicted · none · ref 64 · 2 links
Edit-R1 builds a CoT-based reasoning reward model (RRM) via SFT and GCPO, then applies it with GRPO to improve image editing models such as FLUX.1-kontext.
DanceGRPO: Unleashing GRPO on Visual Generation cs.CV · 2025-05-12 · unverdicted · none · ref 9
DanceGRPO applies GRPO to visual generation tasks to achieve stable policy optimization across diffusion models, rectified flows, multiple tasks, and diverse reward models, outperforming prior RL methods.
Towards General Preference Alignment: Diffusion Models at Nash Equilibrium cs.LG · 2026-05-06 · unverdicted · none · ref 42
Diff.-NPO frames diffusion alignment as a self-play game reaching Nash equilibrium and reports better text-to-image results than prior DPO-style methods.
A Systematic Post-Train Framework for Video Generation cs.CV · 2026-04-28 · unverdicted · none · ref 41
A post-training pipeline for video generation models combines SFT, RLHF with novel GRPO, prompt enhancement, and inference optimization to improve visual quality, temporal coherence, and instruction following.
CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating cs.CV · 2026-05-12 · unreviewed · ref 59
SARA: Semantically Adaptive Relational Alignment for Video Diffusion Models cs.CV · 2026-05-08 · unreviewed · ref 25

Imagereward: Learning and evaluating human preferences for text-to-image generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer