super hub Mixed citations

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Feng Zhu, Keqiang Sun, Rui Zhao, Xiaoshi Wu, Yiming Hao, Yixiong Chen · 2023 · cs.CV · arXiv 2306.09341

Mixed citation behavior. Most common role is background (42%).

159 Pith papers citing it

Background 42% of classified citations

open full Pith review browse 159 citing papers more from Feng Zhu arXiv PDF

abstract

Recent text-to-image generative models can generate high-fidelity images from text inputs, but the quality of these generated images cannot be accurately evaluated by existing evaluation metrics. To address this issue, we introduce Human Preference Dataset v2 (HPD v2), a large-scale dataset that captures human preferences on images from a wide range of sources. HPD v2 comprises 798,090 human preference choices on 433,760 pairs of images, making it the largest dataset of its kind. The text prompts and images are deliberately collected to eliminate potential bias, which is a common issue in previous datasets. By fine-tuning CLIP on HPD v2, we obtain Human Preference Score v2 (HPS v2), a scoring model that can more accurately predict human preferences on generated images. Our experiments demonstrate that HPS v2 generalizes better than previous metrics across various image distributions and is responsive to algorithmic improvements of text-to-image generative models, making it a preferable evaluation metric for these models. We also investigate the design of the evaluation prompts for text-to-image generative models, to make the evaluation stable, fair and easy-to-use. Finally, we establish a benchmark for text-to-image generative models using HPS v2, which includes a set of recent text-to-image models from the academic, community and industry. The code and dataset is available at https://github.com/tgxs002/HPSv2 .

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 17 dataset 13 method 4 baseline 3 other 3

citation-polarity summary

background 17 use dataset 12 baseline 4 use method 4 unclear 2 support 1

claims ledger

abstract Recent text-to-image generative models can generate high-fidelity images from text inputs, but the quality of these generated images cannot be accurately evaluated by existing evaluation metrics. To address this issue, we introduce Human Preference Dataset v2 (HPD v2), a large-scale dataset that captures human preferences on images from a wide range of sources. HPD v2 comprises 798,090 human preference choices on 433,760 pairs of images, making it the largest dataset of its kind. The text prompts and images are deliberately collected to eliminate potential bias, which is a common issue in prev

authors

Feng Zhu Keqiang Sun Rui Zhao Xiaoshi Wu Yiming Hao Yixiong Chen

co-cited works

representative citing papers

How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance

cs.LG · 2026-04-29 · unverdicted · novelty 8.0 · 3 refs

FMRG reformulates guidance as deterministic optimal control, deriving a single-trajectory method using the flow map that matches or exceeds baselines on reward-guided generation and inverse problems with 3 NFEs at text-to-image scale.

OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models

cs.CV · 2026-04-05 · unverdicted · novelty 8.0

OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.

Cross-Space Distillation: Teaching One-Step Students with Modern Diffusion Teachers

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

Introduces a Bridge latent interface that maps mismatched student latents into teacher space, enabling distillation from modern diffusion teachers to compact one-step students and raising SD 1.5 HPSv3 from 5.4 to 9.4 while keeping one-step speed.

Variance Reduction on the Camera Axis: Multi-View Score Distillation for 3D

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

MV-SDI aggregates K-view gradients per step via accumulation and antithetic pairs at fixed UNet budget, raising CLIP R-Precision from 74.8% to 83.8% (K=2) and halving steps while keeping the 2D prior frozen.

DiT-Reward: Generative Representations for Text-to-Image Reward Modeling

cs.LG · 2026-06-22 · unverdicted · novelty 7.0

DiT-Reward converts pretrained DiT models into reward predictors that outperform HPSv3 on four benchmarks while providing 1.65x inference speedup.

Through the PRISM: Preference Representation in Intermediate States of Video Diffusion Models

cs.CV · 2026-06-18 · unverdicted · novelty 7.0

PRISM shows video diffusion models inherently encode preference information in noisy latents, achieving SOTA accuracy and enabling noise-robust early-stage sampling with a correlation to generative performance.

The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

DRL trains a discriminator on data versus base-model samples in pretrained representation space and uses its logit as reward in KL-regularized RL, cutting guidance-free FID from 9.38 to 2.62 on SiT and similar gains on other backbones.

ReFree: Towards Realistic Co-Speech Video Generation via Reward-Free RL and Multilevel Speech Guidance

cs.CV · 2026-06-11 · unverdicted · novelty 7.0

ReFree-S2V applies multilevel speech guidance and reward-free reinforcement learning inside a flow-matching model built on a pretrained video generator to improve lip synchronization and natural expressivity in talking-head videos.

It\^o maps for any-step SDEs

stat.ML · 2026-06-09 · unverdicted · novelty 7.0

Defines Itô maps for any-step SDE integration and shows their use for conditional endpoint sampling and steering on synthetic and image tasks.

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

cs.LG · 2026-06-09 · unverdicted · novelty 7.0 · 2 refs

Flow-DPPO replaces PPO ratio clipping with an asymmetric KL divergence mask for flow models, claiming higher rewards, reduced forgetting, and stable multi-epoch training.

HACK++: Towards More Effective Head-Aware Key-Value Compression for Efficient Visual Autoregressive Modeling

cs.CV · 2026-06-06 · unverdicted · novelty 7.0

HACK++ is a head-aware KV cache compression framework for VAR models that decouples current-scale attention from historical cache under adaptive per-head budgets to achieve near-lossless generation at 30% attention and 10% cache budgets.

Parallel Jacobi Decoding for Fast Autoregressive Image Generation

cs.CV · 2026-06-04 · conditional · novelty 7.0

Parallel Jacobi Decoding accelerates autoregressive image models 4.8x-6.4x by using 2D spatial draft expansion and adjusted attention masks while keeping generation quality competitive.

A Dataset for Dynamic Human Preferences for Vision Language Models

cs.CV · 2026-06-02 · unverdicted · novelty 7.0

Introduces a benchmark dataset with automated pipeline for evaluating VLMs on dynamic in-context human preferences, distinct from static benchmarks.

Drifting Preference Optimization for One-Step Generative Models

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

DrPO enables online preference optimization for deterministic one-step generators via non-parametric dipole updates from ranked samples plus base-model drift, without reward backpropagation.

Where to Refine, When to Stop: Rethinking Redundancy via Latent Discrepancy for Efficient Visual Autoregressive Generation

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

LD-Pruning applies latent discrepancy to prune tokens and adaptively skip unconditional branches in VAR models for up to 2.35x faster inference with preserved quality.

Explicit Critic Guidance for Aligning Diffusion Models

cs.LG · 2026-05-26 · unverdicted · novelty 7.0

Introduces a state-aligned latent actor-critic framework that lets diffusion models act as their own timestep-conditioned value functions for trajectory-level RL post-training and inference steering.

Towards Anatomically Plausible Human Image Generation via Synthetic Localized Preferences

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

ASAP generates over 10K synthetic anatomical preference pairs via targeted degradation of high-fidelity images and applies a localized margin-bounded DPO to reduce anatomical errors in text-to-image human generation, supported by the new HAP dataset and HAF-Bench.

DRM: Diffusion-based Reward Model With Step-wise Guidance

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

DRM turns a pre-trained diffusion model into a step-wise reward model and uses it for dense RL training (Step-wise GRPO) and guided sampling to improve final image quality.

Inference-Time Alignment of Diffusion Models via Trust-Region Iterative Twisted Sequential Monte Carlo

cs.LG · 2026-05-24 · conditional · novelty 7.0

TRI-TSMC is a trust-region framework for learning twisting functions in SMC-based inference-time alignment of diffusion models that yields zero-variance samplers in theory and better alignment on text and image tasks under fixed budgets.

RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution

cs.CV · 2026-05-20 · conditional · novelty 7.0

RankE co-evolves AR policy and decoder via alternating ranking optimization, improving both FID and CLIP scores on LlamaGen-XL and Janus-Pro where policy-only RL degrades FID.

Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.

CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.

TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design

cs.CV · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

TASTE supplies designer multi-dimensional rankings of T2I graphic outputs with statistical validation showing moderate agreement and benchmarks where a TASTE-trained MLP outperforms off-the-shelf VLMs.

Probability-Conserving Flow Guidance

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

AdaMaG is a guidance rule for generative models derived from decomposing continuity-equation effects into divergence and score-parallel terms, with a proof that divergence diverges near the manifold and a time-dependent bound that improves realism at no extra cost.

citing papers explorer

Showing 50 of 159 citing papers.

How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance cs.LG · 2026-04-29 · unverdicted · none · ref 59 · 3 links · internal anchor
FMRG reformulates guidance as deterministic optimal control, deriving a single-trajectory method using the flow map that matches or exceeds baselines on reward-guided generation and inverse problems with 3 NFEs at text-to-image scale.
OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models cs.CV · 2026-04-05 · unverdicted · none · ref 42 · internal anchor
OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
Cross-Space Distillation: Teaching One-Step Students with Modern Diffusion Teachers cs.CV · 2026-06-30 · unverdicted · none · ref 51 · internal anchor
Introduces a Bridge latent interface that maps mismatched student latents into teacher space, enabling distillation from modern diffusion teachers to compact one-step students and raising SD 1.5 HPSv3 from 5.4 to 9.4 while keeping one-step speed.
Variance Reduction on the Camera Axis: Multi-View Score Distillation for 3D cs.CV · 2026-06-29 · unverdicted · none · ref 47 · internal anchor
MV-SDI aggregates K-view gradients per step via accumulation and antithetic pairs at fixed UNet budget, raising CLIP R-Precision from 74.8% to 83.8% (K=2) and halving steps while keeping the 2D prior frozen.
DiT-Reward: Generative Representations for Text-to-Image Reward Modeling cs.LG · 2026-06-22 · unverdicted · none · ref 119 · internal anchor
DiT-Reward converts pretrained DiT models into reward predictors that outperform HPSv3 on four benchmarks while providing 1.65x inference speedup.
Through the PRISM: Preference Representation in Intermediate States of Video Diffusion Models cs.CV · 2026-06-18 · unverdicted · none · ref 32 · internal anchor
PRISM shows video diffusion models inherently encode preference information in noisy latents, achieving SOTA accuracy and enabling noise-robust early-stage sampling with a correlation to generative performance.
The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL cs.LG · 2026-06-17 · unverdicted · none · ref 161 · internal anchor
DRL trains a discriminator on data versus base-model samples in pretrained representation space and uses its logit as reward in KL-regularized RL, cutting guidance-free FID from 9.38 to 2.62 on SiT and similar gains on other backbones.
ReFree: Towards Realistic Co-Speech Video Generation via Reward-Free RL and Multilevel Speech Guidance cs.CV · 2026-06-11 · unverdicted · none · ref 8 · internal anchor
ReFree-S2V applies multilevel speech guidance and reward-free reinforcement learning inside a flow-matching model built on a pretrained video generator to improve lip synchronization and natural expressivity in talking-head videos.
It\^o maps for any-step SDEs stat.ML · 2026-06-09 · unverdicted · none · ref 19 · internal anchor
Defines Itô maps for any-step SDE integration and shows their use for conditional endpoint sampling and steering on synthetic and image tasks.
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models cs.LG · 2026-06-09 · unverdicted · none · ref 11 · 2 links · internal anchor
Flow-DPPO replaces PPO ratio clipping with an asymmetric KL divergence mask for flow models, claiming higher rewards, reduced forgetting, and stable multi-epoch training.
HACK++: Towards More Effective Head-Aware Key-Value Compression for Efficient Visual Autoregressive Modeling cs.CV · 2026-06-06 · unverdicted · none · ref 63 · internal anchor
HACK++ is a head-aware KV cache compression framework for VAR models that decouples current-scale attention from historical cache under adaptive per-head budgets to achieve near-lossless generation at 30% attention and 10% cache budgets.
Parallel Jacobi Decoding for Fast Autoregressive Image Generation cs.CV · 2026-06-04 · conditional · none · ref 58 · internal anchor
Parallel Jacobi Decoding accelerates autoregressive image models 4.8x-6.4x by using 2D spatial draft expansion and adjusted attention masks while keeping generation quality competitive.
A Dataset for Dynamic Human Preferences for Vision Language Models cs.CV · 2026-06-02 · unverdicted · none · ref 36 · internal anchor
Introduces a benchmark dataset with automated pipeline for evaluating VLMs on dynamic in-context human preferences, distinct from static benchmarks.
Drifting Preference Optimization for One-Step Generative Models cs.LG · 2026-06-01 · unverdicted · none · ref 37 · internal anchor
DrPO enables online preference optimization for deterministic one-step generators via non-parametric dipole updates from ranked samples plus base-model drift, without reward backpropagation.
Where to Refine, When to Stop: Rethinking Redundancy via Latent Discrepancy for Efficient Visual Autoregressive Generation cs.CV · 2026-05-29 · unverdicted · none · ref 14 · internal anchor
LD-Pruning applies latent discrepancy to prune tokens and adaptively skip unconditional branches in VAR models for up to 2.35x faster inference with preserved quality.
Explicit Critic Guidance for Aligning Diffusion Models cs.LG · 2026-05-26 · unverdicted · none · ref 79 · internal anchor
Introduces a state-aligned latent actor-critic framework that lets diffusion models act as their own timestep-conditioned value functions for trajectory-level RL post-training and inference steering.
Towards Anatomically Plausible Human Image Generation via Synthetic Localized Preferences cs.CV · 2026-05-25 · unverdicted · none · ref 41 · internal anchor
ASAP generates over 10K synthetic anatomical preference pairs via targeted degradation of high-fidelity images and applies a localized margin-bounded DPO to reduce anatomical errors in text-to-image human generation, supported by the new HAP dataset and HAF-Bench.
DRM: Diffusion-based Reward Model With Step-wise Guidance cs.CV · 2026-05-25 · unverdicted · none · ref 41 · internal anchor
DRM turns a pre-trained diffusion model into a step-wise reward model and uses it for dense RL training (Step-wise GRPO) and guided sampling to improve final image quality.
Inference-Time Alignment of Diffusion Models via Trust-Region Iterative Twisted Sequential Monte Carlo cs.LG · 2026-05-24 · conditional · none · ref 37 · internal anchor
TRI-TSMC is a trust-region framework for learning twisting functions in SMC-based inference-time alignment of diffusion models that yields zero-variance samplers in theory and better alignment on text and image tasks under fixed budgets.
RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution cs.CV · 2026-05-20 · conditional · none · ref 66 · internal anchor
RankE co-evolves AR policy and decoder via alternating ranking optimization, improving both FID and CLIP scores on LlamaGen-XL and Janus-Pro where policy-only RL degrades FID.
Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models cs.CV · 2026-05-20 · unverdicted · none · ref 30 · internal anchor
Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.
CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation cs.LG · 2026-05-20 · unverdicted · none · ref 71 · internal anchor
CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.
TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design cs.CV · 2026-05-20 · unverdicted · none · ref 36 · 2 links · internal anchor
TASTE supplies designer multi-dimensional rankings of T2I graphic outputs with statistical validation showing moderate agreement and benchmarks where a TASTE-trained MLP outperforms off-the-shelf VLMs.
Probability-Conserving Flow Guidance cs.CV · 2026-05-19 · unverdicted · none · ref 11 · internal anchor
AdaMaG is a guidance rule for generative models derived from decomposing continuity-equation effects into divergence and score-parallel terms, with a proof that divergence diverges near the manifold and a time-dependent bound that improves realism at no extra cost.
AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment cs.AI · 2026-05-17 · unverdicted · none · ref 29 · 2 links · internal anchor
AutoRubric-T2I learns and selects explicit rubrics from preference pairs to guide VLM judges, producing high-quality interpretable rewards for T2I alignment with far less data than traditional Bradley-Terry models.
SeamCam: Quantifying Seamless Camouflage via Multi-Cue Visual Detectability cs.CV · 2026-05-15 · conditional · none · ref 51 · internal anchor
SeamCam quantifies camouflage by computing one minus the highest IoU recoverable from category-conditioned detection proposals against a ground-truth mask, achieving 78.82% agreement with human judgments.
HeatKV: Head-tuned KV-cache Compression for Visual Autoregressive Modeling cs.CV · 2026-05-14 · conditional · none · ref 35 · 2 links · internal anchor
HeatKV doubles KV-cache compression ratios over prior methods for VAR models by creating static head-specific pruning schedules from attention rankings on a calibration set, while preserving image quality on Infinity-2B.
Pareto-Guided Optimal Transport for Multi-Reward Alignment cs.CV · 2026-05-13 · unverdicted · none · ref 15 · internal anchor
PG-OT builds prompt-specific Pareto frontiers and applies distribution-aware optimal transport to improve multi-reward alignment while introducing JDR and JCR metrics to measure synergy and hacking.
Asymmetric Flow Models cs.CV · 2026-05-13 · unverdicted · none · ref 70 · 2 links · internal anchor
AsymFlow uses rank-asymmetric velocity prediction to reach 1.57 FID on ImageNet 256x256 and enables finetuning of latent flow models into superior pixel-space text-to-image generators.
STRIDE: Training-Free Diversity Guidance via PCA-Directed Feature Perturbation in Single-Step Diffusion Models cs.CV · 2026-05-12 · unverdicted · none · ref 39 · internal anchor
STRIDE boosts diversity in one-step diffusion models by injecting PCA-aligned pink noise into transformer features while preserving text alignment and quality.
ExtraVAR: Stage-Aware RoPE Remapping for Resolution Extrapolation in Visual Autoregressive Models cs.CV · 2026-05-11 · unverdicted · none · ref 39 · internal anchor
ExtraVAR enables resolution extrapolation in visual autoregressive models by stage-aware RoPE remapping and entropy-driven attention scaling, suppressing repetition and detail loss.
TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment cs.LG · 2026-05-09 · unverdicted · none · ref 11 · 2 links · internal anchor
TMPO uses Softmax Trajectory Balance to match policy probabilities over multiple trajectories to a Boltzmann reward distribution, improving diversity by 9.1% in diffusion alignment tasks.
LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling cs.CV · 2026-05-08 · unverdicted · none · ref 41 · internal anchor
LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.
Arena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Models cs.CV · 2026-05-07 · unverdicted · none · ref 36 · internal anchor
ArenaPO infers Gaussian capability distributions from pairwise preferences and applies truncated-normal latent inference to derive fine-grained offline rewards for preference optimization of text-to-image diffusion models.
Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization cs.CV · 2026-04-26 · unverdicted · none · ref 44 · internal anchor
Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models cs.CV · 2026-04-26 · unverdicted · none · ref 47 · internal anchor
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.
Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation cs.CV · 2026-04-21 · unverdicted · none · ref 43 · internal anchor
OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.
Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning cs.LG · 2026-04-21 · unverdicted · none · ref 53 · internal anchor
GDMD replaces raw-sample rewards with distillation-gradient rewards in RL-guided diffusion distillation, yielding 4-step models that surpass their multi-step teachers on GenEval and human preference metrics.
Depth Adaptive Efficient Visual Autoregressive Modeling cs.CV · 2026-04-19 · unverdicted · none · ref 60 · internal anchor
DepthVAR adaptively allocates per-token computational depth in VAR models using a cyclic rotated scheduler and dynamic layer masking to achieve 2.3-3.1x inference speedup with minimal quality loss.
Comparison Drives Preference: Reference-Aware Modeling for AI-Generated Video Quality Assessment cs.CV · 2026-04-18 · unverdicted · none · ref 51 · internal anchor
RefVQA uses a query-centered reference graph and graph-guided difference aggregation to improve AI-generated video quality assessment by incorporating inter-video comparisons.
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories cs.CV · 2026-04-16 · unverdicted · none · ref 51 · internal anchor
LeapAlign fine-tunes flow matching models by constructing two consecutive leaps that skip multiple ODE steps with randomized timesteps and consistency weighting, enabling stable updates at any generation step.
OneHOI: Unifying Human-Object Interaction Generation and Editing cs.CV · 2026-04-15 · unverdicted · none · ref 40 · internal anchor
OneHOI unifies HOI generation and editing in one conditional diffusion transformer using role-aware tokens, structured attention, and joint training on mixed datasets to reach SOTA on both tasks.
SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models cs.LG · 2026-04-14 · unverdicted · none · ref 5 · internal anchor
SOAR is a reward-free on-policy method that supplies dense per-timestep supervision to correct exposure bias in diffusion model denoising trajectories, raising GenEval from 0.70 to 0.78 and OCR from 0.64 to 0.67 over SFT on SD3.5-Medium.
RewardFlow: Generate Images by Optimizing What You Reward cs.CV · 2026-04-09 · unverdicted · none · ref 43 · internal anchor
RewardFlow unifies differentiable rewards including a new VQA-based one and uses a prompt-aware adaptive policy with Langevin dynamics to achieve state-of-the-art image editing and compositional generation.
Personalizing Text-to-Image Generation to Individual Taste cs.CV · 2026-04-08 · unverdicted · none · ref 56 · internal anchor
PAMELA provides a multi-user rating dataset and personalized reward model that predicts individual image preferences more accurately than prior population-level aesthetic models.
Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling cs.LG · 2026-04-06 · unverdicted · none · ref 35 · internal anchor
HiVG introduces hierarchical SVG tokenization with atomic and segment tokens plus HMN initialization to enable more efficient and stable autoregressive generation of vector graphics programs.
1.x-Distill: Breaking the Diversity, Quality, and Efficiency Barrier in Distribution Matching Distillation cs.CV · 2026-04-05 · conditional · none · ref 47 · internal anchor
1.x-Distill achieves better quality and diversity than prior few-step distillation methods at 1.67 and 1.74 effective NFEs on SD3 models with up to 33x speedup.
SHARP: Spectrum-aware Highly-dynamic Adaptation for Resolution Promotion in Remote Sensing Synthesis cs.CV · 2026-03-23 · conditional · none · ref 34 · internal anchor
SHARP applies a spectrum-aware dynamic RoPE scaling schedule that promotes resolution more strongly in early denoising stages and relaxes it later, outperforming static baselines on quality metrics for remote sensing images.
Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards cs.CV · 2026-03-01 · unverdicted · none · ref 75 · internal anchor
SOLACE improves text-to-image generation by using intrinsic self-confidence rewards from noise reconstruction accuracy during reinforcement learning post-training without external supervision.
Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching cs.CV · 2026-02-12 · unverdicted · none · ref 122 · internal anchor
Stroke of Surprise is a framework that generates vector sketches undergoing semantic transformation from one concept to another by adding strokes, using dual-branch SDS and overlay loss for optimization.

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer