arxiv: 2603.05947 · v3 · submitted 2026-03-06 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Flow-Based Real-World Super-Resolution

Song Fei , Tian Ye , Sixiang Chen , Zhaohu Xing , Jianyu Lai , Lei Zhu

Authors on Pith no claims yet

Pith reviewed 2026-05-15 15:40 UTC · model grok-4.3

classification 💻 cs.CV

keywords real-world image super-resolutiongenerative ISRreinforcement learningpreference optimizationflow-matching modelshallucination controlLR consistencymulti-reward RL

0 comments

The pith

LucidNFT anchors RL updates to LR evidence via a new consistency evaluator, raising perceptual quality in flow-based real-world super-resolution without increasing hallucinations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that generative real-world super-resolution models often produce sharp outputs that stray from the low-resolution input through semantic or structural hallucinations. To counter this, LucidNFT trains LucidConsistency—an evaluator that stays invariant to degradations yet flags localized hallucinations—on content-consistent degradation pools and original-inpainted hard negatives. It then folds this signal into a multi-reward RL loop that uses decoupled normalization to keep objective contrasts intact inside each LR-conditioned rollout group. The resulting fine-tuning on the LucidLR collection of real degraded images yields restorations that look better to humans while staying more faithful to the original LR evidence.

Core claim

LucidNFT introduces LucidConsistency, a degradation-invariant and hallucination-sensitive LR-referenced evaluator trained with content-consistent degradation pools and original-inpainted hard negatives; a decoupled reward normalization strategy that preserves objective-wise contrasts within each LR-conditioned rollout group before fusion; and LucidLR, a large-scale collection of real-world degraded images for robust RL fine-tuning. This multi-reward preference optimization improves perceptual quality on strong flow-based Real-ISR baselines while generally maintaining LR-referenced consistency across diverse real-world scenarios.

What carries the argument

LucidConsistency, an LR-referenced evaluator trained to be invariant to degradation yet sensitive to hallucinations, used inside a multi-reward RL loop with decoupled normalization to guide flow-matching super-resolution updates.

If this is right

Perceptual gains appear without loss of LR fidelity on diverse real degradations.
Decoupled normalization keeps separate reward contrasts alive inside each rollout group instead of compressing them.
The LucidLR collection supplies broader degradation coverage for stable RL fine-tuning.
The same anchored preference loop can be applied to other flow-based generative restoration tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same consistency signal could be reused as a training objective for non-RL super-resolution models.
Extending the hard-negative construction to video frames might reduce temporal hallucinations in video super-resolution.
LucidLR-style collections could support preference data for related tasks such as real-world deblurring or denoising.

Load-bearing premise

LucidConsistency supplies a degradation-invariant yet hallucination-sensitive faithfulness signal that reliably guides the RL updates without introducing its own biases.

What would settle it

Human preference tests or automated LR-consistency metrics on held-out real-world images showing that LucidNFT outputs contain more semantic or structural hallucinations than the flow-based baselines.

Figures

Figures reproduced from arXiv: 2603.05947 by Jianyu Lai, Lei Zhu, Sixiang Chen, Song Fei, Tian Ye, Zhaohu Xing.

**Figure 1.** Figure 1: Overview of LucidConsistency. Left: inference stage where embeddings of the LR input and SR output are extracted and their semantic consistency is computed via Eq. (10). Right: training stage where LR–HR pairs are used to optimizate the projection head. 4.1 LucidConsistency: Degradation-Robust Consistency Evaluation Existing Real-ISR evaluation protocols mainly rely on no-reference perceptual metrics [15, … view at source ↗

**Figure 2.** Figure 2: Advantage separability analysis on the LucidFlux backbone using dataset RealLQ250 [2]. (a) DAGC versus rollout count M; (b) mean pairwise advantage gap |∆A| versus M using the top-1 max-∆r pair per group; (c) distribution of |∆A| at M = 12. LucidNFT consistently yields larger advantage gaps and higher separability than DiffusionNFT, indicating reduced advantage compression under decoupled normalization … view at source ↗

**Figure 3.** Figure 3: Representative examples from LucidLR. 4.3 LucidLR: A Large-Scale Real-World Degradation Dataset for Real-ISR [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Training dynamics of LucidNFT on LucidFlux. From left to right: training LucidConsistency score, evaluation LucidConsistency score, training UniPercept IQA score, and evaluation UniPercept IQA score. The smoothed curves exhibit a consistent upward trend, indicating stable multi-reward optimization during RL. Comparison Methods and Evaluation Metrics. We compare the LucidNFToptimized model (LucidFlux + Luc… view at source ↗

**Figure 5.** Figure 5: Visual comparison on RealLQ250 [2]. LucidNFT further improves semantic consistency and perceptual quality over the baseline LucidFlux, producing more faithful structures and richer texture details. Qualitative Comparison [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Annotation interface used for the LR-faithfulness study. The first page shows the standard interface displaying the LR input and three SR candidates. The second page illustrates the enlarged view used by annotators to inspect local structures [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: Training dynamics of LucidNFT on the DiT4SR backbone. The curves show the evolution of the image quality reward (UniPercept IQA) and the consistency reward (LucidConsistency) during RL optimization [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: Primary degradation distribution across four real-world low-quality datasets. Each image contributes to exactly one dominant degradation category. Compared with RealSR and DRealSR, which are strongly dominated by defocus blur, LucidLR presents a more balanced primary degradation distribution over multiple categories [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: Occurrence frequency of degradation categories across four real-world lowquality datasets: LucidLR, RealLQ250, RealSR, and DRealSR. Occurrence frequency is computed in a multi-label manner, so percentages do not sum to 100%. LucidLR exhibits substantially broader degradation coverage and a richer long-tail distribution than existing benchmark datasets [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗

**Figure 10.** Figure 10: Additional qualitative comparisons on RealLQ250 [2]. Compared with the LucidFlux baseline, LucidNFT consistently produces more faithful structures and richer texture details while maintaining strong perceptual quality. See [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗

**Figure 11.** Figure 11: Optimization curves under different reward formulations. The plots show UniPercept rewards during RL training (left) and evaluation (right). LucidNFT achieves higher rewards and smoother convergence compared with IQA-only RL and scalar-aggregated multi-reward RL [PITH_FULL_IMAGE:figures/full_fig_p028_11.png] view at source ↗

read the original abstract

Generative real-world image super-resolution (Real-ISR) can synthesize visually convincing details from severely degraded low-resolution (LR) inputs, yet its stochastic sampling makes a critical failure mode hard to avoid: outputs may look sharp but be unfaithful to the LR evidence, exhibiting semantic or structural hallucinations. Preference-based reinforcement learning (RL) is a natural fit because each LR input yields a rollout group of candidate restorations. However, effective alignment in Real-ISR is hindered by three coupled challenges: (i) the lack of an LR-referenced faithfulness signal that is robust to degradation yet sensitive to localized hallucinations, (ii) a rollout-group optimization bottleneck where scalarizing heterogeneous rewards before normalization compresses objective-wise contrasts and weakens DiffusionNFT-style reward-weighted updates, and (iii) limited coverage of real degradations, which restricts rollout diversity and preference signal quality. We propose LucidNFT, a multi-reward RL framework for flow-matching Real-ISR. LucidNFT introduces LucidConsistency, a degradation-invariant and hallucination-sensitive LR-referenced evaluator trained with content-consistent degradation pools and original-inpainted hard negatives; a decoupled reward normalization strategy that preserves objective-wise contrasts within each LR-conditioned rollout group before fusion; and LucidLR, a large-scale collection of real-world degraded images for robust RL fine-tuning. Extensive experiments show that LucidNFT improves perceptual quality on strong flow-based Real-ISR baselines while generally maintaining LR-referenced consistency across diverse real-world scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LucidNFT adds a targeted evaluator and decoupled reward handling to preference RL for flow-based real-world SR, but the abstract supplies no numbers or ablations to back the gains.

read the letter

The main point is that LucidNFT tries to make preference-based RL work better for flow-matching real-world super-resolution by fixing three specific problems: a weak faithfulness signal, reward normalization that flattens differences, and narrow degradation coverage. They introduce LucidConsistency as an LR-referenced evaluator trained on content-consistent pools plus inpainted hard negatives, a decoupled normalization step that keeps objective contrasts inside each rollout group, and the LucidLR dataset for broader real-world fine-tuning. This setup aims to reduce semantic hallucinations while keeping the strengths of flow sampling.

Referee Report

2 major / 2 minor

Summary. The manuscript presents LucidNFT, a multi-reward preference optimization framework for flow-based real-world super-resolution (Real-ISR). It introduces LucidConsistency as a degradation-invariant, hallucination-sensitive LR-referenced evaluator trained using content-consistent degradation pools and original-inpainted hard negatives, a decoupled reward normalization strategy to preserve objective contrasts in rollout groups, and the LucidLR dataset for robust fine-tuning. The central claim is that this approach improves perceptual quality on strong flow-based baselines while maintaining LR-referenced consistency across diverse real-world scenarios.

Significance. If the empirical results and the properties of LucidConsistency hold, this work could meaningfully advance preference-based RL for generative image restoration by providing a more robust faithfulness signal and better optimization strategy. The new dataset and evaluator may serve as useful resources for the community in addressing hallucinations in Real-ISR.

major comments (2)

[LucidConsistency training and evaluation] The central claim rests on LucidConsistency supplying a degradation-invariant yet hallucination-sensitive signal. However, no cross-degradation consistency metric, ablation removing the inpainted hard negatives, or correlation with human faithfulness judgments on out-of-distribution degradations is reported. This leaves open whether the evaluator introduces systematic biases (e.g., penalizing plausible high-frequency detail) that would propagate into the multi-reward RL updates.
[Experiments and results] The experiments claim improvements on strong flow-based Real-ISR baselines, yet the manuscript supplies no quantitative tables with perceptual metrics (LPIPS, FID, NIQE), consistency measures, baseline comparisons, or error analysis across degradation types. Without these, the support for the perceptual-quality and consistency claims cannot be verified.

minor comments (2)

[Abstract] The abstract would be strengthened by including one or two key quantitative results (e.g., LPIPS delta or human preference win rate) rather than only qualitative statements.
[Method] Clarify the precise form of the decoupled normalization (e.g., per-objective z-scoring within each LR rollout group) and how it differs from standard DiffusionNFT reward weighting.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important areas for strengthening the validation of LucidConsistency and the experimental presentation. We address each major comment below and will revise the manuscript accordingly to improve rigor and verifiability.

read point-by-point responses

Referee: [LucidConsistency training and evaluation] The central claim rests on LucidConsistency supplying a degradation-invariant yet hallucination-sensitive signal. However, no cross-degradation consistency metric, ablation removing the inpainted hard negatives, or correlation with human faithfulness judgments on out-of-distribution degradations is reported. This leaves open whether the evaluator introduces systematic biases (e.g., penalizing plausible high-frequency detail) that would propagate into the multi-reward RL updates.

Authors: We agree that these additional analyses would strengthen the central claim regarding LucidConsistency. In the revised manuscript, we will add a cross-degradation consistency metric evaluated across multiple real-world degradation types to demonstrate invariance. We will also include an ablation study removing the original-inpainted hard negatives to quantify their contribution to hallucination sensitivity. For human correlation, we will report results from a targeted user study correlating LucidConsistency scores with human faithfulness judgments on out-of-distribution degradations. To address potential biases, we will add discussion and examples showing that the evaluator preserves plausible high-frequency details rather than penalizing them, based on the content-consistent degradation pools used in training. revision: yes
Referee: [Experiments and results] The experiments claim improvements on strong flow-based Real-ISR baselines, yet the manuscript supplies no quantitative tables with perceptual metrics (LPIPS, FID, NIQE), consistency measures, baseline comparisons, or error analysis across degradation types. Without these, the support for the perceptual-quality and consistency claims cannot be verified.

Authors: We acknowledge the need for clearer quantitative support and will add comprehensive tables in the revised manuscript. These will include perceptual metrics such as LPIPS, FID, and NIQE; our LR-referenced consistency measures; direct numerical comparisons against the strong flow-based Real-ISR baselines; and error analysis stratified across degradation types (e.g., blur, noise, and compression). This will make the improvements in perceptual quality and maintained consistency fully verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: framework builds independent components on external RL/flow-matching literature

full rationale

The paper introduces LucidNFT as an empirical multi-reward RL framework for flow-based Real-ISR, with LucidConsistency trained on content-consistent degradation pools and inpainted negatives, decoupled normalization, and LucidLR dataset. No equations or derivations appear that reduce claimed perceptual gains or consistency maintenance to fitted parameters renamed as predictions, self-definitions, or self-citation chains. The central claims rest on experimental validation against baselines rather than any load-bearing step that equates outputs to inputs by construction. Self-citations, if present, are not invoked to justify uniqueness theorems or ansatzes. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

The central claim rests on the unverified effectiveness of the newly introduced LucidConsistency evaluator and the assumption that rollout-group rewards can be meaningfully decoupled; no free parameters, standard axioms, or invented physical entities are described in the abstract.

invented entities (2)

LucidConsistency no independent evidence
purpose: Degradation-invariant LR-referenced faithfulness evaluator
Introduced as a trained model using content-consistent degradation pools and hard negatives; no independent evidence outside the paper is stated.
LucidLR no independent evidence
purpose: Large-scale collection of real-world degraded images for RL fine-tuning
New dataset introduced to improve coverage of real degradations; access and construction details absent from abstract.

pith-pipeline@v0.9.0 · 5582 in / 1306 out tokens · 38565 ms · 2026-05-15T15:40:50.335690+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LucidConsistency defines the LR-anchored faithfulness score ... C(xlr, xsr) = s(g(E(xlr)), g(E(xsr))) ... optimized with symmetric InfoNCE
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

decoupled advantage normalization ... z(k)i,j = (r(k)i,j − μ(k)i) / σ(k)i ... ALi,j = ãi,j − μbatch / σbatch

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 13 internal anchors

[1]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

Agustsson, E., Timofte, R.: Ntire 2017 challenge on single image super-resolution: Dataset and study. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 126–135 (2017) 16 S. Fei et al

work page 2017
[2]

Ad- vances in Neural Information Processing Systems37, 55443–55469 (2024)

Ai, Y., Zhou, X., Huang, H., Han, X., Chen, Z., You, Q., Yang, H.: Dreamclear: High-capacity real-world image restoration with privacy-safe dataset curation. Ad- vances in Neural Information Processing Systems37, 55443–55469 (2024)

work page 2024
[3]

Training Diffusion Models with Reinforcement Learning

Black, K., Janner, M., Du, Y., Kostrikov, I., Levine, S.: Training diffusion models with reinforcement learning. arXiv preprint arXiv:2305.13301 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Cai, J., Zeng, H., Yong, H., Cao, Z., Zhang, L.: Toward real-world single im- age super-resolution: A new benchmark and a new model. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 3086–3095 (2019)

work page 2019
[5]

Cao, S., Li, J., Li, X., Pu, Y., Zhu, K., Gao, Y., Luo, S., Xin, Y., Qin, Q., Zhou, Y., Chen, X., Zhang, W., Fu, B., Qiao, Y., Liu, Y.: Unipercept: Towards uni- fied perceptual-level image understanding across aesthetics, quality, structure, and texture (2025)

work page 2025
[6]

Chen, J., Yu, J., Ge, C., Yao, L., Xie, E., Wu, Y., Wang, Z., Kwok, J., Luo, P., Lu, H., Li, Z.: Pixart-α: Fast training of diffusion transformer for photorealistic text-to-image synthesis (2023)

work page 2023
[7]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Chen, X., Wang, X., Zhou, J., Qiao, Y., Dong, C.: Activating more pixels in image super-resolution transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 22367–22377 (2023)

work page 2023
[8]

Advances in Neural Information Processing Systems37, 22596–22623 (2024)

Cohen, R., Kligvasser, I., Rivlin, E., Freedman, D.: Looks too good to be true: An information-theoretic analysis of hallucinations in generative restoration models. Advances in Neural Information Processing Systems37, 22596–22623 (2024)

work page 2024
[9]

In: Proceedings of the European Conference on Computer Vision

Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Proceedings of the European Conference on Computer Vision. pp. 184–199. Springer (2014)

work page 2014
[10]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2025)

Duan, Z.P., Zhang, J., Jin, X., Zhang, Z., Xiong, Z., Zou, D., Ren, J., Guo, C.L., Li, C.: Dit4sr: Taming diffusion transformer for real-world image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2025)

work page 2025
[11]

In: Forty-first International Conference on Machine Learning (2024)

Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al.: Scaling rectified flow transformers for high- resolution image synthesis. In: Forty-first International Conference on Machine Learning (2024)

work page 2024
[12]

Falconsai: Nsfw image detection model (2026)

work page 2026
[13]

LucidFlux: Caption-Free Photo-Realistic Image Restoration via a Large-Scale Diffusion Transformer

Fei, S., Ye, T., Wang, L., Zhu, L.: Lucidflux: Caption-free universal image restora- tion via a large-scale diffusion transformer. arXiv preprint arXiv:2509.22414 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Iclr1(2), 3 (2022)

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. Iclr1(2), 3 (2022)

work page 2022
[15]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: Musiq: Multi-scale image quality transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5148–5157 (2021)

work page 2021
[16]

Labs, B.F.: Flux (2024)

work page 2024
[17]

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Li, M., Zhang, Y., Long, D., Chen, K., Song, S., Bai, S., Yang, Z., Xie, P., Yang, A., Liu, D., et al.: Qwen3-vl-embedding and qwen3-vl-reranker: A unified framework for state-of-the-art multimodal retrieval and ranking. arXiv preprint arXiv:2601.04720 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[18]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Li, Y., Zhang, K., Liang, J., Cao, J., Liu, C., Gong, R., Zhang, Y., Tang, H., Liu, Y., Demandolx, D., Ranjan, R., Timofte, R., Van Gool, L.: Lsdir: A large scale dataset for image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1775–1787 (2023) LucidNFT 17

work page 2023
[19]

In: European Conference on Computer Vision

Lin, X., He, J., Chen, Z., Lyu, Z., Dai, B., Yu, F., Qiao, Y., Ouyang, W., Dong, C.: Diffbir: Toward blind image restoration with generative diffusion prior. In: European Conference on Computer Vision. pp. 430–448. Springer (2024)

work page 2024
[20]

Flow Matching for Generative Modeling

Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. arXiv preprint arXiv:2210.02747 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[21]

Flow-GRPO: Training Flow Matching Models via Online RL

Liu, J., Liu, G., Liang, J., Li, Y., Liu, J., Wang, X., Wan, P., Zhang, D., Ouyang, W.: Flow-grpo: Training flow matching models via online rl. arXiv preprint arXiv:2505.05470 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Liu, X., Gong, C., Liu, Q.: Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[23]

Decoupled Weight Decay Regularization

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[24]

Representation Learning with Contrastive Predictive Coding

Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predic- tive coding. arXiv preprint arXiv:1807.03748 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[25]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[26]

Advances in neural information processing systems36, 53728–53741 (2023)

Rafailov, R., Sharma, A., Mitchell, E., Manning, C.D., Ermon, S., Finn, C.: Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems36, 53728–53741 (2023)

work page 2023
[27]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10684–10695 (2022)

work page 2022
[28]

Proximal Policy Optimization Algorithms

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[29]

Sutton, R.S., Barto, A.G., et al.: Reinforcement learning: An introduction, vol. 1. MIT press Cambridge (1998)

work page 1998
[30]

IEEE Transactions on Image Processing27(8), 3998–4011 (2018)

Talebi, H., Milanfar, P.: Nima: Neural image assessment. IEEE Transactions on Image Processing27(8), 3998–4011 (2018)

work page 2018
[31]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Wallace, B., Dang, M., Rafailov, R., Zhou, L., Lou, A., Purushwalkam, S., Ermon, S., Xiong, C., Joty, S., Naik, N.: Diffusion model alignment using direct preference optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8228–8238 (2024)

work page 2024
[32]

In: AAAI (2023)

Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images. In: AAAI (2023)

work page 2023
[33]

International Journal of Computer Vision 132(12), 5929–5949 (2024)

Wang, J., Yue, Z., Zhou, S., Chan, K.C., Loy, C.C.: Exploiting diffusion prior for real-world image super-resolution. International Journal of Computer Vision 132(12), 5929–5949 (2024)

work page 2024
[34]

In: Proceedings of the IEEE/CVF in- ternational conference on computer vision

Wang, X., Xie, L., Dong, C., Shan, Y.: Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In: Proceedings of the IEEE/CVF in- ternational conference on computer vision. pp. 1905–1914 (2021)

work page 1905
[35]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Wang, Y., Yang, W., Chen, X., Wang, Y., Guo, L., Chau, L.P., Liu, Z., Qiao, Y., Kot, A.C., Wen, B.: Sinsr: diffusion-based image super-resolution in a single step. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 25796–25805 (2024)

work page 2024
[36]

IEEE Transactions on Image Process- ing13(4), 600–612 (2004)

Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Process- ing13(4), 600–612 (2004)

work page 2004
[37]

In: Proceedings of the European Conference on Computer Vision

Wei, P., Xie, Z., Lu, H., Zhan, Z., Ye, Q., Zuo, W., Lin, L.: Component divide- and-conquer for real-world image super-resolution. In: Proceedings of the European Conference on Computer Vision. pp. 101–117. Springer (2020) 18 S. Fei et al

work page 2020
[38]

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

Wu, H., Zhang, Z., Zhang, W., Chen, C., Li, C., Liao, L., Wang, A., Zhang, E., Sun, W., Yan, Q., Min, X., Zhai, G., Lin, W.: Q-align: Teaching lmms for visual scoring via discrete text-defined levels. arXiv preprint arXiv:2312.17090 (2023), equal Contribution by Wu, Haoning and Zhang, Zicheng. Project Lead by Wu, Haoning. Corresponding Authors: Zhai, Guan...

work page internal anchor Pith review arXiv 2023
[39]

In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition

Wu, R., Yang, T., Sun, L., Zhang, Z., Li, S., Zhang, L.: Seesr: Towards semantics- aware real-world image super-resolution. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition. pp. 25456–25467 (2024)

work page 2024
[40]

arXiv preprint arXiv:2505.14460 (2025)

Wu,T.,Zou,J.,Liang,J.,Zhang,L.,Ma,K.:VisualQuality-R1:Reasoning-induced image quality assessment via reinforcement learning to rank. arXiv preprint arXiv:2505.14460 (2025)

work page arXiv 2025
[41]

DanceGRPO: Unleashing GRPO on Visual Generation

Xue, Z., Wu, J., Gao, Y., Kong, F., Zhu, L., Chen, M., Liu, Z., Liu, W., Guo, Q., Huang, W., et al.: Dancegrpo: Unleashing grpo on visual generation. arXiv preprint arXiv:2505.07818 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[42]

In: ProceedingsoftheIEEE/CVFConferenceonComputerVisionandPatternRecog- nition

Yang, S., Wu, T., Shi, S., Lao, S., Gong, Y., Cao, M., Wang, J., Yang, Y.: Maniqa: Multi-dimension attention network for no-reference image quality assessment. In: ProceedingsoftheIEEE/CVFConferenceonComputerVisionandPatternRecog- nition. pp. 1191–1200 (2022)

work page 2022
[43]

Yu,F.,Gu,J.,Li,Z.,Hu,J.,Kong,X.,Wang,X.,He,J.,Qiao,Y.,Dong,C.:Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild (2024)

work page 2024
[44]

Advances in Neural Information Processing Systems 36, 13294–13307 (2023)

Yue, Z., Wang, J., Loy, C.C.: Resshift: Efficient diffusion model for image super- resolution by residual shifting. Advances in Neural Information Processing Systems 36, 13294–13307 (2023)

work page 2023
[45]

In: Proceedings of the IEEE/CVF international conference on computer vision

Zhang, K., Liang, J., Van Gool, L., Timofte, R.: Designing a practical degradation model for deep blind image super-resolution. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4791–4800 (2021)

work page 2021
[46]

IEEE Transactions on Image Processing24(8), 2579–2591 (2015)

Zhang, L., Zhang, L., Bovik, A.C.: A feature-enriched completely blind image qual- ity evaluator. IEEE Transactions on Image Processing24(8), 2579–2591 (2015)

work page 2015
[47]

In: Proceedings of the European Conference on Computer Vision

Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision. pp. 286–301 (2018)

work page 2018
[48]

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Zheng, K., Chen, H., Ye, H., Wang, H., Zhang, Q., Jiang, K., Su, H., Ermon, S., Zhu, J., Liu, M.Y.: Diffusionnft: Online diffusion reinforcement with forward process. arXiv preprint arXiv:2509.16117 (2025) LucidNFT 1 A Supplemental Materials A.1 Human Preference Alignment on LR-Faithfulness To examine whetherLucidConsistencyaligns with human judgments of ...

work page internal anchor Pith review Pith/arXiv arXiv 2025