pith. machine review for the scientific record. sign in

arxiv: 2603.05947 · v3 · submitted 2026-03-06 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Flow-Based Real-World Super-Resolution

Authors on Pith no claims yet

Pith reviewed 2026-05-15 15:40 UTC · model grok-4.3

classification 💻 cs.CV
keywords real-world image super-resolutiongenerative ISRreinforcement learningpreference optimizationflow-matching modelshallucination controlLR consistencymulti-reward RL
0
0 comments X

The pith

LucidNFT anchors RL updates to LR evidence via a new consistency evaluator, raising perceptual quality in flow-based real-world super-resolution without increasing hallucinations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that generative real-world super-resolution models often produce sharp outputs that stray from the low-resolution input through semantic or structural hallucinations. To counter this, LucidNFT trains LucidConsistency—an evaluator that stays invariant to degradations yet flags localized hallucinations—on content-consistent degradation pools and original-inpainted hard negatives. It then folds this signal into a multi-reward RL loop that uses decoupled normalization to keep objective contrasts intact inside each LR-conditioned rollout group. The resulting fine-tuning on the LucidLR collection of real degraded images yields restorations that look better to humans while staying more faithful to the original LR evidence.

Core claim

LucidNFT introduces LucidConsistency, a degradation-invariant and hallucination-sensitive LR-referenced evaluator trained with content-consistent degradation pools and original-inpainted hard negatives; a decoupled reward normalization strategy that preserves objective-wise contrasts within each LR-conditioned rollout group before fusion; and LucidLR, a large-scale collection of real-world degraded images for robust RL fine-tuning. This multi-reward preference optimization improves perceptual quality on strong flow-based Real-ISR baselines while generally maintaining LR-referenced consistency across diverse real-world scenarios.

What carries the argument

LucidConsistency, an LR-referenced evaluator trained to be invariant to degradation yet sensitive to hallucinations, used inside a multi-reward RL loop with decoupled normalization to guide flow-matching super-resolution updates.

If this is right

  • Perceptual gains appear without loss of LR fidelity on diverse real degradations.
  • Decoupled normalization keeps separate reward contrasts alive inside each rollout group instead of compressing them.
  • The LucidLR collection supplies broader degradation coverage for stable RL fine-tuning.
  • The same anchored preference loop can be applied to other flow-based generative restoration tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same consistency signal could be reused as a training objective for non-RL super-resolution models.
  • Extending the hard-negative construction to video frames might reduce temporal hallucinations in video super-resolution.
  • LucidLR-style collections could support preference data for related tasks such as real-world deblurring or denoising.

Load-bearing premise

LucidConsistency supplies a degradation-invariant yet hallucination-sensitive faithfulness signal that reliably guides the RL updates without introducing its own biases.

What would settle it

Human preference tests or automated LR-consistency metrics on held-out real-world images showing that LucidNFT outputs contain more semantic or structural hallucinations than the flow-based baselines.

Figures

Figures reproduced from arXiv: 2603.05947 by Jianyu Lai, Lei Zhu, Sixiang Chen, Song Fei, Tian Ye, Zhaohu Xing.

Figure 1
Figure 1. Figure 1: Overview of LucidConsistency. Left: inference stage where embeddings of the LR input and SR output are extracted and their semantic consistency is computed via Eq. (10). Right: training stage where LR–HR pairs are used to optimizate the projection head. 4.1 LucidConsistency: Degradation-Robust Consistency Evaluation Existing Real-ISR evaluation protocols mainly rely on no-reference perceptual metrics [15, … view at source ↗
Figure 2
Figure 2. Figure 2: Advantage separability analysis on the LucidFlux backbone using dataset Re￾alLQ250 [2]. (a) DAGC versus rollout count M; (b) mean pairwise advantage gap |∆A| versus M using the top-1 max-∆r pair per group; (c) distribution of |∆A| at M = 12. LucidNFT consistently yields larger advantage gaps and higher separability than Dif￾fusionNFT, indicating reduced advantage compression under decoupled normalization … view at source ↗
Figure 3
Figure 3. Figure 3: Representative examples from LucidLR. 4.3 LucidLR: A Large-Scale Real-World Degradation Dataset for Real-ISR [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Training dynamics of LucidNFT on LucidFlux. From left to right: training LucidConsistency score, evaluation LucidConsistency score, training UniPercept IQA score, and evaluation UniPercept IQA score. The smoothed curves exhibit a consistent upward trend, indicating stable multi-reward optimization during RL. Comparison Methods and Evaluation Metrics. We compare the LucidNFT￾optimized model (LucidFlux + Luc… view at source ↗
Figure 5
Figure 5. Figure 5: Visual comparison on RealLQ250 [2]. LucidNFT further improves semantic consistency and perceptual quality over the baseline LucidFlux, producing more faithful structures and richer texture details. Qualitative Comparison [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Annotation interface used for the LR-faithfulness study. The first page shows the standard interface displaying the LR input and three SR candidates. The second page illustrates the enlarged view used by annotators to inspect local structures [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Training dynamics of LucidNFT on the DiT4SR backbone. The curves show the evolution of the image quality reward (UniPercept IQA) and the consistency reward (LucidConsistency) during RL optimization [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Primary degradation distribution across four real-world low-quality datasets. Each image contributes to exactly one dominant degradation category. Compared with RealSR and DRealSR, which are strongly dominated by defocus blur, LucidLR presents a more balanced primary degradation distribution over multiple categories [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Occurrence frequency of degradation categories across four real-world low￾quality datasets: LucidLR, RealLQ250, RealSR, and DRealSR. Occurrence frequency is computed in a multi-label manner, so percentages do not sum to 100%. LucidLR exhibits substantially broader degradation coverage and a richer long-tail distribution than existing benchmark datasets [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Additional qualitative comparisons on RealLQ250 [2]. Compared with the Lu￾cidFlux baseline, LucidNFT consistently produces more faithful structures and richer texture details while maintaining strong perceptual quality. See [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Optimization curves under different reward formulations. The plots show UniPercept rewards during RL training (left) and evaluation (right). LucidNFT achieves higher rewards and smoother convergence compared with IQA-only RL and scalar-aggregated multi-reward RL [PITH_FULL_IMAGE:figures/full_fig_p028_11.png] view at source ↗
read the original abstract

Generative real-world image super-resolution (Real-ISR) can synthesize visually convincing details from severely degraded low-resolution (LR) inputs, yet its stochastic sampling makes a critical failure mode hard to avoid: outputs may look sharp but be unfaithful to the LR evidence, exhibiting semantic or structural hallucinations. Preference-based reinforcement learning (RL) is a natural fit because each LR input yields a rollout group of candidate restorations. However, effective alignment in Real-ISR is hindered by three coupled challenges: (i) the lack of an LR-referenced faithfulness signal that is robust to degradation yet sensitive to localized hallucinations, (ii) a rollout-group optimization bottleneck where scalarizing heterogeneous rewards before normalization compresses objective-wise contrasts and weakens DiffusionNFT-style reward-weighted updates, and (iii) limited coverage of real degradations, which restricts rollout diversity and preference signal quality. We propose LucidNFT, a multi-reward RL framework for flow-matching Real-ISR. LucidNFT introduces LucidConsistency, a degradation-invariant and hallucination-sensitive LR-referenced evaluator trained with content-consistent degradation pools and original-inpainted hard negatives; a decoupled reward normalization strategy that preserves objective-wise contrasts within each LR-conditioned rollout group before fusion; and LucidLR, a large-scale collection of real-world degraded images for robust RL fine-tuning. Extensive experiments show that LucidNFT improves perceptual quality on strong flow-based Real-ISR baselines while generally maintaining LR-referenced consistency across diverse real-world scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents LucidNFT, a multi-reward preference optimization framework for flow-based real-world super-resolution (Real-ISR). It introduces LucidConsistency as a degradation-invariant, hallucination-sensitive LR-referenced evaluator trained using content-consistent degradation pools and original-inpainted hard negatives, a decoupled reward normalization strategy to preserve objective contrasts in rollout groups, and the LucidLR dataset for robust fine-tuning. The central claim is that this approach improves perceptual quality on strong flow-based baselines while maintaining LR-referenced consistency across diverse real-world scenarios.

Significance. If the empirical results and the properties of LucidConsistency hold, this work could meaningfully advance preference-based RL for generative image restoration by providing a more robust faithfulness signal and better optimization strategy. The new dataset and evaluator may serve as useful resources for the community in addressing hallucinations in Real-ISR.

major comments (2)
  1. [LucidConsistency training and evaluation] The central claim rests on LucidConsistency supplying a degradation-invariant yet hallucination-sensitive signal. However, no cross-degradation consistency metric, ablation removing the inpainted hard negatives, or correlation with human faithfulness judgments on out-of-distribution degradations is reported. This leaves open whether the evaluator introduces systematic biases (e.g., penalizing plausible high-frequency detail) that would propagate into the multi-reward RL updates.
  2. [Experiments and results] The experiments claim improvements on strong flow-based Real-ISR baselines, yet the manuscript supplies no quantitative tables with perceptual metrics (LPIPS, FID, NIQE), consistency measures, baseline comparisons, or error analysis across degradation types. Without these, the support for the perceptual-quality and consistency claims cannot be verified.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by including one or two key quantitative results (e.g., LPIPS delta or human preference win rate) rather than only qualitative statements.
  2. [Method] Clarify the precise form of the decoupled normalization (e.g., per-objective z-scoring within each LR rollout group) and how it differs from standard DiffusionNFT reward weighting.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important areas for strengthening the validation of LucidConsistency and the experimental presentation. We address each major comment below and will revise the manuscript accordingly to improve rigor and verifiability.

read point-by-point responses
  1. Referee: [LucidConsistency training and evaluation] The central claim rests on LucidConsistency supplying a degradation-invariant yet hallucination-sensitive signal. However, no cross-degradation consistency metric, ablation removing the inpainted hard negatives, or correlation with human faithfulness judgments on out-of-distribution degradations is reported. This leaves open whether the evaluator introduces systematic biases (e.g., penalizing plausible high-frequency detail) that would propagate into the multi-reward RL updates.

    Authors: We agree that these additional analyses would strengthen the central claim regarding LucidConsistency. In the revised manuscript, we will add a cross-degradation consistency metric evaluated across multiple real-world degradation types to demonstrate invariance. We will also include an ablation study removing the original-inpainted hard negatives to quantify their contribution to hallucination sensitivity. For human correlation, we will report results from a targeted user study correlating LucidConsistency scores with human faithfulness judgments on out-of-distribution degradations. To address potential biases, we will add discussion and examples showing that the evaluator preserves plausible high-frequency details rather than penalizing them, based on the content-consistent degradation pools used in training. revision: yes

  2. Referee: [Experiments and results] The experiments claim improvements on strong flow-based Real-ISR baselines, yet the manuscript supplies no quantitative tables with perceptual metrics (LPIPS, FID, NIQE), consistency measures, baseline comparisons, or error analysis across degradation types. Without these, the support for the perceptual-quality and consistency claims cannot be verified.

    Authors: We acknowledge the need for clearer quantitative support and will add comprehensive tables in the revised manuscript. These will include perceptual metrics such as LPIPS, FID, and NIQE; our LR-referenced consistency measures; direct numerical comparisons against the strong flow-based Real-ISR baselines; and error analysis stratified across degradation types (e.g., blur, noise, and compression). This will make the improvements in perceptual quality and maintained consistency fully verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: framework builds independent components on external RL/flow-matching literature

full rationale

The paper introduces LucidNFT as an empirical multi-reward RL framework for flow-based Real-ISR, with LucidConsistency trained on content-consistent degradation pools and inpainted negatives, decoupled normalization, and LucidLR dataset. No equations or derivations appear that reduce claimed perceptual gains or consistency maintenance to fitted parameters renamed as predictions, self-definitions, or self-citation chains. The central claims rest on experimental validation against baselines rather than any load-bearing step that equates outputs to inputs by construction. Self-citations, if present, are not invoked to justify uniqueness theorems or ansatzes. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

The central claim rests on the unverified effectiveness of the newly introduced LucidConsistency evaluator and the assumption that rollout-group rewards can be meaningfully decoupled; no free parameters, standard axioms, or invented physical entities are described in the abstract.

invented entities (2)
  • LucidConsistency no independent evidence
    purpose: Degradation-invariant LR-referenced faithfulness evaluator
    Introduced as a trained model using content-consistent degradation pools and hard negatives; no independent evidence outside the paper is stated.
  • LucidLR no independent evidence
    purpose: Large-scale collection of real-world degraded images for RL fine-tuning
    New dataset introduced to improve coverage of real degradations; access and construction details absent from abstract.

pith-pipeline@v0.9.0 · 5582 in / 1306 out tokens · 38565 ms · 2026-05-15T15:40:50.335690+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 13 internal anchors

  1. [1]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

    Agustsson, E., Timofte, R.: Ntire 2017 challenge on single image super-resolution: Dataset and study. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 126–135 (2017) 16 S. Fei et al

  2. [2]

    Ad- vances in Neural Information Processing Systems37, 55443–55469 (2024)

    Ai, Y., Zhou, X., Huang, H., Han, X., Chen, Z., You, Q., Yang, H.: Dreamclear: High-capacity real-world image restoration with privacy-safe dataset curation. Ad- vances in Neural Information Processing Systems37, 55443–55469 (2024)

  3. [3]

    Training Diffusion Models with Reinforcement Learning

    Black, K., Janner, M., Du, Y., Kostrikov, I., Levine, S.: Training diffusion models with reinforcement learning. arXiv preprint arXiv:2305.13301 (2023)

  4. [4]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Cai, J., Zeng, H., Yong, H., Cao, Z., Zhang, L.: Toward real-world single im- age super-resolution: A new benchmark and a new model. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 3086–3095 (2019)

  5. [5]

    Cao, S., Li, J., Li, X., Pu, Y., Zhu, K., Gao, Y., Luo, S., Xin, Y., Qin, Q., Zhou, Y., Chen, X., Zhang, W., Fu, B., Qiao, Y., Liu, Y.: Unipercept: Towards uni- fied perceptual-level image understanding across aesthetics, quality, structure, and texture (2025)

  6. [6]

    Chen, J., Yu, J., Ge, C., Yao, L., Xie, E., Wu, Y., Wang, Z., Kwok, J., Luo, P., Lu, H., Li, Z.: Pixart-α: Fast training of diffusion transformer for photorealistic text-to-image synthesis (2023)

  7. [7]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Chen, X., Wang, X., Zhou, J., Qiao, Y., Dong, C.: Activating more pixels in image super-resolution transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 22367–22377 (2023)

  8. [8]

    Advances in Neural Information Processing Systems37, 22596–22623 (2024)

    Cohen, R., Kligvasser, I., Rivlin, E., Freedman, D.: Looks too good to be true: An information-theoretic analysis of hallucinations in generative restoration models. Advances in Neural Information Processing Systems37, 22596–22623 (2024)

  9. [9]

    In: Proceedings of the European Conference on Computer Vision

    Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Proceedings of the European Conference on Computer Vision. pp. 184–199. Springer (2014)

  10. [10]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2025)

    Duan, Z.P., Zhang, J., Jin, X., Zhang, Z., Xiong, Z., Zou, D., Ren, J., Guo, C.L., Li, C.: Dit4sr: Taming diffusion transformer for real-world image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2025)

  11. [11]

    In: Forty-first International Conference on Machine Learning (2024)

    Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al.: Scaling rectified flow transformers for high- resolution image synthesis. In: Forty-first International Conference on Machine Learning (2024)

  12. [12]

    Falconsai: Nsfw image detection model (2026)

  13. [13]

    LucidFlux: Caption-Free Photo-Realistic Image Restoration via a Large-Scale Diffusion Transformer

    Fei, S., Ye, T., Wang, L., Zhu, L.: Lucidflux: Caption-free universal image restora- tion via a large-scale diffusion transformer. arXiv preprint arXiv:2509.22414 (2025)

  14. [14]

    Iclr1(2), 3 (2022)

    Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. Iclr1(2), 3 (2022)

  15. [15]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: Musiq: Multi-scale image quality transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5148–5157 (2021)

  16. [16]

    Labs, B.F.: Flux (2024)

  17. [17]

    Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

    Li, M., Zhang, Y., Long, D., Chen, K., Song, S., Bai, S., Yang, Z., Xie, P., Yang, A., Liu, D., et al.: Qwen3-vl-embedding and qwen3-vl-reranker: A unified framework for state-of-the-art multimodal retrieval and ranking. arXiv preprint arXiv:2601.04720 (2026)

  18. [18]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Li, Y., Zhang, K., Liang, J., Cao, J., Liu, C., Gong, R., Zhang, Y., Tang, H., Liu, Y., Demandolx, D., Ranjan, R., Timofte, R., Van Gool, L.: Lsdir: A large scale dataset for image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1775–1787 (2023) LucidNFT 17

  19. [19]

    In: European Conference on Computer Vision

    Lin, X., He, J., Chen, Z., Lyu, Z., Dai, B., Yu, F., Qiao, Y., Ouyang, W., Dong, C.: Diffbir: Toward blind image restoration with generative diffusion prior. In: European Conference on Computer Vision. pp. 430–448. Springer (2024)

  20. [20]

    Flow Matching for Generative Modeling

    Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. arXiv preprint arXiv:2210.02747 (2022)

  21. [21]

    Flow-GRPO: Training Flow Matching Models via Online RL

    Liu, J., Liu, G., Liang, J., Li, Y., Liu, J., Wang, X., Wan, P., Zhang, D., Ouyang, W.: Flow-grpo: Training flow matching models via online rl. arXiv preprint arXiv:2505.05470 (2025)

  22. [22]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    Liu, X., Gong, C., Liu, Q.: Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003 (2022)

  23. [23]

    Decoupled Weight Decay Regularization

    Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  24. [24]

    Representation Learning with Contrastive Predictive Coding

    Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predic- tive coding. arXiv preprint arXiv:1807.03748 (2018)

  25. [25]

    SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

    Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 (2023)

  26. [26]

    Advances in neural information processing systems36, 53728–53741 (2023)

    Rafailov, R., Sharma, A., Mitchell, E., Manning, C.D., Ermon, S., Finn, C.: Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems36, 53728–53741 (2023)

  27. [27]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10684–10695 (2022)

  28. [28]

    Proximal Policy Optimization Algorithms

    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  29. [29]

    Sutton, R.S., Barto, A.G., et al.: Reinforcement learning: An introduction, vol. 1. MIT press Cambridge (1998)

  30. [30]

    IEEE Transactions on Image Processing27(8), 3998–4011 (2018)

    Talebi, H., Milanfar, P.: Nima: Neural image assessment. IEEE Transactions on Image Processing27(8), 3998–4011 (2018)

  31. [31]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Wallace, B., Dang, M., Rafailov, R., Zhou, L., Lou, A., Purushwalkam, S., Ermon, S., Xiong, C., Joty, S., Naik, N.: Diffusion model alignment using direct preference optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8228–8238 (2024)

  32. [32]

    In: AAAI (2023)

    Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images. In: AAAI (2023)

  33. [33]

    International Journal of Computer Vision 132(12), 5929–5949 (2024)

    Wang, J., Yue, Z., Zhou, S., Chan, K.C., Loy, C.C.: Exploiting diffusion prior for real-world image super-resolution. International Journal of Computer Vision 132(12), 5929–5949 (2024)

  34. [34]

    In: Proceedings of the IEEE/CVF in- ternational conference on computer vision

    Wang, X., Xie, L., Dong, C., Shan, Y.: Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In: Proceedings of the IEEE/CVF in- ternational conference on computer vision. pp. 1905–1914 (2021)

  35. [35]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Wang, Y., Yang, W., Chen, X., Wang, Y., Guo, L., Chau, L.P., Liu, Z., Qiao, Y., Kot, A.C., Wen, B.: Sinsr: diffusion-based image super-resolution in a single step. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 25796–25805 (2024)

  36. [36]

    IEEE Transactions on Image Process- ing13(4), 600–612 (2004)

    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Process- ing13(4), 600–612 (2004)

  37. [37]

    In: Proceedings of the European Conference on Computer Vision

    Wei, P., Xie, Z., Lu, H., Zhan, Z., Ye, Q., Zuo, W., Lin, L.: Component divide- and-conquer for real-world image super-resolution. In: Proceedings of the European Conference on Computer Vision. pp. 101–117. Springer (2020) 18 S. Fei et al

  38. [38]

    Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

    Wu, H., Zhang, Z., Zhang, W., Chen, C., Li, C., Liao, L., Wang, A., Zhang, E., Sun, W., Yan, Q., Min, X., Zhai, G., Lin, W.: Q-align: Teaching lmms for visual scoring via discrete text-defined levels. arXiv preprint arXiv:2312.17090 (2023), equal Contribution by Wu, Haoning and Zhang, Zicheng. Project Lead by Wu, Haoning. Corresponding Authors: Zhai, Guan...

  39. [39]

    In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition

    Wu, R., Yang, T., Sun, L., Zhang, Z., Li, S., Zhang, L.: Seesr: Towards semantics- aware real-world image super-resolution. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition. pp. 25456–25467 (2024)

  40. [40]

    arXiv preprint arXiv:2505.14460 (2025)

    Wu,T.,Zou,J.,Liang,J.,Zhang,L.,Ma,K.:VisualQuality-R1:Reasoning-induced image quality assessment via reinforcement learning to rank. arXiv preprint arXiv:2505.14460 (2025)

  41. [41]

    DanceGRPO: Unleashing GRPO on Visual Generation

    Xue, Z., Wu, J., Gao, Y., Kong, F., Zhu, L., Chen, M., Liu, Z., Liu, W., Guo, Q., Huang, W., et al.: Dancegrpo: Unleashing grpo on visual generation. arXiv preprint arXiv:2505.07818 (2025)

  42. [42]

    In: ProceedingsoftheIEEE/CVFConferenceonComputerVisionandPatternRecog- nition

    Yang, S., Wu, T., Shi, S., Lao, S., Gong, Y., Cao, M., Wang, J., Yang, Y.: Maniqa: Multi-dimension attention network for no-reference image quality assessment. In: ProceedingsoftheIEEE/CVFConferenceonComputerVisionandPatternRecog- nition. pp. 1191–1200 (2022)

  43. [43]

    Yu,F.,Gu,J.,Li,Z.,Hu,J.,Kong,X.,Wang,X.,He,J.,Qiao,Y.,Dong,C.:Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild (2024)

  44. [44]

    Advances in Neural Information Processing Systems 36, 13294–13307 (2023)

    Yue, Z., Wang, J., Loy, C.C.: Resshift: Efficient diffusion model for image super- resolution by residual shifting. Advances in Neural Information Processing Systems 36, 13294–13307 (2023)

  45. [45]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Zhang, K., Liang, J., Van Gool, L., Timofte, R.: Designing a practical degradation model for deep blind image super-resolution. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4791–4800 (2021)

  46. [46]

    IEEE Transactions on Image Processing24(8), 2579–2591 (2015)

    Zhang, L., Zhang, L., Bovik, A.C.: A feature-enriched completely blind image qual- ity evaluator. IEEE Transactions on Image Processing24(8), 2579–2591 (2015)

  47. [47]

    In: Proceedings of the European Conference on Computer Vision

    Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision. pp. 286–301 (2018)

  48. [48]

    DiffusionNFT: Online Diffusion Reinforcement with Forward Process

    Zheng, K., Chen, H., Ye, H., Wang, H., Zhang, Q., Jiang, K., Su, H., Ermon, S., Zhu, J., Liu, M.Y.: Diffusionnft: Online diffusion reinforcement with forward process. arXiv preprint arXiv:2509.16117 (2025) LucidNFT 1 A Supplemental Materials A.1 Human Preference Alignment on LR-Faithfulness To examine whetherLucidConsistencyaligns with human judgments of ...