MetaRanker: Human-in-the-loop Active Ranking for Metalens Image Quality

Haejun Chung; Ikbeom Jang; Yujin Park

arxiv: 2605.29212 · v2 · pith:MFQOFASLnew · submitted 2026-05-28 · 💻 cs.CV · cs.HC

MetaRanker: Human-in-the-loop Active Ranking for Metalens Image Quality

Yujin Park , Haejun Chung , Ikbeom Jang This is my paper

Pith reviewed 2026-06-30 11:12 UTC · model grok-4.3

classification 💻 cs.CV cs.HC

keywords metalensimage quality assessmentactive rankinghuman-in-the-loopsemantic interpretabilitypreference modelingvision-language models

0 comments

The pith

MetaRanker ranks metalens images by semantic interpretability using active human feedback, aligning with human judgments while cutting annotations by about 80%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MetaRanker to evaluate metalens image quality based on how interpretable the images are to humans rather than traditional distortion metrics. It combines probabilistic models and uncertainty sampling with vision-language model priors to select which image pairs humans should compare. This approach produces rankings that match human assessments more closely than standard metrics across different metalens datasets. A sympathetic reader would care because better evaluation can guide the design of practical ultra-thin lenses for miniaturized cameras. The method shows that perception-distortion trade-offs make current proxies inadequate for metalens pipelines.

Core claim

MetaRanker formalizes metalens image quality in terms of semantic interpretability and employs a human-in-the-loop active ranking framework that uses a probabilistic preference model with uncertainty-aware query selection, guided by vision-language model priors only for sampling, to achieve rankings closely aligned with human assessments while requiring approximately 80% fewer pairwise annotations than exhaustive evaluation.

What carries the argument

MetaRanker, a human-in-the-loop active ranking framework that combines a probabilistic preference model with uncertainty-aware query selection and leverages vision-language models to guide sampling of comparisons while keeping human judgments as the primary signal.

Load-bearing premise

Human judgments of semantic interpretability remain the primary supervision signal and are not materially influenced by the VLM priors that are used only to guide sampling of comparisons.

What would settle it

A controlled study on a new metalens dataset where exhaustive human pairwise comparisons produce rankings that diverge substantially from those generated by MetaRanker, while aligning better with PSNR or similar metrics.

Figures

Figures reproduced from arXiv: 2605.29212 by Haejun Chung, Ikbeom Jang, Yujin Park.

**Figure 3.** Figure 3: Overview of MetaRanker. (A) Input: Metalens outputs exhibit diverse, metric-misaligned artifacts (e.g., chromatic [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: The structured prompt used to extract semantic [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Dataset-level optical severity. CA Shift (px) mea [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Controlled simulation against exhaustive PSNR [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Image quality in modern imaging systems emerges from the coupled effects of the sensor, optics, and computational reconstruction. Ultra-thin metalenses offer a path toward substantial miniaturization of optical modules, but practical designs often exhibit pronounced chromatic and field-dependent aberrations that necessitate computational reconstruction. In current metalens pipelines, reconstruction models are commonly trained and selected using distortion-based fidelity objectives, such as PSNR, yet these proxies can be weakly correlated with human preference and downstream utility, reflecting the well-known perception--distortion trade-off. We introduce MetaRanker, a human-in-the-loop active ranking framework that formalizes metalens image quality in terms of semantic interpretability, defined as the degree to which humans can reliably recognize objects and structures in the presence of optical artifacts. MetaRanker combines a probabilistic preference model with uncertainty-aware query selection, and leverages vision--language models to provide lightweight semantic priors. Importantly, these priors are used only to guide the sampling of informative comparisons; human judgments remain the primary supervision signal throughout. Across real-world and synthetic metalens datasets with distinct degradation profiles, MetaRanker produces rankings that align most closely with human assessments, while reducing the number of pairwise annotations required by approximately 80% relative to exhaustive pairwise evaluation. Finally, we show that standard image quality assessment metrics exhibit limited alignment with human interpretability in the metalens domain, positioning MetaRanker as a practical step toward perceptually grounded metalens evaluation and co-design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MetaRanker shows a workable active-ranking loop that cuts human comparisons by ~80% on metalens data while tracking human interpretability better than PSNR-style metrics, but the abstract leaves the VLM-sampling bias question open.

read the letter

The main takeaway is that this paper gives a concrete way to rank metalens reconstructions by how well humans can actually read the content, rather than by pixel error. It combines a preference model, uncertainty sampling, and VLM priors only for picking pairs, then lets humans label those pairs. On the datasets they tested, the resulting rankings line up better with people than the usual distortion numbers, and they claim an 80% drop in the number of labels needed versus full pairwise comparison.

What stands out is the domain focus. Metalens work has a known gap between standard IQA and real usability, and the authors treat semantic interpretability as the target instead of trying to fix PSNR. The active-selection piece is standard in ranking literature but applied here with the VLM step to keep the human load low. That combination is new enough for the metalens setting.

The soft spot is exactly the one the stress-test flags. The method says VLM priors only guide which pairs to show humans, yet if those priors already favor pairs where VLM and human views agree, the collected labels are no longer a clean human signal. The abstract does not describe the held-out collection protocol or an ablation with random selection, so it is impossible to judge whether the alignment gain is real or partly an artifact of the sampling. Without those details the 80% reduction claim is hard to evaluate.

This is for groups doing metalens co-design or perceptual evaluation in optics. A reader who needs a drop-in way to get human-aligned rankings with fewer labels will find the framework useful if the experiments check out. The thinking is straightforward and the motivation is honest, so it deserves a serious referee even if the current write-up needs more experimental transparency on the bias question.

Referee Report

1 major / 1 minor

Summary. The paper introduces MetaRanker, a human-in-the-loop active ranking framework for metalens image quality assessment. It formalizes quality via semantic interpretability (human ability to recognize objects despite optical artifacts) rather than distortion metrics like PSNR. The method uses a probabilistic preference model with uncertainty-aware query selection, employing VLM semantic priors exclusively to guide sampling of comparisons while keeping human judgments as the primary supervision. Evaluated on real-world and synthetic metalens datasets with distinct degradation profiles, it claims rankings that align most closely with human assessments and an ~80% reduction in pairwise annotations relative to exhaustive evaluation. It further shows that standard IQA metrics have limited alignment with human interpretability in this domain.

Significance. If the central claims hold, the work provides a concrete step toward perceptually grounded evaluation and co-design for metalens systems, directly addressing the perception-distortion tradeoff. The explicit design choice to restrict VLMs to query selection (with humans as sole primary signal) and the active-learning reduction in annotation effort are strengths that could generalize to other computational imaging pipelines where traditional metrics diverge from downstream utility.

major comments (1)

[Abstract and Evaluation section] Abstract and § on Experiments/Evaluation: The strongest claim (superior human alignment + ~80% annotation reduction) is load-bearing on the assertion that 'human judgments remain the primary supervision signal throughout' and that VLM priors affect only sampling. The manuscript does not report (i) whether held-out human rankings used for alignment metrics were drawn from the VLM-guided pool or an independent exhaustive set, nor (ii) an ablation of VLM-guided selection versus random or uncertainty-only baselines. Without these, the reported alignment and efficiency figures are consistent with selection bias toward VLM-human concordant pairs, undermining the separation claim.

minor comments (1)

[Abstract] The abstract states 'distinct degradation profiles' for the datasets but provides no quantitative characterization (e.g., aberration statistics or PSNR ranges) in the main text; adding a short table or paragraph would improve reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The concern regarding potential selection bias and the need for explicit validation of the VLM-prior separation is well-taken and directly addresses the load-bearing claims in the abstract and evaluation sections. We address the points below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract and Evaluation section] Abstract and § on Experiments/Evaluation: The strongest claim (superior human alignment + ~80% annotation reduction) is load-bearing on the assertion that 'human judgments remain the primary supervision signal throughout' and that VLM priors affect only sampling. The manuscript does not report (i) whether held-out human rankings used for alignment metrics were drawn from the VLM-guided pool or an independent exhaustive set, nor (ii) an ablation of VLM-guided selection versus random or uncertainty-only baselines. Without these, the reported alignment and efficiency figures are consistent with selection bias toward VLM-human concordant pairs, undermining the separation claim.

Authors: We agree that the manuscript should explicitly document these experimental controls to substantiate the claim that VLM priors are restricted to query selection. (i) The held-out human rankings used for the alignment metrics (Kendall tau and top-k overlap) were collected via an independent exhaustive pairwise comparison protocol on a disjoint set of images, with no VLM involvement in either sampling or judgment collection. (ii) While the current experiments compare against exhaustive and random baselines, we did not include a dedicated ablation isolating VLM-guided selection from pure uncertainty sampling. In the revision we will add both the clarification on the held-out set and the requested ablation (VLM-guided vs. random vs. uncertainty-only) using the same human annotation budget, reporting the resulting alignment and annotation counts. These additions will be placed in the Experiments section with corresponding updates to the abstract if the numbers change. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper describes an active ranking framework where human judgments serve as the primary supervision signal and VLM outputs are used solely for query selection. No equations, fitted parameters, or self-citations appear in the abstract or described method that reduce the reported alignment with human assessments or the 80% annotation reduction to a self-referential definition or input. Performance claims rest on external human evaluations across datasets, rendering the approach self-contained against independent benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, background axioms, or new entities; ledger entries are therefore empty.

pith-pipeline@v0.9.1-grok · 5796 in / 1132 out tokens · 36538 ms · 2026-06-30T11:12:44.844599+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Surprise-Guided MergeSort: Budget-Efficient Human-in-the-Loop Ranking via Adaptive Comparison Scheduling
cs.LG 2026-06 unverdicted novelty 6.0

Surprise-Guided MergeSort combines a MergeSort scheduler, a composite VLM-based surprise scorer, and adaptive budget allocation to reduce human comparisons while improving Kendall tau by 6-12 points over Active Elo on...
Surprise-Guided MergeSort: Budget-Efficient Human-in-the-Loop Ranking via Adaptive Comparison Scheduling
cs.LG 2026-06 unverdicted novelty 6.0

Surprise-Guided MergeSort uses a VLM-based composite surprise scorer to prioritize human comparisons inside a MergeSort scheduler, skipping up to 535 pairs per session and raising Kendall's τ by 6-12 points over Activ...

Reference graph

Works this paper leans on

49 extracted references · 4 canonical work pages · cited by 1 Pith paper

[1]

Eirikur Agustsson and Radu Timofte. 2017. Ntire 2017 challenge on single image super-resolution: Dataset and study. InProceedings of the IEEE conference on computer vision and pattern recognition workshops. 126–135

2017
[2]

Rodrigo Benenson and Vittorio Ferrari. 2022. From colouring-in to pointillism: revisiting semantic segmentation supervision.arXiv preprint arXiv:2210.14142 (2022)

work page arXiv 2022
[3]

Johansson

Herman Bergström, Emil Carlsson, Devdatt Dubhashi, and Fredrik D. Johansson
[4]

In Advances in Neural Information Processing Systems (NeurIPS)

Active Preference Learning for Ordering Items In- and Out-of-Sample. In Advances in Neural Information Processing Systems (NeurIPS)
[5]

Erdem Biyik, Nima Anari, and Dorsa Sadigh. 2024. Batch active learning of reward functions from human preferences.ACM Transactions on Human-Robot Interaction13, 2 (2024), 1–27

2024
[6]

Yochai Blau and Tomer Michaeli. 2018. The perception-distortion tradeoff. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6228–6237

2018
[7]

Ralph Allan Bradley and Milton E Terry. 1952. Rank analysis of incomplete block designs: I. The method of paired comparisons.Biometrika39, 3/4 (1952), 324–345

1952
[8]

Paul F Christiano, Jan Leike, Tom Brown, Miljan Marber, Buck Shlegeris, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems, Vol. 30

2017
[9]

Mark E Glickman. 1995. The glicko system.Boston University16, 8 (1995), 9

1995
[10]

Glickman

Mark E. Glickman. 2012.Example of the Glicko-2 System. Technical Report. Boston University. 1–6 pages. https://www.glicko.net/glicko/glicko2.pdf Online technical report (commonly cited as 2012)

2012
[11]

Andy Gray, Alma Rahat, Tom Crick, and Stephen Lindsay. 2024. A Bayesian active learning approach to comparative judgement within education assessment. Computers and Education: Artificial Intelligence6 (2024), 100245

2024
[12]

Xuemei Hu, Weizhu Xu, Qingbin Fan, Tao Yue, Feng Yan, Yanqing Lu, and Ting Xu
[13]

Metasurface-based computational imaging: a review.Advanced Photonics 6, 1 (2024), 014002–014002

2024
[14]

Quan Huynh-Thu and Mohammed Ghanbari. 2008. Scope of validity of PSNR in image/video quality assessment.Electronics Letters44, 13 (2008), 800–801

2008
[15]

Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey, and Luc Van Gool. 2017. DSLR-quality photos on mobile devices with deep convolutional networks. InProceedings of the IEEE International Conference on Computer Vision. 3277–3285

2017
[16]

Ikbeom Jang, Garrison Danley, Ken Chang, and Jayashree Kalpathy-Cramer
[17]

Decreasing annotation burden of pairwise comparisons with human-in- the-loop sorting: Application in medical image artifact rating.arXiv preprint arXiv:2202.04823(2022)

work page arXiv 2022
[18]

Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. 2021. MUSIQ: Multi-scale image quality transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision. 5148–5157

2021
[19]

Mohammadreza Khorasaninejad and Federico Capasso. 2017. Metalenses: Versa- tile multifunctional photonic components.Science358, 6367 (2017), eaam8100

2017
[20]

Joohoon Kim, Junhwa Seong, Wonjoong Kim, Gun-Yeal Lee, Seokwoo Kim, Hongyoon Kim, Seong-Won Moon, Dong Kyo Oh, Younghwan Yang, Jeonghoon Park, et al. 2023. Scalable manufacturing of high-index atomic layer–polymer hybrid metasurfaces for metaphotonics in the visible.Nature Materials22, 4 (2023), 474–481

2023
[21]

Byeonghyeon Lee, Youbin Kim, Yongjae Jo, Hyunsu Kim, Hyemi Park, Yangkyu Kim, Debabrata Mandal, Praneeth Chakravarthula, Inki Kim, and Eunbyung Park
[22]

Aberration Correcting Vision Transformers for High-Fidelity Metalens Imaging.arXiv preprint arXiv:2412.04591(2024)

work page arXiv 2024
[23]

Chunyi Li, Yuan Tian, Xiaoyue Ling, Zicheng Zhang, Haodong Duan, Haoning Wu, Ziheng Jia, Xiaohong Liu, Xiongkuo Min, Guo Lu, et al. 2025. Image Quality Assessment: From Human to Machine Preference. InProceedings of the Computer Vision and Pattern Recognition Conference. 7570–7581

2025
[24]

Suiyi Ling, Jing Li, Anne Flore Perrin, Zhi Li, Lukáš Krasula, and Patrick Le Callet
[25]

Strategy for boosting pair comparison and improving quality assessment accuracy.arXiv preprint arXiv:2010.00370(2020)

work page arXiv 2010
[26]

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual in- struction tuning.Advances in Neural Information Processing Systems36 (2023), 34892–34916

2023
[27]

Lucas Maystre and Matthias Grossglauser. 2017. Just sort it! A simple and effective approach to active preference learning. InInternational Conference on Machine Learning (ICML). PMLR, 2344–2353

2017
[28]

Aliaksei Mikhailiuk, María Pérez-Ortiz, and Rafal Mantiuk. 2018. Psychometric scaling of TID2013 dataset. In2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 1–6

2018
[29]

completely blind

Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. 2012. Making a “completely blind” image quality analyzer.IEEE Signal Processing Letters20, 3 (2012), 209–212

2012
[30]

Shima Mohammadi and João Ascenso. 2025. Uncertainty-driven Sampling for Efficient Pairwise Comparison Subjective Assessment.IEEE Transactions on Multimedia(2025)

2025
[31]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, et al. 2022. Training lan- guage models to follow instructions with human feedback. InAdvances in Neural Information Processing Systems, Vol. 35

2022
[32]

Yujin Park, Haejun Chung, and Ikbeom Jang. 2025. EZ-Sort: Efficient Pairwise Comparison via Zero-Shot CLIP-Based Pre-Ordering and Human-in-the-Loop Sorting. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 5120–5124

2025
[33]

Judith Alice Redi, Tobias Hoßfeld, Pavel Korshunov, Filippo Mazza, Isabel Povoa, and Christian Keimel. 2013. Crowdsourcing-based multimedia subjective evalua- tions: a case study on image recognizability and aesthetic appeal. InProceedings of the 2nd ACM International Workshop on Crowdsourcing for Multimedia. 29–34

2013
[34]

Eli Schwartz, Raja Giryes, and Alex M Bronstein. 2018. DeepISP: Toward learning an end-to-end image processing pipeline.IEEE Transactions on Image Processing 28, 2 (2018), 912–923

2018
[35]

Joonhyuk Seo, Jaegang Jo, Joohoon Kim, Joonho Kang, Chanik Kang, Seong-Won Moon, Eunji Lee, Jehyeong Hong, Junsuk Rho, and Haejun Chung. 2024. Deep- learning-driven end-to-end metalens imaging.Advanced Photonics6, 6 (2024), 066002–066002

2024
[36]

Xiangfei Sheng, Xiaofeng Pan, Zhichao Yang, Pengfei Chen, and Leida Li. 2026. Fine-grained image quality assessment for perceptual image restoration. InPro- ceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 8914–8922

2026
[37]

Vincent Sitzmann, Steven Diamond, Yifan Peng, Xiong Dun, Stephen Boyd, Wolf- gang Heidrich, Felix Heide, and Gordon Wetzstein. 2018. End-to-end optimization KDD ’26, August 9–13, 2026, Jeju, Republic of Korea Yujin Park, Haejun Chung, and Ikbeom Jang of optics and image processing for achromatic extended depth of field and super- resolution imaging.ACM Tr...

2018
[38]

Tianshu Song, Leida Li, Hancheng Zhu, and Jiansheng Qian. 2021. IE-IQA: Intel- ligibility enriched generalizable no-reference image quality assessment.Frontiers in Neuroscience15 (2021), 739138

2021
[39]

Ethan Tseng, Shane Colburn, James Whitehead, Luocheng Huang, Seung-Hwan Baek, Arka Majumdar, and Felix Heide. 2021. Neural nano-optics for high-quality thin lens imaging.Nature Communications12, 1 (2021), 6493

2021
[40]

Chan, and Chen Change Loy

Jianyi Wang, Kelvin C.K. Chan, and Chen Change Loy. 2023. Exploring CLIP for Assessing the Look and Feel of Images. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 2555–2563

2023
[41]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing13, 4 (2004), 600–612

2004
[42]

Long Xu, Jia Li, Weisi Lin, Yun Zhang, Yongbing Zhang, and Yihua Yan. 2016. Pairwise comparison and rank learning for image quality assessment.Displays 44 (2016), 21–26

2016
[43]

Miao Yang, Ge Yin, Yixiang Du, and Zhiqiang Wei. 2021. Pair comparison based progressive subjective quality ranking for underwater images.Signal Processing: Image Communication99 (2021), 116444

2021
[44]

Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jia- hao Wang, and Yujiu Yang. 2022. MANIQA: Multi-Dimension Attention Network for No-Reference Image Quality Assessment. InCVPR Workshops (NTIRE)

2022
[45]

Joel Yeo, N Duane Loh, Ramon Paniagua-Dominguez, and Arseniy I Kuznetsov
[46]

EigenCWD: a spatially varying deconvolution algorithm for single metalens imaging.Optics Express33, 13 (2025), 28481–28492

2025
[47]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang
[48]

InProceedings of the IEEE conference on computer vision and pattern recognition

The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition. 586–595
[49]

bad prior

Simone Zini and Marco Buzzelli. 2025. Bayesian nights: Optimizing night pho- tography rendering with Bayesian derivative-free methods.Pattern Recognition 161 (2025), 111314. A Ablation Study Details A.1 Hyperparameter Configuration Table 5 lists the exact values used in all main experiments. Sensi- tivity to 𝛼 and RD0 is higher than for other parameters b...

2025

[1] [1]

Eirikur Agustsson and Radu Timofte. 2017. Ntire 2017 challenge on single image super-resolution: Dataset and study. InProceedings of the IEEE conference on computer vision and pattern recognition workshops. 126–135

2017

[2] [2]

Rodrigo Benenson and Vittorio Ferrari. 2022. From colouring-in to pointillism: revisiting semantic segmentation supervision.arXiv preprint arXiv:2210.14142 (2022)

work page arXiv 2022

[3] [3]

Johansson

Herman Bergström, Emil Carlsson, Devdatt Dubhashi, and Fredrik D. Johansson

[4] [4]

In Advances in Neural Information Processing Systems (NeurIPS)

Active Preference Learning for Ordering Items In- and Out-of-Sample. In Advances in Neural Information Processing Systems (NeurIPS)

[5] [5]

Erdem Biyik, Nima Anari, and Dorsa Sadigh. 2024. Batch active learning of reward functions from human preferences.ACM Transactions on Human-Robot Interaction13, 2 (2024), 1–27

2024

[6] [6]

Yochai Blau and Tomer Michaeli. 2018. The perception-distortion tradeoff. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6228–6237

2018

[7] [7]

Ralph Allan Bradley and Milton E Terry. 1952. Rank analysis of incomplete block designs: I. The method of paired comparisons.Biometrika39, 3/4 (1952), 324–345

1952

[8] [8]

Paul F Christiano, Jan Leike, Tom Brown, Miljan Marber, Buck Shlegeris, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems, Vol. 30

2017

[9] [9]

Mark E Glickman. 1995. The glicko system.Boston University16, 8 (1995), 9

1995

[10] [10]

Glickman

Mark E. Glickman. 2012.Example of the Glicko-2 System. Technical Report. Boston University. 1–6 pages. https://www.glicko.net/glicko/glicko2.pdf Online technical report (commonly cited as 2012)

2012

[11] [11]

Andy Gray, Alma Rahat, Tom Crick, and Stephen Lindsay. 2024. A Bayesian active learning approach to comparative judgement within education assessment. Computers and Education: Artificial Intelligence6 (2024), 100245

2024

[12] [12]

Xuemei Hu, Weizhu Xu, Qingbin Fan, Tao Yue, Feng Yan, Yanqing Lu, and Ting Xu

[13] [13]

Metasurface-based computational imaging: a review.Advanced Photonics 6, 1 (2024), 014002–014002

2024

[14] [14]

Quan Huynh-Thu and Mohammed Ghanbari. 2008. Scope of validity of PSNR in image/video quality assessment.Electronics Letters44, 13 (2008), 800–801

2008

[15] [15]

Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey, and Luc Van Gool. 2017. DSLR-quality photos on mobile devices with deep convolutional networks. InProceedings of the IEEE International Conference on Computer Vision. 3277–3285

2017

[16] [16]

Ikbeom Jang, Garrison Danley, Ken Chang, and Jayashree Kalpathy-Cramer

[17] [17]

Decreasing annotation burden of pairwise comparisons with human-in- the-loop sorting: Application in medical image artifact rating.arXiv preprint arXiv:2202.04823(2022)

work page arXiv 2022

[18] [18]

Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. 2021. MUSIQ: Multi-scale image quality transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision. 5148–5157

2021

[19] [19]

Mohammadreza Khorasaninejad and Federico Capasso. 2017. Metalenses: Versa- tile multifunctional photonic components.Science358, 6367 (2017), eaam8100

2017

[20] [20]

Joohoon Kim, Junhwa Seong, Wonjoong Kim, Gun-Yeal Lee, Seokwoo Kim, Hongyoon Kim, Seong-Won Moon, Dong Kyo Oh, Younghwan Yang, Jeonghoon Park, et al. 2023. Scalable manufacturing of high-index atomic layer–polymer hybrid metasurfaces for metaphotonics in the visible.Nature Materials22, 4 (2023), 474–481

2023

[21] [21]

Byeonghyeon Lee, Youbin Kim, Yongjae Jo, Hyunsu Kim, Hyemi Park, Yangkyu Kim, Debabrata Mandal, Praneeth Chakravarthula, Inki Kim, and Eunbyung Park

[22] [22]

Aberration Correcting Vision Transformers for High-Fidelity Metalens Imaging.arXiv preprint arXiv:2412.04591(2024)

work page arXiv 2024

[23] [23]

Chunyi Li, Yuan Tian, Xiaoyue Ling, Zicheng Zhang, Haodong Duan, Haoning Wu, Ziheng Jia, Xiaohong Liu, Xiongkuo Min, Guo Lu, et al. 2025. Image Quality Assessment: From Human to Machine Preference. InProceedings of the Computer Vision and Pattern Recognition Conference. 7570–7581

2025

[24] [24]

Suiyi Ling, Jing Li, Anne Flore Perrin, Zhi Li, Lukáš Krasula, and Patrick Le Callet

[25] [25]

Strategy for boosting pair comparison and improving quality assessment accuracy.arXiv preprint arXiv:2010.00370(2020)

work page arXiv 2010

[26] [26]

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual in- struction tuning.Advances in Neural Information Processing Systems36 (2023), 34892–34916

2023

[27] [27]

Lucas Maystre and Matthias Grossglauser. 2017. Just sort it! A simple and effective approach to active preference learning. InInternational Conference on Machine Learning (ICML). PMLR, 2344–2353

2017

[28] [28]

Aliaksei Mikhailiuk, María Pérez-Ortiz, and Rafal Mantiuk. 2018. Psychometric scaling of TID2013 dataset. In2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 1–6

2018

[29] [29]

completely blind

Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. 2012. Making a “completely blind” image quality analyzer.IEEE Signal Processing Letters20, 3 (2012), 209–212

2012

[30] [30]

Shima Mohammadi and João Ascenso. 2025. Uncertainty-driven Sampling for Efficient Pairwise Comparison Subjective Assessment.IEEE Transactions on Multimedia(2025)

2025

[31] [31]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, et al. 2022. Training lan- guage models to follow instructions with human feedback. InAdvances in Neural Information Processing Systems, Vol. 35

2022

[32] [32]

Yujin Park, Haejun Chung, and Ikbeom Jang. 2025. EZ-Sort: Efficient Pairwise Comparison via Zero-Shot CLIP-Based Pre-Ordering and Human-in-the-Loop Sorting. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 5120–5124

2025

[33] [33]

Judith Alice Redi, Tobias Hoßfeld, Pavel Korshunov, Filippo Mazza, Isabel Povoa, and Christian Keimel. 2013. Crowdsourcing-based multimedia subjective evalua- tions: a case study on image recognizability and aesthetic appeal. InProceedings of the 2nd ACM International Workshop on Crowdsourcing for Multimedia. 29–34

2013

[34] [34]

Eli Schwartz, Raja Giryes, and Alex M Bronstein. 2018. DeepISP: Toward learning an end-to-end image processing pipeline.IEEE Transactions on Image Processing 28, 2 (2018), 912–923

2018

[35] [35]

Joonhyuk Seo, Jaegang Jo, Joohoon Kim, Joonho Kang, Chanik Kang, Seong-Won Moon, Eunji Lee, Jehyeong Hong, Junsuk Rho, and Haejun Chung. 2024. Deep- learning-driven end-to-end metalens imaging.Advanced Photonics6, 6 (2024), 066002–066002

2024

[36] [36]

Xiangfei Sheng, Xiaofeng Pan, Zhichao Yang, Pengfei Chen, and Leida Li. 2026. Fine-grained image quality assessment for perceptual image restoration. InPro- ceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 8914–8922

2026

[37] [37]

Vincent Sitzmann, Steven Diamond, Yifan Peng, Xiong Dun, Stephen Boyd, Wolf- gang Heidrich, Felix Heide, and Gordon Wetzstein. 2018. End-to-end optimization KDD ’26, August 9–13, 2026, Jeju, Republic of Korea Yujin Park, Haejun Chung, and Ikbeom Jang of optics and image processing for achromatic extended depth of field and super- resolution imaging.ACM Tr...

2018

[38] [38]

Tianshu Song, Leida Li, Hancheng Zhu, and Jiansheng Qian. 2021. IE-IQA: Intel- ligibility enriched generalizable no-reference image quality assessment.Frontiers in Neuroscience15 (2021), 739138

2021

[39] [39]

Ethan Tseng, Shane Colburn, James Whitehead, Luocheng Huang, Seung-Hwan Baek, Arka Majumdar, and Felix Heide. 2021. Neural nano-optics for high-quality thin lens imaging.Nature Communications12, 1 (2021), 6493

2021

[40] [40]

Chan, and Chen Change Loy

Jianyi Wang, Kelvin C.K. Chan, and Chen Change Loy. 2023. Exploring CLIP for Assessing the Look and Feel of Images. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 2555–2563

2023

[41] [41]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing13, 4 (2004), 600–612

2004

[42] [42]

Long Xu, Jia Li, Weisi Lin, Yun Zhang, Yongbing Zhang, and Yihua Yan. 2016. Pairwise comparison and rank learning for image quality assessment.Displays 44 (2016), 21–26

2016

[43] [43]

Miao Yang, Ge Yin, Yixiang Du, and Zhiqiang Wei. 2021. Pair comparison based progressive subjective quality ranking for underwater images.Signal Processing: Image Communication99 (2021), 116444

2021

[44] [44]

Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jia- hao Wang, and Yujiu Yang. 2022. MANIQA: Multi-Dimension Attention Network for No-Reference Image Quality Assessment. InCVPR Workshops (NTIRE)

2022

[45] [45]

Joel Yeo, N Duane Loh, Ramon Paniagua-Dominguez, and Arseniy I Kuznetsov

[46] [46]

EigenCWD: a spatially varying deconvolution algorithm for single metalens imaging.Optics Express33, 13 (2025), 28481–28492

2025

[47] [47]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang

[48] [48]

InProceedings of the IEEE conference on computer vision and pattern recognition

The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition. 586–595

[49] [49]

bad prior

Simone Zini and Marco Buzzelli. 2025. Bayesian nights: Optimizing night pho- tography rendering with Bayesian derivative-free methods.Pattern Recognition 161 (2025), 111314. A Ablation Study Details A.1 Hyperparameter Configuration Table 5 lists the exact values used in all main experiments. Sensi- tivity to 𝛼 and RD0 is higher than for other parameters b...

2025