VersusQ: Pairwise Margin Reasoning for Generalizable Video Quality Assessment

Binxin Yang; Hubery Yin; Jiexuan Zhang; Qiang Xu; Shibei Meng; Yuan Liu; Zhengyao Lv

arxiv: 2605.21130 · v1 · pith:D2M446GKnew · submitted 2026-05-20 · 💻 cs.CV

VersusQ: Pairwise Margin Reasoning for Generalizable Video Quality Assessment

Shibei Meng , Binxin Yang , Yuan Liu , Jiexuan Zhang , Zhengyao Lv , Hubery Yin , Qiang Xu This is my paper

Pith reviewed 2026-05-21 04:52 UTC · model grok-4.3

classification 💻 cs.CV

keywords video quality assessmentpairwise comparisonmargin reasoninglarge multimodal modelsgeneralizationrelative evaluation

0 comments

The pith

Pairwise margin reasoning lets video quality models generalize by comparing videos directly instead of assigning absolute scores.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that predicting absolute quality scores for videos ties models to dataset-specific rating habits and calibration biases, hurting performance on new domains. By instead having the model compare pairs of videos and reason about their perceptual differences, VersusQ focuses purely on relative quality. It predicts a signed continuous margin that indicates which video is better and by how much. This is optimized using Margin-Coupled GRPO that aligns the reasoning process with the numerical margin prediction. Experiments show this leads to better results on various benchmarks and stronger generalization.

Core claim

VersusQ performs LMM-based comparison between two videos, reasons about their visual and temporal quality differences, and predicts a signed continuous margin that captures both the preferred choice and the degree of difference, with Margin-Coupled GRPO jointly optimizing rollout-based relational reasoning and continuous margin regression.

What carries the argument

Pairwise margin reasoning, where the model directly compares two videos to output a signed continuous margin representing preference and difference magnitude.

Load-bearing premise

That focusing on perceptual differences through direct comparisons avoids the biases from dataset-specific rating habits and score distributions.

What would settle it

A new video quality benchmark with entirely different rating protocols where VersusQ shows no better cross-domain performance than absolute scoring baselines would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.21130 by Binxin Yang, Hubery Yin, Jiexuan Zhang, Qiang Xu, Shibei Meng, Yuan Liu, Zhengyao Lv.

**Figure 1.** Figure 1: Two comparisons motivating the proposed design. Left: under a gradient-based visualization protocol [13], pointwise scoring (e.g., VQ-Insight [14]) concentrates on a few positions, while VersusQ attends more broadly to scene-level evidence. Right: NTP (next-token prediction) directly predicts numeric score tokens and can collapse to repeated outputs, while Hybrid combines reasoning with margin regression t… view at source ↗

**Figure 2.** Figure 2: Overview of the proposed VersusQ pipeline. VersusQ warm-starts a motion-aware VLM with rationale supervision and MSE margin regression from the designated <reg> hidden state. MC-GRPO then uses score and format rewards for reasoning, with an auxiliary regression gradient for calibration. At inference, sparse margins are aggregated into an anchor-free leaderboard by zero-mean least squares. coarsens fine-gra… view at source ↗

**Figure 3.** Figure 3: Qualitative case study of VersusQ on video pairs. The model first produces comparisonoriented reasoning and then predicts a continuous regression margin aligned with the ground-truth quality difference. Training recipe. The lower panel studies how reasoning supervision and reinforcement optimization affect the margin-based model. Reasoning supervision improves transfer by encouraging the model to ground … view at source ↗

**Figure 4.** Figure 4: Convergence of sparse-graph leaderboard reconstruction. Curves are computed from prefixes of the sampled comparison-edge stream on the seven reported benchmarks. A method is marked stable for each metric once the corresponding SRCC or PLCC is within 0.005 of its final value, where SRCC denotes Spearman rank correlation coefficient and PLCC denotes Pearson linear correlation coefficient. LSQ denotes least-s… view at source ↗

**Figure 5.** Figure 5: Additional qualitative case study. The case illustrates comparison-oriented reasoning for a video pair, followed by a continuous regression margin aligned with the ground-truth quality difference. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Additional qualitative case study. The case provides another example of comparisonoriented reasoning for a video pair, followed by a continuous regression margin aligned with the ground-truth quality difference. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

read the original abstract

Large Multimodal Models (LMMs) have shown promise for video quality assessment, but most methods still predict an absolute score for each video. Such pointwise supervision often mixes perceptual quality with dataset-specific calibration, including annotation protocols, rating habits, and score distributions. As a result, the learned scoring rule may work well within a benchmark but transfer poorly across unseen domains. We argue that relative comparisons alleviate the absolute-scale calibration bias by focusing purely on perceptual differences rather than dataset-specific rating habits. Consequently, we propose \textbf{VersusQ}, a pairwise margin reasoning framework driven entirely by direct comparisons. Specifically, VersusQ performs LMM-based comparison between two videos, reasons about their visual and temporal quality differences, and predicts a signed continuous margin that captures both the preferred choice and the degree of difference. Furthermore, to align interpretable comparison rationales with fine-grained numerical differences, we introduce Margin-Coupled GRPO, which jointly optimizes rollout-based relational reasoning and continuous margin regression. Extensive experiments on multiple public VQA benchmarks demonstrate that VersusQ achieves state-of-the-art performance, strong cross-domain generalization, and reliable fine-grained ranking under heterogeneous evaluation scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VersusQ shifts video quality assessment to pairwise signed margin prediction with LMMs and GRPO coupling, but the bias-avoidance claim probably does not hold once you look at how the regression targets are built.

read the letter

The paper's core move is to replace absolute score prediction with LMM-driven pairwise comparisons that output both a preference and a continuous signed margin, then train the whole thing with Margin-Coupled GRPO so the reasoning steps stay tied to the numerical difference. That framing is new relative to the pointwise LMM baselines they cite, and the reported cross-domain gains on several VQA benchmarks are the part worth checking first if the numbers are real.

Referee Report

1 major / 0 minor

Summary. The paper proposes VersusQ, a pairwise margin reasoning framework for video quality assessment using large multimodal models. It claims that absolute score prediction mixes perceptual quality with dataset-specific calibration biases from annotation protocols and rating habits, whereas relative comparisons focus purely on perceptual differences. The method performs LMM-based pairwise comparisons to predict signed continuous margins capturing both preference and degree of difference, optimized via Margin-Coupled GRPO that jointly handles rollout-based relational reasoning and continuous margin regression. Extensive experiments on multiple public VQA benchmarks are reported to achieve state-of-the-art performance, strong cross-domain generalization, and reliable fine-grained ranking under heterogeneous scenarios.

Significance. If the central claim holds without hidden dependencies on absolute scores, the approach could meaningfully advance generalizable VQA by reducing domain-specific biases, enabling better transfer across benchmarks with differing rating distributions. This would be particularly valuable for applications involving unseen video domains or heterogeneous evaluation protocols.

major comments (1)

[Abstract and §3] Abstract and §3 (method description): The claim that the framework is 'driven entirely by direct comparisons' and alleviates absolute-scale calibration bias is load-bearing for the cross-domain generalization results. However, a continuous margin regressor requires explicit target margins for each pair. Standard VQA practice derives these as signed differences of mean opinion scores (MOS) or normalized variants from the training benchmarks; these targets embed annotation protocols, rater biases, and score distributions, reintroducing the calibration information the method claims to sidestep. Please specify exactly how ground-truth margins are constructed and demonstrate that they do not rely on absolute scores from the same datasets used for training.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review of our manuscript. We address the major comment point by point below, with plans to revise the paper accordingly.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (method description): The claim that the framework is 'driven entirely by direct comparisons' and alleviates absolute-scale calibration bias is load-bearing for the cross-domain generalization results. However, a continuous margin regressor requires explicit target margins for each pair. Standard VQA practice derives these as signed differences of mean opinion scores (MOS) or normalized variants from the training benchmarks; these targets embed annotation protocols, rater biases, and score distributions, reintroducing the calibration information the method claims to sidestep. Please specify exactly how ground-truth margins are constructed and demonstrate that they do not rely on absolute scores from the same datasets used for training.

Authors: We appreciate the referee's careful reading and agree that explicit clarification is needed. In the revised manuscript, we will add the following details to §3: ground-truth margins for training are constructed as m = (MOS_v1 - MOS_v2) / σ_dataset, where σ_dataset is the standard deviation of MOS values in the training benchmark (or a normalized range variant). This construction does rely on absolute MOS from the training datasets. However, the framework remains driven by direct comparisons because the LMM never receives or predicts an absolute score for any individual video; supervision and inference are strictly pairwise. We argue this still reduces calibration bias relative to pointwise methods, as the model learns perceptual difference magnitudes rather than dataset-specific absolute scales. To demonstrate, we will add cross-domain experiments in the revision training on one benchmark and testing on another with substantially shifted MOS distributions, where VersusQ outperforms pointwise baselines. We will also include the exact margin formula and a short ablation on normalization choices. revision: yes

Circularity Check

0 steps flagged

No load-bearing circularity; pairwise framework presented as independent proposal without self-referential reduction

full rationale

The provided abstract and context contain no equations, no self-citations invoked as uniqueness theorems, and no fitted parameters renamed as predictions. The central argument—that relative margin reasoning sidesteps absolute-scale bias—is a methodological claim justified by the framework design itself rather than by reducing to dataset MOS values by construction. No step matches the enumerated circularity patterns; the derivation remains self-contained against external benchmarks and experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations, training details, or modeling choices; ledger entries cannot be populated.

pith-pipeline@v0.9.0 · 5750 in / 996 out tokens · 24968 ms · 2026-05-21T04:52:35.507015+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

m⋆_ij = s_MOS_i − s_MOS_j ... least-squares ... s∗ = arg min ∥As−b∥2_2 s.t. Σ si=0
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Margin-Coupled GRPO ... rollout-based relational reasoning and continuous margin regression

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 7 internal anchors

[1]

Perceptual video quality assessment: A survey.Science China Information Sciences, 67(11):211301, 2024

Xiongkuo Min, Huiyu Duan, Wei Sun, Yucheng Zhu, and Guangtao Zhai. Perceptual video quality assessment: A survey.Science China Information Sciences, 67(11):211301, 2024

work page 2024
[2]

Blind image quality assessment: A survey on blind image quality assessment.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5): 5833–5857, 2023

Guangtao Zhai and Xiongkuo Min. Blind image quality assessment: A survey on blind image quality assessment.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5): 5833–5857, 2023

work page 2023
[3]

Q-align: Teaching lmms for visual scoring via discrete text-defined levels

Haoning Wu, Zicheng Zhang, Weixia Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Yixuan Gao, Annan Wang, Erli Zhang, Wenxiu Sun, Qiong Yan, Xiongkuo Min, Guangtao Zhai, and Weisi Lin. Q-align: Teaching lmms for visual scoring via discrete text-defined levels. In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

work page 2024
[4]

Teaching large language models to regress accurate image quality scores using score distribution

Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue, and Chao Dong. Teaching large language models to regress accurate image quality scores using score distribution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

work page 2025
[5]

Q-insight: Understanding image quality via visual reinforcement learning

Weiqi Li, Xuanyu Zhang, Shijie Zhao, Yabin Zhang, Junlin Li, Li Zhang, and Jian Zhang. Q-insight: Understanding image quality via visual reinforcement learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[6]

Visualquality-r1: Reasoning-induced image quality assessment via reinforcement learning to rank

Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, and Kede Ma. Visualquality-r1: Reasoning-induced image quality assessment via reinforcement learning to rank. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[7]

Vqathinker: Exploring generalizable and explainable video quality assessment via reinforcement learning

Chunyi Cao, Wei Sun, Zicheng Zhang, Yingjie Jia, Xiaohong Liu, Xiongkuo Min, and Guangtao Zhai. Vqathinker: Exploring generalizable and explainable video quality assessment via reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026

work page 2026
[8]

Rank analysis of incomplete block designs: I

Ralph Allan Bradley and Milton E Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons.Biometrika, 39(3/4):324–345, 1952

work page 1952
[9]

Learning to rank using gradient descent

Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Svore. Learning to rank using gradient descent. InProceedings of the 22nd International Conference on Machine Learning (ICML), pages 89–96, 2005

work page 2005
[10]

Learning to rank for information retrieval.Foundations and Trends in Information Retrieval, 3(3):225–331, 2009

Tie-Yan Liu. Learning to rank for information retrieval.Foundations and Trends in Information Retrieval, 3(3):225–331, 2009

work page 2009
[11]

Rankiqa: Learning from rankings for no-reference image quality assessment

Xialei Liu, Joost van de Weijer, and Andrew D Bagdanov. Rankiqa: Learning from rankings for no-reference image quality assessment. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1040–1049, 2017

work page 2017
[12]

Adaptive image quality assessment via teaching large multimodal models to compare

Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, and Shiqi Wang. Adaptive image quality assessment via teaching large multimodal models to compare. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[13]

Mllms know where to look: Training-free perception of small visual details with multimodal llms

Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, and Filip Ilievski. Mllms know where to look: Training-free perception of small visual details with multimodal llms. InInternational Conference on Learning Representations (ICLR), 2025

work page 2025
[14]

Vq-insight: Teaching vlms for ai-generated video quality understanding via progressive visual reinforcement learning

Xuanyu Zhang, Weiqi Li, Shijie Zhao, Junlin Li, Li Zhang, and Jian Zhang. Vq-insight: Teaching vlms for ai-generated video quality understanding via progressive visual reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026

work page 2026
[15]

Statistical ranking and combina- torial Hodge theory.Mathematical Programming, 127(1):203–244, 2011

Xiaoye Jiang, Lek-Heng Lim, Yuan Yao, and Yinyu Ye. Statistical ranking and combina- torial Hodge theory.Mathematical Programming, 127(1):203–244, 2011. doi: 10.1007/ s10107-010-0419-x

work page 2011
[16]

Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4): 600–612, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4): 600–612, 2004. 10

work page 2004
[17]

Motion-based perceptual quality assessment of video.IEEE Transactions on Image Processing, 19(2):335–350, 2010

Kalpana Seshadrinathan, Rajiv Soundararajan, Alan C Bovik, and Lawrence K Cormack. Motion-based perceptual quality assessment of video.IEEE Transactions on Image Processing, 19(2):335–350, 2010

work page 2010
[18]

Toward a practical perceptual video quality metric

Zhi Li, Anne Aaron, Ioannis Katsavounidis, Anush Moorthy, and Megha Manohara. Toward a practical perceptual video quality metric. InThe Netflix Tech Blog, 2016

work page 2016
[19]

Video quality assessment by reduced reference spatio-temporal entropic differencing.IEEE Transactions on Circuits and Systems for Video Technology, 23(4):684–694, 2013

Rajiv Soundararajan and Alan C Bovik. Video quality assessment by reduced reference spatio-temporal entropic differencing.IEEE Transactions on Circuits and Systems for Video Technology, 23(4):684–694, 2013

work page 2013
[20]

Convolutional neural networks for no-reference image quality assessment

Le Kang, Peng Ye, Yi Li, and David Doermann. Convolutional neural networks for no-reference image quality assessment. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1733–1740, 2014

work page 2014
[21]

Blind image quality assessment using a deep bilinear convolutional neural network.IEEE Transactions on Circuits and Systems for Video Technology, 30(1):36–47, 2020

Weixia Zhang, Kede Ma, Jia Yan, Dexiang Deng, and Zhou Wang. Blind image quality assessment using a deep bilinear convolutional neural network.IEEE Transactions on Circuits and Systems for Video Technology, 30(1):36–47, 2020

work page 2020
[22]

Quality assessment of in-the-wild videos

Dingquan Li, Tingting Jiang, and Ming Jiang. Quality assessment of in-the-wild videos. In Proceedings of the 27th ACM International Conference on Multimedia (ACM MM), pages 2351–2359, 2019

work page 2019
[23]

Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling

Haoning Wu, Chaofeng Chen, Jingwen Hou, Liang Liao, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling. InProceedings of the European Conference on Computer Vision (ECCV), 2022

work page 2022
[24]

Exploring video quality assessment on user generated contents from aesthetic and technical perspectives

Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. Exploring video quality assessment on user generated contents from aesthetic and technical perspectives. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

work page 2023
[25]

KVQ: Boosting video quality assessment via saliency-guided local perception

Yunpeng Qu, Kun Yuan, Qizhi Xie, Ming Sun, Chao Zhou, and Jian Wang. KVQ: Boosting video quality assessment via saliency-guided local perception. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

work page 2025
[26]

Visual instruction tuning

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. In Advances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[27]

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks

Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

work page 2024
[28]

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond.arXiv preprint arXiv:2308.12966, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[29]

Vqa 2: Visual question answering for video quality assessment

Ziheng Jia, Zicheng Zhang, Jiaying Qian, Haoning Wu, Wei Sun, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai, and Xiongkuo Min. Vqa 2: Visual question answering for video quality assessment. InProceedings of the 33rd ACM International Conference on Multimedia (ACM MM), 2025

work page 2025
[30]

Rea- soning as representation: Rethinking visual reinforcement learning in image quality assessment

Shijie Zhao, Xuanyu Zhang, Weiqi Li, Junlin Li, Li Zhang, Tianfan Xue, and Jian Zhang. Rea- soning as representation: Rethinking visual reinforcement learning in image quality assessment. InProceedings of the International Conference on Learning Representations (ICLR), 2026

work page 2026
[31]

Deep reinforcement learning from human preferences

Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. InAdvances in Neural Information Processing Systems (NeurIPS), volume 30, 2017

work page 2017
[32]

Training language models to follow instructions with human feedback.Advances in Neural Information Processing Systems (NeurIPS), 35:27730–27744, 2022

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in Neural Information Processing Systems (NeurIPS), 35:27730–27744, 2022. 11

work page 2022
[34]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[35]

Video-r1: Reinforcing video reasoning in mllms

Kaituo Feng, Kaixiong Gong, Bohao Li, Zonghao Guo, Yibing Wang, Tianshuo Peng, Junfei Wu, Xiaoying Zhang, Benyou Wang, and Xiangyu Yue. Video-r1: Reinforcing video reasoning in mllms. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[36]

OneThinker: All-in-one Reasoning Model for Image and Video

Kaituo Feng, Manyuan Zhang, Hongyu Li, Kaixuan Fan, Shuang Chen, Yilei Jiang, Dian Zheng, Peiwen Sun, Yiyuan Zhang, Haoze Sun, Yan Feng, Peng Pei, Xunliang Cai, and Xiangyu Yue. Onethinker: All-in-one reasoning model for image and video.arXiv preprint arXiv:2512.03043, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, et al. Internvl3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency.arXiv preprint arXiv:2508.18265, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[38]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[39]

Visual-rft: Visual reinforcement fine-tuning

Ziyu Liu, Zhiquan Sun, Yifan Li, Zhi Wang, Haotian Guo, Peng Yuan, Nan Shen, Fan Yang, and Zheng Yu. Visual-rft: Visual reinforcement fine-tuning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

work page 2025
[40]

R1-onevision: Advanc- ing generalized multimodal reasoning through cross-modal formalization.arXiv preprint arXiv:2503.12478, 2025

Jiahui Yang, Hao Dong, Sen Wang, Yi Wang, and Guansong Xia. R1-onevision: Advanc- ing generalized multimodal reasoning through cross-modal formalization.arXiv preprint arXiv:2503.12478, 2025

work page arXiv 2025
[41]

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

Peng Liu, Kai Luo, Jiayuan Zheng, Enze Xie, and Zhenguo Li. Seg-zero: Reasoning-chain guided segmentation via cognitive reinforcement.arXiv preprint arXiv:2503.06520, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[42]

Slowfast networks for video recognition

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6202–6211, 2019

work page 2019
[43]

Gemini 3 pro model evaluation, 2025

Google DeepMind. Gemini 3 pro model evaluation, 2025. https://storage.googleapis. com/deepmind-media/gemini/gemini_3_pro_model_evaluation.pdf

work page 2025
[44]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[45]

Patch-vq: ‘patching up’ the video quality problem

Zhenqiang Ying, Maitreya Manivasagam, Hasnain Mannan, Eero P Simoncelli, and Alan C Bovik. Patch-vq: ‘patching up’ the video quality problem. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14019–14029, 2021

work page 2021
[46]

The konstanz natural video database (konvid-1k)

Vlad Hosu, Franz Hahn, Mohsen Jenadeleh, Hanhe Lin, Hui Men, Tamás Szirányi, Shujun Li, and Dietmar Saupe. The konstanz natural video database (konvid-1k). In2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6. IEEE, 2017

work page 2017
[47]

Zeina Sinno and Alan C. Bovik. Large-scale study of perceptual video quality.IEEE Transac- tions on Image Processing, 28(2):612–627, 2019. doi: 10.1109/TIP.2018.2869673

work page doi:10.1109/tip.2018.2869673 2019
[48]

Xiangxu Yu, Zhengzhong Tu, Neil Birkbeck, Yilin Wang, Balu Adsumilli, and Alan C. Bovik. Perceptual quality assessment of ugc gaming videos.arXiv preprint arXiv:2204.00128, 2022

work page arXiv 2022
[49]

Avc, hevc, vp9, avs2 or av1? a comparative study of state-of-the-art video encoders on 4k videos

Zhuoran Li, Zhengfang Duanmu, Wentao Liu, and Zhou Wang. Avc, hevc, vp9, avs2 or av1? a comparative study of state-of-the-art video encoders on 4k videos. In16th International Conference on Image Analysis and Recognition (ICIAR), 2019. 12

work page 2019
[50]

Vdpve: Vqa dataset for perceptual video enhancement

Yixuan Gao, Yuqin Cao, Tengchuan Kou, Wei Sun, Yunlong Dong, Xiaohong Liu, Xiongkuo Min, and Guangtao Zhai. Vdpve: Vqa dataset for perceptual video enhancement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1474–1483, 2023

work page 2023
[51]

completely blind

Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Making a “completely blind” image quality analyzer.IEEE Signal Processing Letters, 20(3):209–212, 2013

work page 2013
[52]

Saad, and Alan C

Anish Mittal, Michele A. Saad, and Alan C. Bovik. A completely blind video integrity oracle. IEEE Transactions on Image Processing, 25(1):289–300, 2016. doi: 10.1109/TIP.2015.2502725

work page doi:10.1109/tip.2015.2502725 2016
[53]

Channappayya

Parimala Kancharla and Sumohana S. Channappayya. Completely blind quality assessment of user generated video content.IEEE Transactions on Image Processing, 31:263–274, 2022. doi: 10.1109/TIP.2021.3130541

work page doi:10.1109/tip.2021.3130541 2022
[54]

Jianyi Wang, Kelvin C. K. Chan, and Chen Change Loy. Exploring clip for assessing the look and feel of images. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023

work page 2023
[55]

A deep learning based no-reference quality assessment model for ugc videos

Wei Sun, Xiongkuo Min, Wei Lu, and Guangtao Zhai. A deep learning based no-reference quality assessment model for ugc videos. InProceedings of the 30th ACM International Conference on Multimedia (ACM MM), pages 856–865, 2022

work page 2022
[56]

Analysis of video quality datasets via design of minimalistic video quality models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, and Kede Ma. Analysis of video quality datasets via design of minimalistic video quality models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024
[57]

Fu, Stefano Ermon, Atri Rudra, and Christopher Ré

Tri Dao, Daniel Y . Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashattention: Fast and memory-efficient exact attention with io-awareness. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[58]

Zero: Memory optimiza- tions toward training trillion parameter models

Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. Zero: Memory optimiza- tions toward training trillion parameter models. InSC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020

work page 2020
[59]

Video 1 is better

Arpad E. Elo.The Rating of Chessplayers, Past and Present. Arco Publishing, 1978. 13 A Sparse Graph Analysis Sparse graph construction.For a benchmark with N videos, the evaluation graph is an undirected sparse comparison graph G= (V,E) , where each vertex is a test video and each queried unordered pair becomes one edge. The edge orientation is assigned o...

work page 1978

[1] [1]

Perceptual video quality assessment: A survey.Science China Information Sciences, 67(11):211301, 2024

Xiongkuo Min, Huiyu Duan, Wei Sun, Yucheng Zhu, and Guangtao Zhai. Perceptual video quality assessment: A survey.Science China Information Sciences, 67(11):211301, 2024

work page 2024

[2] [2]

Blind image quality assessment: A survey on blind image quality assessment.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5): 5833–5857, 2023

Guangtao Zhai and Xiongkuo Min. Blind image quality assessment: A survey on blind image quality assessment.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5): 5833–5857, 2023

work page 2023

[3] [3]

Q-align: Teaching lmms for visual scoring via discrete text-defined levels

Haoning Wu, Zicheng Zhang, Weixia Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Yixuan Gao, Annan Wang, Erli Zhang, Wenxiu Sun, Qiong Yan, Xiongkuo Min, Guangtao Zhai, and Weisi Lin. Q-align: Teaching lmms for visual scoring via discrete text-defined levels. In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

work page 2024

[4] [4]

Teaching large language models to regress accurate image quality scores using score distribution

Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue, and Chao Dong. Teaching large language models to regress accurate image quality scores using score distribution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

work page 2025

[5] [5]

Q-insight: Understanding image quality via visual reinforcement learning

Weiqi Li, Xuanyu Zhang, Shijie Zhao, Yabin Zhang, Junlin Li, Li Zhang, and Jian Zhang. Q-insight: Understanding image quality via visual reinforcement learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025

[6] [6]

Visualquality-r1: Reasoning-induced image quality assessment via reinforcement learning to rank

Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, and Kede Ma. Visualquality-r1: Reasoning-induced image quality assessment via reinforcement learning to rank. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025

[7] [7]

Vqathinker: Exploring generalizable and explainable video quality assessment via reinforcement learning

Chunyi Cao, Wei Sun, Zicheng Zhang, Yingjie Jia, Xiaohong Liu, Xiongkuo Min, and Guangtao Zhai. Vqathinker: Exploring generalizable and explainable video quality assessment via reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026

work page 2026

[8] [8]

Rank analysis of incomplete block designs: I

Ralph Allan Bradley and Milton E Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons.Biometrika, 39(3/4):324–345, 1952

work page 1952

[9] [9]

Learning to rank using gradient descent

Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Svore. Learning to rank using gradient descent. InProceedings of the 22nd International Conference on Machine Learning (ICML), pages 89–96, 2005

work page 2005

[10] [10]

Learning to rank for information retrieval.Foundations and Trends in Information Retrieval, 3(3):225–331, 2009

Tie-Yan Liu. Learning to rank for information retrieval.Foundations and Trends in Information Retrieval, 3(3):225–331, 2009

work page 2009

[11] [11]

Rankiqa: Learning from rankings for no-reference image quality assessment

Xialei Liu, Joost van de Weijer, and Andrew D Bagdanov. Rankiqa: Learning from rankings for no-reference image quality assessment. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1040–1049, 2017

work page 2017

[12] [12]

Adaptive image quality assessment via teaching large multimodal models to compare

Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, and Shiqi Wang. Adaptive image quality assessment via teaching large multimodal models to compare. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024

[13] [13]

Mllms know where to look: Training-free perception of small visual details with multimodal llms

Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, and Filip Ilievski. Mllms know where to look: Training-free perception of small visual details with multimodal llms. InInternational Conference on Learning Representations (ICLR), 2025

work page 2025

[14] [14]

Vq-insight: Teaching vlms for ai-generated video quality understanding via progressive visual reinforcement learning

Xuanyu Zhang, Weiqi Li, Shijie Zhao, Junlin Li, Li Zhang, and Jian Zhang. Vq-insight: Teaching vlms for ai-generated video quality understanding via progressive visual reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026

work page 2026

[15] [15]

Statistical ranking and combina- torial Hodge theory.Mathematical Programming, 127(1):203–244, 2011

Xiaoye Jiang, Lek-Heng Lim, Yuan Yao, and Yinyu Ye. Statistical ranking and combina- torial Hodge theory.Mathematical Programming, 127(1):203–244, 2011. doi: 10.1007/ s10107-010-0419-x

work page 2011

[16] [16]

Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4): 600–612, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4): 600–612, 2004. 10

work page 2004

[17] [17]

Motion-based perceptual quality assessment of video.IEEE Transactions on Image Processing, 19(2):335–350, 2010

Kalpana Seshadrinathan, Rajiv Soundararajan, Alan C Bovik, and Lawrence K Cormack. Motion-based perceptual quality assessment of video.IEEE Transactions on Image Processing, 19(2):335–350, 2010

work page 2010

[18] [18]

Toward a practical perceptual video quality metric

Zhi Li, Anne Aaron, Ioannis Katsavounidis, Anush Moorthy, and Megha Manohara. Toward a practical perceptual video quality metric. InThe Netflix Tech Blog, 2016

work page 2016

[19] [19]

Video quality assessment by reduced reference spatio-temporal entropic differencing.IEEE Transactions on Circuits and Systems for Video Technology, 23(4):684–694, 2013

Rajiv Soundararajan and Alan C Bovik. Video quality assessment by reduced reference spatio-temporal entropic differencing.IEEE Transactions on Circuits and Systems for Video Technology, 23(4):684–694, 2013

work page 2013

[20] [20]

Convolutional neural networks for no-reference image quality assessment

Le Kang, Peng Ye, Yi Li, and David Doermann. Convolutional neural networks for no-reference image quality assessment. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1733–1740, 2014

work page 2014

[21] [21]

Blind image quality assessment using a deep bilinear convolutional neural network.IEEE Transactions on Circuits and Systems for Video Technology, 30(1):36–47, 2020

Weixia Zhang, Kede Ma, Jia Yan, Dexiang Deng, and Zhou Wang. Blind image quality assessment using a deep bilinear convolutional neural network.IEEE Transactions on Circuits and Systems for Video Technology, 30(1):36–47, 2020

work page 2020

[22] [22]

Quality assessment of in-the-wild videos

Dingquan Li, Tingting Jiang, and Ming Jiang. Quality assessment of in-the-wild videos. In Proceedings of the 27th ACM International Conference on Multimedia (ACM MM), pages 2351–2359, 2019

work page 2019

[23] [23]

Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling

Haoning Wu, Chaofeng Chen, Jingwen Hou, Liang Liao, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling. InProceedings of the European Conference on Computer Vision (ECCV), 2022

work page 2022

[24] [24]

Exploring video quality assessment on user generated contents from aesthetic and technical perspectives

Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. Exploring video quality assessment on user generated contents from aesthetic and technical perspectives. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

work page 2023

[25] [25]

KVQ: Boosting video quality assessment via saliency-guided local perception

Yunpeng Qu, Kun Yuan, Qizhi Xie, Ming Sun, Chao Zhou, and Jian Wang. KVQ: Boosting video quality assessment via saliency-guided local perception. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

work page 2025

[26] [26]

Visual instruction tuning

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. In Advances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[27] [27]

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks

Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

work page 2024

[28] [28]

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond.arXiv preprint arXiv:2308.12966, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[29] [29]

Vqa 2: Visual question answering for video quality assessment

Ziheng Jia, Zicheng Zhang, Jiaying Qian, Haoning Wu, Wei Sun, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai, and Xiongkuo Min. Vqa 2: Visual question answering for video quality assessment. InProceedings of the 33rd ACM International Conference on Multimedia (ACM MM), 2025

work page 2025

[30] [30]

Rea- soning as representation: Rethinking visual reinforcement learning in image quality assessment

Shijie Zhao, Xuanyu Zhang, Weiqi Li, Junlin Li, Li Zhang, Tianfan Xue, and Jian Zhang. Rea- soning as representation: Rethinking visual reinforcement learning in image quality assessment. InProceedings of the International Conference on Learning Representations (ICLR), 2026

work page 2026

[31] [31]

Deep reinforcement learning from human preferences

Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. InAdvances in Neural Information Processing Systems (NeurIPS), volume 30, 2017

work page 2017

[32] [32]

Training language models to follow instructions with human feedback.Advances in Neural Information Processing Systems (NeurIPS), 35:27730–27744, 2022

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in Neural Information Processing Systems (NeurIPS), 35:27730–27744, 2022. 11

work page 2022

[33] [34]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[34] [35]

Video-r1: Reinforcing video reasoning in mllms

Kaituo Feng, Kaixiong Gong, Bohao Li, Zonghao Guo, Yibing Wang, Tianshuo Peng, Junfei Wu, Xiaoying Zhang, Benyou Wang, and Xiangyu Yue. Video-r1: Reinforcing video reasoning in mllms. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025

[35] [36]

OneThinker: All-in-one Reasoning Model for Image and Video

Kaituo Feng, Manyuan Zhang, Hongyu Li, Kaixuan Fan, Shuang Chen, Yilei Jiang, Dian Zheng, Peiwen Sun, Yiyuan Zhang, Haoze Sun, Yan Feng, Peng Pei, Xunliang Cai, and Xiangyu Yue. Onethinker: All-in-one reasoning model for image and video.arXiv preprint arXiv:2512.03043, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[36] [37]

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, et al. Internvl3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency.arXiv preprint arXiv:2508.18265, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[37] [38]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[38] [39]

Visual-rft: Visual reinforcement fine-tuning

Ziyu Liu, Zhiquan Sun, Yifan Li, Zhi Wang, Haotian Guo, Peng Yuan, Nan Shen, Fan Yang, and Zheng Yu. Visual-rft: Visual reinforcement fine-tuning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

work page 2025

[39] [40]

R1-onevision: Advanc- ing generalized multimodal reasoning through cross-modal formalization.arXiv preprint arXiv:2503.12478, 2025

Jiahui Yang, Hao Dong, Sen Wang, Yi Wang, and Guansong Xia. R1-onevision: Advanc- ing generalized multimodal reasoning through cross-modal formalization.arXiv preprint arXiv:2503.12478, 2025

work page arXiv 2025

[40] [41]

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

Peng Liu, Kai Luo, Jiayuan Zheng, Enze Xie, and Zhenguo Li. Seg-zero: Reasoning-chain guided segmentation via cognitive reinforcement.arXiv preprint arXiv:2503.06520, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[41] [42]

Slowfast networks for video recognition

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6202–6211, 2019

work page 2019

[42] [43]

Gemini 3 pro model evaluation, 2025

Google DeepMind. Gemini 3 pro model evaluation, 2025. https://storage.googleapis. com/deepmind-media/gemini/gemini_3_pro_model_evaluation.pdf

work page 2025

[43] [44]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[44] [45]

Patch-vq: ‘patching up’ the video quality problem

Zhenqiang Ying, Maitreya Manivasagam, Hasnain Mannan, Eero P Simoncelli, and Alan C Bovik. Patch-vq: ‘patching up’ the video quality problem. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14019–14029, 2021

work page 2021

[45] [46]

The konstanz natural video database (konvid-1k)

Vlad Hosu, Franz Hahn, Mohsen Jenadeleh, Hanhe Lin, Hui Men, Tamás Szirányi, Shujun Li, and Dietmar Saupe. The konstanz natural video database (konvid-1k). In2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6. IEEE, 2017

work page 2017

[46] [47]

Zeina Sinno and Alan C. Bovik. Large-scale study of perceptual video quality.IEEE Transac- tions on Image Processing, 28(2):612–627, 2019. doi: 10.1109/TIP.2018.2869673

work page doi:10.1109/tip.2018.2869673 2019

[47] [48]

Xiangxu Yu, Zhengzhong Tu, Neil Birkbeck, Yilin Wang, Balu Adsumilli, and Alan C. Bovik. Perceptual quality assessment of ugc gaming videos.arXiv preprint arXiv:2204.00128, 2022

work page arXiv 2022

[48] [49]

Avc, hevc, vp9, avs2 or av1? a comparative study of state-of-the-art video encoders on 4k videos

Zhuoran Li, Zhengfang Duanmu, Wentao Liu, and Zhou Wang. Avc, hevc, vp9, avs2 or av1? a comparative study of state-of-the-art video encoders on 4k videos. In16th International Conference on Image Analysis and Recognition (ICIAR), 2019. 12

work page 2019

[49] [50]

Vdpve: Vqa dataset for perceptual video enhancement

Yixuan Gao, Yuqin Cao, Tengchuan Kou, Wei Sun, Yunlong Dong, Xiaohong Liu, Xiongkuo Min, and Guangtao Zhai. Vdpve: Vqa dataset for perceptual video enhancement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1474–1483, 2023

work page 2023

[50] [51]

completely blind

Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Making a “completely blind” image quality analyzer.IEEE Signal Processing Letters, 20(3):209–212, 2013

work page 2013

[51] [52]

Saad, and Alan C

Anish Mittal, Michele A. Saad, and Alan C. Bovik. A completely blind video integrity oracle. IEEE Transactions on Image Processing, 25(1):289–300, 2016. doi: 10.1109/TIP.2015.2502725

work page doi:10.1109/tip.2015.2502725 2016

[52] [53]

Channappayya

Parimala Kancharla and Sumohana S. Channappayya. Completely blind quality assessment of user generated video content.IEEE Transactions on Image Processing, 31:263–274, 2022. doi: 10.1109/TIP.2021.3130541

work page doi:10.1109/tip.2021.3130541 2022

[53] [54]

Jianyi Wang, Kelvin C. K. Chan, and Chen Change Loy. Exploring clip for assessing the look and feel of images. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023

work page 2023

[54] [55]

A deep learning based no-reference quality assessment model for ugc videos

Wei Sun, Xiongkuo Min, Wei Lu, and Guangtao Zhai. A deep learning based no-reference quality assessment model for ugc videos. InProceedings of the 30th ACM International Conference on Multimedia (ACM MM), pages 856–865, 2022

work page 2022

[55] [56]

Analysis of video quality datasets via design of minimalistic video quality models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, and Kede Ma. Analysis of video quality datasets via design of minimalistic video quality models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024

[56] [57]

Fu, Stefano Ermon, Atri Rudra, and Christopher Ré

Tri Dao, Daniel Y . Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashattention: Fast and memory-efficient exact attention with io-awareness. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022

[57] [58]

Zero: Memory optimiza- tions toward training trillion parameter models

Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. Zero: Memory optimiza- tions toward training trillion parameter models. InSC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020

work page 2020

[58] [59]

Video 1 is better

Arpad E. Elo.The Rating of Chessplayers, Past and Present. Arco Publishing, 1978. 13 A Sparse Graph Analysis Sparse graph construction.For a benchmark with N videos, the evaluation graph is an undirected sparse comparison graph G= (V,E) , where each vertex is a test video and each queried unordered pair becomes one edge. The edge orientation is assigned o...

work page 1978