VersusQ: Pairwise Margin Reasoning for Generalizable Video Quality Assessment
Pith reviewed 2026-05-21 04:52 UTC · model grok-4.3
The pith
Pairwise margin reasoning lets video quality models generalize by comparing videos directly instead of assigning absolute scores.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VersusQ performs LMM-based comparison between two videos, reasons about their visual and temporal quality differences, and predicts a signed continuous margin that captures both the preferred choice and the degree of difference, with Margin-Coupled GRPO jointly optimizing rollout-based relational reasoning and continuous margin regression.
What carries the argument
Pairwise margin reasoning, where the model directly compares two videos to output a signed continuous margin representing preference and difference magnitude.
Load-bearing premise
That focusing on perceptual differences through direct comparisons avoids the biases from dataset-specific rating habits and score distributions.
What would settle it
A new video quality benchmark with entirely different rating protocols where VersusQ shows no better cross-domain performance than absolute scoring baselines would falsify the central claim.
Figures
read the original abstract
Large Multimodal Models (LMMs) have shown promise for video quality assessment, but most methods still predict an absolute score for each video. Such pointwise supervision often mixes perceptual quality with dataset-specific calibration, including annotation protocols, rating habits, and score distributions. As a result, the learned scoring rule may work well within a benchmark but transfer poorly across unseen domains. We argue that relative comparisons alleviate the absolute-scale calibration bias by focusing purely on perceptual differences rather than dataset-specific rating habits. Consequently, we propose \textbf{VersusQ}, a pairwise margin reasoning framework driven entirely by direct comparisons. Specifically, VersusQ performs LMM-based comparison between two videos, reasons about their visual and temporal quality differences, and predicts a signed continuous margin that captures both the preferred choice and the degree of difference. Furthermore, to align interpretable comparison rationales with fine-grained numerical differences, we introduce Margin-Coupled GRPO, which jointly optimizes rollout-based relational reasoning and continuous margin regression. Extensive experiments on multiple public VQA benchmarks demonstrate that VersusQ achieves state-of-the-art performance, strong cross-domain generalization, and reliable fine-grained ranking under heterogeneous evaluation scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes VersusQ, a pairwise margin reasoning framework for video quality assessment using large multimodal models. It claims that absolute score prediction mixes perceptual quality with dataset-specific calibration biases from annotation protocols and rating habits, whereas relative comparisons focus purely on perceptual differences. The method performs LMM-based pairwise comparisons to predict signed continuous margins capturing both preference and degree of difference, optimized via Margin-Coupled GRPO that jointly handles rollout-based relational reasoning and continuous margin regression. Extensive experiments on multiple public VQA benchmarks are reported to achieve state-of-the-art performance, strong cross-domain generalization, and reliable fine-grained ranking under heterogeneous scenarios.
Significance. If the central claim holds without hidden dependencies on absolute scores, the approach could meaningfully advance generalizable VQA by reducing domain-specific biases, enabling better transfer across benchmarks with differing rating distributions. This would be particularly valuable for applications involving unseen video domains or heterogeneous evaluation protocols.
major comments (1)
- [Abstract and §3] Abstract and §3 (method description): The claim that the framework is 'driven entirely by direct comparisons' and alleviates absolute-scale calibration bias is load-bearing for the cross-domain generalization results. However, a continuous margin regressor requires explicit target margins for each pair. Standard VQA practice derives these as signed differences of mean opinion scores (MOS) or normalized variants from the training benchmarks; these targets embed annotation protocols, rater biases, and score distributions, reintroducing the calibration information the method claims to sidestep. Please specify exactly how ground-truth margins are constructed and demonstrate that they do not rely on absolute scores from the same datasets used for training.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review of our manuscript. We address the major comment point by point below, with plans to revise the paper accordingly.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (method description): The claim that the framework is 'driven entirely by direct comparisons' and alleviates absolute-scale calibration bias is load-bearing for the cross-domain generalization results. However, a continuous margin regressor requires explicit target margins for each pair. Standard VQA practice derives these as signed differences of mean opinion scores (MOS) or normalized variants from the training benchmarks; these targets embed annotation protocols, rater biases, and score distributions, reintroducing the calibration information the method claims to sidestep. Please specify exactly how ground-truth margins are constructed and demonstrate that they do not rely on absolute scores from the same datasets used for training.
Authors: We appreciate the referee's careful reading and agree that explicit clarification is needed. In the revised manuscript, we will add the following details to §3: ground-truth margins for training are constructed as m = (MOS_v1 - MOS_v2) / σ_dataset, where σ_dataset is the standard deviation of MOS values in the training benchmark (or a normalized range variant). This construction does rely on absolute MOS from the training datasets. However, the framework remains driven by direct comparisons because the LMM never receives or predicts an absolute score for any individual video; supervision and inference are strictly pairwise. We argue this still reduces calibration bias relative to pointwise methods, as the model learns perceptual difference magnitudes rather than dataset-specific absolute scales. To demonstrate, we will add cross-domain experiments in the revision training on one benchmark and testing on another with substantially shifted MOS distributions, where VersusQ outperforms pointwise baselines. We will also include the exact margin formula and a short ablation on normalization choices. revision: yes
Circularity Check
No load-bearing circularity; pairwise framework presented as independent proposal without self-referential reduction
full rationale
The provided abstract and context contain no equations, no self-citations invoked as uniqueness theorems, and no fitted parameters renamed as predictions. The central argument—that relative margin reasoning sidesteps absolute-scale bias—is a methodological claim justified by the framework design itself rather than by reducing to dataset MOS values by construction. No step matches the enumerated circularity patterns; the derivation remains self-contained against external benchmarks and experiments.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
m⋆_ij = s_MOS_i − s_MOS_j ... least-squares ... s∗ = arg min ∥As−b∥2_2 s.t. Σ si=0
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Margin-Coupled GRPO ... rollout-based relational reasoning and continuous margin regression
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Xiongkuo Min, Huiyu Duan, Wei Sun, Yucheng Zhu, and Guangtao Zhai. Perceptual video quality assessment: A survey.Science China Information Sciences, 67(11):211301, 2024
work page 2024
-
[2]
Guangtao Zhai and Xiongkuo Min. Blind image quality assessment: A survey on blind image quality assessment.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5): 5833–5857, 2023
work page 2023
-
[3]
Q-align: Teaching lmms for visual scoring via discrete text-defined levels
Haoning Wu, Zicheng Zhang, Weixia Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Yixuan Gao, Annan Wang, Erli Zhang, Wenxiu Sun, Qiong Yan, Xiongkuo Min, Guangtao Zhai, and Weisi Lin. Q-align: Teaching lmms for visual scoring via discrete text-defined levels. In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024
work page 2024
-
[4]
Teaching large language models to regress accurate image quality scores using score distribution
Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue, and Chao Dong. Teaching large language models to regress accurate image quality scores using score distribution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[5]
Q-insight: Understanding image quality via visual reinforcement learning
Weiqi Li, Xuanyu Zhang, Shijie Zhao, Yabin Zhang, Junlin Li, Li Zhang, and Jian Zhang. Q-insight: Understanding image quality via visual reinforcement learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2025
work page 2025
-
[6]
Visualquality-r1: Reasoning-induced image quality assessment via reinforcement learning to rank
Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, and Kede Ma. Visualquality-r1: Reasoning-induced image quality assessment via reinforcement learning to rank. InAdvances in Neural Information Processing Systems (NeurIPS), 2025
work page 2025
-
[7]
Chunyi Cao, Wei Sun, Zicheng Zhang, Yingjie Jia, Xiaohong Liu, Xiongkuo Min, and Guangtao Zhai. Vqathinker: Exploring generalizable and explainable video quality assessment via reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026
work page 2026
-
[8]
Rank analysis of incomplete block designs: I
Ralph Allan Bradley and Milton E Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons.Biometrika, 39(3/4):324–345, 1952
work page 1952
-
[9]
Learning to rank using gradient descent
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Svore. Learning to rank using gradient descent. InProceedings of the 22nd International Conference on Machine Learning (ICML), pages 89–96, 2005
work page 2005
-
[10]
Tie-Yan Liu. Learning to rank for information retrieval.Foundations and Trends in Information Retrieval, 3(3):225–331, 2009
work page 2009
-
[11]
Rankiqa: Learning from rankings for no-reference image quality assessment
Xialei Liu, Joost van de Weijer, and Andrew D Bagdanov. Rankiqa: Learning from rankings for no-reference image quality assessment. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1040–1049, 2017
work page 2017
-
[12]
Adaptive image quality assessment via teaching large multimodal models to compare
Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, and Shiqi Wang. Adaptive image quality assessment via teaching large multimodal models to compare. InAdvances in Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[13]
Mllms know where to look: Training-free perception of small visual details with multimodal llms
Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, and Filip Ilievski. Mllms know where to look: Training-free perception of small visual details with multimodal llms. InInternational Conference on Learning Representations (ICLR), 2025
work page 2025
-
[14]
Xuanyu Zhang, Weiqi Li, Shijie Zhao, Junlin Li, Li Zhang, and Jian Zhang. Vq-insight: Teaching vlms for ai-generated video quality understanding via progressive visual reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026
work page 2026
-
[15]
Statistical ranking and combina- torial Hodge theory.Mathematical Programming, 127(1):203–244, 2011
Xiaoye Jiang, Lek-Heng Lim, Yuan Yao, and Yinyu Ye. Statistical ranking and combina- torial Hodge theory.Mathematical Programming, 127(1):203–244, 2011. doi: 10.1007/ s10107-010-0419-x
work page 2011
-
[16]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4): 600–612, 2004. 10
work page 2004
-
[17]
Kalpana Seshadrinathan, Rajiv Soundararajan, Alan C Bovik, and Lawrence K Cormack. Motion-based perceptual quality assessment of video.IEEE Transactions on Image Processing, 19(2):335–350, 2010
work page 2010
-
[18]
Toward a practical perceptual video quality metric
Zhi Li, Anne Aaron, Ioannis Katsavounidis, Anush Moorthy, and Megha Manohara. Toward a practical perceptual video quality metric. InThe Netflix Tech Blog, 2016
work page 2016
-
[19]
Rajiv Soundararajan and Alan C Bovik. Video quality assessment by reduced reference spatio-temporal entropic differencing.IEEE Transactions on Circuits and Systems for Video Technology, 23(4):684–694, 2013
work page 2013
-
[20]
Convolutional neural networks for no-reference image quality assessment
Le Kang, Peng Ye, Yi Li, and David Doermann. Convolutional neural networks for no-reference image quality assessment. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1733–1740, 2014
work page 2014
-
[21]
Weixia Zhang, Kede Ma, Jia Yan, Dexiang Deng, and Zhou Wang. Blind image quality assessment using a deep bilinear convolutional neural network.IEEE Transactions on Circuits and Systems for Video Technology, 30(1):36–47, 2020
work page 2020
-
[22]
Quality assessment of in-the-wild videos
Dingquan Li, Tingting Jiang, and Ming Jiang. Quality assessment of in-the-wild videos. In Proceedings of the 27th ACM International Conference on Multimedia (ACM MM), pages 2351–2359, 2019
work page 2019
-
[23]
Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling
Haoning Wu, Chaofeng Chen, Jingwen Hou, Liang Liao, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling. InProceedings of the European Conference on Computer Vision (ECCV), 2022
work page 2022
-
[24]
Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. Exploring video quality assessment on user generated contents from aesthetic and technical perspectives. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023
work page 2023
-
[25]
KVQ: Boosting video quality assessment via saliency-guided local perception
Yunpeng Qu, Kun Yuan, Qizhi Xie, Ming Sun, Chao Zhou, and Jian Wang. KVQ: Boosting video quality assessment via saliency-guided local perception. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[26]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. In Advances in Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[27]
Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
work page 2024
-
[28]
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond.arXiv preprint arXiv:2308.12966, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[29]
Vqa 2: Visual question answering for video quality assessment
Ziheng Jia, Zicheng Zhang, Jiaying Qian, Haoning Wu, Wei Sun, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai, and Xiongkuo Min. Vqa 2: Visual question answering for video quality assessment. InProceedings of the 33rd ACM International Conference on Multimedia (ACM MM), 2025
work page 2025
-
[30]
Rea- soning as representation: Rethinking visual reinforcement learning in image quality assessment
Shijie Zhao, Xuanyu Zhang, Weiqi Li, Junlin Li, Li Zhang, Tianfan Xue, and Jian Zhang. Rea- soning as representation: Rethinking visual reinforcement learning in image quality assessment. InProceedings of the International Conference on Learning Representations (ICLR), 2026
work page 2026
-
[31]
Deep reinforcement learning from human preferences
Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. InAdvances in Neural Information Processing Systems (NeurIPS), volume 30, 2017
work page 2017
-
[32]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in Neural Information Processing Systems (NeurIPS), 35:27730–27744, 2022. 11
work page 2022
-
[34]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[35]
Video-r1: Reinforcing video reasoning in mllms
Kaituo Feng, Kaixiong Gong, Bohao Li, Zonghao Guo, Yibing Wang, Tianshuo Peng, Junfei Wu, Xiaoying Zhang, Benyou Wang, and Xiangyu Yue. Video-r1: Reinforcing video reasoning in mllms. InAdvances in Neural Information Processing Systems (NeurIPS), 2025
work page 2025
-
[36]
OneThinker: All-in-one Reasoning Model for Image and Video
Kaituo Feng, Manyuan Zhang, Hongyu Li, Kaixuan Fan, Shuang Chen, Yilei Jiang, Dian Zheng, Peiwen Sun, Yiyuan Zhang, Haoze Sun, Yan Feng, Peng Pei, Xunliang Cai, and Xiangyu Yue. Onethinker: All-in-one reasoning model for image and video.arXiv preprint arXiv:2512.03043, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, et al. Internvl3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency.arXiv preprint arXiv:2508.18265, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[38]
Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
Visual-rft: Visual reinforcement fine-tuning
Ziyu Liu, Zhiquan Sun, Yifan Li, Zhi Wang, Haotian Guo, Peng Yuan, Nan Shen, Fan Yang, and Zheng Yu. Visual-rft: Visual reinforcement fine-tuning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025
work page 2025
-
[40]
Jiahui Yang, Hao Dong, Sen Wang, Yi Wang, and Guansong Xia. R1-onevision: Advanc- ing generalized multimodal reasoning through cross-modal formalization.arXiv preprint arXiv:2503.12478, 2025
-
[41]
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
Peng Liu, Kai Luo, Jiayuan Zheng, Enze Xie, and Zhenguo Li. Seg-zero: Reasoning-chain guided segmentation via cognitive reinforcement.arXiv preprint arXiv:2503.06520, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[42]
Slowfast networks for video recognition
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6202–6211, 2019
work page 2019
-
[43]
Gemini 3 pro model evaluation, 2025
Google DeepMind. Gemini 3 pro model evaluation, 2025. https://storage.googleapis. com/deepmind-media/gemini/gemini_3_pro_model_evaluation.pdf
work page 2025
-
[44]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[45]
Patch-vq: ‘patching up’ the video quality problem
Zhenqiang Ying, Maitreya Manivasagam, Hasnain Mannan, Eero P Simoncelli, and Alan C Bovik. Patch-vq: ‘patching up’ the video quality problem. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14019–14029, 2021
work page 2021
-
[46]
The konstanz natural video database (konvid-1k)
Vlad Hosu, Franz Hahn, Mohsen Jenadeleh, Hanhe Lin, Hui Men, Tamás Szirányi, Shujun Li, and Dietmar Saupe. The konstanz natural video database (konvid-1k). In2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6. IEEE, 2017
work page 2017
-
[47]
Zeina Sinno and Alan C. Bovik. Large-scale study of perceptual video quality.IEEE Transac- tions on Image Processing, 28(2):612–627, 2019. doi: 10.1109/TIP.2018.2869673
- [48]
-
[49]
Avc, hevc, vp9, avs2 or av1? a comparative study of state-of-the-art video encoders on 4k videos
Zhuoran Li, Zhengfang Duanmu, Wentao Liu, and Zhou Wang. Avc, hevc, vp9, avs2 or av1? a comparative study of state-of-the-art video encoders on 4k videos. In16th International Conference on Image Analysis and Recognition (ICIAR), 2019. 12
work page 2019
-
[50]
Vdpve: Vqa dataset for perceptual video enhancement
Yixuan Gao, Yuqin Cao, Tengchuan Kou, Wei Sun, Yunlong Dong, Xiaohong Liu, Xiongkuo Min, and Guangtao Zhai. Vdpve: Vqa dataset for perceptual video enhancement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1474–1483, 2023
work page 2023
-
[51]
Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Making a “completely blind” image quality analyzer.IEEE Signal Processing Letters, 20(3):209–212, 2013
work page 2013
-
[52]
Anish Mittal, Michele A. Saad, and Alan C. Bovik. A completely blind video integrity oracle. IEEE Transactions on Image Processing, 25(1):289–300, 2016. doi: 10.1109/TIP.2015.2502725
-
[53]
Parimala Kancharla and Sumohana S. Channappayya. Completely blind quality assessment of user generated video content.IEEE Transactions on Image Processing, 31:263–274, 2022. doi: 10.1109/TIP.2021.3130541
-
[54]
Jianyi Wang, Kelvin C. K. Chan, and Chen Change Loy. Exploring clip for assessing the look and feel of images. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023
work page 2023
-
[55]
A deep learning based no-reference quality assessment model for ugc videos
Wei Sun, Xiongkuo Min, Wei Lu, and Guangtao Zhai. A deep learning based no-reference quality assessment model for ugc videos. InProceedings of the 30th ACM International Conference on Multimedia (ACM MM), pages 856–865, 2022
work page 2022
-
[56]
Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, and Kede Ma. Analysis of video quality datasets via design of minimalistic video quality models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
work page 2024
-
[57]
Fu, Stefano Ermon, Atri Rudra, and Christopher Ré
Tri Dao, Daniel Y . Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashattention: Fast and memory-efficient exact attention with io-awareness. InAdvances in Neural Information Processing Systems (NeurIPS), 2022
work page 2022
-
[58]
Zero: Memory optimiza- tions toward training trillion parameter models
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. Zero: Memory optimiza- tions toward training trillion parameter models. InSC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020
work page 2020
-
[59]
Arpad E. Elo.The Rating of Chessplayers, Past and Present. Arco Publishing, 1978. 13 A Sparse Graph Analysis Sparse graph construction.For a benchmark with N videos, the evaluation graph is an undirected sparse comparison graph G= (V,E) , where each vertex is a test video and each queried unordered pair becomes one edge. The edge orientation is assigned o...
work page 1978
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.