pith. machine review for the scientific record. sign in

arxiv: 2604.14816 · v1 · submitted 2026-04-16 · 💻 cs.CV · cs.HC· cs.MM

Recognition: unknown

NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results

Authors on Pith no claims yet

Pith reviewed 2026-05-10 12:19 UTC · model grok-4.3

classification 💻 cs.CV cs.HCcs.MM
keywords video saliencysaliency predictioncrowdsourced datavideo datasetsaliency mapsbenchmarkprediction methodsquality metrics
0
0 comments X

The pith

A dataset of 2000 videos with crowdsourced saliency maps from mouse tracking supports evaluation of automatic prediction methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes the setup of a challenge in which teams create methods to generate saliency maps for given video sequences. A collection of 2000 diverse videos was assembled and annotated with fixations and saliency maps obtained through crowdsourced mouse tracking involving more than 5000 assessors. A test subset of 800 videos serves as the basis for scoring submissions according to standard quality metrics. More than 20 teams submitted entries, and 7 advanced through the final code-review stage. The complete set of videos and annotations has been released for public use.

Core claim

The authors assembled a benchmark dataset consisting of 2000 videos paired with saliency maps derived from crowdsourced mouse-tracking fixations, and they evaluated multiple submitted prediction algorithms on a held-out test portion using accepted quality measures.

What carries the argument

Crowdsourced mouse tracking to collect viewing fixations and generate corresponding saliency maps for video content.

If this is right

  • Participants' methods can be directly compared using the shared test videos and metrics.
  • The public release allows any researcher to train models on the training videos and evaluate on the test set.
  • Results from the seven final teams indicate achievable performance levels for current saliency prediction approaches.
  • Future improvements in prediction accuracy can be measured against this fixed benchmark.
  • Data availability removes barriers for developing and validating new techniques.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such a large annotated video collection could support training of models that generalize better to unseen video types.
  • Integrating the saliency predictions into video processing pipelines might reduce computational load by focusing on important areas.
  • Similar crowdsourcing approaches could be applied to other visual attention tasks like object detection in videos.
  • Discrepancies between mouse-based and eye-tracker data might reveal biases in the collected maps.

Load-bearing premise

The assumption that mouse tracking by crowdsourced assessors produces saliency maps that match actual human gaze patterns across varied video content.

What would settle it

An experiment that records both mouse movements and eye positions on the same set of videos and finds substantial differences in the resulting saliency maps.

Figures

Figures reproduced from arXiv: 2604.14816 by Alexey Bryncev, Andrey Moskalenko, Athanasia Zlatintsi, Cong Wu, Dmitry Vatolin, Gen Zhan, Gongyang Li, Guoyi Xu, Haibin Ling, Hao Liu, Ivan Kosmynin, Jiachen Tu, Jiajia Liu, Jianlin Chen, Katerina Pastra, Keren Fu, Kira Shilovskaya, Konstantinos Chaldaiopoulos, Kun He, Kun Wang, Linze Li, Liqiang Nie, Li Yang, Mikhail Erofeev, Niki Efthymiou, Panagiotis Filntisis, Petros Maragos, Qianlong Xiang, Radu Timofte, Shixiang Shi, Tianyang Xu, Wenzhuo Zhao, Xiaojun Wu, Xuefeng Zhu, Xu Wu, Yabin Zhang, Yaokun Shi, Yaoxin Jiang, Yiting Liao, Yunheng Zheng, Yupeng Hu, Yuxin Liu, Zhiran Li.

Figure 1
Figure 1. Figure 1: iLearn Video Saliency Prediction Pipeline. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An overview of PredJSal. The framework repurposes [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: ARK MMLAB video saliency prediction framework. 4.3. ARK MMLAB Our proposed video saliency prediction framework is built upon a hierarchical architecture, aiming to effectively cap￾ture both spatial semantics and temporal dynamics from video clips. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

This paper presents an overview of the NTIRE 2026 Challenge on Video Saliency Prediction. The goal of the challenge participants was to develop automatic saliency map prediction methods for the provided video sequences. The novel dataset of 2,000 diverse videos with an open license was prepared for this challenge. The fixations and corresponding saliency maps were collected using crowdsourced mouse tracking and contain viewing data from over 5,000 assessors. Evaluation was performed on a subset of 800 test videos using generally accepted quality metrics. The challenge attracted over 20 teams making submissions, and 7 teams passed the final phase with code review. All data used in this challenge is made publicly available - https://github.com/msu-video-group/NTIRE26_Saliency_Prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper provides an overview of the NTIRE 2026 Challenge on Video Saliency Prediction. It describes the creation of a novel dataset of 2,000 diverse videos with saliency maps collected via crowdsourced mouse tracking from over 5,000 assessors. The challenge received submissions from over 20 teams, with 7 teams advancing to the final phase after code review. Evaluation was carried out on 800 test videos using generally accepted quality metrics, and all data is made publicly available on GitHub.

Significance. The public release of the full dataset and associated viewing data constitutes a clear strength, as it enables direct reproducibility and supports ongoing benchmark development in video saliency prediction. The reported participation numbers (>20 submissions, 7 code-reviewed finalists) document community engagement with the task.

minor comments (2)
  1. [Abstract] The abstract states that evaluation used 'generally accepted quality metrics' but does not name them (e.g., AUC, NSS, CC, or SIM). Explicitly listing the metrics and any implementation details would improve clarity without altering the factual claims.
  2. The description of crowdsourced mouse tracking for saliency map generation is brief. Adding one or two sentences on the collection protocol (number of viewers per video, interface used, post-processing steps) would address potential reader questions about data provenance while remaining proportionate to the paper's scope as a challenge report.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review of our manuscript on the NTIRE 2026 Challenge on Video Saliency Prediction. The summary provided accurately reflects the content of the paper, including the dataset creation, crowdsourced saliency maps, participation from over 20 teams, and public availability of the data. We appreciate the recommendation for minor revision. As there were no major comments listed, we do not have point-by-point responses. We will prepare the revised manuscript accordingly.

Circularity Check

0 steps flagged

No significant circularity in factual challenge report

full rationale

The paper is a factual overview of the NTIRE 2026 Challenge on Video Saliency Prediction. It describes dataset creation (2,000 videos), crowdsourced data collection, evaluation on 800 test videos using standard metrics, participation statistics (>20 teams submitting, 7 passing code review), and public data release via GitHub. No equations, derivations, predictions, fitted parameters, or load-bearing self-citations appear in the provided text. All central claims are organizational facts that can be verified directly from submission records and the linked repository, with no reduction to self-defined inputs or ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper contains no mathematical derivations, free parameters, or invented entities; it is an organizational report on a dataset and competition.

pith-pipeline@v0.9.0 · 5618 in / 1129 out tokens · 43513 ms · 2026-05-10T12:19:44.098707+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

121 extracted references · 6 canonical work pages · 4 internal anchors

  1. [1]

    Context-aware saliency detection for image retargeting us- ing convolutional neural networks.Multimedia Tools and Applications, 80(8):11917–11941, 2021

    Mahdi Ahmadi, Nader Karimi, and Shadrokh Samavi. Context-aware saliency detection for image retargeting us- ing convolutional neural networks.Multimedia Tools and Applications, 80(8):11917–11941, 2021. 1

  2. [2]

    Bridging the gap between saliency prediction and image quality assessment

    Kirillov Alexey, Andrey Moskalenko, and Dmitriy Vatolin. Bridging the gap between saliency prediction and image quality assessment. In2025 33rd European Signal Process- ing Conference (EUSIPCO), pages 656–660. IEEE, 2025. 1

  3. [3]

    NT-HAZE: A Benchmark Dataset for Re- alistic Night-time Image Dehazing

    Radu Ancuti, Codruta Ancuti, Radu Timofte, and Cos- min Ancuti. NT-HAZE: A Benchmark Dataset for Re- alistic Night-time Image Dehazing . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  4. [4]

    NTIRE 2026 Nighttime Image De- hazing Challenge Report

    Radu Ancuti, Alexandru Brateanu, Florin Vasluianu, Raul Balmez, Ciprian Orhei, Codruta Ancuti, Radu Timofte, Cosmin Ancuti, et al. NTIRE 2026 Nighttime Image De- hazing Challenge Report . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  5. [5]

    V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

    Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, et al. V-jepa 2: Self-supervised video models enable understanding, pre- diction and planning.arXiv preprint arXiv:2506.09985,

  6. [6]

    What do different evaluation metrics tell us about saliency models?IEEE transactions on pattern analysis and machine intelligence, 41(3):740–757, 2018

    Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, and Fr´edo Durand. What do different evaluation metrics tell us about saliency models?IEEE transactions on pattern analysis and machine intelligence, 41(3):740–757, 2018. 2

  7. [7]

    NTIRE 2026 Challenge on Single Image Reflection Removal in the Wild: Datasets, Results, and Methods

    Jie Cai, Kangning Yang, Zhiyuan Li, Florin Vasluianu, Radu Timofte, et al. NTIRE 2026 Challenge on Single Image Reflection Removal in the Wild: Datasets, Results, and Methods . InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  8. [8]

    PredJSal: Video Saliency via Predictive Self-Supervised Representation

    Konstantinos Chaldaiopoulos, Niki Efthymiou, Athanasia Zlatintsi, Panagiotis Filntisis, Katerina Pastra, and Pet- ros Maragos. PredJSal: Video Saliency via Predictive Self-Supervised Representation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 4

  9. [9]

    Saliency-guided video coding via recurrent learning and perceptual qual- ity assessment.Signal Processing: Image Communication, page 117536, 2026

    Tz-Cheng Chang and Hsu-Feng Hsiao. Saliency-guided video coding via recurrent learning and perceptual qual- ity assessment.Signal Processing: Image Communication, page 117536, 2026. 1

  10. [10]

    Chenglizhao Chen, Mengke Song, Wenfeng Song, Li Guo, and Muwei Jian. A comprehensive survey on video saliency detection with auditory information: The audio-visual con- sistency perceptual is the key!IEEE Transactions on Cir- cuits and Systems for Video Technology, 33(2):457–477,

  11. [11]

    Explainable saliency: Articulating reasoning with contextual prioritization

    Nuo Chen, Ming Jiang, and Qi Zhao. Explainable saliency: Articulating reasoning with contextual prioritization. In Proceedings of the Computer Vision and Pattern Recogni- tion Conference, pages 9601–9610, 2025. 1

  12. [12]

    The Fourth Chal- lenge on Image Super-Resolution (×4) at NTIRE 2026: Benchmark Results and Method Overview

    Zheng Chen, Kai Liu, Jingkai Wang, Xianglong Yan, Jianze Li, Ziqing Zhang, Jue Gong, Jiatong Li, Lei Sun, Xiaoyang Liu, Radu Timofte, Yulun Zhang, et al. The Fourth Chal- lenge on Image Super-Resolution (×4) at NTIRE 2026: Benchmark Results and Method Overview . InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) W...

  13. [13]

    Low Light Image Enhancement Challenge at NTIRE 2026

    George Ciubotariu, Sharif S M A, Abdur Rehman, Fayaz Ali Dharejo, Rizwan Ali Naqvi, Marcos Conde, Radu Tim- ofte, et al. Low Light Image Enhancement Challenge at NTIRE 2026 . InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  14. [14]

    High FPS Video Frame Inter- polation Challenge at NTIRE 2026

    George Ciubotariu, Zhuyun Zhou, Yeying Jin, Zongwei Wu, Radu Timofte, et al. High FPS Video Frame Inter- polation Challenge at NTIRE 2026 . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  15. [15]

    Spherical vision transformers for audio-visual saliency prediction in 360 videos.IEEE transactions on pattern analysis and ma- chine intelligence, 2025

    Mert Cokelek, Halit Ozsoy, Nevrez Imamoglu, Cagri Ozci- nar, Inci Ayhan, Erkut Erdem, and Aykut Erdem. Spherical vision transformers for audio-visual saliency prediction in 360 videos.IEEE transactions on pattern analysis and ma- chine intelligence, 2025. 1

  16. [16]

    ImageNet: A large-scale hierarchical im- age database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical im- age database. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255,

  17. [17]

    Towards 3d colored mesh saliency: Database and benchmarks.IEEE Transactions on Multimedia, 26:3580– 3591, 2023

    Xiaoying Ding, Zhao Chen, Weisi Lin, and Zhenzhong Chen. Towards 3d colored mesh saliency: Database and benchmarks.IEEE Transactions on Multimedia, 26:3580– 3591, 2023. 1

  18. [18]

    Alison Noble

    Richard Droste, Jianbo Jiao, and J. Alison Noble. Unified Image and Video Saliency Modeling. InProceedings of the 16th European Conference on Computer Vision (ECCV),

  19. [19]

    NTIRE 2026 Rip Current Detection and Segmentation (RipDet- Seg) Challenge Report

    Andrei Dumitriu, Aakash Ralhan, Florin Miron, Florin Ta- tui, Radu Tudor Ionescu, Radu Timofte, et al. NTIRE 2026 Rip Current Detection and Segmentation (RipDet- Seg) Challenge Report . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  20. [20]

    Conde, Zongwei Wu, Yeying Jin, Radu Timofte, et al

    Omar Elezabi, Marcos V . Conde, Zongwei Wu, Yeying Jin, Radu Timofte, et al. Photography Retouching Trans- fer, NTIRE 2026 Challenge: Report . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  21. [21]

    Saliency detection in the compressed domain for adap- tive image retargeting.IEEE Transactions on Image Pro- cessing, 21(9):3888–3901, 2012

    Yuming Fang, Zhenzhong Chen, Weisi Lin, and Chia-Wen Lin. Saliency detection in the compressed domain for adap- tive image retargeting.IEEE Transactions on Image Pro- cessing, 21(9):3888–3901, 2012. 1

  22. [22]

    Finevideo.https:// huggingface

    Miquel Farr ´e, Andi Marafioti, Lewis Tunstall, Leandro V on Werra, and Thomas Wolf. Finevideo.https:// huggingface . co / datasets / HuggingFaceFV / finevideo, 2024. 2

  23. [23]

    Intuitive physics understanding emerges from self-supervised pretraining on natural videos.arXiv preprint arXiv:2502.11831, 2025

    Quentin Garrido, Nicolas Ballas, Mahmoud Assran, Adrien Bardes, Laurent Najman, Michael Rabbat, Emmanuel Dupoux, and Yann LeCun. Intuitive physics understanding emerges from self-supervised pretraining on natural videos. arXiv preprint arXiv:2502.11831, 2025. 4

  24. [24]

    Semiautomatic visual- attention modeling and its application to video compres- sion

    Yury Gitman, Mikhail Erofeev, Dmitriy Vatolin, Andrey Bolshakov, and Alexey Fedorov. Semiautomatic visual- attention modeling and its application to video compres- sion. In2014 IEEE International Conference on Image Processing (ICIP) (ICIP 2014), pages 1105–1109, Paris, France, 2014. 1

  25. [25]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023. 6

  26. [26]

    NTIRE 2026 Challenge on End-to-End Financial Receipt Restoration and Reasoning from Degraded Images: Datasets, Methods and Results

    Bochen Guan, Jinlong Li, Kangning Yang, Chuang Ke, Jie Cai, Florin Vasluianu, Radu Timofte, et al. NTIRE 2026 Challenge on End-to-End Financial Receipt Restoration and Reasoning from Degraded Images: Datasets, Methods and Results . InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  27. [27]

    NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: AI Flash Portrait (Track 3)

    Ya-nan Guan, Shaonan Zhang, Hang Guo, Yawen Wang, Xinying Fan, Jie Liang, Hui Zeng, Guanyi Qin, Lishen Qu, Tao Dai, Shu-Tao Xia, Lei Zhang, Radu Timofte, et al. NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: AI Flash Portrait (Track 3) . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  28. [28]

    Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform

    Chenlei Guo, Qi Ma, and Liming Zhang. Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In2008 IEEE conference on computer vision and pattern recognition, pages 1–8. IEEE, 2008. 1

  29. [29]

    Irnet-rs: image retargeting network via relative saliency

    Yingchun Guo, Meng Zhang, Xiaoke Hao, and Gang Yan. Irnet-rs: image retargeting network via relative saliency. Neural Computing and Applications, 36(8):4133–4149,

  30. [30]

    NTIRE 2026 Challenge on Robust AI- Generated Image Detection in the Wild

    Aleksandr Gushchin, Khaled Abud, Ekaterina Shu- mitskaya, Artem Filippov, Georgii Bychkov, Sergey Lavrushkin, Mikhail Erofeev, Anastasia Antsiferova, Changsheng Chen, Shunquan Tan, Radu Timofte, Dmitriy Vatolin, et al. NTIRE 2026 Challenge on Robust AI- Generated Image Detection in the Wild . InProceedings of the IEEE/CVF Conference on Computer Vision and...

  31. [31]

    Saliency-aware video compression.IEEE Transactions on Image Processing, 23 (1):19–33, 2013

    Hadi Hadizadeh and Ivan V Baji ´c. Saliency-aware video compression.IEEE Transactions on Image Processing, 23 (1):19–33, 2013. 1

  32. [32]

    Graph- based visual saliency.Advances in neural information pro- cessing systems, 19, 2006

    Jonathan Harel, Christof Koch, and Pietro Perona. Graph- based visual saliency.Advances in neural information pro- cessing systems, 19, 2006. 1

  33. [33]

    Denoising dif- fusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. InAdvances in Neural Infor- mation Processing Systems (NeurIPS), pages 6840–6851,

  34. [34]

    Robust Deepfake De- tection, NTIRE 2026 Challenge: Report

    Benedikt Hopf, Radu Timofte, et al. Robust Deepfake De- tection, NTIRE 2026 Challenge: Report . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  35. [35]

    Lora: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. Iclr, 1(2):3, 2022. 3

  36. [36]

    A model of saliency-based visual attention for rapid scene analysis

    Laurent Itti, Christof Koch, and Ernst Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on pattern analysis and machine intelli- gence, 20(11):1254–1259, 1998. 1

  37. [37]

    Vinet: Pushing the limits of visual modality for audio-visual saliency prediction

    Samyak Jain, Pradeep Yarlagadda, Shreyank Jyoti, Shyam- gopal Karthik, Ramanathan Subramanian, and Vineet Gandhi. Vinet: Pushing the limits of visual modality for audio-visual saliency prediction. In2021 IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS), pages 3520–3527. IEEE, 2021. 1

  38. [38]

    Salicon: Saliency in context

    Ming Jiang, Shengsheng Huang, Juanyong Duan, and Qi Zhao. Salicon: Saliency in context. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 1072–1080, 2015. 2

  39. [39]

    Diffgaze: A diffusion model for modelling fine-grained human gaze behaviour on 360 images.ACM Transactions on Interactive Intelligent Systems, 16(1):1–23, 2026

    Chuhan Jiao, Yao Wang, Guanhua Zhang, Mihai B ˆace, Zhiming Hu, and Andreas Bulling. Diffgaze: A diffusion model for modelling fine-grained human gaze behaviour on 360 images.ACM Transactions on Interactive Intelligent Systems, 16(1):1–23, 2026. 1

  40. [40]

    Learning to predict where humans look

    Tilke Judd, Krista Ehinger, Fr´edo Durand, and Antonio Tor- ralba. Learning to predict where humans look. InIEEE 12th International Conference on Computer Vision, pages 2106–2113, 2009. 1, 7

  41. [41]

    The Kinetics Human Action Video Dataset

    Will Kay, Jo ˜ao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Vi- ola, Tim Green, Trevor Back, Paul Natsev, et al. The kinetics human action video dataset.arXiv preprint arXiv:1705.06950, 2017. 6

  42. [42]

    NTIRE 2026 Low-light Enhancement: Twilight Cowboy Challenge

    Aleksei Khalin, Egor Ershov, Artem Panshin, Sergey Kor- chagin, Georgiy Lobarev, Arseniy Terekhin, Sofiia Doro- gova, Amir Shamsutdinov, Yasin Mamedov, Bakhtiyar Khalfin, Bogdan Sheludko, Emil Zilyaev, Nikola Bani ´c, Georgy Perevozchikov, Radu Timofte, et al. NTIRE 2026 Low-light Enhancement: Twilight Cowboy Challenge . In Proceedings of the IEEE/CVF Con...

  43. [43]

    Bubbleview: an interface for crowd- sourcing image importance maps and tracking visual atten- tion.ACM Transactions on Computer-Human Interaction (TOCHI), 24(5):1–40, 2017

    Nam Wook Kim, Zoya Bylinskii, Michelle A Borkin, Krzysztof Z Gajos, Aude Oliva, Fredo Durand, and Hanspeter Pfister. Bubbleview: an interface for crowd- sourcing image importance maps and tracking visual atten- tion.ACM Transactions on Computer-Human Interaction (TOCHI), 24(5):1–40, 2017. 2

  44. [44]

    Contextual encoder–decoder network for visual saliency prediction.Neural Networks, 129:261–270,

    Alexander Kroner, Mario Senden, Kurt Driessens, and Rainer Goebel. Contextual encoder–decoder network for visual saliency prediction.Neural Networks, 129:261–270,

  45. [45]

    Saliency detection for large-scale mesh decimation.Computers & Graphics, 111:63–76, 2023

    Rafael Kuffner dos Anjos, Richard Andrew Roberts, Ben- jamin Allen, Joaquim Jorge, and Ken Anjyo. Saliency detection for large-scale mesh decimation.Computers & Graphics, 111:63–76, 2023. 1

  46. [46]

    Chang Ha Lee, Amitabh Varshney, and David W. Jacobs. Mesh saliency.ACM Trans. Graph., 24(3):659–666, 2005. 1

  47. [47]

    The First Challenge on Mobile Real- World Image Super-Resolution at NTIRE 2026: Bench- mark Results and Method Overview

    Jiatong Li, Zheng Chen, Kai Liu, Jingkai Wang, Zihan Zhou, Xiaoyang Liu, Libo Zhu, Radu Timofte, Yulun Zhang, et al. The First Challenge on Mobile Real- World Image Super-Resolution at NTIRE 2026: Bench- mark Results and Method Overview . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  48. [48]

    NTIRE 2026 Challenge on Short-form UGC Video Restoration in the Wild with Generative Mod- els: Datasets, Methods and Results

    Xin Li, Jiachao Gong, Xijun Wang, Shiyao Xiong, Bingchen Li, Suhang Yao, Chao Zhou, Zhibo Chen, Radu Timofte, et al. NTIRE 2026 Challenge on Short-form UGC Video Restoration in the Wild with Generative Mod- els: Datasets, Methods and Results . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  49. [49]

    NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual- Focused Images: Methods and Results

    Xin Li, Yeying Jin, Suhang Yao, Beibei Lin, Zhaoxin Fan, Wending Yan, Xin Jin, Zongwei Wu, Bingchen Li, Peishu Shi, Yufei Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby Tan, Radu Timofte, et al. NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual- Focused Images: Methods and Results . InProceedings of the IEEE/CVF Conference on Computer ...

  50. [50]

    Yingping Liang, Ying Fu, Yutao Hu, Wenqi Shao, Jiaming Liu, and Debing Zhang. Flow-anything: Learning real- world optical flow estimation from large-scale single-view images.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 47(10):8435–8452, 2025. 6

  51. [51]

    The First Challenge on Remote Sensing Infrared Image Super-Resolution at NTIRE 2026: Benchmark Results and Method Overview

    Kai Liu, Haoyang Yue, Zeli Lin, Zheng Chen, Jingkai Wang, Jue Gong, Radu Timofte, Yulun Zhang, et al. The First Challenge on Remote Sensing Infrared Image Super-Resolution at NTIRE 2026: Benchmark Results and Method Overview . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  52. [52]

    Conde, et al

    Shuhong Liu, Ziteng Cui, Chenyu Bao, Xuangeng Chu, Lin Gu, Bin Ren, Radu Timofte, Marcos V . Conde, et al. 3D Restoration and Reconstruction in Adverse Conditions: Re- alX3D Challenge Results . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  53. [53]

    NTIRE 2026 X- AIGC Quality Assessment Challenge: Methods and Results

    Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Qiang Hu, Jiezhang Cao, Yu Zhou, Wei Sun, Farong Wen, Zitong Xu, Yingjie Zhou, Huiyu Duan, Lu Liu, Jiarui Wang, Siqi Luo, Chunyi Li, Li Xu, Zicheng Zhang, Yue Shi, Yubo Wang, Minghong Zhang, Chunchao Guo, Zhichao Hu, Mingtao Chen, Xiele Wu, Xin Ma, Zhaohe Lv, Yuanhao Xue, Jiaqi Wang, Xinxing Sha, Radu Timofte, et...

  54. [54]

    Vmamba: Visual state space model

    Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, Jianbin Jiao, and Yunfan Liu. Vmamba: Visual state space model. InAdvances in Neural Information Processing Systems (NeurIPS), pages 103031–103063, 2024. 6

  55. [55]

    A convnet for the 2020s

    Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feicht- enhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11976– 11986, 2022. 7

  56. [56]

    Video swin transformer

    Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu. Video swin transformer. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3202–3211, 2022. 5

  57. [57]

    Transalnet: Towards perceptually relevant visual saliency prediction.Neurocomputing, 494:455–467,

    Jianxun Lou, Hanhe Lin, David Marshall, Dietmar Saupe, and Hantao Liu. Transalnet: Towards perceptually relevant visual saliency prediction.Neurocomputing, 494:455–467,

  58. [58]

    Predicting video saliency using crowdsourced mouse-tracking data

    Vitaliy Lyudvichenko and Dmitriy Vatolin. Predicting video saliency using crowdsourced mouse-tracking data. In Proceedings of the 29th International Conference on Com- puter Graphics and Vision, pages 127–130, 2019. 2

  59. [59]

    A semiautomatic saliency model and its application to video compression

    Vitaliy Lyudvichenko, Mikhail Erofeev, Yury Gitman, and Dmitriy Vatolin. A semiautomatic saliency model and its application to video compression. In2017 13th IEEE Inter- national Conference on Intelligent Computer Communica- tion and Processing (ICCP), pages 403–410. IEEE, 2017. 1

  60. [60]

    Spatiotemporal saliency in dynamic scenes.IEEE transactions on pattern analysis and machine intelligence, 32(1):171–177, 2009

    Vijay Mahadevan and Nuno Vasconcelos. Spatiotemporal saliency in dynamic scenes.IEEE transactions on pattern analysis and machine intelligence, 32(1):171–177, 2009. 1

  61. [61]

    Mod- elling spatio-temporal saliency to predict gaze direction for short videos.International journal of computer vision, 82 (3):231–243, 2009

    Sophie Marat, Tien Ho Phuoc, Lionel Granjon, Nathalie Guyader, Denis Pellerin, and Anne Gu ´erin-Dugu´e. Mod- elling spatio-temporal saliency to predict gaze direction for short videos.International journal of computer vision, 82 (3):231–243, 2009. 1

  62. [62]

    Sal3d: a model for saliency prediction in 3d meshes

    Daniel Martin, Andres Fandos, Belen Masia, and Ana Ser- rano. Sal3d: a model for saliency prediction in 3d meshes. The Visual Computer, 40(11):7761–7771, 2024. 1

  63. [63]

    Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition.IEEE transactions on pattern analysis and machine intelligence, 37(7):1408–1424, 2014

    Stefan Mathe and Cristian Sminchisescu. Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition.IEEE transactions on pattern analysis and machine intelligence, 37(7):1408–1424, 2014. 2

  64. [64]

    Realistic saliency guided image enhancement

    S Mahdi H Miangoleh, Zoya Bylinskii, Eric Kee, Eli Shechtman, and Ya ˘giz Aksoy. Realistic saliency guided image enhancement. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 186–194, 2023. 1

  65. [65]

    Sal- fom: Dynamic saliency prediction with video foundation models

    Morteza Moradi, Mohammad Moradi, Francesco Rundo, Concetto Spampinato, Ali Borji, and Simone Palazzo. Sal- fom: Dynamic saliency prediction with video foundation models. InInternational Conference on Pattern Recogni- tion, pages 33–48. Springer, 2024. 1, 5

  66. [66]

    Aim 2024 challenge on video saliency prediction: Methods and results

    Andrey Moskalenko, Alexey Bryncev, Dmitry Vatolin, Radu Timofte, Gen Zhan, Li Yang, Yunlong Tang, Yiting Liao, Jiongzhi Lin, Baitao Huang, Morteza Moradi, Mo- hammad Moradi, Francesco Rundo, Concetto Spampinato, Ali Borji, Simone Palazzo, et al. Aim 2024 challenge on video saliency prediction: Methods and results. InEu- ropean Conference on Computer Visio...

  67. [67]

    NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results

    Andrey Moskalenko, Alexey Bryncev, Ivan Kosmynin, Kira Shilovskaya, Mikhail Erofeev, Dmitry Vatolin, Radu Timofte, et al. NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  68. [68]

    Deep saliency mapping for 3d meshes and applications.ACM Transactions on Multime- dia Computing, Communications and Applications, 19(2): 1–22, 2023

    Stavros Nousias, Gerasimos Arvanitis, Aris Lalos, and Konstantinos Moustakas. Deep saliency mapping for 3d meshes and applications.ACM Transactions on Multime- dia Computing, Communications and Applications, 19(2): 1–22, 2023. 1

  69. [69]

    NTIRE 2026 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results

    Hyunhee Park, Eunpil Park, Sangmin Lee, Radu Timofte, et al. NTIRE 2026 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results . InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  70. [70]

    Saliency driven perceptual image compression

    Yash Patel, Srikar Appalaraju, and R Manmatha. Saliency driven perceptual image compression. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 227–236, 2021. 1

  71. [71]

    NTIRE 2026 Challenge on Learned Smartphone ISP with Unpaired Data: Methods and Results

    Georgy Perevozchikov, Daniil Vladimirov, Radu Timofte, et al. NTIRE 2026 Challenge on Learned Smartphone ISP with Unpaired Data: Methods and Results . InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  72. [72]

    Film: Visual reasoning with a general conditioning layer

    Ethan Perez, Florian Strub, Harm De Vries, Vincent Du- moulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. InProceedings of the AAAI conference on artificial intelligence, 2018. 4

  73. [73]

    NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Chal- lenge: Professional Image Quality Assessment (Track 1)

    Guanyi Qin, Jie Liang, Bingbing Zhang, Lishen Qu, Ya-nan Guan, Hui Zeng, Lei Zhang, Radu Timofte, et al. NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Chal- lenge: Professional Image Quality Assessment (Track 1) . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  74. [74]

    The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results

    Xingyu Qiu, Yuqian Fu, Jiawei Geng, Bin Ren, Jiancheng Pan, Zongwei Wu, Hao Tang, Yanwei Fu, Radu Timo- fte, Nicu Sebe, Mohamed Elhoseiny, et al. The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  75. [75]

    NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: Multi-Exposure Image Fusion in Dynamic Scenes (Track2)

    Lishen Qu, Yao Liu, Jie Liang, Hui Zeng, Wen Dai, Ya-nan Guan, Guanyi Qin, Shihao Zhou, Jufeng Yang, Lei Zhang, Radu Timofte, et al. NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: Multi-Exposure Image Fusion in Dynamic Scenes (Track2) . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  76. [76]

    Kvq: boosting video quality assessment via saliency-guided local perception

    Yunpeng Qu, Kun Yuan, Qizhi Xie, Ming Sun, Chao Zhou, and Jian Wang. Kvq: boosting video quality assessment via saliency-guided local perception. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 2150–2160, 2025. 1

  77. [77]

    Learn- ing transferable visual models from natural language super- vision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. InICML, pages 8748–8763. PmLR, 2021. 6

  78. [78]

    Predictive coding in the visual cortex: a functional interpretation of some extra- classical receptive-field effects.Nature neuroscience, 2(1): 79–87, 1999

    Rajesh PN Rao and Dana H Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra- classical receptive-field effects.Nature neuroscience, 2(1): 79–87, 1999. 4

  79. [79]

    Sam 2: Segment anything in images and videos

    Nikhila Ravi et al. Sam 2: Segment anything in images and videos. InInternational Conference on Learning Represen- tations (ICLR), pages 28085–28128, 2025. 6

  80. [80]

    The Eleventh NTIRE 2026 Efficient Super- Resolution Challenge Report

    Bin Ren, Hang Guo, Yan Shu, Jiaqi Ma, Ziteng Cui, Shuhong Liu, Guofeng Mei, Lei Sun, Zongwei Wu, Fa- had Shahbaz Khan, Salman Khan, Radu Timofte, Yawei Li, et al. The Eleventh NTIRE 2026 Efficient Super- Resolution Challenge Report . InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

Showing first 80 references.