arxiv: 2604.07959 · v2 · submitted 2026-04-09 · 💻 cs.GR

Recognition: 2 theorem links

· Lean Theorem

Seeing enough: non-reference perceptual resolution selection for power-efficient client-side rendering

Alexander Hepburn, Ali Bozorgian, Arnau Raventos, Chengxi Zeng, Dayllon Vin\'icius Xavier Lemos, Yaru Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:01 UTC · model grok-4.3

classification 💻 cs.GR

keywords perceptual renderingresolution selectionnon-reference predictionpower-efficient renderingclient-side graphicsvideo quality metricneural network

0 comments

The pith

A non-reference neural network predicts the lowest resolution that remains perceptually identical to full resolution in rendered video.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a way to pick the smallest resolution for client-rendered video such that it looks the same as the highest resolution to the viewer, using only the video itself as input. Existing perceptual metrics can flag when lower resolution is sufficient but require a full reference frame and heavy computation that prevents on-device use. The approach trains a network on large sets of rendered scenes labeled by those full-reference metrics so that the network can make the decision in a no-reference setting. This supports power-efficient rendering in games and similar apps by lowering resolution without visible quality loss. Only small changes to existing pipelines are needed and the method stays independent of the video codec in use.

Core claim

The central claim is that a neural network trained on labels from full-reference perceptual video quality metrics can operate without a reference to predict, from the rendered video alone, the lowest resolution that remains perceptually indistinguishable from the best available option, thereby enabling power-efficient client-side rendering with minimal infrastructure changes.

What carries the argument

A neural network that takes rendered video frames as input and outputs the minimal resolution level predicted to be perceptually indistinguishable from higher resolutions.

Load-bearing premise

A network trained on full-reference metric labels will accurately detect perceptual indistinguishability when run without a reference on new rendered content and will not introduce visible quality degradation.

What would settle it

A controlled viewing test on rendered scenes where the method selects a lower resolution yet viewers consistently detect a difference or prefer the full resolution.

Figures

Figures reproduced from arXiv: 2604.07959 by Alexander Hepburn, Ali Bozorgian, Arnau Raventos, Chengxi Zeng, Dayllon Vin\'icius Xavier Lemos, Yaru Liu.

**Figure 1.** Figure 1: Client devices render interactive content at a high frame rate (120 Hz). For each 250 ms clip, we extract motion vectors and a short rendered image sequence; our resolution predictor encodes the relevant spatial and temporal structures that give rise to visible distortions. These motion and appearance features are fused by a non-reference resolution predictor, which estimates the next perceptually suffici… view at source ↗

**Figure 2.** Figure 2: ColorVideoVDP predictions for two camera paths under different scenes, rendered at different resolutions. Each color denotes a rendering configuration that introduces distortions. Resolutions that yield the highest predicted quality are shown as square markers, and among those within 0.1 JOD of the maximum, the resolutions that minimize compute cost (computed from Eq. (1)) are shown as triangle markers. … view at source ↗

**Figure 3.** Figure 3: Overview of our resolution-prediction network. A frozen DINOv2 backbone encodes spatial features of a cropped video clip, fused with encoded motion information via gated addition, and passed through an MLP to predict a discrete resolution. 3.2 Resolution predictor We use a neural network as a non-reference resolution predictor because the perceptually optimal resolution at high frame rate depends on comple… view at source ↗

**Figure 4.** Figure 4: Transition graph showing the weights used by the Viterbi algorithm to control resolution changes. The current state, 1080p, is highlighted in red. bution is the systematic establishment of a perceptually grounded baseline for HFR content. We demonstrate that at 120 Hz, temporal masking effects differ substantially from the well-studied 30/60 Hz regimes, allowing for significant resolution trade-offs that … view at source ↗

**Figure 5.** Figure 5: Average rendering power savings. Mean percentage reduction in rendering cost across all test scenes compared to the 1080p baseline. Each bar represents a distinct rendering configuration (Settings 1–4, Sec. 3.1) with varying spatio-temporal distortion patterns. Savings are calculated for all frames where predicted quality remains within a 0.1 JOD threshold of the 1080p reference. watts, we use pixel throug… view at source ↗

**Figure 6.** Figure 6: The figures present validation results obtained from scenes containing both static content (camera motion only) and dynamic content (with both object and camera motion). The left figure shows results against a 720p (1280 × 720), 120 fps reference, aggregated across observers, scenes, and camera paths. The right figure compares our method against a 1080p (1920 × 1080), 120 fps reference. The y-axis indicate… view at source ↗

**Figure 7.** Figure 7: Examples of ColorVideoVDP predictions across resolutions and distortion levels. The square marks the highest JOD score, and the triangle indicates the lowestresolution setting within 0.1 JOD of the maximum [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Part1: rendering cost savings for all seven scenes, shown in 2-second segments. Each subplot corresponds to one video from a single scene. Note that our technique does not always reduce rendering costs; for example, in FantasyDungeon2-1, video 2, it results in higher rendering costs across all four distortion settings [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Part2: same format as Part 1. For OvergrownVillage-3, videos 1 through 5 are not shown because our technique predicts 1080p at 120 fps for all of them, resulting in rendering costs identical to the baseline (1080p, 120 fps) [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

read the original abstract

Many client-side applications, especially games, render video at high resolution and frame rate on power-constrained devices, even when users perceive little or no benefit from all those extra pixels. Existing perceptual video quality metrics can indicate when a lower resolution is "good enough", but they are full-reference and computationally expensive, making them impractical for real-world applications and deployment on-device. In this work, we leverage the spatio-temporal limits of the human visual system and propose a non-reference method that predicts, from the rendered video alone, the lowest resolution that remains perceptually indistinguishable from the best available option, enabling power-efficient client-side rendering. Our approach is codec-agnostic and requires only minimal modifications to existing infrastructure. The network is trained on a large dataset of rendered content labeled with a full-reference perceptual video quality metric. The prediction significantly enhances perceptual quality while substantially reducing computational costs, suggesting a practical path toward perception-guided, power-efficient client-side rendering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A non-reference network for picking minimal perceptual resolution in rendered video, trained on full-reference labels, but with no results shown to confirm it avoids quality loss.

read the letter

The main takeaway is that this paper describes a non-reference neural network to select the lowest resolution for client-side rendered video that should still look indistinguishable from higher options, trained using labels from a full-reference perceptual metric on a dataset of rendered content. The goal is power savings on mobile and embedded devices for games and similar apps, without needing the reference frame at runtime or heavy computation.

Referee Report

2 major / 2 minor

Summary. The paper proposes a non-reference perceptual resolution selection method for power-efficient client-side rendering in applications such as games. A neural network is trained on a large dataset of rendered content labeled by a full-reference perceptual video quality metric; at inference the network predicts, from the rendered video alone, the lowest resolution that remains perceptually indistinguishable from higher-resolution options. The method is described as codec-agnostic and requiring only minimal infrastructure changes, with the goal of reducing computational cost while preserving perceived quality.

Significance. If the central claim holds, the work would offer a practical route to perception-guided resolution adaptation on power-constrained devices, avoiding the runtime cost of full-reference metrics. The codec-agnostic design and minimal infrastructure requirement are genuine strengths that could translate to deployable systems. However, the absence of any quantitative results, ablation studies, correlation metrics, or human-subject validation in the manuscript prevents assessment of whether the non-reference predictor actually preserves the indistinguishability guarantee across diverse rendered content.

major comments (2)

[Abstract] Abstract: the claim that the prediction 'significantly enhances perceptual quality while substantially reducing computational costs' is unsupported; no error rates, PSNR/SSIM equivalents, perceptual metric correlations on held-out data, or power measurements are supplied to substantiate the central performance assertion.
[Method] Training and inference description: the manuscript states that the network is trained on labels from a full-reference perceptual video quality metric yet supplies no definition of the indistinguishability threshold, no specification of the metric itself, no characterization of dataset diversity (motion, lighting, genres), and no transfer-validation experiment showing that non-reference predictions match full-reference decisions without introducing visible artifacts.

minor comments (2)

The abstract and method sections would benefit from an explicit statement of the network architecture, input representation (e.g., spatio-temporal features), and inference latency on target hardware.
Figure captions and table headings should be expanded to include the precise perceptual metric and threshold used for labeling so that the experimental protocol is reproducible from the text alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments highlight important areas where the manuscript can be strengthened with additional details and evidence. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the prediction 'significantly enhances perceptual quality while substantially reducing computational costs' is unsupported; no error rates, PSNR/SSIM equivalents, perceptual metric correlations on held-out data, or power measurements are supplied to substantiate the central performance assertion.

Authors: We agree that the abstract asserts performance benefits without accompanying quantitative support in the current manuscript. The submission emphasizes the methodological contribution of the non-reference predictor. In the revised manuscript we will add a dedicated evaluation section reporting the non-reference network's agreement with full-reference decisions on held-out rendered sequences, including accuracy at the indistinguishability threshold, correlation with the underlying perceptual metric, and estimated power savings derived from the selected resolutions. These additions will directly substantiate the claims. revision: yes
Referee: [Method] Training and inference description: the manuscript states that the network is trained on labels from a full-reference perceptual video quality metric yet supplies no definition of the indistinguishability threshold, no specification of the metric itself, no characterization of dataset diversity (motion, lighting, genres), and no transfer-validation experiment showing that non-reference predictions match full-reference decisions without introducing visible artifacts.

Authors: We acknowledge the need for these missing specifications. The revised manuscript will explicitly name the full-reference perceptual metric, define the indistinguishability threshold used to generate training labels, and provide a characterization of the dataset (including motion statistics, lighting conditions, and genre coverage). We will also include transfer-validation results that quantify how closely the non-reference predictions align with full-reference decisions and report any observed discrepancies or potential artifacts on diverse test content. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard supervised training

full rationale

The paper's core proposal is a non-reference network trained on labels generated by a full-reference perceptual metric to predict the lowest perceptually indistinguishable resolution. This setup does not reduce any claimed prediction to its inputs by construction, nor does it involve self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations. The provided text contains no equations, uniqueness theorems, or ansatzes that collapse the output to the training labels. The method is self-contained as an empirical ML pipeline whose validity rests on generalization performance rather than tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim rests on the domain assumption that spatio-temporal limits of human vision can be captured by existing full-reference metrics and that a learned non-reference predictor will generalize from those labels.

free parameters (1)

Neural network weights
Weights are fitted during supervised training on the labeled dataset of rendered content.

axioms (1)

domain assumption Spatio-temporal limits of the human visual system allow lower resolutions to remain perceptually indistinguishable from higher ones
Invoked to justify predicting a minimal sufficient resolution from rendered video alone.

pith-pipeline@v0.9.0 · 5480 in / 1277 out tokens · 76959 ms · 2026-05-12T02:01:38.448037+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
The network is trained on a large dataset of rendered content labeled with a full-reference perceptual video quality metric... lowest resolution whose JOD lies within 0.1 of the maximum... 0 JOD difference corresponds to the chance level (50% probability)
IndisputableMonolith/Foundation/ArrowOfTime.lean forward_accumulates unclear
leverage the spatio-temporal limits of the human visual system... temporal masking effects... 120Hz

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

[1]

AMD GPUOpen: Fidelityfx super resolution 3 (fsr 3).https://gpuopen.com/ fidelityfx-super-resolution-3/(2025), accessed: 2025-11-12

work page 2025
[2]

ACM Transactions on Graphics (TOG)39(6), 1–15 (2020)

Andersson, P.L., Nilsson, J., Akenine-Möller, T.: Flip: A difference evaluator for alternating images. ACM Transactions on Graphics (TOG)39(6), 1–15 (2020). https://doi.org/10.1145/3414685.3417836

work page doi:10.1145/3414685.3417836 2020
[3]

Deep high dynamic range imaging of dynamic scenes

Bako, S., Vogels, T., McWilliams, B., Meyer, M., Novák, J., Harvill, A., Sen, P., DeRose, T., Rousselle, F.: Kernel-predicting convolutional networks for denoising monte carlo renderings. ACM Transactions on Graphics (TOG)36(4), 1–14 (2017). https://doi.org/10.1145/3072959.3073607

work page doi:10.1145/3072959.3073607 2017
[4]

In: IEEE International Conference on Image Process- ing (ICIP)

Barman, N., Martini, M.G., Möller, S.: Vmaf revisited: towards a perceptually op- timized video quality metric. In: IEEE International Conference on Image Process- ing (ICIP). pp. 1914–1918 (2021).https://doi.org/10.1109/ICIP42928.2021. 9506628

work page doi:10.1109/icip42928.2021 1914
[5]

In: ACM Multimedia Systems Confer- ence (MMSys)

Barman, N., Zadtootaghaj, S., Martini, M.G., Möller, S.: Gamingvideoset: A dataset for gaming video quality assessment. In: ACM Multimedia Systems Confer- ence (MMSys). pp. 261–271 (2019).https://doi.org/10.1145/3304109.3306215

work page doi:10.1145/3304109.3306215 2019
[6]

Yoav Freund and Robert E Schapire

Bhat, G., Chen, Z., Bovik, A.C.: Adaptive bitrate streaming with perceptual qual- ity metrics. In: IEEE International Conference on Image Processing (ICIP). pp. 1416–1420 (2020).https://doi.org/10.1109/ICIP40778.2020.9190931

work page doi:10.1109/icip40778.2020.9190931 2020
[7]

Binks, D.: Dynamic resolution rendering. Game Developers Conference (GDC) presentation / Intel whitepaper (2011),https://www.intel.cn/content/dam/ develop / external / us / en / documents / dynamicresolutionrendering - 183334 . pdf, describes dynamic resolution rendering as a technique to vary rendering res- olution to help maintain stable and appropriate...

work page 2011
[8]

In: IEEE International Con- ference on Image Processing (ICIP)

Bosse, S., Maniry, D., Muller, T., Wiegand, T.: Deep neural networks for no- reference and full-reference image quality assessment. In: IEEE International Con- ference on Image Processing (ICIP). pp. 2349–2353 (2018).https://doi.org/10. 1109/ICIP.2018.8451477

work page arXiv 2018
[9]

Deep high dynamic range imaging of dynamic scenes

Chaitanya, C.R.A., Kaplanyan, A.S., Schied, C., Salvi, M., Lefohn, A., Nowrouzezahrai, D., Aila, T.: Interactive reconstruction of monte carlo image se- quences using a recurrent denoising autoencoder. In: ACM SIGGRAPH. vol. 36, pp. 98:1–98:12 (2017).https://doi.org/10.1145/3072959.3073601

work page doi:10.1145/3072959.3073601 2017
[10]

In: IEEE Conference on Virtual Reality and 3D User Interfaces (VR)

Denes, G., Gonzalez-Franco, M., Ofek, E., Steed, A.: Perceptual evaluation for foveated rendering in virtual reality. In: IEEE Conference on Virtual Reality and 3D User Interfaces (VR). pp. 893–902 (2020).https://doi.org/10.1109/ VR46266.2020.00062

work page arXiv 2020
[11]

Epic Games: Unreal engine 5.https://www.unrealengine.com/en-US/unreal- engine-5(2025), accessed: 2025-11-12

work page 2025
[12]

ACM Transactions on Graphics (TOG)31(6), 1–10 (2012).https://doi.org/10

Guenter, B., Finch, M., Drucker, S., Tan, D., Snyder, J.: Foveated 3d graphics. ACM Transactions on Graphics (TOG)31(6), 1–10 (2012).https://doi.org/10. 1145/2366145.2366183

work page arXiv 2012
[13]

In: Proceedings of the 21st Annual International Con- ference

He, S., Liu, Y., Zhou, H.: Optimizing smartphone power consumption through dynamic resolution scaling. In: Proceedings of the 21st Annual International Con- ference. ACM (2015).https://doi.org/10.1145/2789168.2790117, implements per-frame dynamic resolution scaling to trade off rendering load and performance, highlighting DRS usage in interactive real-tim...

work page doi:10.1145/2789168.2790117 2015
[14]

2022 , keywords =

Hladky, J., Stengel, M., Vining, N., Kerbl, B., Seidel, H.P., Steinberger, M.: Quad- Stream: A Quad-Based Scene Streaming Architecture for Novel Viewpoint Re- construction. ACM Transactions on Graphics41(6), 233:1–233:13 (Nov 2022). 16 Authors Suppressed Due to Excessive Length https://doi.org/10.1145/3550454.3555524,https://dl.acm.org/doi/10. 1145/355045...

work page doi:10.1145/3550454.3555524 2022
[15]

In: IEEE Conference on Games (CoG)

Jindal, A., Lin, X., Kaeli, D.: Perceptual-based variable rate shading for foveated rendering in games. In: IEEE Conference on Games (CoG). pp. 52–59 (2021). https://doi.org/10.1109/CoG52621.2021.9619019

work page doi:10.1109/cog52621.2021.9619019 2021
[16]

In: IEEE International Conference on Image Processing (ICIP)

Katsavounidis, I., Zhou, Z.L., Aaron, A., Dube, P.: Towards perceptually opti- mized end-to-end adaptive video streaming. In: IEEE International Conference on Image Processing (ICIP). pp. 1–5 (2018).https://doi.org/10.1109/ICIP.2018. 8451512

work page doi:10.1109/icip.2018 2018
[17]

In: Netflix Tech Blog (2016), online at:https://netflixtechblog.com/toward- a- practical- perceptual- video- quality-metric-653f208b9652

Li, Z.L., Aaron, A., Katsavounidis, I., Moorthy, A., Manohara, J.: Towards a practical perceptual video quality metric. In: Netflix Tech Blog (2016), online at:https://netflixtechblog.com/toward- a- practical- perceptual- video- quality-metric-653f208b9652

work page 2016
[18]

: Instant field-aligned meshes

Liu, H., Yu, L., Zhang, X., Zhou, K.: Adaptive rendering based on spatio-temporal sensitivity of human visual system. ACM Transactions on Graphics (TOG)34(6), 1–13 (2015).https://doi.org/10.1145/2816795.2818077

work page doi:10.1145/2816795.2818077 2015
[19]

IEEE Transactions on Image Processing32, 1626–1640 (2023)

Liu, X., Zhang, L., Ma, K.: Temporal and spatial attention network for blind video quality assessment. IEEE Transactions on Image Processing32, 1626–1640 (2023). https://doi.org/10.1109/TIP.2023.3245600

work page doi:10.1109/tip.2023.3245600 2023
[20]

IEEE Transactions on Multimedia21(6), 1499–1512 (2018)

Mackin, A., Zhang, F., Bull, D.R.: A study of high frame rate video formats. IEEE Transactions on Multimedia21(6), 1499–1512 (2018)

work page 2018
[21]

ACM Transactions on Graphics (TOG)43(4) (2024).https://doi.org/10.1145/3658166

Mantiuk, R., Ashraf, Y., Chapiro, A., Wuerger, S.: Colorvideovdp: A visual dif- ference predictor for image, video and display distortions. ACM Transactions on Graphics (TOG)43(4) (2024).https://doi.org/10.1145/3658166

work page doi:10.1145/3658166 2024
[22]

Proceedings of SPIE8291(2012).https://doi

Mantiuk, R., Daly, S., Kerofsky, L.: Hdr-vdp-3: A model of visibility of differences in high dynamic range images. Proceedings of SPIE8291(2012).https://doi. org/10.1117/12.912416

work page doi:10.1117/12.912416 2012
[23]

ACM Transactions on Graphics (TOG)30(4), 1–14 (2011).https://doi.org/10

Mantiuk, R., Kim, K.J., Rempel, A.G., Heidrich, W.: Hdr-vdp-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. ACM Transactions on Graphics (TOG)30(4), 1–14 (2011).https://doi.org/10. 1145/2010324.1964935

work page arXiv 2011
[24]

ACM Transactions on Graphics (Proc

Mantiuk, R.K., Hanji, P., Ashraf, G., Wanat, R., Wernikowski, P., Mikhailiuk, A., Pérez, A., Mantiuk, R.: Fovvideovdp: A visible difference predictor for wide field- of-view video. ACM Transactions on Graphics (Proc. SIGGRAPH)40(4), 1–19 (2021)

work page 2021
[25]

In: ACM SIGCOMM

Mao, H., Netravali, R., Alizadeh, M.: Neural adaptive video streaming with pen- sieve. In: ACM SIGCOMM. pp. 197–210 (2017).https://doi.org/10.1145/ 3098822.3098843

work page arXiv 2017
[26]

IEEE Transactions on Image Processing21(12), 4695–4708 (2012).https://doi.org/10.1109/TIP.2012.2214050

Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing21(12), 4695–4708 (2012).https://doi.org/10.1109/TIP.2012.2214050

work page doi:10.1109/tip.2012.2214050 2012
[27]

ACM Transactions on Graphics , author =

Mueller, J.H., Voglreiter, P., Dokter, M., Neff, T., Makar, M., Steinberger, M., Schmalstieg, D.: Shading atlas streaming. ACM Transactions on Graphics37(6), 1–16 (Dec 2018).https://doi.org/10.1145/3272127.3275087,https://dl.acm. org/doi/10.1145/3272127.3275087

work page doi:10.1145/3272127.3275087 2018
[28]

In: ACM SIGGRAPH Asia

Patney, A., Salvi, M., Kim, J., Kaplanyan, A.S., Wyman, C., Benty, N., Luebke, D., Lefohn, A.: Towards foveated rendering for gaze-tracked virtual reality. In: ACM SIGGRAPH Asia. vol. 35, pp. 1–12 (2016).https://doi.org/10.1145/2980179. 2980246 Abbreviated paper title 17

work page doi:10.1145/2980179 2016
[29]

ACM Computing Surveys57(12), 1–47 (2025)

Peroni, L., Gorinsky, S.: An end-to-end pipeline perspective on video streaming in best-effort networks: a survey and tutorial. ACM Computing Surveys57(12), 1–47 (2025)

work page 2025
[30]

Sawhney, H., Barman, N., Martini, M.G.: No-reference image and video quality assessment:Aclassificationandreviewofrecentapproaches.In:IEEEInternational Conference on Image Processing (ICIP). pp. 4584–4588 (2019).https://doi.org/ 10.1109/ICIP.2019.8803031

work page doi:10.1109/icip.2019.8803031 2019
[31]

2021 , issue_date =

Sun, Q., Patney, A., Shirley, P., Ramamoorthi, R.: Perceptual temporal rendering: Improving quality and performance via temporal masking. In: ACM SIGGRAPH. vol. 40, pp. 1–15 (2021).https://doi.org/10.1145/3450626.3459786

work page doi:10.1145/3450626.3459786 2021
[32]

In: IEEE International Conference on Communications (ICC)

Thammineni, P., Claypool, M., Kinicki, R.: Temporal adaptation for video stream- ing over wireless networks. In: IEEE International Conference on Communications (ICC). pp. 177–182 (2008).https://doi.org/10.1109/ICC.2008.68

work page doi:10.1109/icc.2008.68 2008
[33]

IEEE Transactions on Circuits and Systems for Video Tech- nology12(6), 438–452 (2001).https://doi.org/10.1109/TCSVT.2002.800560

Vetro, A., Christopoulos, C., Sun, H.: Dynamic frame skipping for joint source and channel video coding. IEEE Transactions on Circuits and Systems for Video Tech- nology12(6), 438–452 (2001).https://doi.org/10.1109/TCSVT.2002.800560

work page doi:10.1109/tcsvt.2002.800560 2001
[34]

In: IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR)

Wang, H., Li, J., Ma, K.: Streamvqa: Adaptive video quality assessment for stream- ing scenarios. In: IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). pp. 10460–10469 (2023).https://doi.org/10.1109/CVPR52729. 2023.01008

work page doi:10.1109/cvpr52729 2023
[35]

Bovik, H.R

Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Process- ing13(4), 600–612 (2004).https://doi.org/10.1109/TIP.2003.819861

work page doi:10.1109/tip.2003.819861 2004
[36]

In: IEEE International Conference on Multimedia and Expo (ICME)

Yan, J., Zhang, W., Ma, K.: Toward intelligent video streaming: A reinforcement learning approach. In: IEEE International Conference on Multimedia and Expo (ICME). pp. 1–6 (2020).https://doi.org/10.1109/ICME46284.2020.9102839

work page doi:10.1109/icme46284.2020.9102839 2020
[37]

In: Proceedings of the 33rd ACM International Conference on Multimedia

Yılmaz, D., Wang, H., Takikawa, T., Ceylan, D., Akşit, K.: Learned single-pass multitasking perceptual graphics for immersive displays. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 10719–10727 (2025)

work page 2025
[38]

IEEE Transactions on Image Processing (2020).https: //doi.org/10.1109/TIP.2020.3045875

Ying, Z., Mandal, M., Bovik, A.C.: Patch-vq: Patch-level vision transformer for video quality assessment. IEEE Transactions on Image Processing (2020).https: //doi.org/10.1109/TIP.2020.3045875

work page doi:10.1109/tip.2020.3045875 2020
[39]

IEEE Transactions on Image Processing31, 4797– 4811 (2022).https://doi.org/10.1109/TIP.2022.3185593

Yu, M., Zhang, L., Ma, K.: Gamingvqa: Large-scale gaming video quality assess- ment dataset and benchmark. IEEE Transactions on Image Processing31, 4797– 4811 (2022).https://doi.org/10.1109/TIP.2022.3185593

work page doi:10.1109/tip.2022.3185593 2022
[40]

In: IEEE Conference on Quality of Multi- media Experience (QoMEX)

Zadtootaghaj, S., Barman, N., Möller, S.: Modeling gaming video qoe: Towards a qoe model for gaming video streaming. In: IEEE Conference on Quality of Multi- media Experience (QoMEX). pp. 1–6 (2018).https://doi.org/10.1109/QoMEX. 2018.8463422

work page doi:10.1109/qomex 2018
[41]

lowest resolution within 0.1 JOD of the optimum

Zhang, F., Rangrej, S.B., Aumentado-Armstrong, T., Fazly, A., Levinshtein, A.: Augmenting perceptual super-resolution via image quality predictors. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2311–2322 (2025) 18 Authors Suppressed Due to Excessive Length Supplementary Material Seeing Enough: Non-Refe...

work page 2025