pith. sign in

arxiv: 2606.24916 · v1 · pith:OZEK4WACnew · submitted 2026-06-19 · 💻 cs.AR · cs.MM

SPORT: Spherical-PSNR-Optimized tRuncaTion for Power-Efficient 360-Degree Video Systems

Pith reviewed 2026-06-26 12:23 UTC · model grok-4.3

classification 💻 cs.AR cs.MM
keywords 360-degree videoVR headsetsmemory power reductionbit truncationWS-PSNRgaze predictionASICpower efficiency
0
0 comments X

The pith

SPORT cuts VR memory power by 51.6% by truncating bits outside the gaze-predicted field of view while meeting spherical PSNR targets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SPORT, a bit-truncation method that stores fewer bits for pixels outside a viewer's predicted field of view in 360-degree video to lower memory bandwidth power in standalone VR headsets. It incorporates weighted-to-spherically-uniform PSNR directly into the truncation decisions so that quality constraints remain consistent. Gaze prediction adjusts tile classification to offset the 9.33 ms pipeline latency and limit boundary errors. The adaptive SPORT-A version reaches 51.6% power reduction, outperforming a standard-PSNR optimizer by 3.1 points at matched quality, and is confirmed on a fabricated ASIC with exact hardware-software agreement.

Core claim

SPORT is a bit-truncation framework that reduces display-path memory power by storing only the most significant bits of pixels outside the user's field of view. It uses WS-PSNR directly in the optimization constraint to satisfy per-region quality thresholds. Gaze-predictive tile classification compensates for the 9.33 ms end-to-end pipeline latency, reducing boundary misclassifications. The full adaptive variant SPORT-A reaches 51.6% power saving with byte-exact silicon-software agreement on the TrunMEM360 ASIC and WS-PSNR/SSIM matching within 0.1 dB and 0.001.

What carries the argument

WS-PSNR-constrained bit-truncation optimizer paired with gaze-predictive tile classification that offsets 9.33 ms latency.

If this is right

  • SPORT-B keeps the attended field of view lossless, delivers 47.9% memory power and bandwidth reduction across 4K sequences, and maintains SSIM of 1.000 in the attended region.
  • SPORT-A delivers 3.1 percentage points more power saving than a PSNR-based optimizer at equal measured quality.
  • The 9.33 ms motion-to-photon latency satisfies the 20 ms VR comfort budget with a 53.3% safety margin.
  • CACTI analysis shows 48.72% DRAM leakage reduction and 36.4%/36.7% read/write energy reduction.
  • Validation on the SkyWater 130 nm TrunMEM360 ASIC confirms byte-exact agreement and WS-PSNR/SSIM fidelity within 0.1 dB and 0.001.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If gaze prediction models improve beyond the current compensation, the same quality targets could be met with even fewer bits allocated to peripheral regions.
  • The truncation approach could be applied to other spherical or panoramic content streams that share similar memory-bandwidth bottlenecks.
  • Lower memory power draw may allow higher-resolution 360 video or longer battery runtime in mobile VR without changing the display hardware.
  • The ASIC implementation could be retargeted to smaller process nodes to compound the reported leakage and dynamic energy reductions.

Load-bearing premise

Gaze prediction can reliably forecast viewer direction far enough ahead to keep boundary misclassifications low enough that all per-region WS-PSNR thresholds stay satisfied.

What would settle it

Measure WS-PSNR on a test sequence where actual gaze deviates from the prediction model by the full 9.33 ms compensation window and check whether any region drops below its assigned threshold.

Figures

Figures reproduced from arXiv: 2606.24916 by Hasibur Rahman Hemel, Hritom Das, Jinhui Wang, Kyle Mooney, Mario Renteria-Pinon, Md. Sajjad Hossain, Na Gong, William Oswald, Yiwen Xu, Zhenlin Pei.

Figure 1
Figure 1. Figure 1: Proposed SPORT. The decoded frame is intercepted by TrunMEM360 [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Specifically, each bank is dedicated to a specific [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: Proposed TrunMEM360 architecture with three heterogeneous banks (FoV, Border, Background). [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Quality comparison across all methods- 4K 60 fps ( [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Cumulative distribution of per-frame WS-PSNR per region - 4K 60 fps, 100 frames. The dashed vertical line marks the per-region quality threshold. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: SPORT-B Border and Background WS-PSNR across five 4K 30 fps [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Experimental hardware setup with the fabricated TrunMEM360 chip. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Measured timing waveforms of TrunMEM360 write and read operations. The waveforms verify correct control signal sequencing and truncation [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
read the original abstract

Memory bandwidth accounts for 30-40% of total power consumption in standalone virtual reality (VR) headsets, yet existing systems typically store the entire 360-degree frame at a uniform resolution regardless of viewer gaze. This paper presents SPORT (Spherical-PSNR Optimized tRuncaTion), a bit-truncation framework that reduces display-path memory power by storing only the most significant bits of pixels outside the user's field of view (FoV). Specifically, a new bit-truncation framework is developed to use weighted-to-spherically-uniform PSNR (WS-PSNR) directly in the optimization constraint, eliminating the metric inconsistency that arises when standard PSNR is used for a WS-PSNR quality target. Also, gaze-predictive tile classification compensates for the 9.33 ms end-to-end pipeline latency, reducing boundary misclassifications by 5.2 percentage points at a cost of only 0.01 ms. In addition, the developed SPORT-B variant, which keeps the FoV lossless, achieves 47.9% memory power saving and 47.9% bandwidth reduction across different 4K video sequences while satisfying all three per-region WS-PSNR thresholds and maintaining SSIM = 1.000 in the attended region. The full adaptive variant SPORT-A reaches 51.6% power saving, 3.1percentage points more than a PSNR-based optimizer at equal measured quality. SPORT is validated on the TrunMEM360 flexible SRAM Application-Specific Integrated Circuit (ASIC) fabricated in SkyWater 130 nm CMOS, confirming byte-exact silicon-software agreement, with WS-PSNR and SSIM matching within 0.1 dB and 0.001. CACTI-based analysis confirms 48.72% DRAM leakage reduction and 36.4%/36.7% read/write energy reduction. The total motion-to-photon latency of 9.33 ms satisfies the 20 ms VR comfort budget with a 53.3% safety margin.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents SPORT, a bit-truncation framework for power-efficient 360° video storage in VR headsets. It optimizes truncation levels outside the FoV using WS-PSNR directly as the constraint (avoiding PSNR/WS-PSNR mismatch), augments this with gaze-predictive tile classification to compensate for 9.33 ms end-to-end latency (reducing boundary misclassifications by 5.2 pp), and evaluates two variants: SPORT-B (FoV kept lossless, SSIM=1.000) at 47.9% memory power/bandwidth reduction and SPORT-A (full adaptive) at 51.6% power saving (3.1 pp above a PSNR-based optimizer at equal measured quality). Claims are supported by measurements on the fabricated TrunMEM360 ASIC (SkyWater 130 nm) showing byte-exact silicon-software agreement, WS-PSNR/SSIM within 0.1 dB and 0.001, plus CACTI analysis reporting 48.72% DRAM leakage and 36.4/36.7% read/write energy reductions. Total latency satisfies the 20 ms VR budget with margin.

Significance. If the quality-preservation claims hold under realistic conditions, the work directly targets the 30-40% memory-bandwidth power share in standalone VR headsets and supplies concrete hardware evidence via a fabricated ASIC plus CACTI modeling. The methodological choice to embed WS-PSNR in the optimizer is a clear improvement over prior metric-inconsistent approaches. Reproducible silicon validation and numeric outcomes (51.6% saving, exact byte agreement) strengthen the result; the approach could inform future gaze-aware memory architectures if the latency-compensation assumption is further substantiated.

major comments (2)
  1. [Gaze-predictive tile classification and latency compensation] Gaze-predictive tile classification: The central 51.6% saving claim for SPORT-A at equal measured quality rests on the assertion that the predictor (reducing misclassifications by 5.2 pp at 0.01 ms cost) keeps all three per-region WS-PSNR thresholds satisfied despite 9.33 ms pipeline latency. The manuscript reports aggregate misclassification reduction and SSIM=1.000 for SPORT-B but does not supply worst-case or per-sequence statistics on residual boundary-tile errors under realistic head-motion velocity distributions, nor the resulting WS-PSNR deviation when a near-FoV tile is erroneously assigned a lower truncation level. This leaves the 'equal measured quality' comparison to the PSNR baseline unverified for the load-bearing case.
  2. [ASIC validation and CACTI analysis] Experimental validation and reproducibility: The ASIC results (byte-exact agreement, WS-PSNR/SSIM within 0.1 dB/0.001, 51.6% saving) are presented as confirmation, yet the evaluation section provides neither the number/diversity of 4K sequences and frames used for the power/quality measurements nor the exact test-vector coverage that produced the reported CACTI DRAM reductions. Without these details the numeric outcomes cannot be independently reproduced or stress-tested against the gaze-latency assumption.
minor comments (2)
  1. [Abstract] Abstract contains the typo '3.1percentage points' (missing space).
  2. [Optimization framework] Notation for the three per-region WS-PSNR thresholds and the exact truncation bit levels outside FoV should be defined once with symbols before being used in the optimization description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance. We address each major comment below. Where the manuscript lacks requested details, we will revise to incorporate them for improved reproducibility and verification of the quality claims.

read point-by-point responses
  1. Referee: [Gaze-predictive tile classification and latency compensation] The manuscript reports aggregate misclassification reduction and SSIM=1.000 for SPORT-B but does not supply worst-case or per-sequence statistics on residual boundary-tile errors under realistic head-motion velocity distributions, nor the resulting WS-PSNR deviation when a near-FoV tile is erroneously assigned a lower truncation level. This leaves the 'equal measured quality' comparison to the PSNR baseline unverified for the load-bearing case.

    Authors: We agree that aggregate statistics alone leave the worst-case behavior unverified. The revised manuscript will include per-sequence misclassification rates, analysis under high-velocity head-motion distributions, and the maximum observed WS-PSNR deviation for any residual boundary-tile misclassifications. This will directly substantiate the equal-quality comparison to the PSNR baseline. revision: yes

  2. Referee: [ASIC validation and CACTI analysis] The ASIC results (byte-exact agreement, WS-PSNR/SSIM within 0.1 dB/0.001, 51.6% saving) are presented as confirmation, yet the evaluation section provides neither the number/diversity of 4K sequences and frames used for the power/quality measurements nor the exact test-vector coverage that produced the reported CACTI DRAM reductions. Without these details the numeric outcomes cannot be independently reproduced or stress-tested against the gaze-latency assumption.

    Authors: We concur that explicit dataset and coverage details are required for reproducibility. The revised manuscript will specify the number and diversity of 4K sequences/frames used for the power and quality measurements, along with the test-vector coverage underlying the CACTI DRAM reductions. This will allow independent verification of the reported savings and quality metrics. revision: yes

Circularity Check

0 steps flagged

No load-bearing circularity; savings and quality claims rest on ASIC measurements and explicit design choices

full rationale

The paper presents SPORT as an optimization framework that directly incorporates WS-PSNR into the truncation constraint and uses gaze prediction to handle latency. These are methodological decisions, not self-referential definitions. Reported power savings (47.9% and 51.6%) and quality metrics (WS-PSNR thresholds, SSIM=1.000) are obtained from fabricated ASIC validation with byte-exact silicon-software agreement and CACTI modeling. No equations, fitted parameters, or self-citations reduce the central claims back to the inputs by construction. The derivation chain is self-contained against external hardware benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Abstract-only review supplies insufficient detail to enumerate concrete free parameters or axioms; the optimization thresholds and truncation bit-widths are implicitly chosen to meet quality targets, and WS-PSNR is treated as the appropriate spherical metric without further justification.

free parameters (2)
  • per-region WS-PSNR thresholds
    Quality targets used as optimization constraints; values chosen to satisfy the three per-region requirements while enabling truncation.
  • truncation bit levels outside FoV
    Number of bits dropped for non-attended pixels; selected during the WS-PSNR optimization.
axioms (1)
  • domain assumption WS-PSNR is the correct quality metric for spherical 360 video truncation decisions
    The framework is built around using WS-PSNR directly in the constraint; this choice is not derived in the abstract.

pith-pipeline@v0.9.1-grok · 5948 in / 1572 out tokens · 49564 ms · 2026-06-26T12:23:47.017666+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references

  1. [1]

    Eye tracking in virtual reality: A broad review of applications and challenges,

    I. B. Adhanom, P. MacNeilage, and E. Folmer, “Eye tracking in virtual reality: A broad review of applications and challenges,”Virtual Reality, vol. 27, no. 2, pp. 1481–1505, Jan. 2023. IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS (JETCAS) 14

  2. [2]

    A survey on adaptive 360° video streaming: Solutions, challenges and opportunities,

    A. Yaqoob, T. Bi, and G.-M. Muntean, “A survey on adaptive 360° video streaming: Solutions, challenges and opportunities,”IEEE Commun. Surveys Tuts., vol. 22, no. 4, pp. 2801–2838, 4th Quart. 2020

  3. [3]

    Peripheral vision in real-world tasks: A systematic review,

    C. Vater, B. Wolfe, and R. Rosenholtz, “Peripheral vision in real-world tasks: A systematic review,”Psychon. Bull. Rev., vol. 29, no. 5, pp. 1531– 1557, 2022

  4. [4]

    A lossless embedded compression using significant bit truncation for HD video coding,

    J. Kim and C.-M. Kyung, “A lossless embedded compression using significant bit truncation for HD video coding,”IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 6, pp. 848–860, Jun. 2010

  5. [5]

    Content-adaptable ROI- aware video storage for power-quality scalable mobile streaming,

    A. Haidous, W. Oswald, H. Das, and N. Gong, “Content-adaptable ROI- aware video storage for power-quality scalable mobile streaming,”IEEE Access, vol. 10, pp. 26830–26848, 2022

  6. [6]

    Viewer-aware intelligent efficient mobile video embed- ded memory,

    D. Chenet al., “Viewer-aware intelligent efficient mobile video embed- ded memory,”IEEE Trans. Very Large Scale Integr. Syst., vol. 26, no. 4, pp. 684–696, Apr. 2018

  7. [7]

    Weighted-to-spherically-uniform quality eval- uation for omnidirectional video,

    Y . Sun, A. Lu, and L. Yu, “Weighted-to-spherically-uniform quality eval- uation for omnidirectional video,”IEEE Signal Process. Lett., vol. 24, no. 9, pp. 1408–1412, Sep. 2017

  8. [8]

    Spherical domain rate-distortion optimization for 360-degree video coding,

    Y . Li, J. Xu, and Z. Chen, “Spherical domain rate-distortion optimization for 360-degree video coding,”IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 6, pp. 1767–1780, Jun. 2019

  9. [9]

    JVET common test conditions and evaluation procedures for 360° video,

    P. Hanhart, J. Boyce, K. Choi, and J.-L. Lin, “JVET common test conditions and evaluation procedures for 360° video,” JVET-K1012, Joint Video Experts Team, Jul. 2018

  10. [10]

    Image quality assessment: From error visibility to structural similarity,

    Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,”IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004

  11. [11]

    Latitude-redundancy-aware all-zero block detection for fast 360-degree video coding,

    C. Yuet al., “Latitude-redundancy-aware all-zero block detection for fast 360-degree video coding,”IEEE Trans. Image Process., vol. 33, pp. 6129–6142, 2024

  12. [12]

    Latitude-based flexible com- plexity allocation for 360-degree video coding,

    J. Lin, L. Lin, W. Li, Y . Xu, and T. Zhao, “Latitude-based flexible com- plexity allocation for 360-degree video coding,”IEEE Trans. Broadcast., vol. 68, no. 3, pp. 572–581, Sep. 2022

  13. [13]

    Perceptual versus latitude- based 360-deg video coding optimization,

    S. Jaballah, A. Bhavsar, and M.-C. Larabi, “Perceptual versus latitude- based 360-deg video coding optimization,” inProc. IEEE MMSP, Sep. 2020, pp. 1–6

  14. [14]

    Low-latency FoV-adaptive coding and streaming for interactive 360° video streaming,

    Y . Mao, L. Sun, Y . Liu, and Y . Wang, “Low-latency FoV-adaptive coding and streaming for interactive 360° video streaming,” inProc. 28th ACM Int. Conf. Multimedia, Oct. 2020, pp. 3696–3704

  15. [15]

    Adaptive 360-degree video streaming using scalable video coding,

    A. T. Nasrabadi, A. Mahzari, J. D. Beshay, and R. Prakash, “Adaptive 360-degree video streaming using scalable video coding,” inProc. ACM Multimedia, 2020, pp. 1–9

  16. [16]

    Flexible bit-truncation memory for low-power quality-adaptive video and deep learning storage,

    W. Oswaldet al., “Flexible bit-truncation memory for low-power quality-adaptive video and deep learning storage,” inProc. IEEE IGSC, Nov. 2024, pp. 87–92. [17]“https://www.youtube.com/watch?v=pITYu0TQ1eM”

  17. [17]

    Predicting head movement in panoramic video: A deep reinforcement learning approach,

    M. Xu, Y . Song, J. Wang, M. Qiao, L. Huo, and Z. Wang, “Predicting head movement in panoramic video: A deep reinforcement learning approach,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 11, pp. 2693–2708, Nov. 2019

  18. [18]

    Prediction of head movement in 360- degree videos using attention model,

    D. Kim, S. Cho, and J. Lee, “Prediction of head movement in 360- degree videos using attention model,”Sensors, vol. 21, no. 11, p. 3678, May 2021

  19. [19]

    An overview of HEVC/H.265 video codec and its deployment,

    A. Sulhanet al., “An overview of HEVC/H.265 video codec and its deployment,”IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 2, pp. 392–406, Feb. 2021

  20. [20]

    End-to-end latency analysis and optimiza- tion for real-time VR video streaming pipelines,

    J. Lim, H. Kim, and S. Park, “End-to-end latency analysis and optimiza- tion for real-time VR video streaming pipelines,”IEEE Trans. Consumer Electron., vol. 70, no. 1, pp. 120–133, Feb. 2024

  21. [21]

    SJND: A spherical just noticeable difference modelling for 360° video coding,

    X. Linet al., “SJND: A spherical just noticeable difference modelling for 360° video coding,”Signal Process.: Image Commun., 2025

  22. [22]

    Individualized foveated rendering with eye- tracking head-mounted display,

    S. Kim, J. Park, and Y . Lee, “Individualized foveated rendering with eye- tracking head-mounted display,”Virtual Reality, vol. 28, no. 1, pp. 1–15, Jan. 2024

  23. [23]

    Virtual reality telepresence: 360-degree video stream- ing with edge-compute assisted static foveated compression,

    X. Huanget al., “Virtual reality telepresence: 360-degree video stream- ing with edge-compute assisted static foveated compression,”IEEE Trans. Vis. Comput. Graph., vol. 29, no. 11, pp. 4525–4534, Nov. 2023

  24. [24]

    A dataset of head and eye movements for 360 degree images,

    C. Wu, Z. Tan, Z. Wang, and S. Yang, “A dataset of head and eye movements for 360 degree images,” inProc. ACM MMSys, 2017, pp. 205–210

  25. [25]

    CACTI 7: New tools for interconnect exploration in innovative off-chip memories,

    R. Balasubramonian, A. B. Kahng, N. Muralimanohar, A. Shafiee, and V . Srinivas, “CACTI 7: New tools for interconnect exploration in innovative off-chip memories,”ACM Trans. Archit. Code Optim., vol. 14, no. 2, pp. 1–25, 2017. Md. Sajjad Hossainreceived his B.Sc. degree in Electronics and Telecom- munication Engineering from Rajshahi University of Enginee...