pith. machine review for the scientific record. sign in

arxiv: 2604.09220 · v1 · submitted 2026-04-10 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

TinyNeRV: Compact Neural Video Representations via Capacity Scaling, Distillation, and Low-Precision Inference

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:23 UTC · model grok-4.3

classification 💻 cs.CV
keywords neural video representationsNeRVmodel compressionknowledge distillationquantizationcompact modelsvideo reconstructionefficient inference
0
0 comments X

The pith

Tiny NeRV models achieve favorable quality-efficiency trade-offs by scaling capacity, adding distillation, and applying low-precision inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how neural networks that encode entire videos can be made small enough for devices with tight memory and power limits. It introduces two reduced-capacity architectures, NeRV-T and NeRV-T+, and tests them on multiple video datasets to track changes in reconstruction quality, speed, and resource use. To offset quality loss from the size reduction, the work applies knowledge distillation guided by frequency-aware focal supervision and evaluates both post-training quantization and quantization-aware training. These steps demonstrate that compact neural video representations can retain usable fidelity while cutting parameter counts, computation, and memory needs.

Core claim

Tiny NeRV variants, when trained with frequency-aware focal knowledge distillation and run under reduced numerical precision, achieve favorable reconstruction quality versus efficiency trade-offs that substantially lower parameter count, computational cost, and memory requirements compared with larger NeRV models.

What carries the argument

The NeRV-T and NeRV-T+ architectures, combined with frequency-aware focal knowledge distillation and post-training or quantization-aware training for low-precision inference.

Load-bearing premise

The quality gains from distillation and quantization seen on the tested videos and model sizes will hold for new videos and real deployments without major degradation or unexpected artifacts.

What would settle it

Running the distilled and quantized tiny models on a fresh set of videos from a different source and measuring whether reconstruction metrics drop markedly below those of the larger full-precision baseline.

Figures

Figures reproduced from arXiv: 2604.09220 by Ihab Amer, Muhammad Hannan Akhtar, Tamer Shanableh.

Figure 1
Figure 1. Figure 1: Width-based scaling of NeRV variants. All models share identical depth and decoding structure; capacity differences arise solely from channel dimensionality. effects can be more pronounced in compact models with limited redundancy [77]. Accordingly, understanding the interaction between architectural capacity and numerical precision is essential, which is that aggressive model scaling reduces representatio… view at source ↗
Figure 2
Figure 2. Figure 2: Proposed frequency–focal knowledge distillation. Teacher and student predictions are decomposed into low￾frequency (smooth) and high-frequency (edge-dominant residual) components. A focal weighting term 𝑤(𝑘) emphasizes spatial locations with large teacher–student discrepancies in the high-frequency domain, guiding the tiny student toward improved detail reconstruction while preserving global structure. Thi… view at source ↗
Figure 3
Figure 3. Figure 3: Representative frames from the four evaluated sequences: Big Buck Bunny, honeybee, readysetgo, and yachtride. The sequences exhibit distinct structural patterns, motion characteristics, and texture density, providing a diverse evaluation setting for tiny neural video representations. For every dataset, models are trained for 300 epochs using identical optimization settings, including learning rate schedule… view at source ↗
Figure 4
Figure 4. Figure 4: visualizes the same scaling behavior as a quality–compute trade-off curve, highlighting the separation between the tiny operating points (NeRV-T/T+) and the high-compute NeRV-S/M/L regime [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison across the honeybee, readysetgo, and yachtride sequences. Each row corresponds to one sequence, and each column shows reconstructions from different model configurations. NeRV-L serves as a high-capacity reference, while NeRV-S provides a baseline. NeRV-T+ and NeRV-T illustrate the effect of aggressive capacity reduction in the tiny regime, and the distilled NeRV-T model shows the im… view at source ↗
read the original abstract

Implicit neural video representations encode entire video sequences within the parameters of a neural network and enable constant time frame reconstruction. Recent work on Neural Representations for Videos (NeRV) has demonstrated competitive reconstruction performance while avoiding the sequential decoding process of conventional video codecs. However, most existing studies focus on moderate or high capacity models, leaving the behavior of extremely compact configurations required for constrained environments insufficiently explored. This paper presents a systematic study of tiny NeRV architectures designed for efficient deployment. Two lightweight configurations, NeRV-T and NeRV-T+, are introduced and evaluated across multiple video datasets in order to analyze how aggressive capacity reduction affects reconstruction quality, computational complexity, and decoding throughput. Beyond architectural scaling, the work investigates strategies for improving the performance of compact models without increasing inference cost. Knowledge distillation with frequency-aware focal supervision is explored to enhance reconstruction fidelity in low-capacity networks. In addition, the impact of lowprecision inference is examined through both post training quantization and quantization aware training to study the robustness of tiny models under reduced numerical precision. Experimental results demonstrate that carefully designed tiny NeRV variants can achieve favorable quality efficiency trade offs while substantially reducing parameter count, computational cost, and memory requirements. These findings provide insight into the practical limits of compact neural video representations and offer guidance for deploying NeRV style models in resource constrained and real-time environments. The official implementation is available at https: //github.com/HannanAkhtar/TinyNeRV-Implementation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper introduces two compact NeRV variants, NeRV-T and NeRV-T+, obtained via aggressive capacity scaling. It augments these with frequency-aware knowledge distillation and studies low-precision inference through both post-training quantization (PTQ) and quantization-aware training (QAT). Experiments across standard video datasets report that the resulting models maintain competitive reconstruction quality while delivering large reductions in parameter count, FLOPs, and memory footprint, together with improved decoding throughput. The work concludes with practical guidance for deploying implicit neural video representations under tight resource constraints. The official implementation is released.

Significance. If the reported trade-offs hold under broader testing, the paper meaningfully extends the NeRV line of work into the tiny-model regime that is relevant for edge and real-time applications. The systematic comparison of capacity scaling, distillation, and quantization provides concrete empirical guidance rather than isolated point results. The public release of the implementation is a clear strength that supports reproducibility and follow-on research.

major comments (2)
  1. [§4.2 and Table 4] §4.2 and Table 4: the frequency-aware distillation objective is shown to improve PSNR by 0.8–1.2 dB over vanilla distillation, yet the focal weighting hyper-parameter is selected via grid search on the validation split of each dataset; this makes the reported gains partly dependent on per-dataset tuning and weakens the claim that the method is generally applicable to unseen videos.
  2. [§5.1, Table 6] §5.1, Table 6: the 4-bit QAT results are presented without error bars or multiple random seeds; given that the PSNR gap versus 8-bit is only 0.3 dB on average, the conclusion that tiny NeRV models are “robust” to low-precision inference rests on a single-run comparison whose statistical reliability is unclear.
minor comments (3)
  1. [Figure 2] Figure 2: the capacity-scaling curves would be easier to interpret if the x-axis were labeled in absolute parameter counts rather than only relative reduction percentages.
  2. [§3.1] §3.1: the architectural differences between NeRV-T and NeRV-T+ are described only in prose; a compact table listing layer widths, MLP depths, and positional-encoding frequencies would improve clarity.
  3. [References] References: several citations to the original NeRV paper and follow-up works lack arXiv identifiers or page numbers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the constructive feedback. We address each major comment point by point below, indicating the revisions we will incorporate into the manuscript.

read point-by-point responses
  1. Referee: [§4.2 and Table 4] §4.2 and Table 4: the frequency-aware distillation objective is shown to improve PSNR by 0.8–1.2 dB over vanilla distillation, yet the focal weighting hyper-parameter is selected via grid search on the validation split of each dataset; this makes the reported gains partly dependent on per-dataset tuning and weakens the claim that the method is generally applicable to unseen videos.

    Authors: We agree that selecting the focal weighting hyper-parameter via per-dataset grid search on the validation split limits the strength of the general-applicability claim. In the revised manuscript we will add a new set of results in which a single fixed value (the median of the per-dataset optima) is used across all datasets. The corresponding PSNR gains will be reported in an updated Table 4, and a short sensitivity analysis will be added to §4.2. These changes will directly address the concern while preserving the original per-dataset results for reference. revision: yes

  2. Referee: [§5.1, Table 6] §5.1, Table 6: the 4-bit QAT results are presented without error bars or multiple random seeds; given that the PSNR gap versus 8-bit is only 0.3 dB on average, the conclusion that tiny NeRV models are “robust” to low-precision inference rests on a single-run comparison whose statistical reliability is unclear.

    Authors: We concur that the absence of error bars and multiple seeds reduces the statistical reliability of the 4-bit QAT comparison. We will rerun the 4-bit QAT experiments with at least three independent random seeds, report mean PSNR and standard deviation in a revised Table 6, and update the accompanying text in §5.1 to reflect the observed variability. This will provide a more rigorous basis for the robustness statement. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical study only

full rationale

The paper is an empirical investigation of tiny NeRV architectures, capacity scaling, frequency-aware distillation, and PTQ/QAT quantization. It reports experimental results on video datasets without any mathematical derivation chain, uniqueness theorems, or predictions that reduce to fitted inputs by construction. No self-citation load-bearing steps or ansatz smuggling appear in the provided abstract or described methods. The central claims rest on measured quality-efficiency trade-offs, which are externally falsifiable via the released implementation and datasets.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard deep-learning assumptions about neural network expressivity for video signals and the effectiveness of distillation and quantization; no new entities are postulated and free parameters are the usual architectural and training hyperparameters.

free parameters (2)
  • Capacity scaling hyperparameters for NeRV-T and NeRV-T+
    Specific layer widths and depths chosen to achieve extreme parameter reduction while remaining trainable.
  • Quantization bit widths and training schedules
    Low-precision levels and whether post-training or quantization-aware training is used are selected to balance quality and efficiency.
axioms (2)
  • domain assumption A neural network can implicitly encode an entire video sequence in its parameters for constant-time frame reconstruction.
    Core premise inherited from prior NeRV work and assumed to hold for the scaled-down variants.
  • domain assumption Knowledge distillation with frequency-aware supervision can improve low-capacity model fidelity without raising inference cost.
    Standard ML technique applied here; its effectiveness on tiny NeRV is taken as transferable.

pith-pipeline@v0.9.0 · 5568 in / 1469 out tokens · 85038 ms · 2026-05-10T18:23:57.375440+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

90 extracted references · 35 canonical work pages · 3 internal anchors

  1. [1]

    H. Chen, M. Gwilliam, S.-N. Lim, A. Shrivastava, Nerv: Neural representations for videos, in: Advances in Neural Information Processing Systems, 2021

  2. [2]

    W. Roth, G. Schindler, B. Klein, R. Peharz, S. Tschiatschek, H. Fröning, F. Pernkopf, Z. Ghahramani, Resource-efficient neural networks for embedded systems, Journal of Machine Learning Research 25 (50) (2024) 1–51. URLhttp://jmlr.org/papers/v25/18-566.html

  3. [3]

    M. T. Lê, P. Wolinski, J. Arbel, Efficient neural networks for tiny machine learning: A comprehensive review, ACM Trans. Intell. Syst. Technol.Just Accepted.doi:10.1145/3798276. URLhttps://doi.org/10.1145/3798276

  4. [4]

    Gutierrez-Torre, K

    A. Gutierrez-Torre, K. Bahadori, S.-U.-R. Baig, W. Iqbal, T. Vardanega, J. L. Berral, D. Carrera, Automatic distributed deep learning using resource-constrained edge devices, IEEE Internet of Things Journal 9 (16) (2022) 15018–15029.doi:10.1109/JIOT.2021.3098973

  5. [5]

    C.Yahyati,I.Lamaakal,Y.Maleh,K.ElMakkaoui,I.Ouahbi,M.Almousa,A.A.AbdEl-Latif,Asystematicreviewofstate-of-the-arttinyml applications in healthcare, education, and transportation, IEEE Access 13 (2025) 204513–204562

  6. [6]

    Z.Németh,C.H.See,K.Goh,A.Ghani,S.Keates,R.A.Abd-Alhameed,Machinelearning-basedpositiondetectionusinghall-effectsensor arrays on resource-constrained microcontroller, Sensors 25 (20) (2025) 6444

  7. [7]

    M. J. Reis, Low-power embedded sensor node for real-time environmental monitoring with on-board machine-learning inference, Sensors 26 (2) (2026) 703

  8. [8]

    J. H. Kadhum, A. D. Radhi, L. S. Ismail, M. G. Waheed, H. Alzamily, I. A. Mohammad, W. A. Hashim, Ultra-low power sensor data processing in smart agriculture using attention-optimized tinyml models for edge microcontrollers, in: 2025 3rd International Conference on Cyber Resilience (ICCR), IEEE, 2025, pp. 1–8

  9. [9]

    URLhttps://www.mdpi.com/1424-8220/23/3/1185 Akhtar et al.:Preprint submitted to ElsevierPage 19 of 22 TinyNeRV Variants for Constrained Deployment

    K.Kim,S.-J.Jang,J.Park,E.Lee,S.-S.Lee,Lightweightandenergy-efficientdeeplearningacceleratorforreal-timeobjectdetectiononedge devices, Sensors 23 (3).doi:10.3390/s23031185. URLhttps://www.mdpi.com/1424-8220/23/3/1185 Akhtar et al.:Preprint submitted to ElsevierPage 19 of 22 TinyNeRV Variants for Constrained Deployment

  10. [10]

    B. Kim, S. Lee, On-nas: On-device neural architecture search on memory-constrained intelligent embedded systems, in: Proceedings of the 21st ACM Conference on Embedded Networked Sensor Systems, 2023, pp. 152–166

  11. [11]

    Zheng, Y.-Y

    H.-S. Zheng, Y.-Y. Liu, C.-F. Hsu, T. T. Yeh, Streamnet: Memory-efficient streaming tiny deep learning inference on the microcontroller, Advances in Neural Information Processing Systems 36 (2023) 37160–37172

  12. [12]

    P.-G. Ye, W. Wang, B. Mi, K. Chen, Edgestreaming: Secure computation intelligence in distributed edge networks for streaming analytics, ACM Transactions on Multimedia Computing, Communications and Applications 21 (8) (2025) 1–15

  13. [13]

    C.M.Bhushan,P.Koppuravuri,N.Prasanthi,F.Gazi,M.M.Hussain,M.Abdussami,A.A.Devi,J.Faizi,Deployingtinymlforenergy-efficient object detection and communication in low-power edge ai systems, Scientific Reports

  14. [14]

    J.Tu,L.Yang,J.Cao,Distributedmachinelearninginedgecomputing:Challenges,solutionsandfuturedirections,ACMComputingSurveys 57 (5) (2025) 1–37

  15. [15]

    Berthelier, T

    A. Berthelier, T. Chateau, S. Duffner, C. Garcia, C. Blanc, Deep model compression and architecture optimization for embedded systems: A survey, Journal of Signal Processing Systems 93 (8) (2021) 863–878

  16. [16]

    H.-I. Liu, M. Galindo, H. Xie, L.-K. Wong, H.-H. Shuai, Y.-H. Li, W.-H. Cheng, Lightweight deep learning for resource-constrained environments: A survey, ACM Computing Surveys 56 (10) (2024) 1–42

  17. [17]

    Kulkarni, V

    V. Kulkarni, V. Jujare, Tinyml using neural networks for resource-constrained devices, in: TinyML for Edge Intelligence in IoT and LPWAN Networks, Elsevier, 2024, pp. 87–101

  18. [18]

    Heydari, Q

    S. Heydari, Q. H. Mahmoud, Tiny machine learning and on-device inference: A survey of applications, challenges, and future directions, Sensors 25 (10) (2025) 3191

  19. [19]

    Sitzmann, J

    V. Sitzmann, J. Martel, A. Bergman, D. B. Lindell, G. Wetzstein, Implicit neural representations with periodic activation functions, in: Advances in Neural Information Processing Systems, 2020

  20. [20]

    Li, et al., Hnerv: A hybrid neural representation for videos, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

    Z. Li, et al., Hnerv: A hybrid neural representation for videos, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

  21. [21]

    B.He,X.Yang,H.Wang,Z.Wu,H.Chen,S.Huang,Y.Ren,S.-N.Lim,Hinerv:Videocompressionwithhierarchicalencoding-basedneural representation, in: Advances in Neural Information Processing Systems, 2023

  22. [22]

    Tarchouli, T

    M. Tarchouli, T. Guionnet, M. Riviere, W. Hamidouche, M. Outtas, O. Deforges, Res-nerv: Residual blocks for a practical implicit neural video decoder, in: Proceedings of the IEEE International Conference on Image Processing, 2024

  23. [23]

    L.Yu,etal.,High-frequencyenhancedhybridneuralrepresentationsforvideo,in:ProceedingsoftheIEEEInternationalConferenceonImage Processing, 2023

  24. [24]

    Kim, J.-W

    J. Kim, J.-W. Kang, Spatio-temporal spectra-preserving neural representation for video modeling, ACM Transactions on Multimedia Computing, Communications and Applications

  25. [25]

    12543–12552

    Z.Jia,B.Li,J.Li,W.Xie,L.Qi,H.Li,Y.Lu,Towardspracticalreal-timeneuralvideocompression,in:ProceedingsoftheComputerVision and Pattern Recognition Conference, 2025, pp. 12543–12552

  26. [26]

    M. Tan, Q. V. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, in: Proceedings of the International Conference on Machine Learning, 2019

  27. [27]

    Jacob, S

    B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, D. Kalenichenko, Quantization and training of neural networks for efficient integer-arithmetic-only inference, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018

  28. [28]

    Explaining neural scaling laws , volume=

    Y. Bahri, E. Dyer, J. Kaplan, J. Lee, U. Sharma, Explaining neural scaling laws, Proceedings of the National Academy of Sciences 121 (27) (2024) e2311878121.arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.2311878121,doi:10.1073/pnas.2311878121. URLhttps://www.pnas.org/doi/abs/10.1073/pnas.2311878121

  29. [29]

    Sengupta, Y

    A. Sengupta, Y. Goel, T. Chakraborty, How to upscale neural networks with scaling law?, Transactions on Machine Learning Research. URLhttps://openreview.net/forum?id=AL7N0UOfgI

  30. [30]

    16871–16881

    J.Shao,H.Zhang,H.Yu,J.Wu,Memory-efficientgenerativemodelsviaproductquantization,in:ProceedingsoftheIEEE/CVFInternational Conference on Computer Vision, 2025, pp. 16871–16881

  31. [31]

    H. M. Kwan, T. Peng, G. Gao, F. Zhang, M. Nilsson, A. Gower, D. Bull, Ultra-lightweight neural video representation compression, arXiv preprint arXiv:2512.04019

  32. [32]

    Cordova-Cardenas, D

    R. Cordova-Cardenas, D. Amor, Á. Gutiérrez, Edge ai in practice: A survey and deployment framework for neural networks on embedded systems, Electronics 14 (24) (2025) 4877

  33. [33]

    Distilling the Knowledge in a Neural Network

    G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network (2015).arXiv:1503.02531. URLhttps://arxiv.org/abs/1503.02531

  34. [34]

    A.Romero,N.Ballas,S.E.Kahou,A.Chassang,C.Gatta,Y.Bengio,Fitnets:hintsforthindeepnets(2014),arXivpreprintarXiv:1412.6550 3

  35. [35]

    Zagoruyko, N

    S. Zagoruyko, N. Komodakis, Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer, in: International Conference on Learning Representations, 2017. URLhttps://openreview.net/forum?id=Sks9_ajex

  36. [36]

    Moslemi, A

    A. Moslemi, A. Briskina, Z. Dang, J. Li, A survey on knowledge distillation: Recent advancements, Machine Learning with Applications 18 (2024) 100605.doi:https://doi.org/10.1016/j.mlwa.2024.100605. URLhttps://www.sciencedirect.com/science/article/pii/S2666827024000811

  37. [37]

    A. M. Mansourian, R. Ahmadi, M. Ghafouri, A. M. Babaei, E. B. Golezani, Z. yasamani ghamchi, V. Ramezanian, A. Taherian, K. Dinashi, A. Miri, S. Kasaei, A comprehensive survey on knowledge distillation, Transactions on Machine Learning Research. URLhttps://openreview.net/forum?id=3cbJzdR78B

  38. [38]

    Wang, K.-J

    L. Wang, K.-J. Yoon, Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks, IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (6) (2022) 3048–3068.doi:10.1109/TPAMI.2021.3055564. Akhtar et al.:Preprint submitted to ElsevierPage 20 of 22 TinyNeRV Variants for Constrained Deployment

  39. [39]

    X. Li, L. Li, M. Li, P. Yan, T. Feng, H. Luo, Y. Zhao, S. Yin, Knowledge distillation and teacher–student learning in medical imaging: Comprehensive overview, pivotal role, and future directions, Medical Image Analysis 107 (2026) 103819.doi:https://doi.org/10. 1016/j.media.2025.103819. URLhttps://www.sciencedirect.com/science/article/pii/S1361841525003652

  40. [40]

    Aalishah, M

    R. Aalishah, M. Navardi, T. Mohsenin, Medmambalite: Hardware-aware mamba for medical image classification, in: 2025 IEEE Biomedical Circuits and Systems Conference (BioCAS), 2025, pp. 224–228.doi:10.1109/BioCAS67066.2025.00057

  41. [41]

    Z. Li, S. Xia, J. Yue, L. Fang, Hyperkd: Lifelong hyperspectral image classification with cross-spectral–spatial knowledge distillation, IEEE Transactions on Geoscience and Remote Sensing 63 (2025) 1–17.doi:10.1109/TGRS.2025.3552449

  42. [42]

    K. Wang, F. Zheng, D. Guan, J. Liu, J. Qin, Distilling heterogeneous knowledge with aligned biological entities for histological image classification, Pattern Recognition 160 (2025) 111173.doi:https://doi.org/10.1016/j.patcog.2024.111173. URLhttps://www.sciencedirect.com/science/article/pii/S0031320324009245

  43. [43]

    Himeur, N

    Y. Himeur, N. Aburaed, O. Elharrouss, I. Varlamis, S. Atalla, W. Mansoor, H. Al-Ahmad, Applications of knowledge distillation in remote sensing: A survey, Information Fusion 115 (2025) 102742.doi:https://doi.org/10.1016/j.inffus.2024.102742. URLhttps://www.sciencedirect.com/science/article/pii/S1566253524005207

  44. [44]

    16871–16881

    J.Shao,H.Zhang,H.Yu,J.Wu,Memory-efficientgenerativemodelsviaproductquantization,in:ProceedingsoftheIEEE/CVFInternational Conference on Computer Vision (ICCV), 2025, pp. 16871–16881

  45. [45]

    W. Zhu, X. Zhou, P. Zhu, Y. Wang, Q. Hu, Ckd: Contrastive knowledge distillation from a sample-wise perspective, IEEE Transactions on Image Processing 34 (2025) 3578–3592.doi:10.1109/TIP.2025.3573474

  46. [46]

    A. T. Xu, A. Wilf, P. P. Liang, A. Obolenskiy, D. Fried, L.-P. Morency, Comparative knowledge distillation, in: Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 7690–7699

  47. [47]

    Y. Chen, A. Habibian, L. Benini, Y. Li, Gated relational alignment via confidence-based distillation for efficient vlms, arXiv preprint arXiv:2601.22709

  48. [48]

    Zheng, Y

    K. Zheng, Y. Wang, Y. Yuan, Boosting contrastive learning with relation knowledge distillation, Proceedings of the AAAI Conference on Artificial Intelligence 36 (3) (2022) 3508–3516.doi:10.1609/aaai.v36i3.20262. URLhttps://ojs.aaai.org/index.php/AAAI/article/view/20262

  49. [49]

    H. Wu, L. Xiao, X. Zhang, Y. Miao, Aligning in a compact space: Contrastive knowledge distillation between heterogeneous architectures, arXiv preprint arXiv:2405.18524

  50. [50]

    Camarena, M

    F. Camarena, M. Gonzalez-Mendoza, L. Chang, Knowledge distillation in video-based human action recognition: An intuitive approach to efficient and flexible model training, Journal of Imaging 10 (4).doi:10.3390/jimaging10040085. URLhttps://www.mdpi.com/2313-433X/10/4/85

  51. [51]

    Z. Xiao, H. Xing, R. Qu, H. Li, X. Cheng, L. Xu, L. Feng, Q. Wan, Heterogeneous mutual knowledge distillation for wearable human activity recognition, IEEE Transactions on Neural Networks and Learning Systems 36 (9) (2025) 16589–16603.doi:10.1109/TNNLS. 2025.3556317

  52. [52]

    Zhang, R

    X. Zhang, R. Yang, D. He, X. Ge, T. Xu, Y. Wang, H. Qin, J. Zhang, Boosting neural representations for videos with a conditional decoder, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 2556–2566

  53. [53]

    C. Zhao, Z. Chen, Y. Xu, E. Gu, J. Li, Z. Yi, Q. Wang, J. Yang, Y. Tai, From zero to detail: Deconstructing ultra-high-definition image restoration fromprogressive spectralperspective, in: Proceedingsof theIEEE/CVF Conference onComputer Visionand Pattern Recognition (CVPR), 2025, pp. 17935–17946

  54. [54]

    Zhang, Simultaneous learning knowledge distillation for image restoration: Efficient model compression for drones, Drones 9 (3)

    Y. Zhang, Simultaneous learning knowledge distillation for image restoration: Efficient model compression for drones, Drones 9 (3). doi:10.3390/drones9030209. URLhttps://www.mdpi.com/2504-446X/9/3/209

  55. [55]

    F.AliDharejo,B.Alawode,I.IyappanGanapathi,L.Wang,A.Mian,R.Timofte,S.Javed,Multi-distillationunderwaterimagesuper-resolution via wavelet transform, IEEE Access 12 (2024) 131083–131099.doi:10.1109/ACCESS.2024.3449136

  56. [56]

    X. Li, S. Gao, S. Shang, Knowledge distillation method for spatio-temporal tasks: a survey, GeoInformatica 30 (1) (2026) 4

  57. [57]

    Y. Li, C. Yang, H. Zeng, Z. Dong, Z. An, Y. Xu, Y. Tian, H. Wu, Frequency-aligned knowledge distillation for lightweight spatiotemporal forecasting, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 7262–7272

  58. [58]

    J. Kim, J. Lee, J.-W. Kang, Snerv: Spectra-preserving neural representation for video, in: A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, G. Varol (Eds.), Computer Vision – ECCV 2024, Springer Nature Switzerland, Cham, 2025, pp. 332–348

  59. [59]

    H. A. Abushahla, D. Varam, A. J. N. Panopio, M. I. AlHajri, Neural network quantization for microcontrollers: A comprehensive survey of methods, platforms, and applications (2026).arXiv:2508.15008. URLhttps://arxiv.org/abs/2508.15008

  60. [60]

    K. Liu, Q. Zheng, K. Tao, Z. Li, H. Qin, W. Li, Y. Guo, X. Liu, L. Kong, G. Chen, Y. Zhang, X. Yang, Low-bit model quantization for deep neural networks: A survey (2025).arXiv:2505.05530. URLhttps://arxiv.org/abs/2505.05530

  61. [61]

    J.Yang,X.Shen,J.Xing,X.Tian,H.Li,B.Deng,J.Huang,X.-s.Hua,Quantizationnetworks,in:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR), 2019

  62. [62]

    P. Yin, J. Lyu, S. Zhang, S. J. Osher, Y. Qi, J. Xin, Understanding straight-through estimator in training activation quantized neural nets, in: International Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=Skh4jRcKQ

  63. [63]

    Schoenbauer, D

    M. Schoenbauer, D. Moro, L. Lew, A. G. Howard, Custom gradient estimators are straight-through estimators in disguise (2025). URLhttps://openreview.net/forum?id=3j72egd8q1

  64. [64]

    Nagel, M

    M. Nagel, M. Fournarakis, Y. Bondarenko, T. Blankevoort, Overcoming oscillations in quantization-aware training, in: K. Chaudhuri, S.Jegelka,L.Song,C.Szepesvari,G.Niu,S.Sabato(Eds.),Proceedingsofthe39thInternationalConferenceonMachineLearning,Vol.162 Akhtar et al.:Preprint submitted to ElsevierPage 21 of 22 TinyNeRV Variants for Constrained Deployment of ...

  65. [65]

    B. Lee, D. Kim, Y. you, Y.-M. Kim, Littlebit: Ultra low-bit quantization via latent factorization, in: The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URLhttps://openreview.net/forum?id=zJzu9evD5K

  66. [66]

    T. Chen, S. Chen, X. Qu, D. Zhao, R. Yan, J. Ko, L. Liang, P. Cameron, Stableqat: Stable quantization-aware training at ultra-low bitwidths, arXiv preprint arXiv:2601.19320

  67. [67]

    Cage: Curvature-aware gradient estimation for accurate quantization-aware training.arXiv preprint arXiv:2510.18784,

    S. Tabesh, M. Safaryan, A. Panferov, A. Volkova, D. Alistarh, Cage: Curvature-aware gradient estimation for accurate quantization-aware training, arXiv preprint arXiv:2510.18784

  68. [68]

    S.K.Esser,J.L.McKinstry,D.Bablani,R.Appuswamy,D.S.Modha,Learnedstepsizequantization,in:InternationalConferenceonLearning Representations, 2020

  69. [69]

    5876–5885

    S.Bai,J.Chen,X.Shen,Y.Qian,Y.Liu,Unifieddata-freecompression:Pruningandquantizationwithoutfine-tuning,in:Proceedingsofthe IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 5876–5885

  70. [70]

    M. Li, Z. Huang, L. Chen, J. Ren, M. Jiang, F. Li, J. Fu, C. Gao, Contemporary advances in neural network quantization: A survey, in: 2024 International Joint Conference on Neural Networks (IJCNN), 2024, pp. 1–10.doi:10.1109/IJCNN60899.2024.10650109

  71. [71]

    B.Rokh,A.Azarpeyvand,A.Khanteymoori,Acomprehensivesurveyonmodelquantizationfordeepneuralnetworksinimageclassification, ACM Trans. Intell. Syst. Technol. 14 (6).doi:10.1145/3623402. URLhttps://doi.org/10.1145/3623402

  72. [72]

    1–3.doi:10.1109/ICCE-Asia49877.2020.9276878

    S.Cho,S.Yoo,Per-channelquantizationlevelallocationforquantizingconvolutionalneuralnetworks,in:2020IEEEInternationalConference on Consumer Electronics - Asia (ICCE-Asia), 2020, pp. 1–3.doi:10.1109/ICCE-Asia49877.2020.9276878

  73. [73]

    Q. Li, J. Zou, J. Zhang, W. Long, X. Zhou, S. Gu, Texture vector-quantization and reconstruction aware prediction for generative super- resolution, arXiv preprint arXiv:2509.23774

  74. [74]

    C.Hong,S.Baik,H.Kim,S.Nah,K.M.Lee,Cadyq:Content-awaredynamicquantizationforimagesuper-resolution,in:EuropeanConference on Computer Vision, Springer, 2022, pp. 367–383

  75. [75]

    Qin, Yunchu, Luo, Fugui, Li, Mingzhen, Guo, Peigang, Li, Liangliang, Optimization algorithm for 3d image visual communication based on digital image reconstruction, Int. J. Simul. Multidisci. Des. Optim. 16 (2025) 14.doi:10.1051/smdo/2025015. URLhttps://doi.org/10.1051/smdo/2025015

  76. [76]

    J. Deng, X. Zhang,Z. Yang, C. Zhou, R. Wang, K. Zhang,X. Lv, L. Yang, Z. Wang, P.Li, Z. Ma, Pixel-level regression for uav hyperspectral images: Deep learning-based quantitative inverse of wheat stripe rust disease index, Computers and Electronics in Agriculture 215 (2023) 108434.doi:https://doi.org/10.1016/j.compag.2023.108434. URLhttps://www.sciencedire...

  77. [77]

    Y. Seo, I. Kim, J. Lee, W. Choi, S. Song, On quantization of convolutional neural networks for image restoration, Electronic Imaging 34 (7) (2022) 183–1–183–1.doi:10.2352/EI.2022.34.7.ISS-183. URLhttps://library.imaging.org/ei/articles/34/7/ISS-183

  78. [78]

    Z. Yan, Y. Shi, Y. Wang, M. Tan, Z. Li, W. Tan, Y. Tian, Towards accurate low bit-width quantization with multiple phase adaptations, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 6591–6598

  79. [79]

    Ouyang, T

    X. Ouyang, T. Ge, T. Hartvigsen, Z. Zhang, H. Mi, D. Yu, Low-bit quantization favors undertrained llms, in: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 32338–32348

  80. [80]

    A.Mercat,M.Viitanen,J.Vanne,Uvgdataset:50/120fps4ksequencesforvideocodecanalysisanddevelopment,in:Proceedingsofthe11th ACM multimedia systems conference, 2020, pp. 297–302

Showing first 80 references.