arxiv: 2604.09220 · v1 · submitted 2026-04-10 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

TinyNeRV: Compact Neural Video Representations via Capacity Scaling, Distillation, and Low-Precision Inference

Muhammad Hannan Akhtar , Ihab Amer , Tamer Shanableh

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:23 UTC · model grok-4.3

classification 💻 cs.CV

keywords neural video representationsNeRVmodel compressionknowledge distillationquantizationcompact modelsvideo reconstructionefficient inference

0 comments

The pith

Tiny NeRV models achieve favorable quality-efficiency trade-offs by scaling capacity, adding distillation, and applying low-precision inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how neural networks that encode entire videos can be made small enough for devices with tight memory and power limits. It introduces two reduced-capacity architectures, NeRV-T and NeRV-T+, and tests them on multiple video datasets to track changes in reconstruction quality, speed, and resource use. To offset quality loss from the size reduction, the work applies knowledge distillation guided by frequency-aware focal supervision and evaluates both post-training quantization and quantization-aware training. These steps demonstrate that compact neural video representations can retain usable fidelity while cutting parameter counts, computation, and memory needs.

Core claim

Tiny NeRV variants, when trained with frequency-aware focal knowledge distillation and run under reduced numerical precision, achieve favorable reconstruction quality versus efficiency trade-offs that substantially lower parameter count, computational cost, and memory requirements compared with larger NeRV models.

What carries the argument

The NeRV-T and NeRV-T+ architectures, combined with frequency-aware focal knowledge distillation and post-training or quantization-aware training for low-precision inference.

Load-bearing premise

The quality gains from distillation and quantization seen on the tested videos and model sizes will hold for new videos and real deployments without major degradation or unexpected artifacts.

What would settle it

Running the distilled and quantized tiny models on a fresh set of videos from a different source and measuring whether reconstruction metrics drop markedly below those of the larger full-precision baseline.

Figures

Figures reproduced from arXiv: 2604.09220 by Ihab Amer, Muhammad Hannan Akhtar, Tamer Shanableh.

**Figure 1.** Figure 1: Width-based scaling of NeRV variants. All models share identical depth and decoding structure; capacity differences arise solely from channel dimensionality. effects can be more pronounced in compact models with limited redundancy [77]. Accordingly, understanding the interaction between architectural capacity and numerical precision is essential, which is that aggressive model scaling reduces representatio… view at source ↗

**Figure 2.** Figure 2: Proposed frequency–focal knowledge distillation. Teacher and student predictions are decomposed into lowfrequency (smooth) and high-frequency (edge-dominant residual) components. A focal weighting term 𝑤(𝑘) emphasizes spatial locations with large teacher–student discrepancies in the high-frequency domain, guiding the tiny student toward improved detail reconstruction while preserving global structure. Thi… view at source ↗

**Figure 3.** Figure 3: Representative frames from the four evaluated sequences: Big Buck Bunny, honeybee, readysetgo, and yachtride. The sequences exhibit distinct structural patterns, motion characteristics, and texture density, providing a diverse evaluation setting for tiny neural video representations. For every dataset, models are trained for 300 epochs using identical optimization settings, including learning rate schedule… view at source ↗

**Figure 4.** Figure 4: visualizes the same scaling behavior as a quality–compute trade-off curve, highlighting the separation between the tiny operating points (NeRV-T/T+) and the high-compute NeRV-S/M/L regime [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison across the honeybee, readysetgo, and yachtride sequences. Each row corresponds to one sequence, and each column shows reconstructions from different model configurations. NeRV-L serves as a high-capacity reference, while NeRV-S provides a baseline. NeRV-T+ and NeRV-T illustrate the effect of aggressive capacity reduction in the tiny regime, and the distilled NeRV-T model shows the im… view at source ↗

read the original abstract

Implicit neural video representations encode entire video sequences within the parameters of a neural network and enable constant time frame reconstruction. Recent work on Neural Representations for Videos (NeRV) has demonstrated competitive reconstruction performance while avoiding the sequential decoding process of conventional video codecs. However, most existing studies focus on moderate or high capacity models, leaving the behavior of extremely compact configurations required for constrained environments insufficiently explored. This paper presents a systematic study of tiny NeRV architectures designed for efficient deployment. Two lightweight configurations, NeRV-T and NeRV-T+, are introduced and evaluated across multiple video datasets in order to analyze how aggressive capacity reduction affects reconstruction quality, computational complexity, and decoding throughput. Beyond architectural scaling, the work investigates strategies for improving the performance of compact models without increasing inference cost. Knowledge distillation with frequency-aware focal supervision is explored to enhance reconstruction fidelity in low-capacity networks. In addition, the impact of lowprecision inference is examined through both post training quantization and quantization aware training to study the robustness of tiny models under reduced numerical precision. Experimental results demonstrate that carefully designed tiny NeRV variants can achieve favorable quality efficiency trade offs while substantially reducing parameter count, computational cost, and memory requirements. These findings provide insight into the practical limits of compact neural video representations and offer guidance for deploying NeRV style models in resource constrained and real-time environments. The official implementation is available at https: //github.com/HannanAkhtar/TinyNeRV-Implementation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper introduces two compact NeRV variants, NeRV-T and NeRV-T+, obtained via aggressive capacity scaling. It augments these with frequency-aware knowledge distillation and studies low-precision inference through both post-training quantization (PTQ) and quantization-aware training (QAT). Experiments across standard video datasets report that the resulting models maintain competitive reconstruction quality while delivering large reductions in parameter count, FLOPs, and memory footprint, together with improved decoding throughput. The work concludes with practical guidance for deploying implicit neural video representations under tight resource constraints. The official implementation is released.

Significance. If the reported trade-offs hold under broader testing, the paper meaningfully extends the NeRV line of work into the tiny-model regime that is relevant for edge and real-time applications. The systematic comparison of capacity scaling, distillation, and quantization provides concrete empirical guidance rather than isolated point results. The public release of the implementation is a clear strength that supports reproducibility and follow-on research.

major comments (2)

[§4.2 and Table 4] §4.2 and Table 4: the frequency-aware distillation objective is shown to improve PSNR by 0.8–1.2 dB over vanilla distillation, yet the focal weighting hyper-parameter is selected via grid search on the validation split of each dataset; this makes the reported gains partly dependent on per-dataset tuning and weakens the claim that the method is generally applicable to unseen videos.
[§5.1, Table 6] §5.1, Table 6: the 4-bit QAT results are presented without error bars or multiple random seeds; given that the PSNR gap versus 8-bit is only 0.3 dB on average, the conclusion that tiny NeRV models are “robust” to low-precision inference rests on a single-run comparison whose statistical reliability is unclear.

minor comments (3)

[Figure 2] Figure 2: the capacity-scaling curves would be easier to interpret if the x-axis were labeled in absolute parameter counts rather than only relative reduction percentages.
[§3.1] §3.1: the architectural differences between NeRV-T and NeRV-T+ are described only in prose; a compact table listing layer widths, MLP depths, and positional-encoding frequencies would improve clarity.
[References] References: several citations to the original NeRV paper and follow-up works lack arXiv identifiers or page numbers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the constructive feedback. We address each major comment point by point below, indicating the revisions we will incorporate into the manuscript.

read point-by-point responses

Referee: [§4.2 and Table 4] §4.2 and Table 4: the frequency-aware distillation objective is shown to improve PSNR by 0.8–1.2 dB over vanilla distillation, yet the focal weighting hyper-parameter is selected via grid search on the validation split of each dataset; this makes the reported gains partly dependent on per-dataset tuning and weakens the claim that the method is generally applicable to unseen videos.

Authors: We agree that selecting the focal weighting hyper-parameter via per-dataset grid search on the validation split limits the strength of the general-applicability claim. In the revised manuscript we will add a new set of results in which a single fixed value (the median of the per-dataset optima) is used across all datasets. The corresponding PSNR gains will be reported in an updated Table 4, and a short sensitivity analysis will be added to §4.2. These changes will directly address the concern while preserving the original per-dataset results for reference. revision: yes
Referee: [§5.1, Table 6] §5.1, Table 6: the 4-bit QAT results are presented without error bars or multiple random seeds; given that the PSNR gap versus 8-bit is only 0.3 dB on average, the conclusion that tiny NeRV models are “robust” to low-precision inference rests on a single-run comparison whose statistical reliability is unclear.

Authors: We concur that the absence of error bars and multiple seeds reduces the statistical reliability of the 4-bit QAT comparison. We will rerun the 4-bit QAT experiments with at least three independent random seeds, report mean PSNR and standard deviation in a revised Table 6, and update the accompanying text in §5.1 to reflect the observed variability. This will provide a more rigorous basis for the robustness statement. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical study only

full rationale

The paper is an empirical investigation of tiny NeRV architectures, capacity scaling, frequency-aware distillation, and PTQ/QAT quantization. It reports experimental results on video datasets without any mathematical derivation chain, uniqueness theorems, or predictions that reduce to fitted inputs by construction. No self-citation load-bearing steps or ansatz smuggling appear in the provided abstract or described methods. The central claims rest on measured quality-efficiency trade-offs, which are externally falsifiable via the released implementation and datasets.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard deep-learning assumptions about neural network expressivity for video signals and the effectiveness of distillation and quantization; no new entities are postulated and free parameters are the usual architectural and training hyperparameters.

free parameters (2)

Capacity scaling hyperparameters for NeRV-T and NeRV-T+
Specific layer widths and depths chosen to achieve extreme parameter reduction while remaining trainable.
Quantization bit widths and training schedules
Low-precision levels and whether post-training or quantization-aware training is used are selected to balance quality and efficiency.

axioms (2)

domain assumption A neural network can implicitly encode an entire video sequence in its parameters for constant-time frame reconstruction.
Core premise inherited from prior NeRV work and assumed to hold for the scaled-down variants.
domain assumption Knowledge distillation with frequency-aware supervision can improve low-capacity model fidelity without raising inference cost.
Standard ML technique applied here; its effectiveness on tiny NeRV is taken as transferable.

pith-pipeline@v0.9.0 · 5568 in / 1469 out tokens · 85038 ms · 2026-05-10T18:23:57.375440+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Two lightweight configurations, NeRV-T and NeRV-T+, are introduced... Knowledge distillation with frequency-aware focal supervision... post training quantization and quantization aware training
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

width-based scaling of NeRV variants... channel dimensionality... GFLOPs breakdown

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

90 extracted references · 35 canonical work pages · 3 internal anchors

[1]

H. Chen, M. Gwilliam, S.-N. Lim, A. Shrivastava, Nerv: Neural representations for videos, in: Advances in Neural Information Processing Systems, 2021

2021
[2]

W. Roth, G. Schindler, B. Klein, R. Peharz, S. Tschiatschek, H. Fröning, F. Pernkopf, Z. Ghahramani, Resource-efficient neural networks for embedded systems, Journal of Machine Learning Research 25 (50) (2024) 1–51. URLhttp://jmlr.org/papers/v25/18-566.html

2024
[3]

M. T. Lê, P. Wolinski, J. Arbel, Efficient neural networks for tiny machine learning: A comprehensive review, ACM Trans. Intell. Syst. Technol.Just Accepted.doi:10.1145/3798276. URLhttps://doi.org/10.1145/3798276

work page doi:10.1145/3798276
[4]

Gutierrez-Torre, K

A. Gutierrez-Torre, K. Bahadori, S.-U.-R. Baig, W. Iqbal, T. Vardanega, J. L. Berral, D. Carrera, Automatic distributed deep learning using resource-constrained edge devices, IEEE Internet of Things Journal 9 (16) (2022) 15018–15029.doi:10.1109/JIOT.2021.3098973

work page doi:10.1109/jiot.2021.3098973 2022
[5]

C.Yahyati,I.Lamaakal,Y.Maleh,K.ElMakkaoui,I.Ouahbi,M.Almousa,A.A.AbdEl-Latif,Asystematicreviewofstate-of-the-arttinyml applications in healthcare, education, and transportation, IEEE Access 13 (2025) 204513–204562

2025
[6]

Z.Németh,C.H.See,K.Goh,A.Ghani,S.Keates,R.A.Abd-Alhameed,Machinelearning-basedpositiondetectionusinghall-effectsensor arrays on resource-constrained microcontroller, Sensors 25 (20) (2025) 6444

2025
[7]

M. J. Reis, Low-power embedded sensor node for real-time environmental monitoring with on-board machine-learning inference, Sensors 26 (2) (2026) 703

2026
[8]

J. H. Kadhum, A. D. Radhi, L. S. Ismail, M. G. Waheed, H. Alzamily, I. A. Mohammad, W. A. Hashim, Ultra-low power sensor data processing in smart agriculture using attention-optimized tinyml models for edge microcontrollers, in: 2025 3rd International Conference on Cyber Resilience (ICCR), IEEE, 2025, pp. 1–8

2025
[9]

URLhttps://www.mdpi.com/1424-8220/23/3/1185 Akhtar et al.:Preprint submitted to ElsevierPage 19 of 22 TinyNeRV Variants for Constrained Deployment

K.Kim,S.-J.Jang,J.Park,E.Lee,S.-S.Lee,Lightweightandenergy-efficientdeeplearningacceleratorforreal-timeobjectdetectiononedge devices, Sensors 23 (3).doi:10.3390/s23031185. URLhttps://www.mdpi.com/1424-8220/23/3/1185 Akhtar et al.:Preprint submitted to ElsevierPage 19 of 22 TinyNeRV Variants for Constrained Deployment

work page doi:10.3390/s23031185
[10]

B. Kim, S. Lee, On-nas: On-device neural architecture search on memory-constrained intelligent embedded systems, in: Proceedings of the 21st ACM Conference on Embedded Networked Sensor Systems, 2023, pp. 152–166

2023
[11]

Zheng, Y.-Y

H.-S. Zheng, Y.-Y. Liu, C.-F. Hsu, T. T. Yeh, Streamnet: Memory-efficient streaming tiny deep learning inference on the microcontroller, Advances in Neural Information Processing Systems 36 (2023) 37160–37172

2023
[12]

P.-G. Ye, W. Wang, B. Mi, K. Chen, Edgestreaming: Secure computation intelligence in distributed edge networks for streaming analytics, ACM Transactions on Multimedia Computing, Communications and Applications 21 (8) (2025) 1–15

2025
[13]

C.M.Bhushan,P.Koppuravuri,N.Prasanthi,F.Gazi,M.M.Hussain,M.Abdussami,A.A.Devi,J.Faizi,Deployingtinymlforenergy-efficient object detection and communication in low-power edge ai systems, Scientific Reports
[14]

J.Tu,L.Yang,J.Cao,Distributedmachinelearninginedgecomputing:Challenges,solutionsandfuturedirections,ACMComputingSurveys 57 (5) (2025) 1–37

2025
[15]

Berthelier, T

A. Berthelier, T. Chateau, S. Duffner, C. Garcia, C. Blanc, Deep model compression and architecture optimization for embedded systems: A survey, Journal of Signal Processing Systems 93 (8) (2021) 863–878

2021
[16]

H.-I. Liu, M. Galindo, H. Xie, L.-K. Wong, H.-H. Shuai, Y.-H. Li, W.-H. Cheng, Lightweight deep learning for resource-constrained environments: A survey, ACM Computing Surveys 56 (10) (2024) 1–42

2024
[17]

Kulkarni, V

V. Kulkarni, V. Jujare, Tinyml using neural networks for resource-constrained devices, in: TinyML for Edge Intelligence in IoT and LPWAN Networks, Elsevier, 2024, pp. 87–101

2024
[18]

Heydari, Q

S. Heydari, Q. H. Mahmoud, Tiny machine learning and on-device inference: A survey of applications, challenges, and future directions, Sensors 25 (10) (2025) 3191

2025
[19]

Sitzmann, J

V. Sitzmann, J. Martel, A. Bergman, D. B. Lindell, G. Wetzstein, Implicit neural representations with periodic activation functions, in: Advances in Neural Information Processing Systems, 2020

2020
[20]

Li, et al., Hnerv: A hybrid neural representation for videos, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Z. Li, et al., Hnerv: A hybrid neural representation for videos, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2023
[21]

B.He,X.Yang,H.Wang,Z.Wu,H.Chen,S.Huang,Y.Ren,S.-N.Lim,Hinerv:Videocompressionwithhierarchicalencoding-basedneural representation, in: Advances in Neural Information Processing Systems, 2023

2023
[22]

Tarchouli, T

M. Tarchouli, T. Guionnet, M. Riviere, W. Hamidouche, M. Outtas, O. Deforges, Res-nerv: Residual blocks for a practical implicit neural video decoder, in: Proceedings of the IEEE International Conference on Image Processing, 2024

2024
[23]

L.Yu,etal.,High-frequencyenhancedhybridneuralrepresentationsforvideo,in:ProceedingsoftheIEEEInternationalConferenceonImage Processing, 2023

2023
[24]

Kim, J.-W

J. Kim, J.-W. Kang, Spatio-temporal spectra-preserving neural representation for video modeling, ACM Transactions on Multimedia Computing, Communications and Applications
[25]

12543–12552

Z.Jia,B.Li,J.Li,W.Xie,L.Qi,H.Li,Y.Lu,Towardspracticalreal-timeneuralvideocompression,in:ProceedingsoftheComputerVision and Pattern Recognition Conference, 2025, pp. 12543–12552

2025
[26]

M. Tan, Q. V. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, in: Proceedings of the International Conference on Machine Learning, 2019

2019
[27]

Jacob, S

B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, D. Kalenichenko, Quantization and training of neural networks for efficient integer-arithmetic-only inference, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018

2018
[28]

Explaining neural scaling laws , volume=

Y. Bahri, E. Dyer, J. Kaplan, J. Lee, U. Sharma, Explaining neural scaling laws, Proceedings of the National Academy of Sciences 121 (27) (2024) e2311878121.arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.2311878121,doi:10.1073/pnas.2311878121. URLhttps://www.pnas.org/doi/abs/10.1073/pnas.2311878121

work page doi:10.1073/pnas.2311878121 2024
[29]

Sengupta, Y

A. Sengupta, Y. Goel, T. Chakraborty, How to upscale neural networks with scaling law?, Transactions on Machine Learning Research. URLhttps://openreview.net/forum?id=AL7N0UOfgI
[30]

16871–16881

J.Shao,H.Zhang,H.Yu,J.Wu,Memory-efficientgenerativemodelsviaproductquantization,in:ProceedingsoftheIEEE/CVFInternational Conference on Computer Vision, 2025, pp. 16871–16881

2025
[31]

H. M. Kwan, T. Peng, G. Gao, F. Zhang, M. Nilsson, A. Gower, D. Bull, Ultra-lightweight neural video representation compression, arXiv preprint arXiv:2512.04019

work page arXiv
[32]

Cordova-Cardenas, D

R. Cordova-Cardenas, D. Amor, Á. Gutiérrez, Edge ai in practice: A survey and deployment framework for neural networks on embedded systems, Electronics 14 (24) (2025) 4877

2025
[33]

Distilling the Knowledge in a Neural Network

G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network (2015).arXiv:1503.02531. URLhttps://arxiv.org/abs/1503.02531

work page internal anchor Pith review Pith/arXiv arXiv 2015
[34]

A.Romero,N.Ballas,S.E.Kahou,A.Chassang,C.Gatta,Y.Bengio,Fitnets:hintsforthindeepnets(2014),arXivpreprintarXiv:1412.6550 3

work page internal anchor Pith review arXiv 2014
[35]

Zagoruyko, N

S. Zagoruyko, N. Komodakis, Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer, in: International Conference on Learning Representations, 2017. URLhttps://openreview.net/forum?id=Sks9_ajex

2017
[36]

Moslemi, A

A. Moslemi, A. Briskina, Z. Dang, J. Li, A survey on knowledge distillation: Recent advancements, Machine Learning with Applications 18 (2024) 100605.doi:https://doi.org/10.1016/j.mlwa.2024.100605. URLhttps://www.sciencedirect.com/science/article/pii/S2666827024000811

work page doi:10.1016/j.mlwa.2024.100605 2024
[37]

A. M. Mansourian, R. Ahmadi, M. Ghafouri, A. M. Babaei, E. B. Golezani, Z. yasamani ghamchi, V. Ramezanian, A. Taherian, K. Dinashi, A. Miri, S. Kasaei, A comprehensive survey on knowledge distillation, Transactions on Machine Learning Research. URLhttps://openreview.net/forum?id=3cbJzdR78B
[38]

Wang, K.-J

L. Wang, K.-J. Yoon, Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks, IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (6) (2022) 3048–3068.doi:10.1109/TPAMI.2021.3055564. Akhtar et al.:Preprint submitted to ElsevierPage 20 of 22 TinyNeRV Variants for Constrained Deployment

work page doi:10.1109/tpami.2021.3055564 2022
[39]

X. Li, L. Li, M. Li, P. Yan, T. Feng, H. Luo, Y. Zhao, S. Yin, Knowledge distillation and teacher–student learning in medical imaging: Comprehensive overview, pivotal role, and future directions, Medical Image Analysis 107 (2026) 103819.doi:https://doi.org/10. 1016/j.media.2025.103819. URLhttps://www.sciencedirect.com/science/article/pii/S1361841525003652

work page arXiv 2026
[40]

Aalishah, M

R. Aalishah, M. Navardi, T. Mohsenin, Medmambalite: Hardware-aware mamba for medical image classification, in: 2025 IEEE Biomedical Circuits and Systems Conference (BioCAS), 2025, pp. 224–228.doi:10.1109/BioCAS67066.2025.00057

work page doi:10.1109/biocas67066.2025.00057 2025
[41]

Z. Li, S. Xia, J. Yue, L. Fang, Hyperkd: Lifelong hyperspectral image classification with cross-spectral–spatial knowledge distillation, IEEE Transactions on Geoscience and Remote Sensing 63 (2025) 1–17.doi:10.1109/TGRS.2025.3552449

work page doi:10.1109/tgrs.2025.3552449 2025
[42]

K. Wang, F. Zheng, D. Guan, J. Liu, J. Qin, Distilling heterogeneous knowledge with aligned biological entities for histological image classification, Pattern Recognition 160 (2025) 111173.doi:https://doi.org/10.1016/j.patcog.2024.111173. URLhttps://www.sciencedirect.com/science/article/pii/S0031320324009245

work page doi:10.1016/j.patcog.2024.111173 2025
[43]

Himeur, N

Y. Himeur, N. Aburaed, O. Elharrouss, I. Varlamis, S. Atalla, W. Mansoor, H. Al-Ahmad, Applications of knowledge distillation in remote sensing: A survey, Information Fusion 115 (2025) 102742.doi:https://doi.org/10.1016/j.inffus.2024.102742. URLhttps://www.sciencedirect.com/science/article/pii/S1566253524005207

work page doi:10.1016/j.inffus.2024.102742 2025
[44]

16871–16881

J.Shao,H.Zhang,H.Yu,J.Wu,Memory-efficientgenerativemodelsviaproductquantization,in:ProceedingsoftheIEEE/CVFInternational Conference on Computer Vision (ICCV), 2025, pp. 16871–16881

2025
[45]

W. Zhu, X. Zhou, P. Zhu, Y. Wang, Q. Hu, Ckd: Contrastive knowledge distillation from a sample-wise perspective, IEEE Transactions on Image Processing 34 (2025) 3578–3592.doi:10.1109/TIP.2025.3573474

work page doi:10.1109/tip.2025.3573474 2025
[46]

A. T. Xu, A. Wilf, P. P. Liang, A. Obolenskiy, D. Fried, L.-P. Morency, Comparative knowledge distillation, in: Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 7690–7699

2025
[47]

Y. Chen, A. Habibian, L. Benini, Y. Li, Gated relational alignment via confidence-based distillation for efficient vlms, arXiv preprint arXiv:2601.22709

work page internal anchor Pith review Pith/arXiv arXiv
[48]

Zheng, Y

K. Zheng, Y. Wang, Y. Yuan, Boosting contrastive learning with relation knowledge distillation, Proceedings of the AAAI Conference on Artificial Intelligence 36 (3) (2022) 3508–3516.doi:10.1609/aaai.v36i3.20262. URLhttps://ojs.aaai.org/index.php/AAAI/article/view/20262

work page doi:10.1609/aaai.v36i3.20262 2022
[49]

H. Wu, L. Xiao, X. Zhang, Y. Miao, Aligning in a compact space: Contrastive knowledge distillation between heterogeneous architectures, arXiv preprint arXiv:2405.18524

work page arXiv
[50]

Camarena, M

F. Camarena, M. Gonzalez-Mendoza, L. Chang, Knowledge distillation in video-based human action recognition: An intuitive approach to efficient and flexible model training, Journal of Imaging 10 (4).doi:10.3390/jimaging10040085. URLhttps://www.mdpi.com/2313-433X/10/4/85

work page doi:10.3390/jimaging10040085
[51]

Z. Xiao, H. Xing, R. Qu, H. Li, X. Cheng, L. Xu, L. Feng, Q. Wan, Heterogeneous mutual knowledge distillation for wearable human activity recognition, IEEE Transactions on Neural Networks and Learning Systems 36 (9) (2025) 16589–16603.doi:10.1109/TNNLS. 2025.3556317

work page doi:10.1109/tnnls 2025
[52]

Zhang, R

X. Zhang, R. Yang, D. He, X. Ge, T. Xu, Y. Wang, H. Qin, J. Zhang, Boosting neural representations for videos with a conditional decoder, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 2556–2566

2024
[53]

C. Zhao, Z. Chen, Y. Xu, E. Gu, J. Li, Z. Yi, Q. Wang, J. Yang, Y. Tai, From zero to detail: Deconstructing ultra-high-definition image restoration fromprogressive spectralperspective, in: Proceedingsof theIEEE/CVF Conference onComputer Visionand Pattern Recognition (CVPR), 2025, pp. 17935–17946

2025
[54]

Zhang, Simultaneous learning knowledge distillation for image restoration: Efficient model compression for drones, Drones 9 (3)

Y. Zhang, Simultaneous learning knowledge distillation for image restoration: Efficient model compression for drones, Drones 9 (3). doi:10.3390/drones9030209. URLhttps://www.mdpi.com/2504-446X/9/3/209

work page doi:10.3390/drones9030209
[55]

F.AliDharejo,B.Alawode,I.IyappanGanapathi,L.Wang,A.Mian,R.Timofte,S.Javed,Multi-distillationunderwaterimagesuper-resolution via wavelet transform, IEEE Access 12 (2024) 131083–131099.doi:10.1109/ACCESS.2024.3449136

work page doi:10.1109/access.2024.3449136 2024
[56]

X. Li, S. Gao, S. Shang, Knowledge distillation method for spatio-temporal tasks: a survey, GeoInformatica 30 (1) (2026) 4

2026
[57]

Y. Li, C. Yang, H. Zeng, Z. Dong, Z. An, Y. Xu, Y. Tian, H. Wu, Frequency-aligned knowledge distillation for lightweight spatiotemporal forecasting, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 7262–7272

2025
[58]

J. Kim, J. Lee, J.-W. Kang, Snerv: Spectra-preserving neural representation for video, in: A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, G. Varol (Eds.), Computer Vision – ECCV 2024, Springer Nature Switzerland, Cham, 2025, pp. 332–348

2024
[59]

H. A. Abushahla, D. Varam, A. J. N. Panopio, M. I. AlHajri, Neural network quantization for microcontrollers: A comprehensive survey of methods, platforms, and applications (2026).arXiv:2508.15008. URLhttps://arxiv.org/abs/2508.15008

work page arXiv 2026
[60]

K. Liu, Q. Zheng, K. Tao, Z. Li, H. Qin, W. Li, Y. Guo, X. Liu, L. Kong, G. Chen, Y. Zhang, X. Yang, Low-bit model quantization for deep neural networks: A survey (2025).arXiv:2505.05530. URLhttps://arxiv.org/abs/2505.05530

work page arXiv 2025
[61]

J.Yang,X.Shen,J.Xing,X.Tian,H.Li,B.Deng,J.Huang,X.-s.Hua,Quantizationnetworks,in:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR), 2019

2019
[62]

P. Yin, J. Lyu, S. Zhang, S. J. Osher, Y. Qi, J. Xin, Understanding straight-through estimator in training activation quantized neural nets, in: International Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=Skh4jRcKQ

2019
[63]

Schoenbauer, D

M. Schoenbauer, D. Moro, L. Lew, A. G. Howard, Custom gradient estimators are straight-through estimators in disguise (2025). URLhttps://openreview.net/forum?id=3j72egd8q1

2025
[64]

Nagel, M

M. Nagel, M. Fournarakis, Y. Bondarenko, T. Blankevoort, Overcoming oscillations in quantization-aware training, in: K. Chaudhuri, S.Jegelka,L.Song,C.Szepesvari,G.Niu,S.Sabato(Eds.),Proceedingsofthe39thInternationalConferenceonMachineLearning,Vol.162 Akhtar et al.:Preprint submitted to ElsevierPage 21 of 22 TinyNeRV Variants for Constrained Deployment of ...

2022
[65]

B. Lee, D. Kim, Y. you, Y.-M. Kim, Littlebit: Ultra low-bit quantization via latent factorization, in: The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URLhttps://openreview.net/forum?id=zJzu9evD5K

2025
[66]

T. Chen, S. Chen, X. Qu, D. Zhao, R. Yan, J. Ko, L. Liang, P. Cameron, Stableqat: Stable quantization-aware training at ultra-low bitwidths, arXiv preprint arXiv:2601.19320

work page arXiv
[67]

Cage: Curvature-aware gradient estimation for accurate quantization-aware training.arXiv preprint arXiv:2510.18784,

S. Tabesh, M. Safaryan, A. Panferov, A. Volkova, D. Alistarh, Cage: Curvature-aware gradient estimation for accurate quantization-aware training, arXiv preprint arXiv:2510.18784

work page arXiv
[68]

S.K.Esser,J.L.McKinstry,D.Bablani,R.Appuswamy,D.S.Modha,Learnedstepsizequantization,in:InternationalConferenceonLearning Representations, 2020

2020
[69]

5876–5885

S.Bai,J.Chen,X.Shen,Y.Qian,Y.Liu,Unifieddata-freecompression:Pruningandquantizationwithoutfine-tuning,in:Proceedingsofthe IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 5876–5885

2023
[70]

M. Li, Z. Huang, L. Chen, J. Ren, M. Jiang, F. Li, J. Fu, C. Gao, Contemporary advances in neural network quantization: A survey, in: 2024 International Joint Conference on Neural Networks (IJCNN), 2024, pp. 1–10.doi:10.1109/IJCNN60899.2024.10650109

work page doi:10.1109/ijcnn60899.2024.10650109 2024
[71]

B.Rokh,A.Azarpeyvand,A.Khanteymoori,Acomprehensivesurveyonmodelquantizationfordeepneuralnetworksinimageclassification, ACM Trans. Intell. Syst. Technol. 14 (6).doi:10.1145/3623402. URLhttps://doi.org/10.1145/3623402

work page doi:10.1145/3623402
[72]

1–3.doi:10.1109/ICCE-Asia49877.2020.9276878

S.Cho,S.Yoo,Per-channelquantizationlevelallocationforquantizingconvolutionalneuralnetworks,in:2020IEEEInternationalConference on Consumer Electronics - Asia (ICCE-Asia), 2020, pp. 1–3.doi:10.1109/ICCE-Asia49877.2020.9276878

work page doi:10.1109/icce-asia49877.2020.9276878 2020
[73]

Q. Li, J. Zou, J. Zhang, W. Long, X. Zhou, S. Gu, Texture vector-quantization and reconstruction aware prediction for generative super- resolution, arXiv preprint arXiv:2509.23774

work page arXiv
[74]

C.Hong,S.Baik,H.Kim,S.Nah,K.M.Lee,Cadyq:Content-awaredynamicquantizationforimagesuper-resolution,in:EuropeanConference on Computer Vision, Springer, 2022, pp. 367–383

2022
[75]

Qin, Yunchu, Luo, Fugui, Li, Mingzhen, Guo, Peigang, Li, Liangliang, Optimization algorithm for 3d image visual communication based on digital image reconstruction, Int. J. Simul. Multidisci. Des. Optim. 16 (2025) 14.doi:10.1051/smdo/2025015. URLhttps://doi.org/10.1051/smdo/2025015

work page doi:10.1051/smdo/2025015 2025
[76]

J. Deng, X. Zhang,Z. Yang, C. Zhou, R. Wang, K. Zhang,X. Lv, L. Yang, Z. Wang, P.Li, Z. Ma, Pixel-level regression for uav hyperspectral images: Deep learning-based quantitative inverse of wheat stripe rust disease index, Computers and Electronics in Agriculture 215 (2023) 108434.doi:https://doi.org/10.1016/j.compag.2023.108434. URLhttps://www.sciencedire...

work page doi:10.1016/j.compag.2023.108434 2023
[77]

Y. Seo, I. Kim, J. Lee, W. Choi, S. Song, On quantization of convolutional neural networks for image restoration, Electronic Imaging 34 (7) (2022) 183–1–183–1.doi:10.2352/EI.2022.34.7.ISS-183. URLhttps://library.imaging.org/ei/articles/34/7/ISS-183

work page doi:10.2352/ei.2022.34.7.iss-183 2022
[78]

Z. Yan, Y. Shi, Y. Wang, M. Tan, Z. Li, W. Tan, Y. Tian, Towards accurate low bit-width quantization with multiple phase adaptations, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 6591–6598

2020
[79]

Ouyang, T

X. Ouyang, T. Ge, T. Hartvigsen, Z. Zhang, H. Mi, D. Yu, Low-bit quantization favors undertrained llms, in: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 32338–32348

2025
[80]

A.Mercat,M.Viitanen,J.Vanne,Uvgdataset:50/120fps4ksequencesforvideocodecanalysisanddevelopment,in:Proceedingsofthe11th ACM multimedia systems conference, 2020, pp. 297–302

2020

Showing first 80 references.