TVRN: Invertible Neural Networks for Compression-Aware Temporal Video Rescaling

arxiv: 2605.15579 · v1 · pith:OJKE2CHNnew · submitted 2026-05-15 · 📡 eess.IV · cs.CV

TVRN: Invertible Neural Networks for Compression-Aware Temporal Video Rescaling

Xinmin Feng , Li Li , Dong Liu , Feng Wu This is my paper

Pith reviewed 2026-05-19 19:49 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords video rescalinginvertible neural networksframe rate conversionvideo compressiontemporal wavelet transformhigh frequency reconstructionsurrogate networkcompression aware

0 comments p. Extension

pith:OJKE2CHN Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{OJKE2CHN}

Prints a linked pith:OJKE2CHN badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

TVRN's invertible architecture and surrogate network enable end-to-end compression-aware video frame rate rescaling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces TVRN to optimize the downscaling of high-frame-rate videos to low-frame-rate versions and their subsequent upscaling, while accounting for the effects of lossy compression codecs. Prior methods connect these operations only loosely through objectives and overlook codec impacts, resulting in lost high-frequency details. The approach uses an invertible network structure featuring a temporal wavelet transform and high-frequency reconstruction to maintain information, along with a surrogate model that allows gradient propagation through the codec. An asymmetric version adds robustness to varying compression levels using learned features. If the method works as described, videos could maintain higher fidelity when adapted to different frame rates and bandwidths in practical compression pipelines.

Core claim

The authors present TVRN as an end-to-end framework that regularizes high-frequency information loss in frame-rate downscaling through an invertible architecture with Multi-Input Multi-Output Temporal Wavelet Transform and high-frequency reconstruction module, approximates codec gradients via a surrogate network for end-to-end training, and incorporates compression-aware features in an asymmetric architecture via learning-to-rank for robustness under various compression levels.

What carries the argument

Invertible neural network that combines Multi-Input Multi-Output Temporal Wavelet Transform with a high-frequency reconstruction module, using a surrogate network to enable gradients through non-differentiable codecs.

If this is right

Improved reconstruction quality for upscaled videos after lossy compression.
Better handling of high-frequency details that would otherwise be lost in rescaling.
Robust performance across different industrial compression settings and levels.
End-to-end optimization becomes possible for reciprocal downscaling and upscaling operations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar invertible designs could be applied to other video processing tasks involving irreversible operations like compression.
The learning-to-rank strategy for features might help in other adaptive quality optimization scenarios.
Future work could test if the surrogate generalizes to unseen codec types without retraining.

Load-bearing premise

The surrogate network approximates the gradients of lossy codecs accurately enough that the optimized model performs well when deployed with the actual non-differentiable codec.

What would settle it

Training the model using the surrogate and then evaluating the reconstruction quality with the true codec versus a version where the codec is made differentiable or bypassed, to check if performance holds.

Figures

Figures reproduced from arXiv: 2605.15579 by Dong Liu, Feng Wu, Li Li, Xinmin Feng.

**Figure 1.** Figure 1: Overview of the proposed Temporal Video Rescaling Network (TVRN). The MIMO-TWT followed by MIMO-VRN first decomposes a group of frames x from high-frame-rate (HFR) videos into visually pleasing low-frame-rate (LFR) videos y and high-frequency components z. To reconstruct high-frame-rate videos xb, the inverse process is applied using the compressed LFR videos ye and the reconstructed high-frequency compone… view at source ↗

**Figure 2.** Figure 2: Bidirectional Optical Flow-Guided High-Frequency Component Reconstruction Module. This module reconstructs the high-frequency components zt at time t using the neighboring frames I0 and I1. We first compute the difference between the bidirectionally warped frames as the initial estimate zˆinit. Then, multi-scale contextual features are aligned to time t using the bidirectional optical flows F0→t and F1→t.… view at source ↗

**Figure 3.** Figure 3: Structure of the proposed surrogate network. The network emulates non-differentiable lossy video codecs by degrading the original frame y(t) into a low-quality frame yˆ(t) , using the previously degraded frame yˆ(t−1) as a reference. We first apply the MISO temporal wavelet transform with the motion vector derived from bitsreams, followed by stacked Q-Invertible Blocks, to decompose the temporal frequency … view at source ↗

**Figure 4.** Figure 4: Reconstruction performance gain using different strategies for integrating enhancement modules into our framework, compared to the symmetric rescaling network. The striped box denotes separate models trained for each QP, while the spotted box indicates a single model trained across all four QPs. For a clear comparison of PSNR gains at different compression levels, the pre-trained MIMO-TWT and MIMO-VRN are … view at source ↗

**Figure 5.** Figure 5: Frequency Analysis of Compressed LFR Frames. We visualize the log-magnitude spectrum of LFR frames produced by frame skipping, downscaling, and downscaling-then-upscaling on the SNU-FILM dataset. The overlap ratio ρovl quantifies spectral similarity to frame skipping. Results indicate that as QP decreases, downscaled frames exhibit increasingly rich high-frequency content, complicating subsequent enhanceme… view at source ↗

**Figure 6.** Figure 6: Structure of the LFR restoration module. Built on the classical video enhancement method STDR [57], we integrate compression-aware features learned via a learning-to-rank strategy to adaptively control enhancement strength through a residual shortcut. from compression artifacts. As shown in [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative comparison between our method and competitive baselines on sequences 00023 0059, 00026 0036, 00026 0003, and 00025 0034 from the Vimeo90K septuplet dataset [60]. To facilitate visual comparison, we crop and display regions with noticeable interpolation artifacts. The full frames are shown at the bottom right for context. Bold: best performance, Underline: second best performance. Best viewed by… view at source ↗

**Figure 8.** Figure 8: Rate–Distortion Curves on test datasets. For frame-skipping methods [42], [50], [51], fixed QPs of 18, 22, 27, 32, and 37 are used for HEVC and VVC, and 20, 26, 32, 38, and 44 for AV1. For learned frame-rate downscaling methods [7], [8], the QP is slightly increased to align bitrates. The concurrent method CSTVR [8] also employs a partially invertible architecture to embed motion information into LFR video… view at source ↗

**Figure 9.** Figure 9: Rate–Perception Curves on the SNU-FILM Medium testset [69]. Temporal consistency is evaluated using tOF [66] and PSNRwarp [67], while perceptual quality is assessed by LPIPS [65] and VMAF [64]. Red boxes denote metrics where higher values indicate better performance, whereas blue boxes denote metrics where lower values are preferred. 𝒕 𝒙 EMA-VFI 0.1219/36.10/0.9353 GIMM-VFI 0.1219/36.14/0.9358 Ground Truth… view at source ↗

**Figure 10.** Figure 10: Comparison of temporal profiles in reconstructed high-frame-rate videos. We examine a specific row across consecutive frames to assess temporal consistency. The red line indicates row 400 of the GOPR0384 11 05 sequence from the SNU-FILM test dataset. The visualization spans 50 frames starting from the 200th frame. Error-prone regions are marked with green boxes and arrows. Best viewed by zooming in. We ob… view at source ↗

**Figure 11.** Figure 11: Qualitative comparison on sequence 00026 0036 from the Vimeo90K septuplet dataset, showing low-frame-rate frames compressed by HEVC. Compared to frame-skipping, learned frame-rate downscaling methods better suppress color shifts (red box) and preserve fine details (green box). However, since high-frequency motion information is implicitly embedded in the downscaled frames, these methods typically yield lo… view at source ↗

**Figure 12.** Figure 12: Subjective quality comparison in terms of Mean Opinion Score (MOS). Scores are averaged over all users, and error bars indicate standard deviation. Higher MOS values indicate better perceived visual quality. TABLE IV ABLATION STUDY OF DIFFERENT GRADIENT SIMULATION STRATEGIES. EXPERIMENTS ARE CONDUCTED ON THE VIMEO90K-SEPTUPLET TEST SET. BDBR IS CALCULATED USING GIMM-VFI [51] AS THE BASELINE. Method Simula… view at source ↗

**Figure 14.** Figure 14: Ablation study on the guidance loss weight λ. (a) Training loss and validation performance across different λ values. The left subfigure shows HFR reconstruction loss over the first 10,000 iterations, while the right subfigure shows RD performance on the validation set over 50,000 iterations, evaluated every 2,000 iterations. (b) RD performance comparison for reconstructed HFR (left) and downscaled LFR (r… view at source ↗

**Figure 13.** Figure 13: Comparison of predicted compressed frames among compression simulation methods on the YouTube 0000 sequence from the SNU-FILM test dataset with the QP of 37. The second row shows the difference between the predicted and actual compressed frames in the luma channel. removal, denoted as “GIMM-VFI+VQE.” Experimental results show that our approach is the first learnable temporal video rescaling method to surp… view at source ↗

**Figure 15.** Figure 15: Visual comparison of ablation studies on the YouTube 0017 sequence from the SNU-FILM test dataset. 0.1 0.2 0.3 BPP 32 35 38 PSNR HEVC 0.05 0.10 BPP 32 34 36 PSNR VVC 0.025 0.050 0.075 0.100 BPP 34 35 PSNR AV1 0.1 0.2 0.3 BPP 0.90 0.95 SSIM 0.05 0.10 BPP 0.90 0.93 0.95 SSIM 0.025 0.050 0.075 0.100 BPP 0.93 0.94 0.95 SSIM RIFE IFRNet EMA-VFI GIMM-VFI TVRN Reference [PITH_FULL_IMAGE:figures/full_fig_p013_15.png] view at source ↗

**Figure 16.** Figure 16: Rate–distortion curves on 65-frame clips from the SNU-FILM test set. The comparison includes three categories of methods: (1) frameskipping-based approaches [42], [50], [51]; (2) learned frame-rate downscaling methods [7], [8]; and (3) direct compression of high-frame-rate (HFR) videos using various lossy codecs, which serves as a reference for offline coding efficiency. restoration and evaluate four str… view at source ↗

**Figure 17.** Figure 17: Visualization of temporal High-Frequency (HF) Components. (a) Overlapped neighbouring and target frames, (b) Ground-truth HF components produced by the downscaler, (c) Initial estimation of HF components using bidirectional optical flow at t = 0.3, as defined in Eq. (9), (d) Reconstructed HF components from our model, (e) Reconstruction without contextual features, (f) Reconstruction produced by stacked D… view at source ↗

**Figure 18.** Figure 18: Visualization of failure cases on the GOPR0881 11 01 sequence from the SNU-FILM test dataset. All experiments are conducted on the SNU-FILM Medium dataset [69] using HEVC codecs under the same configurations as the main experiments. For all methods, we report end-to-end latency, peak memory usage, and RD performance in Table VI. To further assess the behavior under different resource constraints, we eval… view at source ↗

**Figure 13.** Figure 13: Visualization of downscaled videos produced by CSTVR* and our method on the 00026 0036 sequence from the Vimeo test dataset. Both methods produce visually pleasant results. We also show the difference between the downscaled and original frames after compression, amplified 10× for clarity. Lastly, we visualize the motion vector field derived from the HEVC codec. (a) L1 loss QP17 QP22 QP27 QP32 QP37 (b) max… view at source ↗

**Figure 14.** Figure 14: Visualization of compression-aware features learned with different loss functions using t-SNE [81]. Forty videos from the Vimeo90k test dataset are compressed with a set of different QPs. Rank loss effectively strengthens the clustering of latent variables for medium QPs. reconstruction quality by balancing smooth regions and robustness in challenging motion areas. B. Analysis of Learned Compression-Awa… view at source ↗

**Figure 15.** Figure 15: Visualization of compression-aware results. From left to right: the compressed low-frame-rate video, the compression-aware feature fc, and the compensation residues produced by the restoration modules before (#1) and after (#2) upscaling. The visualization of fc is obtained by averaging the absolute values across all channels. For the compensation residues, negative and positive values are represented by … view at source ↗

**Figure 16.** Figure 16: Python-style code of calculation log-magnitude spectrum. in [PITH_FULL_IMAGE:figures/full_fig_p019_16.png] view at source ↗

**Figure 17.** Figure 17: Empirical validation of Theorem 1. Scatter plot of the gradient error ∆gk = [PITH_FULL_IMAGE:figures/full_fig_p020_17.png] view at source ↗

**Figure 18.** Figure 18: Detailed Structure of DenseBlock. [ , ] [ , ] Down Block-1 (7,8,3,2↓) Down Block-2 (16,16,3,2↓) C Down Block-3 (32,32,3,2↓) Down Block-4 (64,64,3,2↓) C C C [ , ] [ , ] C Up Block-1 (16,4,3,2↑) Up Block-2 (32,8,4,2↑) Up Block-3 (64,16,4,2↑) Up Block-4 (128,32,4,2↑) C C C Conv2d (4,4,3,1) Sigmoid(·) * 2 -1 Sigmoid [PITH_FULL_IMAGE:figures/full_fig_p021_18.png] view at source ↗

**Figure 19.** Figure 19: Detailed Structure of Context-aware U-Net. Hℓ(·) produces k feature maps, then the ℓ-th layer receives k0 + k · (ℓ − 1) input channels, where k0 is the number of channels in the input layer. Here, k0 is set to 24 in our work. G. Context-Aware U-Net We design a context-aware U-Net architecture to hierarchically incorporate multi-scale contextual features into reconstructed high-frequency components, as s… view at source ↗

**Figure 20.** Figure 20: Detailed Structure of Compression Encoder and Ranker. TABLE VI CONFIGURATION OF THE CONTEXT-AWARE U-NET. “IN” AND “OUT” INDICATE THE INPUT AND OUTPUT CHANNEL DIMENSIONS, RESPECTIVELY. “K” DENOTES THE KERNEL SIZE, “S” THE STRIDE, AND “P” THE PADDING. Module Layer Type In Out K S P Down Block-1 Conv2d + PReLU 7 8 3×3 2 1 Conv2d + PReLU 8 8 3×3 1 1 Conv2d + PReLU 8 8 3×3 1 1 Conv2d + PReLU 8 8 3×3 1 1 Down B… view at source ↗

**Figure 21.** Figure 21: Qualitative comparisons on SNU-FILM datasets [69] with the rounding-based quantization. Ground Truth GIMM-VFI TVRN (Ours) 1st 2nd 3rd 4th 5th 6th 7th 42.55/0.9575 39.99/0.9529 41.75/0.9512 38.12/0.9482 41.23/0.9475 38.46/0.9478 40.83/0.9439 42.10/0.9533 40.58/0.9532 41.34/0.9469 39.50/0.9483 41.01/0.9462 39.81/0.9499 40.45/0.9408 Frame Skipping TVRN TVRN w/o 𝜑 1st 3rd (a) Visual comparison of the upscaled… view at source ↗

**Figure 22.** Figure 22: Qualitative comparisons of upscaled HFR videos on vimeo90K Septuplet test dataset [60] with the lossy H.265 codec [PITH_FULL_IMAGE:figures/full_fig_p023_22.png] view at source ↗

**Figure 23.** Figure 23: Qualitative comparisons of downscaled LFR videos on vimeo90K Septuplet test dataset [60] with the lossy H.265 codec. by [PITH_FULL_IMAGE:figures/full_fig_p024_23.png] view at source ↗

**Figure 24.** Figure 24: Qualitative comparison of our method and competitive methods on the vimeo septuplet dataset [60]. We crop the frames for easier comparison and visualize the interpolated frames at the bottom-right. Error-prone regions are highlighted with red boxes, best viewed by zooming in [PITH_FULL_IMAGE:figures/full_fig_p026_24.png] view at source ↗

**Figure 25.** Figure 25: Qualitative comparison of our method and competitive methods on the full-length sequence 00026 0036 from the Vimeo septuplet dataset [60]. We crop the frames for easier comparison and visualize the interpolated frames at the bottom right. TVRN↓ donates the downscaled low-frame-rate video [PITH_FULL_IMAGE:figures/full_fig_p027_25.png] view at source ↗

**Figure 26.** Figure 26: Qualitative comparison of our method and competitive methods on the full-length sequence 00023 0059 from the Vimeo septuplet dataset [60]. We crop the frames for easier comparison and visualize the interpolated frames at the bottom right. TVRN↓ donates the downscaled low-frame-rate video [PITH_FULL_IMAGE:figures/full_fig_p028_26.png] view at source ↗

**Figure 27.** Figure 27: Qualitative comparison of our method and competitive methods on the full-length sequence 00026 0003 from the Vimeo septuplet dataset [60]. We crop the frames for easier comparison and visualize the interpolated frames at the bottom right. TVRN↓ donates the downscaled low-frame-rate video [PITH_FULL_IMAGE:figures/full_fig_p029_27.png] view at source ↗

**Figure 28.** Figure 28: Qualitative comparison of our method and competitive methods on the full-length sequence 00002 0238 from the Vimeo septuplet dataset [60]. We crop the frames for easier comparison and visualize the interpolated frames at the bottom right. TVRN↓ donates the downscaled low-frame-rate video [PITH_FULL_IMAGE:figures/full_fig_p030_28.png] view at source ↗

read the original abstract

To fit diverse display and bandwidth constraints, high-frame-rate videos are temporally downscaled to low-frame-rate (LFR) and later upscaled, requiring joint optimization for effective frame-rate rescaling. However, existing methods typically link the two operations via training objectives, without fully exploiting their reciprocal nature, which may cause high-frequency information loss. Moreover, they overlook the impact of lossy codecs on LFR videos, limiting real-world applicability. In this work, we propose an end-to-end framework for compression-aware frame-rate rescaling, named TVRN. To regularize high-frequency information lost during frame-rate downscaling, TVRN adopts an invertible architecture that combines a Multi-Input Multi-Output Temporal Wavelet Transform with a high-frequency reconstruction module. To enable end-to-end training through non-differentiable lossy codecs, we design a surrogate network that approximates their gradients. Finally, to improve robustness under various compression levels, we extend TVRN to an asymmetric architecture by incorporating compression-aware features learned via a learning-to-rank strategy. Extensive experiments show that TVRN outperforms existing methods in reconstruction quality under industrial video compression settings. Source code is publicly available at https://github.com/fengxinmin/TVRN_public.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TVRN pairs an invertible temporal wavelet with a surrogate gradient network and a learning-to-rank asymmetry to optimize frame-rate rescaling through real codecs, but the surrogate remains the part that needs the most scrutiny.

read the letter

The main takeaway is that this paper gives a concrete end-to-end method for temporal video rescaling that tries to account for lossy compression instead of treating the codec as an afterthought. It uses an invertible Multi-Input Multi-Output Temporal Wavelet Transform plus a high-frequency reconstruction module to limit information loss on the downscaling side, then adds a surrogate network so gradients can flow through non-differentiable codecs during training. The asymmetric version with a learning-to-rank strategy is meant to make the model more robust across different compression levels. Public code is available, which is helpful for anyone who wants to test the claims directly.

Referee Report

2 major / 2 minor

Summary. The paper introduces TVRN, an end-to-end invertible neural network framework for compression-aware temporal video rescaling. It employs a Multi-Input Multi-Output Temporal Wavelet Transform paired with a high-frequency reconstruction module to mitigate information loss during downscaling, a surrogate network to approximate gradients through non-differentiable lossy codecs for joint optimization, and an asymmetric architecture incorporating compression-aware features via a learning-to-rank strategy. Extensive experiments are reported to demonstrate superior reconstruction quality compared to prior methods under industrial video compression settings such as H.264/HEVC.

Significance. If the surrogate gradient approximation holds under real codecs, the approach could meaningfully improve video rescaling pipelines by jointly handling frame-rate conversion and compression effects, addressing a practical gap in existing methods. The invertible architecture and public code release support reproducibility and potential follow-on work in learned video codecs.

major comments (2)

[Method section describing surrogate network and gradient approximation] The surrogate network (introduced to enable end-to-end training through non-differentiable codecs) is load-bearing for the central claim of compression-aware optimization. No quantitative validation is provided comparing its gradient approximations to true codec gradients (e.g., via cosine similarity or per-frequency error on H.264/HEVC across QP levels), leaving open the possibility that reported gains reflect surrogate-specific artifacts rather than genuine codec behavior.
[Experiments and results section] The experimental claims of outperformance under industrial compression settings rest on training and evaluation that route through the surrogate. An ablation replacing the surrogate with direct (non-differentiable) codec simulation or post-training evaluation on actual codecs is needed to confirm that the learned model does not overfit to surrogate idiosyncrasies.

minor comments (2)

[Architecture description] Clarify the exact formulation of the Multi-Input Multi-Output Temporal Wavelet Transform and how it interfaces with the high-frequency reconstruction module; include a diagram or pseudocode if not already present.
[Quantitative results tables] Add error bars or statistical significance tests to the quantitative tables comparing PSNR/SSIM across methods and compression levels.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We have addressed each of the major comments below and will incorporate revisions to strengthen the validation of the surrogate network and clarify the experimental setup.

read point-by-point responses

Referee: The surrogate network (introduced to enable end-to-end training through non-differentiable codecs) is load-bearing for the central claim of compression-aware optimization. No quantitative validation is provided comparing its gradient approximations to true codec gradients (e.g., via cosine similarity or per-frequency error on H.264/HEVC across QP levels), leaving open the possibility that reported gains reflect surrogate-specific artifacts rather than genuine codec behavior.

Authors: We agree with the referee that quantitative validation of the surrogate's gradient approximations would provide additional confidence in the approach. While the manuscript demonstrates the effectiveness through superior performance under real compression, we will revise the paper to include a dedicated analysis. Specifically, we will compute and report the similarity between the surrogate gradients and gradients approximated from the codec (using methods like straight-through estimation for the quantization steps) across multiple QP values and codecs. This will be added to the experiments section. revision: yes
Referee: The experimental claims of outperformance under industrial compression settings rest on training and evaluation that route through the surrogate. An ablation replacing the surrogate with direct (non-differentiable) codec simulation or post-training evaluation on actual codecs is needed to confirm that the learned model does not overfit to surrogate idiosyncrasies.

Authors: We clarify that the evaluation of TVRN and all compared methods is performed using actual industrial codecs (H.264 and HEVC) on the temporally rescaled videos, as described in the experimental setup. The surrogate is solely used to facilitate differentiable training. To address the potential for surrogate-specific artifacts, we will add an ablation in the revised manuscript that evaluates the model trained with the surrogate directly on real codecs without any surrogate involvement during inference, and compare it to a non-joint optimization baseline. Note that fully replacing the surrogate with direct simulation for training is challenging due to the non-differentiable nature of the codecs, but the post-training evaluation on actual codecs already supports the generalization of our results. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical framework with independent evaluation

full rationale

The paper proposes an architectural framework (invertible TVRN with wavelet transform, surrogate gradient approximator, and asymmetric learning-to-rank extension) and supports its claims via extensive experiments on reconstruction quality under real compression. No derivation chain, uniqueness theorem, or fitted parameter is presented as a 'prediction' that reduces by construction to the training inputs or prior self-citations. The surrogate network is a learned component for enabling end-to-end training, but performance is measured against actual codecs on held-out data, keeping the central empirical claim self-contained and externally falsifiable.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The framework rests on standard neural network training assumptions plus two paper-specific modeling choices: that a learned surrogate can stand in for codec gradients and that ranking-based features capture compression robustness. No new physical entities are postulated.

free parameters (2)

surrogate network weights
Learned parameters that approximate codec behavior; their values are fitted during end-to-end training.
compression-aware feature extractor weights
Parameters trained via learning-to-rank on different compression levels.

axioms (2)

domain assumption Invertible networks can perfectly preserve information in the absence of compression and quantization.
Invoked when the Multi-Input Multi-Output Temporal Wavelet Transform is introduced to regularize high-frequency loss.
ad hoc to paper The surrogate network gradient approximation is close enough to the true codec gradient for stable optimization.
Required for the end-to-end training claim through non-differentiable codecs.

invented entities (1)

surrogate network for codec gradients no independent evidence
purpose: Enable back-propagation through non-differentiable lossy codecs during training
New learned component introduced to bypass non-differentiability; no independent falsifiable prediction outside the training loop is provided.

pith-pipeline@v0.9.0 · 5750 in / 1534 out tokens · 34720 ms · 2026-05-19T19:49:11.860381+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

invertible architecture that combines a Multi-Input Multi-Output Temporal Wavelet Transform with a high-frequency reconstruction module... surrogate network that approximates their gradients

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

89 extracted references · 89 canonical work pages · 5 internal anchors

[1]

BETA: bandwidth-efficient temporal adaptation for video streaming over reliable transports,

C. James, M. Wang, and E. Halepovic, “BETA: bandwidth-efficient temporal adaptation for video streaming over reliable transports,” in Proceedings of the 10th ACM Multimedia Systems Conference, 2019, pp. 98–109

work page 2019
[2]

VOXEL: Cross-layer optimization for video streaming with imperfect transmission,

M. Palmer, M. Appel, K. Spiteri, B. Chandrasekaran, A. Feldmann, and R. K. Sitaraman, “VOXEL: Cross-layer optimization for video streaming with imperfect transmission,” inProceedings of the 17th International Conference on emerging Networking EXperiments and Technologies, 2021, pp. 359–374

work page 2021
[3]

Reparo: Qoe-aware live video streaming in low- rate networks by intelligent frame recovery,

F. Wang, Q. Li, W. Shi, G. Tyson, Y . Jiang, L. Ma, P. Zhang, Y . Lan, and Z. Li, “Reparo: Qoe-aware live video streaming in low- rate networks by intelligent frame recovery,” inProceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 9194–9204

work page 2023
[4]

Enabling high quality Real-Time communications with adaptive Frame-Rate,

Z. Meng, T. Wang, Y . Shen, B. Wang, M. Xu, R. Han, H. Liu, V . Arun, H. Hu, and X. Wei, “Enabling high quality Real-Time communications with adaptive Frame-Rate,” in20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). Boston, MA: USENIX Association, Apr. 2023, pp. 1429–1450

work page 2023
[5]

SAFR: A real- time communication system with adaptive frame rate,

W. Yin, B. Lu, Y . Zhao, J. Xu, L. Song, and W. Zhang, “SAFR: A real- time communication system with adaptive frame rate,” inProceedings of the 1st International Workshop on Networked AI Systems, ser. NetAISys ’23. New York, NY , USA: Association for Computing Machinery,

work page
[6]

Available: https://doi.org/10.1145/3597062.3597277

[Online]. Available: https://doi.org/10.1145/3597062.3597277

work page doi:10.1145/3597062.3597277
[7]

Enabling high frame- rate uhd real-time communication with frame-skipping,

T. Wang, Z. Meng, M. Xu, R. Han, and H. Liu, “Enabling high frame- rate uhd real-time communication with frame-skipping,” inProceedings of the 3rd ACM Workshop on Hot Topics in Video Analytics and Intelligent Edges, 2021, pp. 19–24

work page 2021
[8]

Learning spatio-temporal downsampling for effective video upscaling,

X. Xiang, Y . Tian, V . Rengarajan, L. D. Young, B. Zhu, and R. Ranjan, “Learning spatio-temporal downsampling for effective video upscaling,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 162– 181

work page 2022
[9]

Continuous space-time video resampling with invertible motion steganography,

Y . Zhang and Z. Chen, “Continuous space-time video resampling with invertible motion steganography,” inCVPR, 2025, pp. 2116–2126

work page 2025
[10]

Overview of the high efficiency video coding (HEVC) standard,

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,”IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649– 1668, 2012

work page 2012
[11]

Video rescaling networks with joint optimization strategies for downscaling and upscaling,

Y .-C. Huang, Y .-H. Chen, C.-Y . Lu, H.-P. Wang, W.-H. Peng, and C.-C. Huang, “Video rescaling networks with joint optimization strategies for downscaling and upscaling,” inCVPR, 2021, pp. 3527–3536

work page 2021
[12]

Self- conditioned probabilistic learning of video rescaling,

Y . Tian, G. Lu, X. Min, Z. Che, G. Zhai, G. Guo, and Z. Gao, “Self- conditioned probabilistic learning of video rescaling,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 4490–4499

work page 2021
[13]

Towards omniscient feature alignment for video rescaling,

G. Ding and C. W. Chen, “Towards omniscient feature alignment for video rescaling,” inICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 4190–4194

work page 2024
[14]

Video diffusion models,

J. Ho, T. Salimans, A. A. Gritsenko, W. Chan, M. Norouzi, and D. J. Fleet, “Video diffusion models,” inAdv. in Neural Inform. Process. Syst., A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022. [Online]. Available: https://openreview.net/forum?id=f3zNgKga ep

work page 2022
[15]

CLSA: a contrastive learning framework with selective aggregation for video rescaling,

Y . Tian, Y . Yan, G. Zhai, L. Chen, and Z. Gao, “CLSA: a contrastive learning framework with selective aggregation for video rescaling,”IEEE Transactions on Image Processing, vol. 32, pp. 1300–1314, 2023

work page 2023
[16]

Temporal wavelet transform- based low-complexity perceptual quality enhancement of compressed video,

C. Dong, H. Ma, Z. Li, L. Li, and D. Liu, “Temporal wavelet transform- based low-complexity perceptual quality enhancement of compressed video,”IEEE Transactions on Circuits and Systems for Video Technol- ogy, 2023

work page 2023
[17]

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. Keutzer, “Densenet: Implementing efficient convnet descriptor pyra- mids,”arXiv preprint arXiv:1404.1869, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[18]

NICE: Non-linear Independent Components Estimation

L. Dinh, D. Krueger, and Y . Bengio, “Nice: Non-linear independent components estimation,”arXiv preprint arXiv:1410.8516, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[19]

Density estimation using Real NVP

L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using real nvp,”arXiv preprint arXiv:1605.08803, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[20]

Glow: Generative flow with invertible 1x1 convolutions,

D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1x1 convolutions,”Advances in neural information processing systems, vol. 31, 2018. 16 ACCEPTED BY IEEE TRANSACTIONS ON IMAGE PROCESSING

work page 2018
[21]

Dehazeflow: Multi-scale conditional flow network for single image dehazing,

H. Li, J. Li, D. Zhao, and L. Xu, “Dehazeflow: Multi-scale conditional flow network for single image dehazing,” inProceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 2577–2585

work page 2021
[22]

Winnet: Wavelet-inspired invertible network for image denoising,

J.-J. Huang and P. L. Dragotti, “Winnet: Wavelet-inspired invertible network for image denoising,”IEEE Transactions on Image Processing, vol. 31, pp. 4377–4392, 2022

work page 2022
[23]

Invertible denoising network: A light solution for real noise removal,

Y . Liu, Z. Qin, S. Anwar, P. Ji, D. Kim, S. Caldwell, and T. Gedeon, “Invertible denoising network: A light solution for real noise removal,” in2021 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 2021, pp. 13 360–13 369

work page 2021
[24]

Task-aware image down- scaling,

H. Kim, M. Choi, B. Lim, and K. M. Lee, “Task-aware image down- scaling,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 399–414

work page 2018
[25]

Learning a convolutional neural network for image compact-resolution,

Y . Li, D. Liu, H. Li, L. Li, Z. Li, and F. Wu, “Learning a convolutional neural network for image compact-resolution,”IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1092–1107, 2018

work page 2018
[26]

Learned image downscaling for upscaling using content adaptive resampler,

W. Sun and Z. Chen, “Learned image downscaling for upscaling using content adaptive resampler,”IEEE Transactions on Image Processing, vol. 29, pp. 4027–4040, 2020

work page 2020
[27]

Hrnet: Hamiltonian rescaling network for image downscaling,

Y . Chen, X. Xiao, T. Dai, and S.-T. Xia, “Hrnet: Hamiltonian rescaling network for image downscaling,” in2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 523–527

work page 2020
[28]

Invertible image rescaling,

M. Xiao, S. Zheng, C. Liu, Y . Wang, D. He, G. Ke, J. Bian, Z. Lin, and T.-Y . Liu, “Invertible image rescaling,” inComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, 2020, pp. 126–144

work page 2020
[29]

Invertible rescaling network and its extensions,

M. Xiao, S. Zheng, C. Liu, Z. Lin, and T.-Y . Liu, “Invertible rescaling network and its extensions,”International Journal of Computer Vision, vol. 131, no. 1, pp. 134–159, 2023

work page 2023
[30]

Direct: Discrete image rescaling with enhancement from case-specific textures,

Y .-A. Chen, C.-C. Hsiao, W.-H. Peng, and C.-C. Huang, “Direct: Discrete image rescaling with enhancement from case-specific textures,” in2021 International Conference on Visual Communications and Image Processing (VCIP). IEEE, 2021, pp. 1–5

work page 2021
[31]

Hierarchical conditional flow: A unified framework for image super-resolution and image rescaling,

J. Liang, A. Lugmayr, K. Zhang, M. Danelljan, L. Van Gool, and R. Timofte, “Hierarchical conditional flow: A unified framework for image super-resolution and image rescaling,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 4076–4085

work page 2021
[32]

High- frequency normalizing flow for image rescaling,

Y . Zhu, C. Wang, C. Dong, K. Zhang, H. Gao, and C. Yuan, “High- frequency normalizing flow for image rescaling,”IEEE Transactions on Image Processing, vol. 32, pp. 6223–6233, 2022

work page 2022
[33]

Self-asymmetric invert- ible network for compression-aware image rescaling,

J. Yang, M. Guo, S. Zhao, J. Li, and L. Zhang, “Self-asymmetric invert- ible network for compression-aware image rescaling,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 3, 2023, pp. 3155–3163

work page 2023
[34]

Real-time 6k image rescaling with rate-distortion optimization,

C. Qi, X. Yang, K. L. Cheng, Y .-C. Chen, and Q. Chen, “Real-time 6k image rescaling with rate-distortion optimization,” inCVPR, 2023, pp. 14 092–14 101

work page 2023
[35]

Learned scale-arbitrary image down- scaling for non-learnable upscaling,

C. Huang, W. Sun, and Z. Chen, “Learned scale-arbitrary image down- scaling for non-learnable upscaling,”IEEE Signal Process. Lett., vol. 30, pp. 264–268, 2023

work page 2023
[36]

Timestep-aware diffusion model for extreme image rescaling,

C. Wang, Z. Hu, W. Sun, and Z. Chen, “Timestep-aware diffusion model for extreme image rescaling,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 15 594–15 603

work page 2025
[37]

Extremely low bit-rate image compression via invertible image generation,

F. Gao, X. Deng, J. Jing, X. Zou, and M. Xu, “Extremely low bit-rate image compression via invertible image generation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 8, pp. 6993– 7004, 2023

work page 2023
[38]

Context-aware synthesis for video frame interpo- lation,

S. Niklaus and F. Liu, “Context-aware synthesis for video frame interpo- lation,” inProc. of the IEEE Conf. on Comput. Vis. and Pattern Recog., 2018, pp. 1701–1710

work page 2018
[39]

Softmax splatting for video frame interpolation,

——, “Softmax splatting for video frame interpolation,” inProc. of the IEEE Conf. on Comput. Vis. and Pattern Recog., 2020, pp. 5437–5446

work page 2020
[40]

XVFI: extreme video frame interpolation,

H. Sim, J. Oh, M. Kim, and J. Oh, “XVFI: extreme video frame interpolation,” inProc. of the IEEE Int. Conf. on Comput. Vis., 2021, pp. 14 489–14 498

work page 2021
[41]

BMBC: Bilateral motion estimation with bilateral cost volume for video interpolation,

J. Park, K. Ko, C. Lee, and C.-S. Kim, “BMBC: Bilateral motion estimation with bilateral cost volume for video interpolation,” inComput. Vis.–ECCV 2020: 16th European Conference, Glasgow, UK, August 23– 28, 2020, Proceedings, Part XIV 16. Springer, 2020, pp. 109–125

work page 2020
[42]

Video frame interpolation with transformer,

L. Lu, R. Wu, H. Lin, J. Lu, and J. Jia, “Video frame interpolation with transformer,” inProc. of the IEEE Conf. on Comput. Vis. and Pattern Recog., 2022, pp. 3532–3542

work page 2022
[43]

IFRNet: Intermediate feature refine network for efficient frame interpolation,

L. Kong, B. Jiang, D. Luo, W. Chu, X. Huang, Y . Tai, C. Wang, and J. Yang, “IFRNet: Intermediate feature refine network for efficient frame interpolation,” inProc. of the IEEE Conf. on Comput. Vis. and Pattern Recog., 2022, pp. 1969–1978

work page 2022
[44]

Upr-net: A unified pyramid recurrent network for video frame interpolation,

X. Jin, L. Wu, J. Chen, Y . Chen, J. Koo, C.-H. Hahm, and Z.-M. Chen, “Upr-net: A unified pyramid recurrent network for video frame interpolation,”International Journal of Computer Vision, vol. 133, no. 1, pp. 16–30, 2025

work page 2025
[45]

Disentangled motion modeling for video frame interpolation,

J. Lew, J. Choi, C. Shin, D. Jung, and S. Yoon, “Disentangled motion modeling for video frame interpolation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 5, 2025, pp. 4607– 4615

work page 2025
[46]

Sparse global matching for video frame interpolation with large motion,

C. Liu, G. Zhang, R. Zhao, and L. Wang, “Sparse global matching for video frame interpolation with large motion,” inCVPR, 2024, pp. 19 125–19 134

work page 2024
[47]

CDFI: Compression-driven network design for frame interpolation,

T. Ding, L. Liang, Z. Zhu, and I. Zharkov, “CDFI: Compression-driven network design for frame interpolation,” inProc. of the IEEE Conf. on Comput. Vis. and Pattern Recog., 2021, pp. 8001–8011

work page 2021
[48]

Video frame interpolation via deformable separable convolution,

X. Cheng and Z. Chen, “Video frame interpolation via deformable separable convolution,” inProc. of the AAAI Conf. on Artificial Intell., 2020, pp. 10 607–10 614

work page 2020
[49]

Multiple video frame interpolation via enhanced deformable separable convolution,

——, “Multiple video frame interpolation via enhanced deformable separable convolution,”IEEE Trans. on Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 7029–7045, 2022

work page 2022
[50]

Enhancing deformable convolution based video frame interpolation with coarse-to-fine 3d CNN,

D. Danier, F. Zhang, and D. Bull, “Enhancing deformable convolution based video frame interpolation with coarse-to-fine 3d CNN,” inIEEE Int. Conf. on Image Process., 2022, pp. 1396–1400

work page 2022
[51]

Extracting motion and appearance via inter-frame attention for efficient video frame interpolation,

G. Zhang, Y . Zhu, H. Wang, Y . Chen, G. Wu, and L. Wang, “Extracting motion and appearance via inter-frame attention for efficient video frame interpolation,” inCVPR, 2023, pp. 5682–5692

work page 2023
[52]

Generalizable implicit motion mod- eling for video frame interpolation,

Z. Guo, W. Li, and C. C. Loy, “Generalizable implicit motion mod- eling for video frame interpolation,”Advances in Neural Information Processing Systems, vol. 37, pp. 63 747–63 770, 2024

work page 2024
[53]

Nonlinear independent component anal- ysis: Existence and uniqueness results,

A. Hyv ¨arinen and P. Pajunen, “Nonlinear independent component anal- ysis: Existence and uniqueness results,”Neural networks, vol. 12, no. 3, pp. 429–439, 1999

work page 1999
[54]

Nonlinear wavelet transforms for image coding via lifting,

R. L. Claypoole, G. M. Davis, W. Sweldens, and R. G. Baraniuk, “Nonlinear wavelet transforms for image coding via lifting,”IEEE Transactions on Image Processing, vol. 12, no. 12, pp. 1449–1459, 2003

work page 2003
[55]

Video rescaling with recurrent diffusion,

D. Li, Y . Liu, Z. Wang, and J. Yang, “Video rescaling with recurrent diffusion,”IEEE Transactions on Circuits and Systems for Video Tech- nology, 2024

work page 2024
[56]

En- hanced bi-directional motion estimation for video frame interpolation,

X. Jin, L. Wu, G. Shen, Y . Chen, J. Chen, J. Koo, and C.-h. Hahm, “En- hanced bi-directional motion estimation for video frame interpolation,” arXiv preprint arXiv:2206.08572, 2022

work page arXiv 2022
[57]

Preprocessing enhanced image compression for machine vision,

G. Lu, X. Ge, T. Zhong, Q. Hu, and J. Geng, “Preprocessing enhanced image compression for machine vision,”IEEE Transactions on Circuits and Systems for Video Technology, 2024

work page 2024
[58]

Spatio-temporal detail information retrieval for compressed video quality enhancement,

D. Luo, M. Ye, S. Li, C. Zhu, and X. Li, “Spatio-temporal detail information retrieval for compressed video quality enhancement,”IEEE Transactions on Multimedia, vol. 25, pp. 6808–6820, 2022

work page 2022
[59]

Compression- aware video super-resolution,

Y . Wang, T. Isobe, X. Jia, X. Tao, H. Lu, and Y .-W. Tai, “Compression- aware video super-resolution,” inCVPR, 2023, pp. 2012–2021

work page 2023
[60]

Rate-distortion-optimized deep preprocessing for jpeg compression,

F. Ye, B. Liu, L. Li, and D. Liu, “Rate-distortion-optimized deep preprocessing for jpeg compression,”IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2025

work page 2025
[61]

Video enhance- ment with task-oriented flow,

T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhance- ment with task-oriented flow,”IJCV, 2019

work page 2019
[62]

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

K. Soomro, A. R. Zamir, and M. Shah, “UCF101: A dataset of 101 human actions classes from videos in the wild,”arXiv preprint arXiv:1212.0402, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012
[63]

Scene-adaptive video frame interpolation via meta-learning,

M. Choi, J. Choi, S. Baik, T. H. Kim, and K. M. Lee, “Scene-adaptive video frame interpolation via meta-learning,” inProc. of the IEEE Conf. on Comput. Vis. and Pattern Recog., 2020, pp. 9444–9453

work page 2020
[64]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE Trans. on Image Process., vol. 13, no. 4, pp. 600–612, 2004

work page 2004
[65]

Toward a practical perceptual video quality metric,

Z. Li, A. Aaron, I. Katsavounidis, A. Moorthy, and M. Manohara, “Toward a practical perceptual video quality metric,”The Netflix Tech Blog, vol. 6, no. 2, 2016

work page 2016
[66]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 586–595

work page 2018
[67]

Learning temporal coherence via self-supervision for gan-based video generation,

M. Chu, Y . Xie, J. Mayer, L. Leal-Taix ´e, and N. Thuerey, “Learning temporal coherence via self-supervision for gan-based video generation,” ACM Transactions on Graphics (TOG), vol. 39, no. 4, p. 75, 2020

work page 2020
[68]

Learning blind video temporal consistency,

W.-S. Lai, J.-B. Huang, O. Wang, E. Shechtman, E. Yumer, and M.-H. Yang, “Learning blind video temporal consistency,” inProceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 170– 185. FENGet al.: TVRN: INVERTIBLE NEURAL NETWORKS FOR COMPRESSION-AW ARE TEMPORAL VIDEO RESCALING 17

work page 2018
[69]

Raft: Recurrent all-pairs field transforms for optical flow,

Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” inComputer Vision–ECCV 2020: 16th European Confer- ence, Glasgow, UK, August 23–28, 2020, Proceedings, Part II, vol. 16. Springer, 2020, pp. 402–419

work page 2020
[70]

Channel attention is all you need for video frame interpolation,

M. Choi, H. Kim, B. Han, N. Xu, and K. M. Lee, “Channel attention is all you need for video frame interpolation,” inAAAI, 2020

work page 2020
[71]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inInt. Conf. on Learn. Represent., 2015

work page 2015
[72]

A technical overview of av1,

J. Han, B. Li, D. Mukherjee, C.-H. Chiang, A. Grange, C. Chen, H. Su, S. Parker, S. Deng, U. Joshi, Y . Chen, Y . Wang, P. Wilkins, Y . Xu, and J. Bankoski, “A technical overview of av1,”Proceedings of the IEEE, vol. 109, no. 9, pp. 1435–1462, 2021

work page 2021
[73]

Overview of the versatile video coding (vvc) standard and its applications,

B. Bross, Y .-K. Wang, Y . Ye, S. Liu, J. Chen, G. J. Sullivan, and J.- R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021

work page 2021
[74]

Vvenc: An open and optimized vvc encoder implementation,

A. Wieckowski, J. Brandenburg, T. Hinz, C. Bartnik, V . George, G. Hege, C. Helmrich, A. Henkel, C. Lehmann, C. Stoffers, I. Zupancic, B. Bross, and D. Marpe, “Vvenc: An open and optimized vvc encoder implementation,” inProc. IEEE International Conference on Multimedia Expo Workshops (ICMEW), pp. 1–2

work page
[75]

Methods for the subjective assessment of video quality, audio quality and audiovisual quality of internet video and distribution quality television in any environment,

International Telecommunication Union, “Methods for the subjective assessment of video quality, audio quality and audiovisual quality of internet video and distribution quality television in any environment,” International Telecommunication Union (ITU), Geneva, Switzerland, Recommendation ITU-T P.913, Mar. 2016

work page 2016
[76]

Methodology for the subjective assessment of the quality of tele- vision pictures,

——, “Methodology for the subjective assessment of the quality of tele- vision pictures,” International Telecommunication Union (ITU), Geneva, Switzerland, Recommendation ITU-R BT.500-13, Jun. 2012

work page 2012
[77]

Calcuation of average PSNR differences between RD- curves,

G. Bjontegaard, “Calcuation of average PSNR differences between RD- curves,” VCEG, Tech. Rep. VCEG-M33, 2001

work page 2001
[78]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Y . Bengio, N. L ´eonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,”arXiv preprint arXiv:1308.3432, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[79]

PWC-net: CNNs for optical flow using pyramid, warping, and cost volume,

D. Sun, X. Yang, M.-Y . Liu, and J. Kautz, “PWC-net: CNNs for optical flow using pyramid, warping, and cost volume,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8934–8943

work page 2018
[80]

Real-time inter- mediate flow estimation for video frame interpolation,

Z. Huang, T. Zhang, W. Heng, B. Shi, and S. Zhou, “Real-time inter- mediate flow estimation for video frame interpolation,” inProceedings of the European Conference on Computer Vision (ECCV), 2022

work page 2022

Showing first 80 references.

[1] [1]

BETA: bandwidth-efficient temporal adaptation for video streaming over reliable transports,

C. James, M. Wang, and E. Halepovic, “BETA: bandwidth-efficient temporal adaptation for video streaming over reliable transports,” in Proceedings of the 10th ACM Multimedia Systems Conference, 2019, pp. 98–109

work page 2019

[2] [2]

VOXEL: Cross-layer optimization for video streaming with imperfect transmission,

M. Palmer, M. Appel, K. Spiteri, B. Chandrasekaran, A. Feldmann, and R. K. Sitaraman, “VOXEL: Cross-layer optimization for video streaming with imperfect transmission,” inProceedings of the 17th International Conference on emerging Networking EXperiments and Technologies, 2021, pp. 359–374

work page 2021

[3] [3]

Reparo: Qoe-aware live video streaming in low- rate networks by intelligent frame recovery,

F. Wang, Q. Li, W. Shi, G. Tyson, Y . Jiang, L. Ma, P. Zhang, Y . Lan, and Z. Li, “Reparo: Qoe-aware live video streaming in low- rate networks by intelligent frame recovery,” inProceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 9194–9204

work page 2023

[4] [4]

Enabling high quality Real-Time communications with adaptive Frame-Rate,

Z. Meng, T. Wang, Y . Shen, B. Wang, M. Xu, R. Han, H. Liu, V . Arun, H. Hu, and X. Wei, “Enabling high quality Real-Time communications with adaptive Frame-Rate,” in20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). Boston, MA: USENIX Association, Apr. 2023, pp. 1429–1450

work page 2023

[5] [5]

SAFR: A real- time communication system with adaptive frame rate,

W. Yin, B. Lu, Y . Zhao, J. Xu, L. Song, and W. Zhang, “SAFR: A real- time communication system with adaptive frame rate,” inProceedings of the 1st International Workshop on Networked AI Systems, ser. NetAISys ’23. New York, NY , USA: Association for Computing Machinery,

work page

[6] [6]

Available: https://doi.org/10.1145/3597062.3597277

[Online]. Available: https://doi.org/10.1145/3597062.3597277

work page doi:10.1145/3597062.3597277

[7] [7]

Enabling high frame- rate uhd real-time communication with frame-skipping,

T. Wang, Z. Meng, M. Xu, R. Han, and H. Liu, “Enabling high frame- rate uhd real-time communication with frame-skipping,” inProceedings of the 3rd ACM Workshop on Hot Topics in Video Analytics and Intelligent Edges, 2021, pp. 19–24

work page 2021

[8] [8]

Learning spatio-temporal downsampling for effective video upscaling,

X. Xiang, Y . Tian, V . Rengarajan, L. D. Young, B. Zhu, and R. Ranjan, “Learning spatio-temporal downsampling for effective video upscaling,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 162– 181

work page 2022

[9] [9]

Continuous space-time video resampling with invertible motion steganography,

Y . Zhang and Z. Chen, “Continuous space-time video resampling with invertible motion steganography,” inCVPR, 2025, pp. 2116–2126

work page 2025

[10] [10]

Overview of the high efficiency video coding (HEVC) standard,

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,”IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649– 1668, 2012

work page 2012

[11] [11]

Video rescaling networks with joint optimization strategies for downscaling and upscaling,

Y .-C. Huang, Y .-H. Chen, C.-Y . Lu, H.-P. Wang, W.-H. Peng, and C.-C. Huang, “Video rescaling networks with joint optimization strategies for downscaling and upscaling,” inCVPR, 2021, pp. 3527–3536

work page 2021

[12] [12]

Self- conditioned probabilistic learning of video rescaling,

Y . Tian, G. Lu, X. Min, Z. Che, G. Zhai, G. Guo, and Z. Gao, “Self- conditioned probabilistic learning of video rescaling,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 4490–4499

work page 2021

[13] [13]

Towards omniscient feature alignment for video rescaling,

G. Ding and C. W. Chen, “Towards omniscient feature alignment for video rescaling,” inICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 4190–4194

work page 2024

[14] [14]

Video diffusion models,

J. Ho, T. Salimans, A. A. Gritsenko, W. Chan, M. Norouzi, and D. J. Fleet, “Video diffusion models,” inAdv. in Neural Inform. Process. Syst., A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022. [Online]. Available: https://openreview.net/forum?id=f3zNgKga ep

work page 2022

[15] [15]

CLSA: a contrastive learning framework with selective aggregation for video rescaling,

Y . Tian, Y . Yan, G. Zhai, L. Chen, and Z. Gao, “CLSA: a contrastive learning framework with selective aggregation for video rescaling,”IEEE Transactions on Image Processing, vol. 32, pp. 1300–1314, 2023

work page 2023

[16] [16]

Temporal wavelet transform- based low-complexity perceptual quality enhancement of compressed video,

C. Dong, H. Ma, Z. Li, L. Li, and D. Liu, “Temporal wavelet transform- based low-complexity perceptual quality enhancement of compressed video,”IEEE Transactions on Circuits and Systems for Video Technol- ogy, 2023

work page 2023

[17] [17]

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. Keutzer, “Densenet: Implementing efficient convnet descriptor pyra- mids,”arXiv preprint arXiv:1404.1869, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[18] [18]

NICE: Non-linear Independent Components Estimation

L. Dinh, D. Krueger, and Y . Bengio, “Nice: Non-linear independent components estimation,”arXiv preprint arXiv:1410.8516, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[19] [19]

Density estimation using Real NVP

L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using real nvp,”arXiv preprint arXiv:1605.08803, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[20] [20]

Glow: Generative flow with invertible 1x1 convolutions,

D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1x1 convolutions,”Advances in neural information processing systems, vol. 31, 2018. 16 ACCEPTED BY IEEE TRANSACTIONS ON IMAGE PROCESSING

work page 2018

[21] [21]

Dehazeflow: Multi-scale conditional flow network for single image dehazing,

H. Li, J. Li, D. Zhao, and L. Xu, “Dehazeflow: Multi-scale conditional flow network for single image dehazing,” inProceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 2577–2585

work page 2021

[22] [22]

Winnet: Wavelet-inspired invertible network for image denoising,

J.-J. Huang and P. L. Dragotti, “Winnet: Wavelet-inspired invertible network for image denoising,”IEEE Transactions on Image Processing, vol. 31, pp. 4377–4392, 2022

work page 2022

[23] [23]

Invertible denoising network: A light solution for real noise removal,

Y . Liu, Z. Qin, S. Anwar, P. Ji, D. Kim, S. Caldwell, and T. Gedeon, “Invertible denoising network: A light solution for real noise removal,” in2021 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 2021, pp. 13 360–13 369

work page 2021

[24] [24]

Task-aware image down- scaling,

H. Kim, M. Choi, B. Lim, and K. M. Lee, “Task-aware image down- scaling,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 399–414

work page 2018

[25] [25]

Learning a convolutional neural network for image compact-resolution,

Y . Li, D. Liu, H. Li, L. Li, Z. Li, and F. Wu, “Learning a convolutional neural network for image compact-resolution,”IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1092–1107, 2018

work page 2018

[26] [26]

Learned image downscaling for upscaling using content adaptive resampler,

W. Sun and Z. Chen, “Learned image downscaling for upscaling using content adaptive resampler,”IEEE Transactions on Image Processing, vol. 29, pp. 4027–4040, 2020

work page 2020

[27] [27]

Hrnet: Hamiltonian rescaling network for image downscaling,

Y . Chen, X. Xiao, T. Dai, and S.-T. Xia, “Hrnet: Hamiltonian rescaling network for image downscaling,” in2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 523–527

work page 2020

[28] [28]

Invertible image rescaling,

M. Xiao, S. Zheng, C. Liu, Y . Wang, D. He, G. Ke, J. Bian, Z. Lin, and T.-Y . Liu, “Invertible image rescaling,” inComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, 2020, pp. 126–144

work page 2020

[29] [29]

Invertible rescaling network and its extensions,

M. Xiao, S. Zheng, C. Liu, Z. Lin, and T.-Y . Liu, “Invertible rescaling network and its extensions,”International Journal of Computer Vision, vol. 131, no. 1, pp. 134–159, 2023

work page 2023

[30] [30]

Direct: Discrete image rescaling with enhancement from case-specific textures,

Y .-A. Chen, C.-C. Hsiao, W.-H. Peng, and C.-C. Huang, “Direct: Discrete image rescaling with enhancement from case-specific textures,” in2021 International Conference on Visual Communications and Image Processing (VCIP). IEEE, 2021, pp. 1–5

work page 2021

[31] [31]

Hierarchical conditional flow: A unified framework for image super-resolution and image rescaling,

J. Liang, A. Lugmayr, K. Zhang, M. Danelljan, L. Van Gool, and R. Timofte, “Hierarchical conditional flow: A unified framework for image super-resolution and image rescaling,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 4076–4085

work page 2021

[32] [32]

High- frequency normalizing flow for image rescaling,

Y . Zhu, C. Wang, C. Dong, K. Zhang, H. Gao, and C. Yuan, “High- frequency normalizing flow for image rescaling,”IEEE Transactions on Image Processing, vol. 32, pp. 6223–6233, 2022

work page 2022

[33] [33]

Self-asymmetric invert- ible network for compression-aware image rescaling,

J. Yang, M. Guo, S. Zhao, J. Li, and L. Zhang, “Self-asymmetric invert- ible network for compression-aware image rescaling,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 3, 2023, pp. 3155–3163

work page 2023

[34] [34]

Real-time 6k image rescaling with rate-distortion optimization,

C. Qi, X. Yang, K. L. Cheng, Y .-C. Chen, and Q. Chen, “Real-time 6k image rescaling with rate-distortion optimization,” inCVPR, 2023, pp. 14 092–14 101

work page 2023

[35] [35]

Learned scale-arbitrary image down- scaling for non-learnable upscaling,

C. Huang, W. Sun, and Z. Chen, “Learned scale-arbitrary image down- scaling for non-learnable upscaling,”IEEE Signal Process. Lett., vol. 30, pp. 264–268, 2023

work page 2023

[36] [36]

Timestep-aware diffusion model for extreme image rescaling,

C. Wang, Z. Hu, W. Sun, and Z. Chen, “Timestep-aware diffusion model for extreme image rescaling,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 15 594–15 603

work page 2025

[37] [37]

Extremely low bit-rate image compression via invertible image generation,

F. Gao, X. Deng, J. Jing, X. Zou, and M. Xu, “Extremely low bit-rate image compression via invertible image generation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 8, pp. 6993– 7004, 2023

work page 2023

[38] [38]

Context-aware synthesis for video frame interpo- lation,

S. Niklaus and F. Liu, “Context-aware synthesis for video frame interpo- lation,” inProc. of the IEEE Conf. on Comput. Vis. and Pattern Recog., 2018, pp. 1701–1710

work page 2018

[39] [39]

Softmax splatting for video frame interpolation,

——, “Softmax splatting for video frame interpolation,” inProc. of the IEEE Conf. on Comput. Vis. and Pattern Recog., 2020, pp. 5437–5446

work page 2020

[40] [40]

XVFI: extreme video frame interpolation,

H. Sim, J. Oh, M. Kim, and J. Oh, “XVFI: extreme video frame interpolation,” inProc. of the IEEE Int. Conf. on Comput. Vis., 2021, pp. 14 489–14 498

work page 2021

[41] [41]

BMBC: Bilateral motion estimation with bilateral cost volume for video interpolation,

J. Park, K. Ko, C. Lee, and C.-S. Kim, “BMBC: Bilateral motion estimation with bilateral cost volume for video interpolation,” inComput. Vis.–ECCV 2020: 16th European Conference, Glasgow, UK, August 23– 28, 2020, Proceedings, Part XIV 16. Springer, 2020, pp. 109–125

work page 2020

[42] [42]

Video frame interpolation with transformer,

L. Lu, R. Wu, H. Lin, J. Lu, and J. Jia, “Video frame interpolation with transformer,” inProc. of the IEEE Conf. on Comput. Vis. and Pattern Recog., 2022, pp. 3532–3542

work page 2022

[43] [43]

IFRNet: Intermediate feature refine network for efficient frame interpolation,

L. Kong, B. Jiang, D. Luo, W. Chu, X. Huang, Y . Tai, C. Wang, and J. Yang, “IFRNet: Intermediate feature refine network for efficient frame interpolation,” inProc. of the IEEE Conf. on Comput. Vis. and Pattern Recog., 2022, pp. 1969–1978

work page 2022

[44] [44]

Upr-net: A unified pyramid recurrent network for video frame interpolation,

X. Jin, L. Wu, J. Chen, Y . Chen, J. Koo, C.-H. Hahm, and Z.-M. Chen, “Upr-net: A unified pyramid recurrent network for video frame interpolation,”International Journal of Computer Vision, vol. 133, no. 1, pp. 16–30, 2025

work page 2025

[45] [45]

Disentangled motion modeling for video frame interpolation,

J. Lew, J. Choi, C. Shin, D. Jung, and S. Yoon, “Disentangled motion modeling for video frame interpolation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 5, 2025, pp. 4607– 4615

work page 2025

[46] [46]

Sparse global matching for video frame interpolation with large motion,

C. Liu, G. Zhang, R. Zhao, and L. Wang, “Sparse global matching for video frame interpolation with large motion,” inCVPR, 2024, pp. 19 125–19 134

work page 2024

[47] [47]

CDFI: Compression-driven network design for frame interpolation,

T. Ding, L. Liang, Z. Zhu, and I. Zharkov, “CDFI: Compression-driven network design for frame interpolation,” inProc. of the IEEE Conf. on Comput. Vis. and Pattern Recog., 2021, pp. 8001–8011

work page 2021

[48] [48]

Video frame interpolation via deformable separable convolution,

X. Cheng and Z. Chen, “Video frame interpolation via deformable separable convolution,” inProc. of the AAAI Conf. on Artificial Intell., 2020, pp. 10 607–10 614

work page 2020

[49] [49]

Multiple video frame interpolation via enhanced deformable separable convolution,

——, “Multiple video frame interpolation via enhanced deformable separable convolution,”IEEE Trans. on Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 7029–7045, 2022

work page 2022

[50] [50]

Enhancing deformable convolution based video frame interpolation with coarse-to-fine 3d CNN,

D. Danier, F. Zhang, and D. Bull, “Enhancing deformable convolution based video frame interpolation with coarse-to-fine 3d CNN,” inIEEE Int. Conf. on Image Process., 2022, pp. 1396–1400

work page 2022

[51] [51]

Extracting motion and appearance via inter-frame attention for efficient video frame interpolation,

G. Zhang, Y . Zhu, H. Wang, Y . Chen, G. Wu, and L. Wang, “Extracting motion and appearance via inter-frame attention for efficient video frame interpolation,” inCVPR, 2023, pp. 5682–5692

work page 2023

[52] [52]

Generalizable implicit motion mod- eling for video frame interpolation,

Z. Guo, W. Li, and C. C. Loy, “Generalizable implicit motion mod- eling for video frame interpolation,”Advances in Neural Information Processing Systems, vol. 37, pp. 63 747–63 770, 2024

work page 2024

[53] [53]

Nonlinear independent component anal- ysis: Existence and uniqueness results,

A. Hyv ¨arinen and P. Pajunen, “Nonlinear independent component anal- ysis: Existence and uniqueness results,”Neural networks, vol. 12, no. 3, pp. 429–439, 1999

work page 1999

[54] [54]

Nonlinear wavelet transforms for image coding via lifting,

R. L. Claypoole, G. M. Davis, W. Sweldens, and R. G. Baraniuk, “Nonlinear wavelet transforms for image coding via lifting,”IEEE Transactions on Image Processing, vol. 12, no. 12, pp. 1449–1459, 2003

work page 2003

[55] [55]

Video rescaling with recurrent diffusion,

D. Li, Y . Liu, Z. Wang, and J. Yang, “Video rescaling with recurrent diffusion,”IEEE Transactions on Circuits and Systems for Video Tech- nology, 2024

work page 2024

[56] [56]

En- hanced bi-directional motion estimation for video frame interpolation,

X. Jin, L. Wu, G. Shen, Y . Chen, J. Chen, J. Koo, and C.-h. Hahm, “En- hanced bi-directional motion estimation for video frame interpolation,” arXiv preprint arXiv:2206.08572, 2022

work page arXiv 2022

[57] [57]

Preprocessing enhanced image compression for machine vision,

G. Lu, X. Ge, T. Zhong, Q. Hu, and J. Geng, “Preprocessing enhanced image compression for machine vision,”IEEE Transactions on Circuits and Systems for Video Technology, 2024

work page 2024

[58] [58]

Spatio-temporal detail information retrieval for compressed video quality enhancement,

D. Luo, M. Ye, S. Li, C. Zhu, and X. Li, “Spatio-temporal detail information retrieval for compressed video quality enhancement,”IEEE Transactions on Multimedia, vol. 25, pp. 6808–6820, 2022

work page 2022

[59] [59]

Compression- aware video super-resolution,

Y . Wang, T. Isobe, X. Jia, X. Tao, H. Lu, and Y .-W. Tai, “Compression- aware video super-resolution,” inCVPR, 2023, pp. 2012–2021

work page 2023

[60] [60]

Rate-distortion-optimized deep preprocessing for jpeg compression,

F. Ye, B. Liu, L. Li, and D. Liu, “Rate-distortion-optimized deep preprocessing for jpeg compression,”IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2025

work page 2025

[61] [61]

Video enhance- ment with task-oriented flow,

T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhance- ment with task-oriented flow,”IJCV, 2019

work page 2019

[62] [62]

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

K. Soomro, A. R. Zamir, and M. Shah, “UCF101: A dataset of 101 human actions classes from videos in the wild,”arXiv preprint arXiv:1212.0402, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012

[63] [63]

Scene-adaptive video frame interpolation via meta-learning,

M. Choi, J. Choi, S. Baik, T. H. Kim, and K. M. Lee, “Scene-adaptive video frame interpolation via meta-learning,” inProc. of the IEEE Conf. on Comput. Vis. and Pattern Recog., 2020, pp. 9444–9453

work page 2020

[64] [64]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE Trans. on Image Process., vol. 13, no. 4, pp. 600–612, 2004

work page 2004

[65] [65]

Toward a practical perceptual video quality metric,

Z. Li, A. Aaron, I. Katsavounidis, A. Moorthy, and M. Manohara, “Toward a practical perceptual video quality metric,”The Netflix Tech Blog, vol. 6, no. 2, 2016

work page 2016

[66] [66]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 586–595

work page 2018

[67] [67]

Learning temporal coherence via self-supervision for gan-based video generation,

M. Chu, Y . Xie, J. Mayer, L. Leal-Taix ´e, and N. Thuerey, “Learning temporal coherence via self-supervision for gan-based video generation,” ACM Transactions on Graphics (TOG), vol. 39, no. 4, p. 75, 2020

work page 2020

[68] [68]

Learning blind video temporal consistency,

W.-S. Lai, J.-B. Huang, O. Wang, E. Shechtman, E. Yumer, and M.-H. Yang, “Learning blind video temporal consistency,” inProceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 170– 185. FENGet al.: TVRN: INVERTIBLE NEURAL NETWORKS FOR COMPRESSION-AW ARE TEMPORAL VIDEO RESCALING 17

work page 2018

[69] [69]

Raft: Recurrent all-pairs field transforms for optical flow,

Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” inComputer Vision–ECCV 2020: 16th European Confer- ence, Glasgow, UK, August 23–28, 2020, Proceedings, Part II, vol. 16. Springer, 2020, pp. 402–419

work page 2020

[70] [70]

Channel attention is all you need for video frame interpolation,

M. Choi, H. Kim, B. Han, N. Xu, and K. M. Lee, “Channel attention is all you need for video frame interpolation,” inAAAI, 2020

work page 2020

[71] [71]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inInt. Conf. on Learn. Represent., 2015

work page 2015

[72] [72]

A technical overview of av1,

J. Han, B. Li, D. Mukherjee, C.-H. Chiang, A. Grange, C. Chen, H. Su, S. Parker, S. Deng, U. Joshi, Y . Chen, Y . Wang, P. Wilkins, Y . Xu, and J. Bankoski, “A technical overview of av1,”Proceedings of the IEEE, vol. 109, no. 9, pp. 1435–1462, 2021

work page 2021

[73] [73]

Overview of the versatile video coding (vvc) standard and its applications,

B. Bross, Y .-K. Wang, Y . Ye, S. Liu, J. Chen, G. J. Sullivan, and J.- R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021

work page 2021

[74] [74]

Vvenc: An open and optimized vvc encoder implementation,

A. Wieckowski, J. Brandenburg, T. Hinz, C. Bartnik, V . George, G. Hege, C. Helmrich, A. Henkel, C. Lehmann, C. Stoffers, I. Zupancic, B. Bross, and D. Marpe, “Vvenc: An open and optimized vvc encoder implementation,” inProc. IEEE International Conference on Multimedia Expo Workshops (ICMEW), pp. 1–2

work page

[75] [75]

Methods for the subjective assessment of video quality, audio quality and audiovisual quality of internet video and distribution quality television in any environment,

International Telecommunication Union, “Methods for the subjective assessment of video quality, audio quality and audiovisual quality of internet video and distribution quality television in any environment,” International Telecommunication Union (ITU), Geneva, Switzerland, Recommendation ITU-T P.913, Mar. 2016

work page 2016

[76] [76]

Methodology for the subjective assessment of the quality of tele- vision pictures,

——, “Methodology for the subjective assessment of the quality of tele- vision pictures,” International Telecommunication Union (ITU), Geneva, Switzerland, Recommendation ITU-R BT.500-13, Jun. 2012

work page 2012

[77] [77]

Calcuation of average PSNR differences between RD- curves,

G. Bjontegaard, “Calcuation of average PSNR differences between RD- curves,” VCEG, Tech. Rep. VCEG-M33, 2001

work page 2001

[78] [78]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Y . Bengio, N. L ´eonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,”arXiv preprint arXiv:1308.3432, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[79] [79]

PWC-net: CNNs for optical flow using pyramid, warping, and cost volume,

D. Sun, X. Yang, M.-Y . Liu, and J. Kautz, “PWC-net: CNNs for optical flow using pyramid, warping, and cost volume,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8934–8943

work page 2018

[80] [80]

Real-time inter- mediate flow estimation for video frame interpolation,

Z. Huang, T. Zhang, W. Heng, B. Shi, and S. Zhou, “Real-time inter- mediate flow estimation for video frame interpolation,” inProceedings of the European Conference on Computer Vision (ECCV), 2022

work page 2022