arxiv: 2605.02849 · v1 · submitted 2026-05-04 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Active Sampling for Ultra-Low-Bit-Rate Video Compression via Conditional Controlled Diffusion

Amirhosein Javadi, Shirin Saeedi Bidokhti, Tara Javidi

Pith reviewed 2026-05-08 18:26 UTC · model grok-4.3

classification 💻 cs.CV

keywords video compressiondiffusion modelsultra-low bitrateconditional generationkeyframe selectiontrajectory trackingperceptual reconstruction

0 comments

The pith

Sparse adaptive keyframes and tracked trajectories let a conditional diffusion model reconstruct video at much lower bitrates while keeping perceptual quality high.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a video compression approach that transmits only a few keyframes chosen according to content changes and a small set of tracked point trajectories to summarize motion. These compact signals condition a diffusion model that generates the missing frames, targeting the regime where bitrates are too low for conventional codecs to avoid visible artifacts. The design partitions video into variable-length segments so that keyframes are sent only when they add new information and trajectories are selected to respect a rate budget. This yields large measured gains in perceptual metrics on standard test sets compared with learned and diffusion baselines.

Core claim

ActDiff-VC partitions videos into variable-length segments, transmits keyframes only when needed, and summarizes temporal dynamics using a compact set of tracked point trajectories. Conditioned on these sparse signals, a conditional diffusion decoder synthesizes the remaining frames, enabling perceptually realistic reconstruction under severe rate constraints through content-adaptive keyframe selection and budget-aware sparse trajectory selection.

What carries the argument

Content-adaptive keyframe selection and budget-aware sparse trajectory selection that together supply compact conditioning signals to a conditional diffusion decoder for frame synthesis.

If this is right

Up to 64.6 percent bitrate reduction at matched NIQE on UVG and MCL-JCV benchmarks.
KID improved by up to 64.6 percent and FID by up to 37.7 percent at comparable bitrates versus strong learned codecs.
Favorable perceptual rate-distortion curves relative to both learned and diffusion-based baselines in the ultra-low-bitrate regime.
Variable-length segmentation reduces the frequency of keyframe transmission when content remains stable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sparse-conditioning idea could be tested on live video streams if diffusion sampling speed is improved.
Hybrid codecs that fall back to conventional prediction when generative artifacts appear might combine the reported gains with robustness.
Extending the trajectory selection to include semantic object tracks could further reduce required bitrate for scenes with moving foregrounds.

Load-bearing premise

Sparse conditioning signals from adaptive keyframes and tracked trajectories are enough for the diffusion decoder to produce temporally coherent frames without major artifacts.

What would settle it

A direct side-by-side comparison on sequences with rapid or complex motion where the selected trajectories fail to capture key dynamics, showing visible flickering or inconsistencies in the synthesized frames.

Figures

Figures reproduced from arXiv: 2605.02849 by Amirhosein Javadi, Shirin Saeedi Bidokhti, Tara Javidi.

**Figure 1.** Figure 1: Framework of ActDiff-VC. Given the first frame, a dense point tracker estimates the dense tracking field M across subsequent frames. The sparse point selector, guided by a sketch of the first frame, subsamples the dense tracking field to form the conditioning sparse trajectory set P (k) . On the decoder side, the diffusion model is conditioned on P (k) together with the first and last frames to reconstruct… view at source ↗

**Figure 2.** Figure 2: Content-Adaptive Keyframe Selection. The first frame of each segment is forward-splatted through the next frames using a dense tracker, yielding target-space occupancy occ(t) and perceptual similarity simperc(t). The next keyframe is selected at the earliest t such that occ(t) < θocc or simperc(t) < θperc holds for L consecutive frames. In this visualization, L = 1 and θocc = θperc = 0.8. The next segment… view at source ↗

**Figure 3.** Figure 3: Budget-Aware Sparse Trajectory Selection. Given the current tracking set S (red dots), the dense tracking field is estimated as Mc(· | S) using the RBF kernel interpolation in equation 3. The residual r(p), as defined in equation 11, quantifies the discrepancy between the dense tracking field M(·) and its reconstruction Mc(· | S). Points with the largest sketch-weighted residuals (blue dots) are added to S… view at source ↗

**Figure 4.** Figure 4: Quantitative comparison on the UVG and MCL-JCV datasets. We report LPIPS, FID, KID, and view at source ↗

**Figure 5.** Figure 5: Qualitative comparison on representative sequences from UVG and MCL-JCV. We compare view at source ↗

**Figure 6.** Figure 6: Visualization of content-adaptive keyframe selection. We show a frame before a scene change view at source ↗

**Figure 7.** Figure 7: Sensitivity of Adaptive GOP thresholds. Heatmaps over occupancy threshold view at source ↗

read the original abstract

Diffusion models provide a powerful generative prior for perceptual reconstruction at ultra-low bitrates, but effective video compression requires controlling the generative process using highly compact conditioning signals. In this work, we present ActDiff-VC, a diffusion-based video compression framework for the ultra-low-bitrate regime. Our method partitions videos into variable-length segments, transmits keyframes only when needed, and summarizes temporal dynamics using a compact set of tracked point trajectories. Conditioned on these sparse signals, a conditional diffusion decoder synthesizes the remaining frames, enabling perceptually realistic reconstruction under severe rate constraints. To support this design, we introduce two mechanisms: content-adaptive keyframe selection and budget-aware sparse trajectory selection, which together enable compact yet effective conditioning for generative reconstruction. Experiments on the UVG and MCL-JCV benchmarks show that ActDiff-VC achieves up to 64.6\% bitrate reduction at matched NIQE, improves KID by up to 64.6\% and FID by up to 37.7\% at comparable bitrates against strong learned codecs, and delivers favorable perceptual rate--distortion trade-offs relative to learned and diffusion-based baselines in the ultra-low-bitrate regime.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ActDiff-VC introduces active sampling of keyframes and trajectories to condition diffusion for ultra-low bitrate video, but the coherence of the output rests on an assumption that needs more checking.

read the letter

The main thing here is that the paper shows how to keep conditioning signals tiny for a diffusion video decoder by picking keyframes only when content changes and adding a small set of tracked points for motion. They split the video into variable-length segments and use content-adaptive selection plus budget-aware trajectory picking to stay under tight rate limits. On UVG and MCL-JCV the results look strong against learned codecs, with up to 64% lower bitrate at the same NIQE and better KID and FID numbers at similar rates. That part is new: the specific active sampling rules for this generative setup are not just a rehash of earlier diffusion compression work. The design is straightforward and targets the real bottleneck of side information at these bitrates. The weak point is whether the sparse keyframes and points actually keep the synthesized frames temporally consistent. Diffusion is noisy by nature, and the abstract gives no sign of explicit motion constraints beyond the trajectories, so drift or flickering between points could still show up. Perceptual metrics like NIQE, KID, and FID often miss those localized problems, and without ablations on the selection steps or failure-case analysis it is hard to know how much of the gain is real versus tuned. This is useful reading for people already working on generative video codecs who need ideas for minimal conditioning. A serious editor should send it out for review because the core mechanisms are distinct and the benchmarks are standard, even if the authors will have to add evidence on coherence.

Referee Report

2 major / 2 minor

Summary. The manuscript presents ActDiff-VC, a diffusion-based video compression framework for the ultra-low-bitrate regime. Videos are partitioned into variable-length segments; content-adaptive keyframes and budget-aware tracked point trajectories serve as sparse conditioning signals for a conditional diffusion decoder that synthesizes the remaining frames. Experiments on UVG and MCL-JCV benchmarks report up to 64.6% bitrate reduction at matched NIQE together with KID and FID gains relative to learned and diffusion-based baselines.

Significance. If the empirical claims are substantiated, the work would demonstrate that carefully chosen sparse, adaptive conditioning can steer conditional diffusion models to deliver strong perceptual rate-distortion performance at rates where conventional codecs degrade, thereby extending the applicability of generative priors to practical video compression.

major comments (2)

[Experiments] The headline quantitative gains (64.6% bitrate reduction at matched NIQE, KID/FID improvements) rest on the unverified assumption that sparse keyframe-plus-trajectory conditioning suffices to prevent temporal drift, flickering, or hallucinated motion in the diffusion synthesis. The manuscript provides no dedicated coherence metrics, qualitative frame-by-frame analysis, or ablation removing the trajectory component, leaving the central claim vulnerable to the stress-test concern.
[Method] The methods description of the budget-aware sparse trajectory selection and its injection into the conditional diffusion decoder is insufficiently detailed to allow reproduction or assessment of how motion constraints are enforced across variable-length segments. Without explicit equations for trajectory encoding, conditioning strength, or the diffusion sampling schedule, it is impossible to evaluate whether the reported perceptual improvements are robust or artifact-free.

minor comments (2)

[Abstract] The abstract reports the same numerical value (64.6%) for both bitrate reduction and KID improvement; clarify whether this is coincidental or indicates a reporting error.
[Experiments] Tables comparing against baselines should explicitly state whether the learned and diffusion-based competitors were re-trained or taken from published numbers, and should report variance across sequences.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments below and will revise the manuscript accordingly to improve clarity and validation.

read point-by-point responses

Referee: [Experiments] The headline quantitative gains (64.6% bitrate reduction at matched NIQE, KID/FID improvements) rest on the unverified assumption that sparse keyframe-plus-trajectory conditioning suffices to prevent temporal drift, flickering, or hallucinated motion in the diffusion synthesis. The manuscript provides no dedicated coherence metrics, qualitative frame-by-frame analysis, or ablation removing the trajectory component, leaving the central claim vulnerable to the stress-test concern.

Authors: We agree that dedicated validation of temporal coherence is important to substantiate the claims. In the revised manuscript, we will add an ablation study isolating the contribution of the trajectory component, include quantitative temporal coherence metrics (e.g., frame-to-frame consistency measures), and provide additional qualitative frame-by-frame visualizations with analysis of motion fidelity and absence of drift or flickering. revision: yes
Referee: [Method] The methods description of the budget-aware sparse trajectory selection and its injection into the conditional diffusion decoder is insufficiently detailed to allow reproduction or assessment of how motion constraints are enforced across variable-length segments. Without explicit equations for trajectory encoding, conditioning strength, or the diffusion sampling schedule, it is impossible to evaluate whether the reported perceptual improvements are robust or artifact-free.

Authors: We acknowledge that the method description lacks sufficient detail for reproducibility. In the revision, we will expand the relevant sections to include explicit equations and pseudocode for the budget-aware sparse trajectory selection algorithm, the encoding of trajectories as conditioning signals, the mechanism and strength of conditioning injection into the diffusion decoder, and the precise diffusion sampling schedule used across variable-length segments. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical design validated on external benchmarks

full rationale

The paper describes an algorithmic framework (ActDiff-VC) that partitions video into segments, selects content-adaptive keyframes and budget-aware trajectories, and uses these as conditioning for a conditional diffusion decoder. All load-bearing claims are experimental performance numbers (bitrate reductions, NIQE/KID/FID improvements) measured on independent UVG and MCL-JCV datasets against external baselines. No equations, derivations, or first-principles results are presented that reduce by construction to fitted parameters, self-definitions, or self-citations; the method choices are design decisions whose efficacy is tested rather than assumed via internal tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are identifiable with precision. The adaptive selection mechanisms likely involve tunable thresholds or budgets that may be chosen or fitted to data, but details are absent.

pith-pipeline@v0.9.0 · 5513 in / 1267 out tokens · 62396 ms · 2026-05-08T18:26:03.996352+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation (J(x)=1/2(x+x^-1)-1 unique calibrated reciprocal cost). Paper's objective is a Gaussian-kernel weighted L2 on motion fields, not a ratio-symmetric J-cost. washburn_uniqueness_aczel unclear
R(S) := sum w(p) || M(p) - M_hat(p|S) ||^2; alpha(p,q) = exp(-||p-q||^2/(2 sigma^2)) / sum exp(...)

Reference graph

Works this paper leans on

85 extracted references · 15 canonical work pages · 4 internal anchors

[1]

FirstName LastName , title =
[2]

FirstName Alpher , title =
[3]

Journal of Foo , volume = 13, number = 1, pages =

FirstName Alpher and FirstName Fotheringham-Smythe , title =. Journal of Foo , volume = 13, number = 1, pages =
[4]

Journal of Foo , volume = 14, number = 1, pages =

FirstName Alpher and FirstName Fotheringham-Smythe and FirstName Gamow , title =. Journal of Foo , volume = 14, number = 1, pages =
[5]

FirstName Alpher and FirstName Gamow , title =
[6]

Proceedings of the 31st ACM International Conference on Multimedia , pages=

Mlic: Multi-reference entropy model for learned image compression , author=. Proceedings of the 31st ACM International Conference on Multimedia , pages=
[7]

Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers , pages=

Diffusion as shader: 3d-aware video diffusion for versatile video generation control , author=. Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers , pages=
[8]

Proceedings of the IEEE international conference on computer vision , pages=

Holistically-nested edge detection , author=. Proceedings of the IEEE international conference on computer vision , pages=
[9]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Spatialtracker: Tracking any 2d pixels in 3d space , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[10]

arXiv preprint arXiv:2506.07310 , year=

AllTracker: Efficient Dense Point Tracking at High Resolution , author=. arXiv preprint arXiv:2506.07310 , year=

work page arXiv
[11]

Proceedings of the 11th ACM multimedia systems conference , pages=

UVG dataset: 50/120fps 4K sequences for video codec analysis and development , author=. Proceedings of the 11th ACM multimedia systems conference , pages=
[12]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

The unreasonable effectiveness of deep features as a perceptual metric , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[13]

FVD: A new metric for video generation , author=
[14]

Proceedings of the 22nd international conference on Machine learning , pages=

Near-optimal sensor placements in gaussian processes , author=. Proceedings of the 22nd international conference on Machine learning , pages=
[15]

Mathematical programming , volume=

An analysis of approximations for maximizing submodular set functions—I , author=. Mathematical programming , volume=. 1978 , publisher=

1978
[16]

264/AVC video quality assessment dataset , author=

MCL-JCV: a JND-based H. 264/AVC video quality assessment dataset , author=. 2016 IEEE international conference on image processing (ICIP) , pages=. 2016 , organization=

2016
[17]

The thrity-seventh asilomar conference on signals, systems & computers, 2003 , volume=

Multiscale structural similarity for image quality assessment , author=. The thrity-seventh asilomar conference on signals, systems & computers, 2003 , volume=. 2003 , organization=

2003
[18]

IEEE transactions on pattern analysis and machine intelligence , volume=

Image quality assessment: Unifying structure and texture similarity , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2020 , publisher=

2020
[19]

Demystifying mmd gans.arXiv preprint arXiv:1801.01401,

Demystifying mmd gans , author=. arXiv preprint arXiv:1801.01401 , year=

work page arXiv
[20]

completely blind

Making a “completely blind” image quality analyzer , author=. IEEE Signal processing letters , volume=. 2012 , publisher=

2012
[21]

Progressive Distillation for Fast Sampling of Diffusion Models

Progressive distillation for fast sampling of diffusion models , author=. arXiv preprint arXiv:2202.00512 , year=

work page internal anchor Pith review arXiv
[22]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Inference-Time Diffusion Model Distillation , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[23]

Accvideo: Accelerating video diffusion model with synthetic dataset.arXiv preprint arXiv:2503.19462, 2025

Accvideo: Accelerating video diffusion model with synthetic dataset , author=. arXiv preprint arXiv:2503.19462 , year=

work page arXiv
[24]

Consistency models , author=
[25]

arXiv preprint arXiv:2509.00036 , year=

A-FloPS: Accelerating Diffusion Sampling with Adaptive Flow Path Sampler , author=. arXiv preprint arXiv:2509.00036 , year=

work page arXiv
[26]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Align your latents: High-resolution video synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[27]

Machine Intelligence Research , pages=

Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models , author=. Machine Intelligence Research , pages=. 2025 , publisher=

2025
[28]

264/AVC video coding standard , author=

Overview of the H. 264/AVC video coding standard , author=. IEEE Transactions on circuits and systems for video technology , volume=. 2003 , publisher=

2003
[29]

IEEE Transactions on circuits and systems for video technology , volume=

Overview of the high efficiency video coding (HEVC) standard , author=. IEEE Transactions on circuits and systems for video technology , volume=. 2012 , publisher=

2012
[30]

Advances in Neural Information Processing Systems , volume=

Deep contextual video compression , author=. Advances in Neural Information Processing Systems , volume=
[31]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Neural video compression with feature modulation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[32]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Neural video compression with diverse contexts , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[33]

Proceedings of the 30th ACM International Conference on Multimedia , pages=

Hybrid spatial-temporal entropy modelling for neural video compression , author=. Proceedings of the 30th ACM International Conference on Multimedia , pages=
[34]

IEEE Transactions on Multimedia , volume=

Temporal context mining for learned video compression , author=. IEEE Transactions on Multimedia , volume=. 2022 , publisher=

2022
[35]

International Conference on Machine Learning , pages=

Rethinking lossy compression: The rate-distortion-perception tradeoff , author=. International Conference on Machine Learning , pages=. 2019 , organization=

2019
[36]

Pattern Recognition , volume=

IBVC: Interpolation-driven B-frame video compression , author=. Pattern Recognition , volume=. 2024 , publisher=

2024
[37]

Advances in neural information processing systems , volume=

Generative adversarial nets , author=. Advances in neural information processing systems , volume=
[38]

European Conference on Computer Vision , pages=

Neural video compression using gans for detail synthesis and propagation , author=. European Conference on Computer Vision , pages=. 2022 , organization=

2022
[39]

2021 International Conference on Visual Communications and Image Processing (VCIP) , pages=

DVC-P: Deep video compression with perceptual optimizations , author=. 2021 International Conference on Visual Communications and Image Processing (VCIP) , pages=. 2021 , organization=

2021
[40]

, author=

Perceptual Learned Video Compression with Recurrent Conditional GAN. , author=. IJCAI , pages=
[41]

Proceedings of the 31st ACM International Conference on Multimedia , pages=

High visual-fidelity learned video compression , author=. Proceedings of the 31st ACM International Conference on Multimedia , pages=
[42]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages=

Adversarial distortion for learned video compression , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages=
[43]

Extreme generative image compression by learning text embedding from diffusion models,

Extreme generative image compression by learning text embedding from diffusion models , author=. arXiv preprint arXiv:2211.07793 , year=

work page arXiv
[44]

Advances in Neural Information Processing Systems , volume=

Lossy image compression with conditional diffusion models , author=. Advances in Neural Information Processing Systems , volume=
[45]

arXiv preprint arXiv:2307.01944 , year=

Text+ sketch: Image compression at ultra low rates , author=. arXiv preprint arXiv:2307.01944 , year=

work page arXiv
[46]

The Twelfth International Conference on Learning Representations , year=

Towards image compression with perfect realism at ultra-low bitrates , author=. The Twelfth International Conference on Learning Representations , year=
[47]

European Conference on Computer Vision , pages=

Lossy image compression with foundation diffusion models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024
[48]

2024 16th International Conference on Wireless Communications and Signal Processing (WCSP) , pages=

Extreme Video Compression with Prediction Using Pre-trained Diffusion Models , author=. 2024 16th International Conference on Wireless Communications and Signal Processing (WCSP) , pages=. 2024 , organization=

2024
[49]

arXiv preprint arXiv:2501.13528 , year=

Diffusion-based perceptual neural video compression with temporal diffusion information reuse , author=. arXiv preprint arXiv:2501.13528 , year=

work page arXiv
[50]

2024 , eprint=

I ^2 VC: A Unified Framework for Intra- & Inter-frame Video Compression , author=. 2024 , eprint=

2024
[51]

Advances in neural information processing systems , volume=

Mcvd-masked conditional video diffusion for prediction, generation, and interpolation , author=. Advances in neural information processing systems , volume=
[52]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=
[53]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Adding conditional control to text-to-image diffusion models , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
[54]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[55]

Common test conditions and software reference configurations , author=. 3rd. JCT-VC Meeting, Guangzhou, CN, October 2010 , year=

2010
[56]

Advances in neural information processing systems , volume=

Gans trained by a two time-scale update rule converge to a local nash equilibrium , author=. Advances in neural information processing systems , volume=
[57]

End-to-end optimized image compression.arXiv preprint arXiv:1611.01704, 2016

End-to-end optimized image compression , author=. arXiv preprint arXiv:1611.01704 , year=

work page arXiv
[58]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[59]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Glide: Towards photorealistic image generation and editing with text-guided diffusion models , author=. arXiv preprint arXiv:2112.10741 , year=

work page internal anchor Pith review arXiv
[60]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Learned video compression , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[61]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Dvc: An end-to-end deep video compression framework , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[62]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Neural inter-frame compression for video coding , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
[63]

Imagen Video: High Definition Video Generation with Diffusion Models

Imagen video: High definition video generation with diffusion models , author=. arXiv preprint arXiv:2210.02303 , year=

work page internal anchor Pith review arXiv
[64]

IEEE Journal of Selected Topics in Signal Processing , volume=

Learning for video compression with recurrent auto-encoder and recurrent probability model , author=. IEEE Journal of Selected Topics in Signal Processing , volume=. 2020 , publisher=

2020
[65]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Conditional image-to-video generation with latent flow diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[66]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Low bandwidth video-chat compression using deep generative models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[67]

ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

Ultra-low bitrate video conferencing using deep image animation , author=. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2021 , organization=

2021
[68]

arXiv preprint arXiv:2505.16152 , year=

Compressing Human Body Video with Interactive Semantics: A Generative Approach , author=. arXiv preprint arXiv:2505.16152 , year=

work page arXiv
[69]

Proceedings of the European conference on computer vision (ECCV) , pages=

Video compression through image interpolation , author=. Proceedings of the European conference on computer vision (ECCV) , pages=
[70]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Hierarchical b-frame video coding using two-layer canf without motion coding , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[71]

European conference on computer vision , pages=

Content adaptive and error propagation aware deep video compression , author=. European conference on computer vision , pages=. 2020 , organization=

2020
[72]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Task-aware encoder control for deep video compression , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[73]

2022 Picture Coding Symposium (PCS) , pages=

Generative video compression with a transformer-based discriminator , author=. 2022 Picture Coding Symposium (PCS) , pages=. 2022 , organization=

2022
[74]

ACM Transactions on Multimedia Computing, Communications and Applications , volume=

Adaptive prediction structure for learned video compression , author=. ACM Transactions on Multimedia Computing, Communications and Applications , volume=. 2024 , publisher=

2024
[75]

IEEE Journal on Emerging and Selected Topics in Circuits and Systems , volume=

Cgvc-t: Contextual generative video compression with transformers , author=. IEEE Journal on Emerging and Selected Topics in Circuits and Systems , volume=. 2024 , publisher=

2024
[76]

ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

M3-CVC: Controllable Video Compression with Multimodal Generative Models , author=. ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2025 , organization=

2025
[77]

arXiv preprint arXiv:2304.11603 , year=

Lamd: Latent motion diffusion for video generation , author=. arXiv preprint arXiv:2304.11603 , year=

work page arXiv
[78]

European Conference on Computer Vision , pages=

Movideo: Motion-aware video generation with diffusion model , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024
[79]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Go-with-the-flow: Motion-controllable video diffusion models using real-time warped noise , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
[80]

Towards Accurate Generative Models of Video: A New Metric & Challenges

Towards accurate generative models of video: A new metric & challenges , author=. arXiv preprint arXiv:1812.01717 , year=

work page internal anchor Pith review arXiv

Showing first 80 references.