arxiv: 2604.18047 · v1 · submitted 2026-04-20 · 💻 cs.CV

Recognition: unknown

GS-STVSR: Ultra-Efficient Continuous Spatio-Temporal Video Super-Resolution via 2D Gaussian Splatting

Mingyu Shi , Xin Di , Long Peng , Boxiang Cao , Anran Wu , Zhanfeng Feng , Jiaming Guo , Renjing Pei

show 3 more authors

Xueyang Fu Yang Cao Zhengjun Zha

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:43 UTC · model grok-4.3

classification 💻 cs.CV

keywords continuous spatio-temporal video super-resolution2D Gaussian splattingarbitrary scaleoptical flow motion modelinginference efficiencyvideo frame interpolationcovariance alignment

0 comments

The pith

A 2D Gaussian splatting method performs continuous video super-resolution at arbitrary scales without dense grid queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Continuous spatio-temporal video super-resolution aims to increase both the spatial detail and the frame rate of a video by any chosen factor. Earlier implicit neural representation techniques compute each output pixel by querying a dense coordinate grid, so the work scales directly with the number of frames produced. GS-STVSR instead represents each frame as a collection of 2D Gaussian kernels whose positions, colors, and covariances are updated continuously through an optical-flow motion model and stable covariance fitting. This removes the per-frame grid cost entirely, yielding state-of-the-art quality on Vid4, GoPro, and Adobe240 while keeping inference time nearly constant from 2x to 8x temporal scaling and more than three times faster at 32x scaling. The result matters for any application that needs flexible, high-frame-rate video enhancement on ordinary hardware.

Core claim

GS-STVSR is an ultra-efficient continuous spatio-temporal video super-resolution framework based on 2D Gaussian Splatting that drives the spatiotemporal evolution of Gaussian kernels through continuous motion modeling, bypassing dense grid queries entirely. It exploits the strong temporal stability of covariance parameters for lightweight intermediate fitting, designs an optical flow-guided motion module to derive Gaussian position and color at arbitrary time steps, introduces a Covariance resampling alignment module to prevent covariance drift, and proposes an adaptive offset window for large-scale motion. Experiments on Vid4, GoPro, and Adobe240 show state-of-the-art quality across all the

What carries the argument

2D Gaussian Splatting representation whose kernels are evolved by an optical flow-guided motion module together with covariance resampling alignment, allowing direct computation of any intermediate frame without grid sampling.

If this is right

Inference cost for temporal upsampling becomes independent of the number of output frames.
Extreme-scale frame-rate conversion remains practical on standard hardware.
Quality stays competitive with slower INR-based methods on established video benchmarks.
The same Gaussian-kernel motion model can be reused for other continuous video tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same constant-time property could be tested on longer uncurated video sequences where motion complexity exceeds the benchmark clips.
If the motion module generalizes, Gaussian splatting might replace INR pipelines in other domains that currently suffer from query-density scaling.
Extending the kernels to a 3D spatio-temporal volume could link this approach to continuous novel-view synthesis from video.

Load-bearing premise

Covariance parameters remain stable enough over time that only lightweight fitting is required, and the optical flow module can derive accurate Gaussian positions and colors at large temporal scales without drift.

What would settle it

Run the method on a 32x temporal upsampling task and measure both perceptual quality against ground truth and wall-clock inference time; visible motion artifacts or linear growth in runtime with frame count would disprove the constant-time claim.

Figures

Figures reproduced from arXiv: 2604.18047 by Anran Wu, Boxiang Cao, Jiaming Guo, Long Peng, Mingyu Shi, Renjing Pei, Xin Di, Xueyang Fu, Yang Cao, Zhanfeng Feng, Zhengjun Zha.

**Figure 2.** Figure 2: Overview of the GS-STVSR framework.Given two input low-resolution frames, we first extract their features and bidirectional optical flows. Based on temporal characteristics, the estimation of 2D Gaussian parameters is decoupled into two branches: (a) The Covariance Resampling Alignment Module predicts the intermediate covariance by resampling from a pre-defined Covariance Prior Bank (CPB), which ensures te… view at source ↗

**Figure 3.** Figure 3: Illustration of the Motion-Aware Adaptive Offset [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison on arbitrary-scale temporal interpolation. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 6.** Figure 6: Visual comparison at 𝑡 = 0.5 on a GoPro test scene with large camera shake (×4 spatial, ×8 temporal). 5 Limitations and Future Work While GS-STVSR achieves strong efficiency and reconstruction quality, several limitations remain. First, the linear motion assumption for optical flow interpolation may introduce alignment errors under complex non-linear motions (e.g., rotation or acceleration). Second, our f… view at source ↗

**Figure 7.** Figure 7: Comparison of temporal correlation between pixel [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative comparison of different C-STVSR methods under large-scale global camera motions. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative comparisons of different C-STVSR methods on in-distribution time scale. The spatial scaling factor is set [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative comparisons of different C-STVSR methods on out-of-distribution time scale. The spatial scaling factor [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: Qualitative comparison of different C-STVSR methods in the VSR task. The spatial scaling factor is set to [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

read the original abstract

Continuous Spatio-Temporal Video Super-Resolution (C-STVSR) aims to simultaneously enhance the spatial resolution and frame rate of videos by arbitrary scale factors, offering greater flexibility than fixed-scale methods that are constrained by predefined upsampling ratios. In recent years, methods based on Implicit Neural Representations (INR) have made significant progress in C-STVSR by learning continuous mappings from spatio-temporal coordinates to pixel values. However, these methods fundamentally rely on dense pixel-wise grid queries, causing computational cost to scale linearly with the number of interpolated frames and severely limiting inference efficiency. We propose GS-STVSR, an ultra-efficient C-STVSR framework based on 2D Gaussian Splatting (2D-GS) that drives the spatiotemporal evolution of Gaussian kernels through continuous motion modeling, bypassing dense grid queries entirely. We exploit the strong temporal stability of covariance parameters for lightweight intermediate fitting, design an optical flow-guided motion module to derive Gaussian position and color at arbitrary time steps, introduce a Covariance resampling alignment module to prevent covariance drift, and propose an adaptive offset window for large-scale motion. Extensive experiments on Vid4, GoPro, and Adobe240 show that GS-STVSR achieves state-of-the-art quality across all benchmarks. Moreover, its inference time remains nearly constant at conventional temporal scales (X2--X8) and delivers over X3 speedup at extreme scales X32, demonstrating strong practical applicability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GS-STVSR swaps dense INR queries for 2D Gaussian splatting with optical-flow motion modeling to hold inference time flat across scales, but the stability claims need the full ablations to check.

read the letter

The paper's main contribution is taking 2D Gaussian splatting and fitting it to continuous spatio-temporal video super-resolution. Instead of querying a dense grid for every output pixel and frame, they evolve a set of Gaussians over time using optical flow to set positions and colors, then splat them. This is the part that actually differs from prior INR work on C-STVSR. They add three targeted pieces: a covariance resampling alignment to limit drift, an adaptive offset window for large motions, and the observation that covariance parameters are stable enough to reuse across frames without refitting everything. That last point is what lets them claim near-constant runtime even at X32 temporal scale. The abstract reports SOTA quality on Vid4, GoPro, and Adobe240 plus the speed numbers, which is the practical payoff they are after. The efficiency angle is real; anyone who has tried to run high-scale continuous upsampling knows the query cost becomes the bottleneck. The soft spot is exactly where the reader flagged: the motion module and covariance stability have to work without visible drift or error accumulation at extreme scales, and the abstract gives no equations, no ablation tables, and no failure cases to verify it. If those modules only hold under the tested conditions or required extra tuning, the constant-time claim weakens. The paper is for people who need arbitrary-scale video enhancement in resource-limited settings rather than pure quality chasing. A reader already working on splatting or efficient continuous representations will see the concrete adaptation and the motion-handling tricks. It deserves a serious referee because the problem is practical, the construction is distinct, and the efficiency numbers are the kind of result that matters for deployment even if the current evidence is still high-level. Send it out.

Referee Report

0 major / 0 minor

Summary. The paper introduces GS-STVSR, a framework for continuous spatio-temporal video super-resolution (C-STVSR) based on 2D Gaussian Splatting. It replaces dense pixel-wise INR queries with continuous motion modeling of Gaussian kernels to achieve arbitrary spatial and temporal upsampling. The approach exploits temporal stability of covariance parameters for lightweight fitting, uses an optical flow-guided motion module to derive position and color at arbitrary times, adds a covariance resampling alignment module to avoid drift, and incorporates an adaptive offset window for large-scale motion. Experiments on Vid4, GoPro, and Adobe240 are reported to achieve SOTA quality while keeping inference time nearly constant for X2--X8 scales and delivering over 3x speedup at X32.

Significance. If the covariance stability and drift-free optical-flow motion modeling hold under the reported conditions, the work would offer a meaningful efficiency advance for C-STVSR by removing the linear dependence of compute on the number of interpolated frames. The near-constant inference time across conventional and extreme temporal scales addresses a practical bottleneck in prior INR methods and could support real-world video enhancement pipelines. The SOTA quality claims, if backed by full ablations and fair comparisons, would add to the contribution; the efficiency result at X32 is particularly noteworthy for scalability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review and for recognizing the potential of GS-STVSR to address the inference-time bottleneck in continuous spatio-temporal video super-resolution. We note the 'uncertain' recommendation and are happy to provide further details or revisions if specific concerns arise.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces GS-STVSR as a new construction based on 2D Gaussian Splatting for C-STVSR, with explicitly described independent modules (optical flow-guided motion, covariance resampling alignment, adaptive offset window) that bypass dense pixel queries. These are presented as design choices rather than reductions of outputs to inputs by definition or fitted parameters renamed as predictions. Claims of SOTA quality and near-constant inference time rest on benchmark experiments (Vid4, GoPro, Adobe240) and are not shown to collapse into self-citations or self-definitional loops in the abstract or described framework. The derivation chain appears self-contained with external validation via empirical results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on domain assumptions about Gaussian kernel temporal stability and accurate motion derivation; no free parameters or invented entities are explicitly quantified in the abstract.

axioms (1)

domain assumption Strong temporal stability of covariance parameters allows lightweight intermediate fitting
Invoked to enable efficient fitting without dense queries.

pith-pipeline@v0.9.0 · 5594 in / 1234 out tokens · 39584 ms · 2026-05-10T04:43:04.380747+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

175 extracted references · 42 canonical work pages · 2 internal anchors

[1]

Haowen Bai, Jiangshe Zhang, Zixiang Zhao, Yichen Wu, Lilun Deng, Yukun Cui, Tao Feng, and Shuang Xu. 2025. Task-driven image fusion with learnable fusion loss. InProceedings of the Computer Vision and Pattern Recognition Conference. 7457–7468

2025
[2]

Haowen Bai, Zixiang Zhao, Jiangshe Zhang, Yichen Wu, Lilun Deng, Yukun Cui, Baisong Jiang, and Shuang Xu. 2025. Refusion: Learning image fusion from reconstruction with learnable loss via meta-learning.International Journal of Computer Vision133, 5 (2025), 2547–2567

2025
[3]

Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, and Ming- Hsuan Yang. 2019. Depth-Aware Video Frame Interpolation. InIEEE Conferene on Computer Vision and Pattern Recognition

2019
[4]

Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation. InProceedings of the IEEE conference on computer vision and pattern recognition. 4778–4787

2017
[5]

Chengzhi Cao, Xueyang Fu, Senyan Xu, Chengjie Ge, Kunyu Wang, and Zheng- Jun Zha. 2026. Learning Robust Event-Guided Representations for Person Re-Identification: Cao et al.International Journal of Computer Vision134, 2 (2026), 82

2026
[6]

Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy

Kelvin C.K. Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2021. BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond. InProceedings of the IEEE conference on computer vision and pattern recognition

2021
[7]

Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy

Kelvin C.K. Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy. 2022. BasicVSR++: Improving video super-resolution with enhanced propagation and alignment. InIEEE Conference on Computer Vision and Pattern Recognition

2022
[8]

Yinbo Chen, Sifei Liu, and Xiaolong Wang. 2021. Learning continuous image representation with local implicit image function. 8628–8638

2021
[9]

Yi-Hsin Chen, Si-Cun Chen, Yen-Yu Lin, and Wen-Hsiao Peng. 2023. MoTIF: Learning motion trajectories with local implicit neural functions for continuous space-time video super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision. 23131–23141

2023
[10]

Zeyuan Chen, Yinbo Chen, Jingwen Liu, Xingqian Xu, Vidit Goel, Zhangyang Wang, Humphrey Shi, and Xiaolong Wang. 2022. VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion(2022)

2022
[11]

Zheng Chen, Zichen Zou, Kewei Zhang, Xiongfei Su, Xin Yuan, Yong Guo, and Yulun Zhang. 2025. DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution. InNeurIPS

2025
[12]

Yuning Cui, Wenqi Ren, Xiaochun Cao, and Alois Knoll. 2023. Focal Network for Image Restoration. InProceedings of the IEEE/CVF International Conference on Computer Vision. 13001–13011

2023
[13]

Xin Di, Long Peng, Peizhe Xia, Wenbo Li, Renjing Pei, Yang Cao, Yang Wang, and Zheng-Jun Zha. 2025. Qmambabsr: Burst image super-resolution with query state space model. InProceedings of the Computer Vision and Pattern Recognition Conference. 23080–23090

2025
[14]

Yan Ding, Shuang Li, Huafeng Li, Guanqiu Qi, Baisen Cong, Yunpeng Gong, and Zhiqin Zhu. 2026. Physical Regularization Loss: Integrating Physical Knowledge to Image Segmentation.International Journal of Computer Vision134, 3 (2026), 137

2026
[15]

Jiajun Dong, Chengkun Wang, Wenzhao Zheng, Lei Chen, Jiwen Lu, and Yan- song Tang. 2025. GaussianToken: An Effective Image Tokenizer with 2D Gauss- ian Splatting.ArXivabs/2501.15619 (2025). https://api.semanticscholar.org/ CorpusID:275920953

work page arXiv 2025
[16]

Zheng-Peng Duan, Jiawei Zhang, Xin Jin, Ziheng Zhang, Zheng Xiong, Dongqing Zou, Jimmy S Ren, Chunle Guo, and Chongyi Li. 2025. Dit4sr: Taming diffusion transformer for real-world image super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision. 18948–18958

2025
[17]

Zheng-Peng Duan, Jiawei Zhang, Zheng Lin, Xin Jin, XunDong Wang, Dongqing Zou, Chun-Le Guo, and Chongyi Li. 2025. Diffretouch: Using diffusion to retouch on the shoulder of experts. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 2825–2833

2025
[18]

Zheng-Peng Duan, Jiawei Zhang, Siyu Liu, Zheng Lin, Chun-Le Guo, Dongqing Zou, Jimmy Ren, and Chongyi Li. 2025. A diffusion-based framework for occluded object movement. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 2816–2824

2025
[19]

Jiemin Fang, Taoran Yi, Xinggang Wang, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Matthias Nießner, and Qi Tian. 2022. Fast Dynamic Radiance Fields with Time-Aware Neural Voxels. InSIGGRAPH Asia 2022 Conference Papers

2022
[20]

ZhanFeng Feng, Long Peng, Xin Di, Yong Guo, Wenbo Li, Yulun Zhang, Renjing Pei, Yang Wang, Yang Cao, and Zheng-Jun Zha. [n. d.]. PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems
[21]

Xueyang Fu, Chengzhi Cao, Senyan Xu, Fanrui Zhang, Kunyu Wang, and Zheng- Jun Zha. 2024. Event-driven heterogeneous network for video deraining.Inter- national Journal of Computer Vision132, 12 (2024), 5841–5861

2024
[22]

Daiheng Gao, Nanxiang Jiang, Andi Zhang, Shilin Lu, Yufei Tang, Wenbo Zhou, Weiming Zhang, and Zhaoxin Fan. 2025. Revoking amnesia: Rl-based trajectory optimization to resurrect erased concepts in diffusion models.arXiv preprint arXiv:2510.03302(2025)

work page arXiv 2025
[23]

Daiheng Gao, Shilin Lu, Shaw Walters, Wenbo Zhou, Jiaming Chu, Jie Zhang, Bang Zhang, Mengxi Jia, Jian Zhao, Zhaoxin Fan, et al. 2024. EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers.arXiv preprint arXiv:2412.20413(2024)

work page arXiv 2024
[24]

Guangwei Gao, Wenjie Li, Juncheng Li, Fei Wu, Huimin Lu, and Yi Yu. 2022. Feature distillation interaction weighting network for lightweight image super- resolution. InProceedings of the AAAI conference on artificial intelligence, Vol. 36. 661–669

2022
[25]

Zhicheng Geng, Luming Liang, Tianyu Ding, and Ilya Zharkov. 2022. RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition. 17441–17451

2022
[26]

Yunpeng Gong, Yongjie Hou, Jiangming Shi, KL DIEP, and Min Jiang. 2026. A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re- Identification. (2026)

2026
[27]

Yunpeng Gong, Liqing Huang, and Lifei Chen. 2022. Person re-identification method based on color attack and joint defence. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4313–4322

2022
[28]

Yunpeng Gong, Jiaquan Li, Lifei Chen, and Min Jiang. 2024. Exploring color in- variance through image-level ensemble learning.arXiv preprint arXiv:2401.10512 (2024)

work page arXiv 2024
[29]

Y Gong, Z Zeng, L Chen, Y Luo, B Weng, and F Ye. 2021. A person re- identification data augmentation method with adversarial defense effect.arXiv preprint arXiv:2101.08783, 2021(2021)

work page arXiv 2021
[30]

Yunpeng Gong, Chuangliang Zhang, Yongjie Hou, Lifei Chen, and Min Jiang
[31]

InInternational Joint Conference on Neural Networks (IJCNN)

Beyond dropout: Robust convolutional neural networks based on local feature masking. InInternational Joint Conference on Neural Networks (IJCNN). IEEE
[32]

Yunpeng Gong, Zhun Zhong, Yansong Qu, Zhiming Luo, Rongrong Ji, and Min Jiang. 2024. Cross-modality perturbation synergy attack for person re- identification.Advances in Neural Information Processing Systems37 (2024), 23352–23377

2024
[33]

Muhammad Haris, Greg Shakhnarovich, and Norimichi Ukita. 2020. Space- time-aware multi-resolution video enhancement. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2859–2868

2020
[34]

Chunming He, Chengyu Fang, Yulun Zhang, Tian Ye, Kai Li, Longxiang Tang, Zhenhua Guo, Xiu Li, and Sina Farsiu. 2025. Reti-diff: Illumination degradation image restoration with retinex-based latent diffusion model.ICLR(2025)

2025
[35]

Chunming He, Kai Li, Guoxia Xu, Jiangpeng Yan, Longxiang Tang, Yulun Zhang, Yaowei Wang, and Xiu Li. 2023. Hqg-net: Unpaired medical image enhancement with high-quality guidance.IEEE Transactions on Neural Networks and Learning Systems(2023)

2023
[36]

Chunming He, Kai Li, Guoxia Xu, Yulun Zhang, Runze Hu, Zhenhua Guo, and Xiu Li. 2023. Degradation-resistant unfolding network for heterogeneous image fusion. InICCV. 12611–12621

2023
[37]

Chunming He, Kai Li, Yachao Zhang, Longxiang Tang, Yulun Zhang, Zhenhua Guo, and Xiu Li. 2023. Camouflaged object detection with feature decomposition and edge reconstruction. InCVPR. 22046–22055

2023
[38]

Chunming He, Kai Li, Yachao Zhang, Guoxia Xu, Longxiang Tang, Yulun Zhang, Zhenhua Guo, and Xiu Li. 2024. Weakly-supervised concealed object segmenta- tion with sam-based pseudo labeling and multi-scale feature grouping.NeurIPS 36 (2024)

2024
[39]

Chunming He, Kai Li, Yachao Zhang, Ziyun Yang, Longxiang Tang, Yulun Zhang, Linghe Kong, and Sina Farsiu. 2025. Segment concealed object with incomplete supervision.TPAMI(2025)

2025
[40]

Chunming He, Kai Li, Yachao Zhang, Yulun Zhang, Zhenhua Guo, Xiu Li, Martin Danelljan, and Fisher Yu. 2024. Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors by Generating Camouflaged Objects.ICLR (2024)

2024
[41]

Chunming He, Fengyang Xiao, Rihan Zhang, Chengyu Fang, Deng-Ping Fan, and Sina Farsiu. 2025. Reversible Unfolding Network for Concealed Visual Perception with Generative Refinement.arXiv preprint arXiv:2508.15027(2025)

work page arXiv 2025
[42]

Chunming He, Rihan Zhang, Fengyang Xiao, Chenyu Fang, Longxiang Tang, Yulun Zhang, Linghe Kong, Deng-Ping Fan, Kai Li, and Sina Farsiu. 2025. RUN: Reversible Unfolding Network for Concealed Object Segmentation.ICML(2025)

2025
[43]

Chunming He, Rihan Zhang, Dingming Zhang, Fengyang Xiao, Deng-Ping Fan, and Sina Farsiu. 2025. Nested Unfolding Network for Real-World Concealed Object Segmentation.arXiv preprint arXiv:2511.18164(2025)

work page arXiv 2025
[44]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition.IEEE(2016). 10

2016
[45]

Jintong Hu, Bin Xia, Bin Chen, Wenming Yang, and Lei Zhang. 2024. GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution. arXiv:2407.18046 [cs.CV] https://arxiv.org/abs/2407.18046

work page arXiv 2024
[46]

Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2024. 2D Gaussian Splatting for Geometrically Accurate Radiance Fields. InSIGGRAPH 2024 Conference Papers. Association for Computing Machinery. doi:10.1145/ 3641519.3657428

work page arXiv 2024
[47]

Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, and Xiaojuan Qi. 2023. SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes.arXiv preprint arXiv:2312.14937(2023)

work page arXiv 2023
[48]

Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi, and Shuchang Zhou
[49]

InProceedings of the European Conference on Computer Vision (ECCV)

Real-Time Intermediate Flow Estimation for Video Frame Interpolation. InProceedings of the European Conference on Computer Vision (ECCV)
[50]

Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned- Miller, and Jan Kautz. 2018. Super slomo: High quality estimation of multiple intermediate frames for video interpolation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

2018
[51]

Siyuan Jiang, Senyan Xu, and Xingfu Wang. 2024. Rbsformer: Enhanced trans- former network for raw image super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6479–6488

2024
[52]

Dongyang Jin, Ryan Xu, Jianhao Zeng, Rui Lan, Yancheng Bai, Lei Sun, and Xiangxiang Chu. 2025. Semantic Context Matters: Improving Conditioning for Autoregressive Models.arXiv preprint arXiv:2511.14063(2025)

work page arXiv 2025
[53]

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis
[54]

https://repo-sam.inria.fr/fungraph/3d- gaussian-splatting/

3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Transactions on Graphics42, 4 (July 2023). https://repo-sam.inria.fr/fungraph/3d- gaussian-splatting/

2023
[55]

Eunjin Kim, Hyeonjin Kim, Kyong Hwan Jin, and Jaejun Yoo. 2025. BF-STVSR: B-Splines and Fourier-Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution.arXiv preprint arXiv:2501.11043(2025)

work page arXiv 2025
[56]

Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti- mization.Computer Science(2014)

2014
[57]

Rui Lan, Yancheng Bai, Xu Duan, Mingxing Li, Dongyang Jin, Ryan Xu, Lei Sun, and Xiangxiang Chu. 2025. Flux-text: A simple and advanced diffusion transformer baseline for scene text editing.arXiv preprint arXiv:2505.03329 (2025)

work page arXiv 2025
[58]

Jaewon Lee and Kyong Hwan Jin. 2022. Local Texture Estimator for Implicit Representation Function. 1929–1938

2022
[59]

Jiahui Lei, Yijia Weng, Adam Harley, Leonidas Guibas, and Kostas Daniilidis
[60]

MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds.arXiv preprint arXiv:2405.17421(2024)

work page arXiv 2024
[61]

Dong Li, Yidi Liu, Xueyang Fu, Senyan Xu, and Zheng-Jun Zha. 2024. Fourier- mamba: Fourier learning integration with state space models for image deraining. arXiv preprint arXiv:2405.19450(2024)

work page arXiv 2024
[62]

Leyang Li, Shilin Lu, Yan Ren, and Adams Wai-Kin Kong. 2025. Set you straight: Auto-steering denoising trajectories to sidestep unwanted concepts.arXiv preprint arXiv:2504.12782(2025)

work page arXiv 2025
[63]

Wenjie Li, Heng Guo, Yuefeng Hou, Guangwei Gao, and Zhanyu Ma. 2025. Dual-domain modulation network for lightweight image super-resolution.IEEE Transactions on Multimedia(2025)

2025
[64]

Wenjie Li, Heng Guo, Yuefeng Hou, and Zhanyu Ma. 2026. FourierSR: A Fourier Token-based Plugin for Efficient Image Super-Resolution.IEEE Transactions on Image Processing(2026)

2026
[65]

Wenjie Li, Heng Guo, Xuannan Liu, Kongming Liang, Jiani Hu, Zhanyu Ma, and Jun Guo. 2024. Efficient face super-resolution via wavelet-based feature enhancement network. InProceedings of the 32nd ACM International Conference on Multimedia. 4515–4523

2024
[66]

Wenjie Li, Juncheng Li, Guangwei Gao, Weihong Deng, Jian Yang, Guo-Jun Qi, and Chia-Wen Lin. 2024. Efficient image super-resolution with feature interaction weighted hybrid network.IEEE Transactions on Multimedia(2024)

2024
[67]

Wenjie Li, Juncheng Li, Guangwei Gao, Weihong Deng, Jiantao Zhou, Jian Yang, and Guo-Jun Qi. 2023. Cross-receptive focused inference network for lightweight image super-resolution.IEEE Transactions on Multimedia26 (2023), 864–877

2023
[68]

Wenjie Li, Jinglei Shi, Jin Han, Heng Guo, and Zhanyu Ma. 2026. Seeing Through the Rain: Resolving High-Frequency Conflicts in Deraining and Super- Resolution via Diffusion Guidance. InAAAI

2026
[69]

Wenjie Li, Mei Wang, Kai Zhang, Juncheng Li, Xiaoming Li, Yuhang Zhang, Guangwei Gao, and Zhanyu Ma. 2025. Survey on deep face restoration: From non-blind to blind and beyond.Comput. Surveys(2025)

2025
[70]

Wenjie Li, Xiangyi Wang, Heng Guo, Guangwei Gao, and Zhanyu Ma. 2025. Self- Supervised Selective-Guided Diffusion Model for Old-Photo Face Restoration. InNeurIPs

2025
[71]

Wenjie Li, Yulun Zhang, Guangwei Gao, Heng Guo, and Zhanyu Ma. 2025. Measurement-Constrained Sampling for Text-Prompted Blind Face Restoration. arXiv preprint arXiv:2511.12419

work page arXiv 2025
[72]

Zhan Li, Zhang Chen, Zhong Li, and Yi Xu. 2024. Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8508–8520

2024
[73]

Zhuoyuan Li, Junqi Liao, Chuanbo Tang, Haotian Zhang, Yuqi Li, Yifan Bian, Xihua Sheng, Xinmin Feng, Yao Li, Changsheng Gao, et al. 2025. Ustc-td: A test dataset and benchmark for image and video coding in 2020s.IEEE Transactions on Multimedia(2025)

2025
[74]

Zhuoyuan Li, Zikun Yuan, Li Li, Dong Liu, Xiaohu Tang, and Feng Wu. 2024. Object segmentation-assisted inter prediction for versatile video coding.IEEE Transactions on Broadcasting70, 4 (2024), 1236–1253

2024
[75]

Zhen Li, Zuo-Liang Zhu, Ling-Hao Han, Qibin Hou, Chun-Le Guo, and Ming- Ming Cheng. 2023. AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2023
[76]

Jingyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, and Luc Van Gool. 2022. Recur- rent Video Restoration Transformer with Guided Deformable Attention.arXiv preprint arXiv:2206.02146(2022)

work page arXiv 2022
[77]

Ce Liu and Deqing Sun. 2011. A Bayesian approach to adaptive video super resolution. InCVPR 2011. 209–216

2011
[78]

Kean Liu, Mingchen Zhong, Senyan Xu, Zhijing Sun, Jiaying Zhu, Chengjie Ge, Xingbo Wang, Xin Lu, Xueyang Fu, and Zheng-Jun Zha. 2025. Event-conditioned dual-modal fusion for motion deblurring. InProceedings of the Computer Vision and Pattern Recognition Conference. 1482–1492

2025
[79]

Yidi Liu, Dong Li, Jie Xiao, Yuanfei Bao, Senyan Xu, and Xueyang Fu. 2025. DreamUHD: Frequency Enhanced Variational Autoencoder for Ultra-High- Definition Image Restoration. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 5712–5720

2025
[80]

Shilin Lu, Xinghong Hu, Chengyou Wang, Lu Chen, Shulu Han, and Yuejia Han

Showing first 80 references.