arxiv: 2605.04581 · v1 · submitted 2026-05-06 · 💻 cs.CV

Recognition: unknown

GTF: Omnidirectional EPI Transformer for Light Field Super-Resolution

Kunyu Li , Fei Wang , Lichao Zhang , Junjie Liu , Bihong Li

Authors on Pith no claims yet

Pith reviewed 2026-05-08 16:23 UTC · model grok-4.3

classification 💻 cs.CV

keywords light fieldsuper-resolutionepipolar plane imagetransformerdirectional fusionomnidirectionalNTIRE challenge

0 comments

The pith

An omnidirectional Transformer that processes all four EPI directions improves light field image super-resolution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GTF, an omnidirectional EPI Transformer for light field super-resolution. It explicitly models epipolar plane images in horizontal, vertical, 45-degree, and 135-degree directions, unlike prior methods that focused only on horizontal and vertical. The architecture uses directional EPI processing, MacPI-based prior injection, adaptive directional fusion, and a topology-preserving feed-forward network to better capture light field geometry. This approach leads to improved reconstruction on both real-captured and synthetic scenes, as shown by high PSNR scores on benchmarks and competitive challenge rankings. A lightweight variant meets strict efficiency requirements while maintaining strong performance.

Core claim

GTF combines directional EPI processing, MacPI-based prior injection, adaptive directional fusion, and a topology-preserving feed-forward network to explicitly model horizontal, vertical, 45-degree, and 135-degree EPIs in a unified framework for superior light field super-resolution.

What carries the argument

Omnidirectional EPI Transformer with adaptive directional fusion of four EPI orientations to capture full epipolar geometry.

If this is right

GTF achieves 32.78 dB PSNR on five standard benchmarks without additional inference enhancements.
The lightweight GTF-Tiny reaches 32.57 dB using only 0.915 million parameters and 19.81 GFLOPs.
The model secures 3rd place on two tracks and 4th on one in the NTIRE 2026 LF SR Challenge.
Ablation studies validate the contribution of diagonal EPI modeling and the fusion strategy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This directional approach could be adapted to other light field tasks such as depth estimation or novel view synthesis where diagonal disparities matter.
The adaptive fusion might apply to multi-directional data in other domains like video processing or medical imaging.
Testing the model on more diverse real-world LF datasets could reveal its robustness to noise and varying scene complexities.

Load-bearing premise

That modeling the diagonal 45 and 135 degree EPIs with adaptive fusion gives a meaningful improvement over horizontal-vertical only Transformer designs.

What would settle it

Running an ablation study that removes the diagonal EPI branches and measures if performance drops on the standard benchmarks.

Figures

Figures reproduced from arXiv: 2605.04581 by Bihong Li, Fei Wang, Junjie Liu, Kunyu Li, Lichao Zhang.

**Figure 1.** Figure 1: Overview of the proposed GTF framework. (a) Overall network architecture of GTF with MacPI prior injection, stacked Omni view at source ↗

**Figure 2.** Figure 2: Qualitative comparison on representative real and synthetic benchmark scenes. From left to right, each group shows DistgSSR, view at source ↗

read the original abstract

Light field (LF) image super-resolution benefits from Epipolar Plane Images (EPIs), whose line slopes explicitly encode disparity. However, existing Transformer-based LF SR methods mainly attend to horizontal and vertical EPIs, leaving diagonal epipolar geometry underexplored. We present GTF, an omnidirectional EPI Transformer that explicitly models horizontal, vertical, 45-degree, and 135-degree EPIs within a unified reconstruction framework. GTF combines directional EPI processing, MacPI-based prior injection, adaptive directional fusion, and a topology-preserving feed-forward network to better exploit LF geometry. For the NTIRE 2026 fidelity tracks, we use GTF as the main model, while a lightweight GTF-Tiny variant targets the efficiency track. On five standard LF SR benchmarks covering both real-captured and synthetic scenes, GTF reaches 32.78 dB without inference-time enhancement, and stronger inference settings with EPSW and test-time augmentation further improve performance. Under the NTIRE 2026 efficiency constraint, GTF-Tiny attains 32.57 dB with only 0.915M parameters and 19.81 GFLOPs. In the NTIRE 2026 Light Field Image Super-Resolution Challenge, our submissions rank 3rd on Track 1 and Track 3 and 4th on Track 2. Architecture-evolution, channel-width, and inference analyses further support the effectiveness of diagonal EPI modeling, directional fusion, and the lightweight design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GTF adds diagonal EPI directions to Transformer LF-SR and posts competitive benchmark numbers plus a strong efficiency variant, but the gains from diagonals are not cleanly separated from the fusion and FFN additions.

read the letter

GTF takes the standard Transformer setup for light field super-resolution and extends it to handle horizontal, vertical, 45-degree, and 135-degree EPIs in one framework. It layers on MacPI prior injection, adaptive directional fusion, and a topology-preserving feed-forward network, then tests the full model and a tiny version on five standard benchmarks that mix real and synthetic scenes. The main model reaches 32.78 dB without extra inference tricks, and GTF-Tiny hits 32.57 dB while staying under the NTIRE 2026 efficiency limits on parameters and FLOPs. The submissions also placed in the top four across the challenge tracks. Those are the concrete results the paper delivers. The architecture-evolution and channel-width checks give some backing for the overall design. The soft spot is exactly the one the stress-test note flags. The headline performance is attributed to omnidirectional EPI modeling, yet the experiments bundle the new diagonal branches with the MacPI injection, the adaptive fusion, and the modified FFN. There is no controlled run that keeps every other component identical to a strict horizontal-vertical baseline and only enables the 45/135 paths. Without that isolation, it is difficult to know how much the diagonal geometry itself improves results versus the extra capacity or the fusion changes. The abstract and the reported analyses do not resolve this. This work is aimed at people already working on light-field or multi-view super-resolution pipelines. A reader who needs practical numbers on standard LF benchmarks and an efficiency-friendly variant will get direct value from it. The task is well-defined, the results are reported on external data, and the claims are empirical rather than circular, so it deserves a serious referee even though the ablation could be tightened. I would send it out for review.

Referee Report

1 major / 2 minor

Summary. The paper proposes GTF, an omnidirectional EPI Transformer for light field super-resolution that explicitly processes horizontal, vertical, 45°, and 135° epipolar plane images via directional EPI branches, MacPI prior injection, adaptive directional fusion, and a topology-preserving FFN. It reports 32.78 dB PSNR on five standard LF SR benchmarks (real and synthetic) without inference enhancements, with GTF-Tiny reaching 32.57 dB under 0.915M parameters and 19.81 GFLOPs for the NTIRE 2026 efficiency track; submissions rank 3rd/4th in the challenge. Architecture-evolution, channel-width, and inference analyses are presented to support the design choices.

Significance. If the performance gains hold under controlled evaluation, the work would provide a concrete demonstration that incorporating diagonal epipolar geometry can improve Transformer-based LF SR beyond horizontal-vertical baselines, with added value from the efficiency variant and challenge results. The empirical focus on standard benchmarks and parameter/FLOP reporting strengthens its practical relevance for multi-view imaging tasks.

major comments (1)

[Architecture-evolution analysis] Architecture-evolution analysis (mentioned in abstract): the headline attribution of the 32.78 dB result to omnidirectional (including 45°/135°) EPI modeling is not supported by an isolated ablation that enables only the diagonal branches while holding MacPI injection, adaptive fusion, topology-preserving FFN, and all training settings identical to a strict horizontal-vertical baseline. Without this controlled comparison, gains cannot be separated from capacity increases or fusion effects, directly weakening the central claim.

minor comments (2)

[Abstract] The abstract states concrete PSNR, parameter, and ranking numbers but does not name the five specific benchmarks or provide error bars/standard deviations; this should be added for reproducibility.
[Methods] Notation for the four directional EPIs and the MacPI prior should be defined with explicit equations or diagrams in the methods section to clarify how 45°/135° slopes are discretized and fused.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thorough review and constructive feedback. We address the major comment below and will revise the manuscript to strengthen the evidence for our claims.

read point-by-point responses

Referee: [Architecture-evolution analysis] Architecture-evolution analysis (mentioned in abstract): the headline attribution of the 32.78 dB result to omnidirectional (including 45°/135°) EPI modeling is not supported by an isolated ablation that enables only the diagonal branches while holding MacPI injection, adaptive fusion, topology-preserving FFN, and all training settings identical to a strict horizontal-vertical baseline. Without this controlled comparison, gains cannot be separated from capacity increases or fusion effects, directly weakening the central claim.

Authors: We appreciate the referee highlighting this point. Our architecture-evolution analysis shows incremental gains when adding diagonal EPI branches, but we acknowledge that the current presentation does not include a strictly isolated ablation enabling only the diagonal branches on top of a fixed horizontal-vertical baseline while holding MacPI injection, adaptive directional fusion, topology-preserving FFN, and all training settings identical. To better isolate and substantiate the contribution of omnidirectional (including 45°/135°) EPI modeling to the reported performance, we will perform and include this controlled ablation in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture evaluated on external benchmarks

full rationale

The paper presents GTF as a novel Transformer architecture for light-field super-resolution that processes omnidirectional EPIs, with performance measured directly on five standard external benchmarks (real and synthetic). No equations, derivations, or first-principles predictions are claimed; results are reported as empirical outcomes of training and inference on held-out data. Architecture-evolution and channel-width analyses are internal ablations supporting design choices but do not reduce the headline metrics to quantities defined by the inputs or self-citations. The central claim remains falsifiable against independent test sets and does not collapse by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The performance claims rest on standard deep-learning components (attention, residual connections, fusion modules) whose effectiveness is demonstrated empirically rather than derived from first principles; no new physical or mathematical axioms are introduced.

free parameters (1)

model hyperparameters and training settings
Typical learned weights, learning rates, and architectural widths in a Transformer model; not enumerated in the abstract.

axioms (1)

domain assumption Transformer attention layers can capture directional epipolar features when applied to EPI slices
Invoked by the directional EPI processing blocks described in the abstract.

pith-pipeline@v0.9.0 · 5575 in / 1373 out tokens · 32898 ms · 2026-05-08T16:23:20.669177+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 3 canonical work pages · 3 internal anchors

[1]

Layer Normalization

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hin- ton. Layer normalization.arXiv preprint arXiv:1607.06450,

work page internal anchor Pith review arXiv
[2]

Light field image super-resolution via angular and spatial interactive network

Yingqian Chen, Longguang Wang, Yingqian Wang, Jungang Yang, and Yulan Guo. Light field image super-resolution via angular and spatial interactive network. InECCV, 2022. 2

2022
[3]

Exploiting spatial and angular correlations with deep efficient transformers for light field image super- resolution.IEEE TMM, 26:1421–1435, 2024

Ruixuan Cong, Hao Sheng, Da Yang, Zhenglong Cui, and Rongshan Chen. Exploiting spatial and angular correlations with deep efficient transformers for light field image super- resolution.IEEE TMM, 26:1421–1435, 2024. 1, 2, 3, 5, 6, 8

2024
[4]

Light field image super-resolution net- work via joint spatial-angular and epipolar information

Vinh Van Duong, Thuc Nguyen Huu, Jonghoon Yim, and Byeungwoo Jeon. Light field image super-resolution net- work via joint spatial-angular and epipolar information. IEEE TIP, 32:1534–1545, 2023. 1, 2, 3, 5, 8

2023
[5]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023. 2

work page internal anchor Pith review arXiv 2023
[6]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco An- dreetto, and Hartwig Adam. MobileNets: Efficient convolu- tional neural networks for mobile vision applications.arXiv preprint arXiv:1704.04861, 2017. 4

work page internal anchor Pith review arXiv 2017
[7]

Light field spatial super-resolution via deep combinatorial geom- etry embedding and structural consistency regularization

Jing Jin, Junhui Hou, Jie Chen, and Sam Kwong. Light field spatial super-resolution via deep combinatorial geom- etry embedding and structural consistency regularization. In CVPR, 2020. 5

2020
[8]

LFTransMamba: A hybrid mamba- transformer model for light field image super-resolution

Kai Jin, Zeqiang Wei, Angulia Yang, Di Wu, Mingzhi Gao, and Xiuzhuang Zhou. LFTransMamba: A hybrid mamba- transformer model for light field image super-resolution. In CVPRW, pages 1195–1204, 2025. 1, 2, 6

2025
[9]

Learning-based view synthesis for light field cameras.ACM Transactions on Graphics (TOG), 35(6): 193:1–193:10, 2016

Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ra- mamoorthi. Learning-based view synthesis for light field cameras.ACM Transactions on Graphics (TOG), 35(6): 193:1–193:10, 2016. 1

2016
[10]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015. 6

2015
[11]

Light field image super-resolution with transformers.IEEE Signal Processing Letters, 30:310–314, 2023

Kwanyoung Ko, Yoonjong Yoo, and Suk-Ju Kang. Light field image super-resolution with transformers.IEEE Signal Processing Letters, 30:310–314, 2023. 2

2023
[12]

Light field rendering

Marc Levoy and Pat Hanrahan. Light field rendering. In ACM SIGGRAPH, 1996. 1

1996
[13]

SwinIR: Image restoration using swin transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. SwinIR: Image restoration using swin transformer. InICCV Workshops, 2021. 2

2021
[14]

Light field image super-resolution with transformers.IEEE Signal Processing Letters, 29:563– 567, 2022

Zhengyu Liang, Yingqian Wang, Longguang Wang, Jungang Yang, and Shilin Zhou. Light field image super-resolution with transformers.IEEE Signal Processing Letters, 29:563– 567, 2022. 2, 5, 8

2022
[15]

Learning non- local spatial-angular correlation for light field image super- resolution

Zhengyu Liang, Yingqian Wang, Longguang Wang, Jun- gang Yang, Shilin Zhou, and Yulan Guo. Learning non- local spatial-angular correlation for light field image super- resolution. InICCV, 2023. 1, 2, 3, 5, 7, 8

2023
[16]

Diving into epipolar transformers for light field super-resolution and disparity estimation.IEEE TPAMI, 2026

Zhengyu Liang, Yingqian Wang, Longguang Wang, Jungang Yang, Yulan Guo, Li Liu, Shilin Zhou, and Wei An. Diving into epipolar transformers for light field super-resolution and disparity estimation.IEEE TPAMI, 2026. Early access, on- line ahead of print. 1, 3

2026
[17]

Enhanced deep residual networks for single image super-resolution

Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. InCVPRW, 2017. 2, 5

2017
[18]

Intra-inter view interaction network for light field image super-resolution.IEEE TMM, 25:256–266, 2023

Gaosheng Liu, Huanjing Yue, Jiamin Wu, and Jingyu Yang. Intra-inter view interaction network for light field image super-resolution.IEEE TMM, 25:256–266, 2023. 2, 5

2023
[19]

LFTramba: Comprehensive information learning for light field image super-resolution via a hybrid transformer-mamba frame- work

Haosong Liu, Xiancheng Zhu, Huanqiang Zeng, Jianqing Zhu, Yifan Shi, Jing Chen, and Junhui Hou. LFTramba: Comprehensive information learning for light field image super-resolution via a hybrid transformer-mamba frame- work. InCVPRW, pages 1137–1147, 2025. 1, 2

2025
[20]

VMamba: Visual state space model

Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, and Yunfan Liu. VMamba: Visual state space model. InNeurIPS, 2024. 2

2024
[21]

Light field photography with a hand-held plenoptic camera.Stanford Tech Report CTSR 2005-02, 2005

Ren Ng, Marc Levoy, Mathieu Br ´edif, Gene Duval, Mark Horowitz, and Pat Hanrahan. Light field photography with a hand-held plenoptic camera.Stanford Tech Report CTSR 2005-02, 2005. 1

2005
[22]

Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang

Wenzhe Shi, Jose Caballero, Ferenc Husz ´ar, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, 2016. 3

2016
[23]

EPINET: A fully-convolutional neural network using epipolar geometry for depth from light field images

Changha Shin, Hae-Gon Jeon, Youngjin Yoon, In So Kweon, and Seon Joo Kim. EPINET: A fully-convolutional neural network using epipolar geometry for depth from light field images. InCVPR, 2018. 1, 2, 3

2018
[24]

Training region-based object detectors with online hard ex- ample mining

Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. Training region-based object detectors with online hard ex- ample mining. InCVPR, 2016. 6

2016
[25]

Gomez, Łukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, 2017. 4

2017
[26]

Detail-preserving transformer for light field image super- resolution.Proceedings of the AAAI Conference on Artificial Intelligence, 36(3):2522–2530, 2022

Shunzhou Wang, Tianfei Zhou, Yao Lu, and Huijun Di. Detail-preserving transformer for light field image super- resolution.Proceedings of the AAAI Conference on Artificial Intelligence, 36(3):2522–2530, 2022. 5

2022
[27]

Spatial-angular interaction for light field image super-resolution

Yingqian Wang, Longguang Wang, Jungang Yang, Wei An, and Yulan Guo. Spatial-angular interaction for light field image super-resolution. InECCV, 2020. 1, 2, 5

2020
[28]

Light field im- age super-resolution using deformable convolution.IEEE TIP, 30:1057–1071, 2021

Yingqian Wang, Jungang Yang, Longguang Wang, Xinyi Ying, Tianhao Wu, Wei An, and Yulan Guo. Light field im- age super-resolution using deformable convolution.IEEE TIP, 30:1057–1071, 2021. 2, 5

2021
[29]

NTIRE 2023 challenge on light field image super-resolution: Dataset, methods and results

Yingqian Wang, Longguang Wang, Zhengyu Liang, Jungang Yang, Radu Timofte, and Yulan Guo. NTIRE 2023 challenge on light field image super-resolution: Dataset, methods and results. InCVPRW, 2023. 6

2023
[30]

Disentangling light fields for super-resolution and disparity estimation

Yingqian Wang, Longguang Wang, Gaochang Wu, Jungang Yang, Wei An, Jingyi Yu, and Yulan Guo. Disentangling light fields for super-resolution and disparity estimation. IEEE TPAMI, 45(1):425–443, 2023. 1, 2, 5, 8

2023
[31]

NTIRE 2026 challenge on light field image super-resolution: Methods and results

Yingqian Wang, Zhengyu Liang, Fengyuan Zhang, Wend- ing Zhao, Longguang Wang, Juncheng Li, Jungang Yang, Radu Timofte, Yulan Guo, et al. NTIRE 2026 challenge on light field image super-resolution: Methods and results. In CVPRW, 2026. 6

2026
[32]

Light field spatial super-resolution using deep efficient spatial-angular separa- ble convolution.IEEE TIP, 28(5):2319–2330, 2019

Henry Wing Fung Yeung, Junhui Hou, Xiaoming Chen, Jie Chen, Zhibo Chen, and Yuk Ying Chung. Light field spatial super-resolution using deep efficient spatial-angular separa- ble convolution.IEEE TIP, 28(5):2319–2330, 2019. 2, 5

2019
[33]

LFMix: A lightweight hybrid architecture for light field super- resolution

Mingyang Yu, Zhijian Wu, and Dingjiang Huang. LFMix: A lightweight hybrid architecture for light field super- resolution. InCVPRW, pages 1450–1459, 2025. 2

2025
[34]

Residual net- works for light field image super-resolution

Shuo Zhang, Youfang Lin, and Hao Sheng. Residual net- works for light field image super-resolution. InCVPR, 2019. 2, 5

2019
[35]

End-to-end light field spatial super-resolution network using multiple epipolar geometry.IEEE TIP, 30:5956–5968, 2021

Shuo Zhang, Song Chang, and Youfang Lin. End-to-end light field spatial super-resolution network using multiple epipolar geometry.IEEE TIP, 30:5956–5968, 2021. 5

2021
[36]

Image super-resolution using very deep residual channel attention networks

Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image super-resolution using very deep residual channel attention networks. InECCV, 2018. 2, 5

2018