Recognition: no theorem link
ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction
Pith reviewed 2026-05-13 21:58 UTC · model grok-4.3
The pith
ProDiG progressively refines aerial Gaussian representations into ground-level 3D views by synthesizing intermediate altitudes with diffusion guidance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ProDiG is a diffusion-guided framework that progressively transforms aerial 3D representations toward ground-level fidelity. It synthesizes intermediate-altitude views and refines the Gaussian representation at each stage using a geometry-aware causal attention module that injects epipolar structure into reference-view diffusion, plus a distance-adaptive Gaussian module that dynamically adjusts scale and opacity based on camera distance.
What carries the argument
Progressive refinement loop that synthesizes intermediate views and applies geometry-aware causal attention together with distance-adaptive Gaussian scaling to maintain consistency across large viewpoint gaps.
If this is right
- Produces ground-level renderings whose visual quality and geometric consistency exceed those of prior single-stage or post-hoc methods.
- Maintains stable reconstruction when viewpoint change is extreme, without requiring any additional ground-truth viewpoints.
- Enables coherent 3D site models from aerial-only input on both synthetic and real-world scenes.
- Supports applications that need ground-level fidelity from drone or satellite imagery alone.
Where Pith is reading between the lines
- The same progressive intermediate-view strategy might reduce domain gaps in other 3D tasks such as indoor-to-outdoor or day-to-night translation.
- If the distance-adaptive scaling generalizes, it could stabilize Gaussian splatting under arbitrary camera trajectories beyond aerial-ground pairs.
- The geometry-aware attention could be tested as a drop-in module for other diffusion-based view-synthesis pipelines to improve epipolar consistency.
Load-bearing premise
Synthesizing and refining through intermediate altitudes will reliably close the gap between aerial and ground viewpoints even when no real ground-truth images exist at any lower height.
What would settle it
Run ProDiG on a dataset containing paired aerial and actual ground-level photographs of the same sites, then measure whether the generated ground renderings match the real photographs in both pixel appearance and 3D geometric alignment within a chosen error threshold.
Figures
read the original abstract
Generating ground-level views and coherent 3D site models from aerial-only imagery is challenging due to extreme viewpoint changes, missing intermediate observations, and large scale variations. Existing methods either refine renderings post-hoc, often producing geometrically inconsistent results, or rely on multi-altitude ground-truth, which is rarely available. Gaussian Splatting and diffusion-based refinements improve fidelity under small variations but fail under wide aerial-toground gaps. To address these limitations, we introduce ProDiG (Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction), a diffusionguided framework that progressively transforms aerial 3D representations toward ground-level fidelity. ProDiG synthesizes intermediate-altitude views and refines the Gaussian representation at each stage using a geometry-aware causal attention module that injects epipolar structure into reference-view diffusion. A distance-adaptive Gaussian module dynamically adjusts Gaussian scale and opacity based on camera distance, ensuring stable reconstruction across large viewpoint gaps. Together, these components enable progressive, geometrically grounded refinement without requiring additional ground-truth viewpoints. Extensive experiments on synthetic and real-world datasets demonstrate that ProDiG produces visually realistic ground-level renderings and coherent 3D geometry, significantly outperforming existing approaches in terms of visual quality, geometric consistency, and robustness to extreme viewpoint changes. Project Page: https://sirsh07.github.io/research/prodig
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents ProDiG, a progressive diffusion-guided Gaussian splatting method for reconstructing ground-level views and 3D models from aerial imagery. The approach synthesizes intermediate-altitude views and iteratively refines the Gaussian representation using a geometry-aware causal attention module that incorporates epipolar geometry and a distance-adaptive Gaussian module that adjusts scale and opacity based on camera distance. It claims to achieve superior visual quality, geometric consistency, and robustness to extreme viewpoint changes on both synthetic and real datasets without requiring additional ground-truth ground-level viewpoints.
Significance. If the empirical results and consistency claims are substantiated, this work would represent a meaningful advance in novel view synthesis for large viewpoint gaps, with potential applications in aerial photogrammetry, virtual tourism, and disaster assessment. The progressive refinement strategy combined with explicit geometric constraints in the diffusion process offers a practical solution where prior methods either require multi-altitude data or produce inconsistent geometry.
major comments (1)
- [Method description of progressive refinement] In the description of the geometry-aware causal attention module and distance-adaptive Gaussian module: the paper states that these components inject epipolar structure and dynamically adjust scale/opacity to ensure stable reconstruction, but provides no derivation, bound, or explicit consistency loss (e.g., cycle-consistency or bundle-adjustment term) showing that cumulative geometric drift is prevented across multiple refinement stages. This directly bears on the central claim that the method operates without any ground-truth viewpoints, as diffusion priors could introduce hallucinations not constrained by the initial aerial 3D structure.
minor comments (1)
- [Abstract] The abstract asserts outperformance in visual quality and geometric consistency but omits any quantitative metrics, ablation results, or experimental setup details (e.g., dataset names, baseline methods, or evaluation protocols), which would strengthen the summary for readers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on geometric consistency in our progressive refinement pipeline. We address the single major comment below and outline targeted revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Method description of progressive refinement] In the description of the geometry-aware causal attention module and distance-adaptive Gaussian module: the paper states that these components inject epipolar structure and dynamically adjust scale/opacity to ensure stable reconstruction, but provides no derivation, bound, or explicit consistency loss (e.g., cycle-consistency or bundle-adjustment term) showing that cumulative geometric drift is prevented across multiple refinement stages. This directly bears on the central claim that the method operates without any ground-truth viewpoints, as diffusion priors could introduce hallucinations not constrained by the initial aerial 3D structure.
Authors: We acknowledge that the current manuscript lacks a formal derivation, theoretical bound, or explicit auxiliary loss (such as cycle-consistency) to prove absence of cumulative drift. The geometry-aware causal attention module enforces epipolar constraints by restricting attention to geometrically corresponding rays derived from the initial aerial Gaussian splats, while the distance-adaptive module modulates Gaussian parameters to preserve scale consistency with camera distance. These mechanisms are intended to ground each diffusion step in the original 3D structure, limiting hallucinations. Our experiments on synthetic data with known ground-truth geometry show low drift in multi-stage metrics (e.g., PSNR and depth error remain stable across refinement stages). We agree a more explicit analysis would improve rigor. In revision we will (1) add a dedicated paragraph deriving the attention mask from epipolar geometry and (2) include an ablation quantifying drift over refinement stages. This constitutes a partial revision. revision: partial
Circularity Check
No circularity detected; method is an algorithmic pipeline without load-bearing derivations or self-referential reductions
full rationale
The paper describes ProDiG as a progressive diffusion-guided Gaussian splatting pipeline that synthesizes intermediate views and refines representations via geometry-aware causal attention and distance-adaptive Gaussians. No equations, derivations, or fitted parameters are presented that reduce by construction to the inputs. The central claims rest on the described components and experimental validation on synthetic and real-world datasets rather than on any self-definition, fitted-input prediction, or self-citation chain. The approach is self-contained as an engineering framework without mathematical steps that equate outputs to inputs by definition.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Gaussian Splatting can represent 3D scenes from images
- domain assumption Diffusion models can synthesize coherent novel views when guided
invented entities (2)
-
geometry-aware causal attention module
no independent evidence
-
distance-adaptive Gaussian module
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Myron Brown, Michael Chan, and Michael Twardowski. Wriva public data, 2024. 7
work page 2024
-
[2]
Ray conditioning: Trading photo- consistency for photo-realism in multi-view image genera- tion
Eric Ming Chen, Sidhanth Holalkere, Ruyu Yan, Kai Zhang, and Abe Davis. Ray conditioning: Trading photo- consistency for photo-realism in multi-view image genera- tion. InProceedings of the IEEE/CVF International Confer- ence on Computer Vision, pages 23242–23251, 2023. 3
work page 2023
-
[3]
Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InEuropean Conference on Computer Vision, pages 370–386. Springer, 2024. 1, 3
work page 2024
-
[4]
Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, and Jianfei Cai. Mvs- plat360: Feed-forward 360 scene synthesis from sparse views.Advances in Neural Information Processing Systems, 37:107064–107086, 2024. 1
work page 2024
-
[5]
Dy- namic 3d gaussian fields for urban areas.arXiv preprint arXiv:2406.03175, 2024
Tobias Fischer, Jonas Kulhanek, Samuel Rota Bulo, Lorenzo Porzi, Marc Pollefeys, and Peter Kontschieder. Dy- namic 3d gaussian fields for urban areas.arXiv preprint arXiv:2406.03175, 2024. 1
-
[6]
Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981. 7
work page 1981
-
[7]
Stephanie Fu, Netanel Tamir, Shobhita Sundaram, Lucy Chai, Richard Zhang, Tali Dekel, and Phillip Isola. Dream- sim: Learning new dimensions of human visual similar- ity using synthetic data.arXiv preprint arXiv:2306.09344,
-
[8]
Ruiyuan Gao, Kai Chen, Zhihao Li, Lanqing Hong, Zhenguo Li, and Qiang Xu. Magicdrive3d: Controllable 3d genera- tion for any-view rendering in street scenes.arXiv preprint arXiv:2405.14475, 2024. 1
-
[9]
Skyeyes: Ground roaming using aerial view images
Zhiyuan Gao, Wenbin Teng, Gonglin Chen, Jinsen Wu, Ningli Xu, Rongjun Qin, Andrew Feng, and Yajie Zhao. Skyeyes: Ground roaming using aerial view images. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3045–3054. IEEE, 2025. 1, 3
work page 2025
-
[10]
Dragon: Drone and ground gaussian splatting for 3d building reconstruction
Yujin Ham, Mateusz Michalkiewicz, and Guha Balakrish- nan. Dragon: Drone and ground gaussian splatting for 3d building reconstruction. In2024 IEEE International Confer- ence on Computational Photography (ICCP), pages 1–12. IEEE, 2024. 1, 5
work page 2024
-
[11]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 3
work page 2020
-
[12]
2d gaussian splatting for geometrically ac- curate radiance fields
Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically ac- curate radiance fields. InACM SIGGRAPH 2024 conference papers, pages 1–11, 2024. 2, 7
work page 2024
-
[13]
Epidiff: Enhancing multi-view synthesis via localized epipolar-constrained diffusion
Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, et al. Epidiff: Enhancing multi-view synthesis via localized epipolar-constrained diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 9784–9794, 2024. 3
work page 2024
-
[14]
Horizon- gs: Unified 3d gaussian splatting for large-scale aerial-to- ground scenes
Lihan Jiang, Kerui Ren, Mulin Yu, Linning Xu, Junting Dong, Tao Lu, Feng Zhao, Dahua Lin, and Bo Dai. Horizon- gs: Unified 3d gaussian splatting for large-scale aerial-to- ground scenes. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26789–26799, 2025. 1, 6
work page 2025
-
[15]
Spad: Spatially aware multi-view diffusers
Yash Kant, Aliaksandr Siarohin, Ziyi Wu, Michael Vasilkovsky, Guocheng Qian, Jian Ren, Riza Alp Guler, Bernard Ghanem, Sergey Tulyakov, and Igor Gilitschenski. Spad: Spatially aware multi-view diffusers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 10026–10038, 2024. 3, 4
work page 2024
-
[16]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,
-
[17]
Bernhard Kerbl, Andreas Meuleman, Georgios Kopanas, Michael Wimmer, Alexandre Lanvin, and George Drettakis. A hierarchical 3d gaussian representation for real-time ren- dering of very large datasets.ACM Transactions on Graphics (TOG), 43(4):1–15, 2024. 5
work page 2024
-
[18]
Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Wei- wei Sun, Yang-Che Tseng, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, and Kwang Moo Yi. 3d gaussian splat- ting as markov chain monte carlo.Advances in Neural Infor- mation Processing Systems, 37:80965–80986, 2024. 2, 7
work page 2024
-
[19]
Wildgaussians: 3d gaussian splatting in the wild.arXiv preprint arXiv:2407.08447, 2024
Jonas Kulhanek, Songyou Peng, Zuzana Kukelova, Marc Pollefeys, and Torsten Sattler. Wildgaussians: 3d gaussian splatting in the wild.arXiv preprint arXiv:2407.08447, 2024. 2, 5
-
[20]
Jie-Ying Lee, Yi-Ruei Liu, Shr-Ruei Tsai, Wei-Cheng Chang, Chung-Ho Wu, Jiewen Chan, Zhenjun Zhao, Chieh Hubert Lin, and Yu-Lun Liu. Skyfall-gs: Synthesiz- ing immersive 3d urban scenes from satellite imagery.arXiv preprint arXiv:2510.15869, 2025. 5
-
[21]
Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond
Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhen- zhi Wang, Dahua Lin, and Bo Dai. Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3205–3215, 2023. 6, 7
work page 2023
-
[22]
Diff- bir: Toward blind image restoration with generative diffusion prior
Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Yu Qiao, Wanli Ouyang, and Chao Dong. Diff- bir: Toward blind image restoration with generative diffusion prior. InEuropean Conference on Computer Vision, pages 430–448. Springer, 2024. 3, 5
work page 2024
-
[23]
Citygaussian: Real-time high-quality large-scale scene rendering with gaussians
Yang Liu, Chuanchen Luo, Lue Fan, Naiyan Wang, Jun- ran Peng, and Zhaoxiang Zhang. Citygaussian: Real-time high-quality large-scale scene rendering with gaussians. In European Conference on Computer Vision, pages 265–282. Springer, 2024. 1, 2
work page 2024
-
[24]
Scaffold-gs: Structured 3d gaussians for view-adaptive rendering
Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20654–20664, 2024. 2, 5, 7
work page 2024
-
[25]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 1
work page 2021
-
[26]
Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. InProceedings of the AAAI conference on artificial intelligence, pages 4296–4304, 2024. 3
work page 2024
-
[27]
arXiv preprint arXiv:2403.12036 (2024)
Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, and Jun-Yan Zhu. One-step image translation with text-to-image models.arXiv preprint arXiv:2403.12036, 2024. 5
-
[28]
Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Xu, Zhangkai Ni, and Bo Dai. Octree-gs: Towards consistent real-time rendering with lod-structured 3d gaussians.arXiv preprint arXiv:2403.17898, 2024. 2, 3, 5
-
[29]
Structure-from-motion revisited
Johannes Lutz Sch ¨onberger and Jan-Michael Frahm. Structure-from-motion revisited. InConference on Com- puter Vision and Pattern Recognition (CVPR), 2016. 3, 7
work page 2016
-
[30]
Pixelwise view selection for un- structured multi-view stereo
Johannes Lutz Sch ¨onberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm. Pixelwise view selection for un- structured multi-view stereo. InEuropean Conference on Computer Vision (ECCV), 2016. 7
work page 2016
-
[31]
Light field networks: Neu- ral scene representations with single-evaluation rendering
Vincent Sitzmann, Semon Rezchikov, Bill Freeman, Josh Tenenbaum, and Fredo Durand. Light field networks: Neu- ral scene representations with single-evaluation rendering. Advances in Neural Information Processing Systems, 34: 19313–19325, 2021. 3
work page 2021
-
[32]
Generalizable patch-based neural render- ing
Mohammed Suhail, Carlos Esteves, Leonid Sigal, and Ameesh Makadia. Generalizable patch-based neural render- ing. InEuropean Conference on Computer Vision, pages 156–174. Springer, 2022. 3, 4
work page 2022
-
[33]
Mohammed Suhail, Carlos Esteves, Leonid Sigal, and Ameesh Makadia. Light field neural rendering. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8269–8279, 2022. 3
work page 2022
-
[34]
Dronesplat: 3d gaussian splatting for robust 3d reconstruction from in-the-wild drone imagery
Jiadong Tang, Yu Gao, Dianyi Yang, Liqi Yan, Yufeng Yue, and Yi Yang. Dronesplat: 3d gaussian splatting for robust 3d reconstruction from in-the-wild drone imagery. InProceed- ings of the Computer Vision and Pattern Recognition Con- ference, pages 833–843, 2025. 1
work page 2025
-
[35]
Mega-nerf: Scalable construction of large- scale nerfs for virtual fly-throughs
Haithem Turki, Deva Ramanan, and Mahadev Satya- narayanan. Mega-nerf: Scalable construction of large- scale nerfs for virtual fly-throughs. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12922–12931, 2022. 6
work page 2022
-
[36]
Aerialmegadepth: Learning aerial-ground reconstruction and view synthesis
Khiem Vuong, Anurag Ghosh, Deva Ramanan, Srinivasa Narasimhan, and Shubham Tulsiani. Aerialmegadepth: Learning aerial-ground reconstruction and view synthesis. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 21674–21684, 2025. 8
work page 2025
-
[37]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 7
work page 2004
-
[38]
Difix3d+: Improving 3d reconstruc- tions with single-step diffusion models
Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Goj- cic, and Huan Ling. Difix3d+: Improving 3d reconstruc- tions with single-step diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26024–26035, 2025. 1, 3, 5, 6, 7
work page 2025
-
[39]
Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering
Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Anyi Rao, Christian Theobalt, Bo Dai, and Dahua Lin. Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering. InEuropean conference on computer vision, pages 106–122. Springer, 2022. 1
work page 2022
-
[40]
Butian Xiong, Nanjun Zheng, Junhua Liu, and Zhen Li. Gauu-scene v2: Assessing the reliability of image-based metrics with expansive lidar image dataset using 3dgs and nerf.arXiv preprint arXiv:2404.04880, 2024. 6
-
[41]
Jiacong Xu, Yiqun Mei, and Vishal Patel. Wild-gs: Real- time novel view synthesis from unconstrained photo collec- tions.Advances in Neural Information Processing Systems, 37:103334–103355, 2024. 2
work page 2024
-
[42]
Street gaussians: Modeling dynamic urban scenes with gaussian splatting
Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. Street gaussians: Modeling dynamic urban scenes with gaussian splatting. InEuropean Conference on Computer Vision, pages 156–173. Springer, 2024. 1
work page 2024
-
[43]
Vickie Ye, Ruilong Li, Justin Kerr, Matias Turkulainen, Brent Yi, Zhuoyang Pan, Otto Seiskari, Jianbo Ye, Jeffrey Hu, Matthew Tancik, and Angjoo Kanazawa. gsplat: An open-source library for gaussian splatting.Journal of Ma- chine Learning Research, 26(34):1–17, 2025. 7
work page 2025
-
[44]
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
Wangbo Yu, Jinbo Xing, Li Yuan, Wenbo Hu, Xiaoyu Li, Zhipeng Huang, Xiangjun Gao, Tien-Tsin Wong, Ying Shan, and Yonghong Tian. Viewcrafter: Taming video diffusion models for high-fidelity novel view synthesis.arXiv preprint arXiv:2409.02048, 2024. 1, 3
work page internal anchor Pith review arXiv 2024
-
[45]
Mip-splatting: Alias-free 3d gaussian splat- ting
Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 19447–19456,
-
[46]
Crossview- gs: Cross-view gaussian splatting for large-scale scene recon- struction
Chenhao Zhang, Yuanping Cao, and Lei Zhang. Crossview- gs: Cross-view gaussian splatting for large-scale scene re- construction.arXiv preprint arXiv:2501.01695, 2025. 1
-
[47]
Adding conditional control to text-to-image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023. 3
work page 2023
-
[48]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 7
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.