pith. sign in

arxiv: 2605.26115 · v1 · pith:AKWE6OUWnew · submitted 2026-05-25 · 💻 cs.CV

TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

Pith reviewed 2026-06-29 22:08 UTC · model grok-4.3

classification 💻 cs.CV
keywords feed-forward 3D reconstructiontriangle primitivessimulation-ready meshessparse-view reconstructionnovel view synthesismesh extractionGaussian splatting
0
0 comments X

The pith

TriSplat reconstructs scenes as oriented triangle meshes in one forward pass for direct use in physics engines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that representing scenes with surface triangles instead of Gaussians allows a feed-forward network to output meshes ready for simulation and collision detection without extra conversion steps. This would matter because Gaussian-based methods still need expensive post-processing to produce usable geometry, breaking the promise of instant reconstruction especially when camera poses must be estimated jointly from sparse views. The approach builds triangle orientations by deriving normals from predicted point maps, refining them via an image-conditioned head, and forming local frames, with training stabilized by a mono-normal bootstrap and progressive opacity-blur schedules. A sympathetic reader would care because the output can feed straight into standard rendering pipelines and embodied interaction systems.

Core claim

TriSplat is a feed-forward reconstruction network that represents scenes with oriented triangle primitives and directly exports simulation-ready mesh scenes from a single forward pass. Given input images, the network predicts local 3D point maps, triangle attributes, camera poses, and optional intrinsics. Rather than regressing triangle orientation as an unconstrained latent variable, the method constructs geometry normals from the predicted point maps, refines them with an image-conditioned normal head, and converts them into stable local frames for triangle parameterization. A mono-normal bootstrap schedule further stabilizes early training, while opacity and blur scheduling progressively

What carries the argument

Oriented triangle primitives whose local frames are derived from point-map normals refined by an image-conditioned normal head.

If this is right

  • Output meshes can be ingested directly by physics engines, collision detectors, and standard rendering pipelines without conversion.
  • Reconstructions are more faithful to scene geometry than those produced by Gaussian feed-forward baselines.
  • Novel-view rendering quality stays competitive with existing methods on RealEstate10K and DL3DV.
  • The network jointly estimates scene structure and camera parameters from sparse pose-free observations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The triangle output could let robotics systems run physics-based planning directly on images captured in the field.
  • The normal-refinement and scheduling techniques might transfer to other explicit primitive representations to improve surface stability.
  • If the method scales to larger scenes, it could shorten the pipeline from casual video capture to interactive simulation.

Load-bearing premise

Deriving and refining normals from point maps produces accurate stable triangles that need no post-hoc fixes for mesh extraction.

What would settle it

Running the exported triangles through a physics engine and finding that they require additional cleanup or produce unstable collisions would show the simulation-readiness claim does not hold.

read the original abstract

Sparse-view 3D reconstruction is increasingly addressed with feed-forward splatting networks that predict explicit primitives directly from images. Yet most existing methods remain centered on Gaussian primitives and expose surfaces only indirectly: extracting a usable mesh for downstream simulation, physics reasoning, or embodied interaction still requires expensive post-hoc steps that break the feed-forward promise. This limitation is especially pronounced in pose-free settings, where scene structure and camera parameters must be estimated jointly from sparse observations. We present TriSplat, a feed-forward reconstruction network that represents scenes with oriented triangle primitives and directly exports simulation-ready mesh scenes from a single forward pass. Given input images, the network predicts local 3D point maps, triangle attributes, camera poses, and optional intrinsics. Rather than regressing triangle orientation as an unconstrained latent variable, our approach constructs geometry normals from the predicted point maps, refines them with an image-conditioned normal head, and converts them into stable local frames for triangle parameterization. A mono-normal bootstrap schedule further stabilizes early training, while opacity and blur scheduling progressively sharpens the learned surface representation for direct mesh extraction. Experiments on RealEstate10K and DL3DV show that this representation produces more geometry-faithful reconstructions than Gaussian feed-forward baselines while maintaining competitive novel-view rendering quality. Because the rendering primitives are themselves surface triangles, the output can be directly ingested by physics engines, collision detectors, and standard rendering pipelines without any conversion, making it a practical simulation-ready solution for feed-forward 3D scene reconstruction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces TriSplat, a feed-forward network for sparse-view 3D scene reconstruction that predicts oriented triangle primitives directly from images. It constructs geometry normals from predicted point maps, refines them with an image-conditioned normal head, and uses scheduling for stable training to produce simulation-ready meshes. Experiments on RealEstate10K and DL3DV demonstrate improved geometry faithfulness over Gaussian baselines with competitive rendering quality.

Significance. If the simulation-readiness claim holds, the work would meaningfully advance feed-forward reconstruction by eliminating post-hoc mesh extraction steps and enabling direct integration into physics and collision pipelines.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'the output can be directly ingested by physics engines, collision detectors, and standard rendering pipelines without any conversion' is load-bearing for the title and contribution but is unsupported by evidence; all reported experiments are confined to geometry faithfulness and novel-view synthesis quality on RealEstate10K and DL3DV, with no measurements of manifoldness, self-intersection rates, or dynamic stability.
  2. [Abstract] Abstract: the statement that the representation 'produces more geometry-faithful reconstructions than Gaussian feed-forward baselines' is presented without any quantitative metrics, baseline names, tables, or error analysis, preventing assessment of whether the geometry improvement is meaningful or statistically reliable.
minor comments (1)
  1. [Abstract] Abstract: the mono-normal bootstrap schedule, opacity scheduling, and blur scheduling are mentioned only at a high level; a brief description of their implementation or an ablation would clarify their role in producing stable triangles.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract claims. We address each point below and will revise the manuscript to ensure all statements are appropriately supported by the presented evidence.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'the output can be directly ingested by physics engines, collision detectors, and standard rendering pipelines without any conversion' is load-bearing for the title and contribution but is unsupported by evidence; all reported experiments are confined to geometry faithfulness and novel-view synthesis quality on RealEstate10K and DL3DV, with no measurements of manifoldness, self-intersection rates, or dynamic stability.

    Authors: We agree that the simulation-readiness claim would benefit from additional supporting evidence. The representation uses oriented triangles, which are a standard primitive directly compatible with mesh-based pipelines and require no conversion step, but the manuscript provides no quantitative checks on manifoldness, self-intersections, or simulation stability. We will revise the abstract to qualify the claim (emphasizing format compatibility rather than untested downstream performance) and add a short discussion or appendix with basic mesh-quality statistics derived from the existing outputs. revision: yes

  2. Referee: [Abstract] Abstract: the statement that the representation 'produces more geometry-faithful reconstructions than Gaussian feed-forward baselines' is presented without any quantitative metrics, baseline names, tables, or error analysis, preventing assessment of whether the geometry improvement is meaningful or statistically reliable.

    Authors: The abstract summarizes results that are quantified in the experiments section (specific baselines, geometry metrics, and tables). To make this immediately verifiable from the abstract, we will incorporate key quantitative comparisons and baseline references directly into the abstract text. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical method with no self-referential derivations

full rationale

The paper presents a neural architecture that predicts point maps, normals, and triangle attributes from images, with a training schedule for stabilization. These are design and implementation choices, not mathematical derivations or predictions that reduce to fitted inputs by construction. The simulation-ready claim follows directly from the choice of triangle primitives rather than any equation or parameter fit that is tautological. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked in the provided text. The work is self-contained against external benchmarks via reported experiments on RealEstate10K and DL3DV.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations, training details, or parameter counts; free parameters, axioms, and invented entities cannot be enumerated from available text.

pith-pipeline@v0.9.1-grok · 5827 in / 1154 out tokens · 33078 ms · 2026-06-29T22:08:54.303752+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Latent Spatial Memory for Video World Models

    cs.CV 2026-06 unverdicted novelty 6.0

    Mirage stores and queries 3D scene information in diffusion latent space via depth-guided lifting and warping, yielding 10.57× faster generation and 55× smaller memory than explicit RGB point-cloud baselines while rea...

Reference graph

Works this paper leans on

81 extracted references · 27 canonical work pages · cited by 1 Pith paper · 7 internal anchors

  1. [1]

    Splatsim: Zero-shot sim2real transfer of rgb manipulation policies using gaussian splatting.arXiv preprint arXiv:2409.10161, 2024

    Mohammad Nomaan Qureshi, Sparsh Garg, Francisco Yandun, David Held, George Kantor, and Abhisesh Silwal. Splatsim: Zero-shot sim2real transfer of rgb manipulation policies using gaussian splatting.arXiv preprint arXiv:2409.10161, 2024

  2. [2]

    Embodiedsplat: Personalized real-to-sim-to-real navigation with gaussian splats from a mobile device

    Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad, and Zsolt Kira. Embodiedsplat: Personalized real-to-sim-to-real navigation with gaussian splats from a mobile device. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 25431– 25441, 2025

  3. [3]

    Structure-from-motion revisited

    Johannes Lutz Schönberger and Jan-Michael Frahm. Structure-from-motion revisited. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016

  4. [4]

    Pixelwise view selection for unstructured multi-view stereo

    Johannes Lutz Schönberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm. Pixelwise view selection for unstructured multi-view stereo. InEuropean Conference on Computer Vision, 2016

  5. [5]

    Multi-view stereo: A tutorial.Foundations and trends®in Computer Graphics and Vision, 9(1-2):1–148, 2015

    Yasutaka Furukawa, Carlos Hernández, et al. Multi-view stereo: A tutorial.Foundations and trends®in Computer Graphics and Vision, 9(1-2):1–148, 2015

  6. [6]

    pixelnerf: Neural radiance fields from one or few images

    Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. pixelnerf: Neural radiance fields from one or few images. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021

  7. [7]

    Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo

    Anpei Chen, Zexiang Xu, Fuqiang Zhao, Xiaoshuai Zhang, Fanbo Xiang, Jingyi Yu, and Hao Su. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In IEEE/CVF International Conference on Computer Vision, pages 14124–14133, 2021. 13

  8. [8]

    Mvsplat: Efficient3dgaussiansplattingfromsparsemulti-view images

    Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-JenCham, andJianfeiCai. Mvsplat: Efficient3dgaussiansplattingfromsparsemulti-view images. InEuropean Conference on Computer Vision, pages 370–386. Springer, 2024

  9. [9]

    Advances in feed-forward 3d reconstruction and view synthesis: A survey.arXiv preprint arXiv:2507.14501, 2025

    Jiahui Zhang, Yuelei Li, Anpei Chen, Muyu Xu, Kunhao Liu, Jianyuan Wang, Xiao-Xiao Long, Hanxue Liang, Zexiang Xu, Hao Su, et al. Advances in feed-forward 3d reconstruction and view synthesis: A survey.arXiv preprint arXiv:2507.14501, 2025

  10. [10]

    3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4):139–1, 2023

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4):139–1, 2023

  11. [11]

    pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction

    David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19457–19467, 2024

  12. [12]

    Depthsplat: Connecting gaussian splatting and depth

    Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

  13. [13]

    VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction

    Weijie Wang, Yeqing Chen, Zeyu Zhang, Hengyu Liu, Haoxiao Wang, Zhiyuan Feng, Wenkang Qin, Zheng Zhu, Donny Y. Chen, and Bohan Zhuang. Volsplat: Rethinking feed-forward 3d gaussian splatting with voxel-aligned prediction.arXiv preprint arXiv:2509.19297, 2025

  14. [14]

    Dust3r: Geometric 3d vision made easy

    Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20697–20709, 2024

  15. [15]

    Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views

    Shangzhan Zhang, Jianyuan Wang, Yinghao Xu, Nan Xue, Christian Rupprecht, Xiaowei Zhou, Yujun Shen, and Gordon Wetzstein. Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21936–21947, 2025

  16. [16]

    arXiv preprint arXiv:2503.11651 (2025)

    Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer.arXiv preprint arXiv:2503.11651, 2025

  17. [17]

    No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images,

    Botao Ye, Sifei Liu, Haofei Xu, Xueting Li, Marc Pollefeys, Ming-Hsuan Yang, and Songyou Peng. No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images.arXiv preprint arXiv:2410.24207, 2024

  18. [18]

    Yonosplat: You only need one model for feedforward 3d gaussian splatting.arXiv preprint arXiv:2511.07321, 2025

    Botao Ye, Boqi Chen, Haofei Xu, Daniel Barath, and Marc Pollefeys. Yonosplat: You only need one model for feedforward 3d gaussian splatting.arXiv preprint arXiv:2511.07321, 2025

  19. [19]

    2d gaussian splat- ting for geometrically accurate radiance fields

    Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splat- ting for geometrically accurate radiance fields. InACM SIGGRAPH Conference Proceedings, pages 1–11, 2024

  20. [20]

    Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Transactions on Graphics, 43(6):1–13, 2024

    Zehao Yu, Torsten Sattler, and Andreas Geiger. Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Transactions on Graphics, 43(6):1–13, 2024

  21. [21]

    3dgsr: Implicit surface reconstruction with 3d gaussian splatting

    Xiaoyang Lyu, Yang-Tian Sun, Yi-Hua Huang, Xiuzhe Wu, Ziyi Yang, Yilun Chen, Jiangmiao Pang, and Xiaojuan Qi. 3dgsr: Implicit surface reconstruction with 3d gaussian splatting. ACM Transactions on Graphics, 43(6):1–12, 2024. 14

  22. [22]

    Meshsplat: Generalizable sparse-view surface reconstruction via gaussian splatting.arXiv preprint arXiv:2508.17811, 2025

    Hanzhi Chang, Ruijie Zhu, Wenjie Chang, Mulin Yu, Yanzhe Liang, Jiahao Lu, Zhuoyuan Li, and Tianzhu Zhang. Meshsplat: Generalizable sparse-view surface reconstruction via gaussian splatting.arXiv preprint arXiv:2508.17811, 2025

  23. [23]

    Surfelsplat: Learning efficient and generalizable gaussian surfel representations for sparse-view surface reconstruction

    Chensheng Dai, Shengjun Zhang, Min Chen, and Yueqi Duan. Surfelsplat: Learning efficient and generalizable gaussian surfel representations for sparse-view surface reconstruction. In Advances in Neural Information Processing Systems, 2025

  24. [24]

    InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

    Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Shenghua Gao, and Ying Shan. In- stantmesh: Efficient 3d mesh generation from a single image with sparse-view large recon- struction models.arXiv preprint arXiv:2404.07191, 2024

  25. [25]

    Meshlrm: Large reconstruction model for high-quality meshes

    XinyueWei, KaiZhang, SaiBi, HaoTan, FujunLuan, ValentinDeschaintre, KalyanSunkavalli, Hao Su, and Zexiang Xu. Meshlrm: Large reconstruction model for high-quality meshes. arXiv preprint arXiv:2404.12385, 2024

  26. [26]

    Meshformer: High-quality mesh generation with 3d-guided reconstruction model.Advances in Neural Information Processing Systems, 37:59314–59341, 2024

    Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, et al. Meshformer: High-quality mesh generation with 3d-guided reconstruction model.Advances in Neural Information Processing Systems, 37:59314–59341, 2024

  27. [27]

    3d-r2n2: A unified approach for single and multi-view 3d object reconstruction

    Christopher B Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, and Silvio Savarese. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. InEuropean Conference on Computer Vision, pages 628–644. Springer, 2016

  28. [28]

    Learning category- specific mesh reconstruction from image collections

    Angjoo Kanazawa, Shubham Tulsiani, Alexei A Efros, and Jitendra Malik. Learning category- specific mesh reconstruction from image collections. InEuropean Conference on Computer Vision, pages 371–386, 2018

  29. [29]

    Pixel2mesh: Generating 3d mesh models from single rgb images

    Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. InEuropean Conference on Computer Vision, pages 52–67, 2018

  30. [30]

    Triangle splatting for real-time radiance field rendering.arXiv, 2025

    Jan Held, Renaud Vandeghen, Adrien Deliege, Abdullah Hamdi, Anthony Cioppa, Silvio Gian- cola, Andrea Vedaldi, Bernard Ghanem, Andrea Tagliasacchi, and Marc Van Droogenbroeck. Triangle splatting for real-time radiance field rendering.arXiv, 2025

  31. [31]

    Stereo magnification: learning view synthesis using multiplane images.ACM Transactions on Graphics, 37(4):1–12, 2018

    Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. Stereo magnification: learning view synthesis using multiplane images.ACM Transactions on Graphics, 37(4):1–12, 2018

  32. [32]

    Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision

    Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22160–22169, 2024

  33. [33]

    Scannet: Richly-annotated 3d reconstructions of indoor scenes

    AngelaDai, AngelXChang, ManolisSavva, MaciejHalber, ThomasFunkhouser, andMatthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5828–5839, 2017

  34. [34]

    Fregs: 3d gaussian splatting with progressive frequency regularization

    Jiahui Zhang, Fangneng Zhan, Muyu Xu, Shijian Lu, and Eric Xing. Fregs: 3d gaussian splatting with progressive frequency regularization. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21424–21433, 2024. 15

  35. [35]

    Gaussianshader: 3d gaussian splatting with shading functions for reflective surfaces

    Yingwenqi Jiang, Jiadong Tu, Yuan Liu, Xifeng Gao, Xiaoxiao Long, Wenping Wang, and Yuexin Ma. Gaussianshader: 3d gaussian splatting with shading functions for reflective surfaces. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5322–5332, 2024

  36. [36]

    Scaffold-gs: Structured 3d gaussians for view-adaptive rendering

    Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20654–20664, 2024

  37. [37]

    Bags: Blur agnostic gaussian splatting through multi-scale kernel modeling

    Cheng Peng, Yutao Tang, Yifan Zhou, Nengyu Wang, Xijun Liu, Deming Li, and Rama Chellappa. Bags: Blur agnostic gaussian splatting through multi-scale kernel modeling. In European Conference on Computer Vision, pages 293–310. Springer, 2024

  38. [38]

    Bad-gaussians: Bundle adjusted deblur gaussian splatting

    Lingzhe Zhao, Peng Wang, and Peidong Liu. Bad-gaussians: Bundle adjusted deblur gaussian splatting. InEuropean Conference on Computer Vision, pages 233–250. Springer, 2024

  39. [39]

    Compact 3d gaussian representation for radiance field

    Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, and Eunbyung Park. Compact 3d gaussian representation for radiance field. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21719–21728, 2024

  40. [40]

    Hac: Hash-grid assisted context for 3d gaussian splatting compression

    Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, and Jianfei Cai. Hac: Hash-grid assisted context for 3d gaussian splatting compression. InEuropean Conference on Computer Vision, pages 422–438. Springer, 2024

  41. [41]

    Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps

    Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang, et al. Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps. Advances in Neural Information Processing Systems, 37:140138–140158, 2024

  42. [42]

    Compressed 3d gaussian splatting for accelerated novel view synthesis

    Simon Niedermayr, Josef Stumpfegger, and Rüdiger Westermann. Compressed 3d gaussian splatting for accelerated novel view synthesis. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10349–10358, 2024

  43. [43]

    Surfacesplat: Connecting surface reconstruction and gaussian splatting.arXiv preprint arXiv:2507.15602, 2025

    Zihui Gao, Jia-Wang Bian, Guosheng Lin, Hao Chen, and Chunhua Shen. Surfacesplat: Connecting surface reconstruction and gaussian splatting.arXiv preprint arXiv:2507.15602, 2025

  44. [44]

    Trim 3d gaussian splatting for accurate geometry representation.arXiv preprint arXiv:2406.07499, 2024

    Lue Fan, Yuxue Yang, Minxing Li, Hongsheng Li, and Zhaoxiang Zhang. Trim 3d gaussian splatting for accurate geometry representation.arXiv preprint arXiv:2406.07499, 2024

  45. [45]

    Surface reconstruction from gaussian splatting via novel stereo views.arXiv e-prints, pages arXiv–2404, 2024

    Yaniv Wolf, Amit Bracha, and Ron Kimmel. Surface reconstruction from gaussian splatting via novel stereo views.arXiv e-prints, pages arXiv–2404, 2024

  46. [46]

    Gaussianpro: 3d gaussian splatting with progressive propagation

    Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, and Xuejin Chen. Gaussianpro: 3d gaussian splatting with progressive propagation. In International Conference on Machine Learning, 2024

  47. [47]

    Sags: structure-aware 3d gaussian splatting

    Evangelos Ververas, Rolandos Alexandros Potamias, Jifei Song, Jiankang Deng, and Stefanos Zafeiriou. Sags: structure-aware 3d gaussian splatting. InEuropean Conference on Computer Vision, pages 221–238. Springer, 2024

  48. [48]

    Maxime Oquab, Timothée Darcet, Theo Moutakanni, Huy V. Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Russell Howes, Po-Yao Huang, Hu Xu, Vasu Sharma, Shang-Wen Li, Wojciech Galuba, Mike Rabbat, Mido Assran, Nicolas Ballas, Gabriel Synnaeve, Ishan Misra, Herve Jegou, Julien Mairal, Patrick Labatu...

  49. [49]

    Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans

    Ainaz Eftekhar, Alexander Sax, Jitendra Malik, and Amir Zamir. Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans. InIEEE/CVF International Conference on Computer Vision, pages 10786–10796, 2021

  50. [50]

    Boostmvsnerfs: Boosting mvs-based nerfs to generalizable view synthesis in large-scale scenes

    Chih-Hai Su, Chih-Yao Hu, Shr-Ruei Tsai, Jie-Ying Lee, Chin-Yang Lin, and Yu-Lun Liu. Boostmvsnerfs: Boosting mvs-based nerfs to generalizable view synthesis in large-scale scenes. InACM SIGGRAPH Conference Proceedings, pages 1–12, 2024

  51. [51]

    Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

    Weijie Wang, Qihang Cao, Sensen Gao, Donny Y Chen, Haofei Xu, Wenjing Bian, Songyou Peng, Tat-Jen Cham, Chuanxia Zheng, Andreas Geiger, et al. Feed-forward 3d scene modeling: A problem-driven perspective.arXiv preprint arXiv:2604.14025, 2026

  52. [52]

    latentsplat: Autoencoding variational gaussians for fast generalizable 3d reconstruction

    Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, and Jan Eric Lenssen. latentsplat: Autoencoding variational gaussians for fast generalizable 3d reconstruction. InEuropean Conference on Computer Vision, pages 456–473. Springer, 2024

  53. [53]

    Transplat: Generalizable 3d gaussian splatting from sparse multi-view images with transformers

    Chuanrui Zhang, Yingshuang Zou, Zhuoling Li, Minmin Yi, and Haoqian Wang. Transplat: Generalizable 3d gaussian splatting from sparse multi-view images with transformers. In AAAI Conference on Artificial Intelligence, volume 39, pages 9869–9877, 2025

  54. [54]

    Epipolar-free 3d gaussian splat- ting for generalizable novel view synthesis

    Zhiyuan Min, Yawei Luo, Jianwen Sun, and Yi Yang. Epipolar-free 3d gaussian splat- ting for generalizable novel view synthesis. In A. Globerson, L. Mackey, D. Bel- grave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neu- ral Information Processing Systems, volume 37, pages 39573–39596. Curran Associates, Inc., 2024. URL https://proceed...

  55. [55]

    Hisplat: Hierarchical 3d gaussian splatting for generalizable sparse-view reconstruction, 2024

    Shengji Tang, Weicai Ye, Peng Ye, Weihao Lin, Yang Zhou, Tao Chen, and Wanli Ouyang. Hisplat: Hierarchical 3d gaussian splatting for generalizable sparse-view reconstruction, 2024. URLhttps://arxiv.org/abs/2410.06245

  56. [56]

    Pixelgaussian: Generalizable 3d gaussian reconstruction from arbitrary views, 2024

    Xin Fei, Wenzhao Zheng, Yueqi Duan, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, and Jiwen Lu. Pixelgaussian: Generalizable 3d gaussian reconstruction from arbitrary views, 2024. URLhttps://arxiv.org/abs/2410.18979

  57. [57]

    arXiv preprint arXiv:2505.23716 (2025)

    Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.arXiv preprint arXiv:2505.23716, 2025

  58. [58]

    Freesplat++: Generalizable 3dgaussiansplattingforefficientindoorscenereconstruction.arXiv preprint arXiv:2503.22986, 2025

    Yunsong Wang, Tianxin Huang, Hanlin Chen, and Gim Hee Lee. Freesplat++: Generalizable 3dgaussiansplattingforefficientindoorscenereconstruction.arXiv preprint arXiv:2503.22986, 2025

  59. [59]

    Longsplat: Online generalizable 3d gaussian splatting from long sequence images

    Guichen Huang, Ruoyu Wang, Xiangjun Gao, Che Sun, Yuwei Wu, Shenghua Gao, and Yunde Jia. Longsplat: Online generalizable 3d gaussian splatting from long sequence images. arXiv preprint arXiv:2507.16144, 2025

  60. [60]

    Jointsplat: Probabilistic joint flow-depth optimization for sparse-view gaussian splatting.arXiv preprint arXiv:2506.03872, 2025

    Yang Xiao, Guoan Xu, Qiang Wu, and Wenjing Jia. Jointsplat: Probabilistic joint flow-depth optimization for sparse-view gaussian splatting.arXiv preprint arXiv:2506.03872, 2025

  61. [61]

    Zpressor: Bottleneck-aware compression for scalable feed-forward 3dgs.Advances in Neural Information Processing Systems, 38:113407–113436, 2026

    Weijie Wang, Donny Y Chen, Zeyu Zhang, Duochao Shi, Akide Liu, and Bohan Zhuang. Zpressor: Bottleneck-aware compression for scalable feed-forward 3dgs.Advances in Neural Information Processing Systems, 38:113407–113436, 2026. 17

  62. [62]

    Chen, Zeyu Zhang, Jiawang Bian, Bohan Zhuang, and Chunhua Shen

    Duochao Shi, Weijie Wang, Donny Y. Chen, Zeyu Zhang, Jiawang Bian, Bohan Zhuang, and Chunhua Shen. Revisiting depth representations for feed-forward 3d gaussian splatting. arXiv preprint arXiv:2506.05327, 2025

  63. [63]

    DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion

    Weijie Wang, Jiagang Zhu, Zeyu Zhang, Xiaofeng Wang, Zheng Zhu, Guosheng Zhao, Chaojun Ni, Haoxiao Wang, Guan Huang, Xinze Chen, Yukun Zhou, Wenkang Qin, Duochao Shi, Haoyun Li, Yicheng Xiao, Donny Y. Chen, and Jiwen Lu. Drivegen3d: Boosting feed-forward driving scene generation with efficient video diffusion.arXiv preprint arXiv:2510.15264, 2025

  64. [64]

    Trace anything: Representing any video in 4d via trajectory fields.arXiv preprint arXiv:2510.13802, 2025

    Xinhang Liu, Yuxi Xiao, Donny Y Chen, Jiashi Feng, Yu-Wing Tai, Chi-Keung Tang, and Bingyi Kang. Trace anything: Representing any video in 4d via trajectory fields.arXiv preprint arXiv:2510.13802, 2025

  65. [65]

    Grounding image matching in 3d with mast3r

    Vincent Leroy, Yohann Cabon, and Jérôme Revaud. Grounding image matching in 3d with mast3r. InEuropean Conference on Computer Vision, pages 71–91. Springer, 2024

  66. [66]

    Must3r: Multi-view network for stereo 3d reconstruction

    Yohann Cabon, Lucas Stoffl, Leonid Antsfeld, Gabriela Csurka, Boris Chidlovskii, Jerome Revaud, and Vincent Leroy. Must3r: Multi-view network for stereo 3d reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1050–1060, 2025

  67. [67]

    arXiv preprint arXiv:2501.13928 (2025)

    Jianing Yang, Alexander Sax, Kevin J Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli. Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass.arXiv preprint arXiv:2501.13928, 2025

  68. [68]

    arXiv preprint arXiv:2412.06974 (2024)

    Zhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu, Rakesh Ranjan, Alexander Schwing, and Zhicheng Yan. Mv-dust3r+: Single-stage scene reconstruction from sparse views in 2 seconds.arXiv preprint arXiv:2412.06974, 2024

  69. [69]

    MapAnything: Universal Feed-Forward Metric 3D Reconstruction

    Nikhil Keetha, Norman Müller, Johannes Schönberger, Lorenzo Porzi, Yuchen Zhang, Tobias Fischer, Arno Knapitsch, Duncan Zauss, Ethan Weber, Nelson Antunes, et al. Mapanything: Universal feed-forward metric 3d reconstruction.arXiv preprint arXiv:2509.13414, 2025

  70. [70]

    Pow3r: Empowering unconstrained 3d reconstruction with camera and scene priors

    Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy, Lourdes Agapito, and Jerome Revaud. Pow3r: Empowering unconstrained 3d reconstruction with camera and scene priors. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1071–1081, 2025

  71. [71]

    Instantsplat: Unbounded sparse-view pose-free gaussian splatting in 40 seconds.CoRR, 2024

    Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, et al. Instantsplat: Unbounded sparse-view pose-free gaussian splatting in 40 seconds.CoRR, 2024

  72. [72]

    Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs

    Brandon Smart, Chuanxia Zheng, Iro Laina, and Victor Adrian Prisacariu. Splatt3r: Zero- shot gaussian splatting from uncalibrated image pairs.arXiv preprint arXiv:2408.13912, 2024

  73. [73]

    Freesplatter: Pose-free gaussian splatting for sparse-view 3d reconstruction

    Jiale Xu, Shenghua Gao, and Ying Shan. Freesplatter: Pose-free gaussian splatting for sparse-view 3d reconstruction. InIEEE/CVF International Conference on Computer Vision, 2025

  74. [74]

    RegGS: Unposed sparse views gaussian splatting with 3DGS registration

    Chong Cheng, Yu Hu, Sicheng Yu, Beizhen Zhao, Zijian Wang, and Hao Wang. RegGS: Unposed sparse views gaussian splatting with 3DGS registration. InIEEE/CVF International Conference on Computer Vision, 2025

  75. [75]

    Ufv-splatter: Pose-free feed-forward 3d gaussian splatting adapted to unfavorable views.arXiv preprint arXiv:2507.22342, 2025

    Yuki Fujimura, Takahiro Kushida, Kazuya Kitano, Takuya Funatomi, and Yasuhiro Mukaigawa. Ufv-splatter: Pose-free feed-forward 3d gaussian splatting adapted to unfavorable views.arXiv preprint arXiv:2507.22342, 2025. 18

  76. [76]

    An analysis of svd for deep rotation estimation.Advances in Neural Information Processing Systems, 33:22554–22565, 2020

    Jake Levinson, Carlos Esteves, Kefan Chen, Noah Snavely, Angjoo Kanazawa, Afshin Ros- tamizadeh, and Ameesh Makadia. An analysis of svd for deep rotation estimation.Advances in Neural Information Processing Systems, 33:22554–22565, 2020

  77. [77]

    Scheduled sampling for sequence prediction with recurrent neural networks.Advances in Neural Information Processing Systems, 28, 2015

    Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks.Advances in Neural Information Processing Systems, 28, 2015

  78. [78]

    Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network

    Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1874–1883, 2016

  79. [79]

    The unrea- sonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018

  80. [80]

    Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4): 600–612, 2004

    Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4): 600–612, 2004

Showing first 80 references.