Recognition: unknown
WildSplatter: Feed-forward 3D Gaussian Splatting with Appearance Control from Unconstrained Images
Pith reviewed 2026-05-09 22:53 UTC · model grok-4.3
The pith
A feed-forward model reconstructs 3D Gaussians from sparse unconstrained images with unknown cameras and varying lighting in under one second.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
WildSplatter is a feed-forward 3D Gaussian Splatting model trained on unconstrained photo collections that jointly learns 3D Gaussians and appearance embeddings conditioned on input images, enabling reconstruction from sparse views in under one second and flexible modulation of Gaussian colors to represent lighting variations.
What carries the argument
Joint learning of 3D Gaussians and appearance embeddings conditioned on input images, which modulates per-Gaussian colors to handle appearance changes.
If this is right
- Scene reconstruction becomes possible from everyday photo collections without requiring calibrated rigs or controlled lighting.
- Appearance can be edited after reconstruction by changing the conditioning embedding rather than re-optimizing the entire model.
- Real-time novel-view synthesis is available immediately after the single forward pass instead of minutes or hours of iterative fitting.
- Pose-free pipelines can now handle real-world datasets that contain illumination changes, outperforming prior methods that assume consistent lighting.
Where Pith is reading between the lines
- The same conditioning mechanism could be extended to video frames to produce temporally coherent 3D models of moving scenes.
- Appearance embeddings might serve as a compact way to store multiple lighting states of one geometry, reducing storage for AR content.
- Combining the feed-forward reconstruction with downstream tasks such as object insertion or relighting would create end-to-end pipelines that start from casual photos.
Load-bearing premise
A network trained on unconstrained photo collections will generalize to new scenes whose camera parameters and lighting lie outside the training distribution while keeping appearance control accurate.
What would settle it
Reconstructing a held-out scene captured under lighting or camera angles markedly different from the training set and checking whether appearance edits produce color-consistent renderings without introducing artifacts or geometry drift.
Figures
read the original abstract
We propose WildSplatter, a feed-forward 3D Gaussian Splatting (3DGS) model for unconstrained images with unknown camera parameters and varying lighting conditions. 3DGS is an effective scene representation that enables high-quality, real-time rendering; however, it typically requires iterative optimization and multi-view images captured under consistent lighting with known camera parameters. WildSplatter is trained on unconstrained photo collections and jointly learns 3D Gaussians and appearance embeddings conditioned on input images. This design enables flexible modulation of Gaussian colors to represent significant variations in lighting and appearance. Our method reconstructs 3D Gaussians from sparse input views in under one second, while also enabling appearance control under diverse lighting conditions. Experimental results demonstrate that our approach outperforms existing pose-free 3DGS methods on challenging real-world datasets with varying illumination.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes WildSplatter, a feed-forward 3D Gaussian Splatting model trained on unconstrained photo collections. It jointly predicts 3D Gaussians and appearance embeddings conditioned on input images with unknown camera parameters and varying lighting. This enables reconstruction from sparse views in under one second and flexible modulation of Gaussian colors for appearance control under diverse lighting conditions. Experiments claim outperformance over existing pose-free 3DGS methods on challenging real-world datasets with illumination variations.
Significance. If the claims hold, the work would be significant for practical 3D reconstruction from casual, unconstrained imagery, removing the need for controlled capture setups or per-scene optimization. The feed-forward design and reported sub-second inference are notable strengths, as is the attempt to handle appearance variation without explicit lighting supervision. No machine-checked proofs or parameter-free derivations are present, but the empirical focus on real-world varying-illumination datasets provides a concrete testbed.
major comments (2)
- [§3.2] §3.2 (Appearance Embedding Module): The architecture conditions appearance embeddings solely on input images without explicit regularization, adversarial losses, or cycle-consistency terms to enforce disentanglement from geometry and albedo. This directly bears on the central claim of generalizable lighting modulation, as the skeptic concern notes that end-to-end training on per-scene illumination correlations in unconstrained collections risks the embeddings memorizing dataset-specific patterns rather than learning transferable modulation.
- [§4.2] §4.2 (Quantitative Evaluation on Varying-Illumination Datasets): The reported outperformance over pose-free baselines is presented, but the evaluation protocol for appearance control (e.g., how novel lighting conditions are synthesized or measured via PSNR/SSIM on held-out illuminations) is not detailed enough to rule out overfitting; if the test scenes share lighting statistics with training, the generalization claim is not load-bearingly supported.
minor comments (2)
- [Figure 3] Figure 3 and §4.1: The qualitative examples of appearance control would benefit from side-by-side comparison with ground-truth novel lighting captures rather than only relit renderings.
- [§3.1] Notation in §3.1: The definition of the appearance embedding vector a_i should explicitly state its dimensionality and how it is injected into the Gaussian color prediction (e.g., via concatenation or FiLM).
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, providing clarifications and indicating where revisions will be made to improve the paper.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Appearance Embedding Module): The architecture conditions appearance embeddings solely on input images without explicit regularization, adversarial losses, or cycle-consistency terms to enforce disentanglement from geometry and albedo. This directly bears on the central claim of generalizable lighting modulation, as the skeptic concern notes that end-to-end training on per-scene illumination correlations in unconstrained collections risks the embeddings memorizing dataset-specific patterns rather than learning transferable modulation.
Authors: We acknowledge the absence of explicit regularization mechanisms such as adversarial or cycle-consistency losses in the appearance embedding module. The embeddings are predicted directly from input images and modulate Gaussian colors via a lightweight network, with the entire model trained end-to-end using only photometric reconstruction losses on diverse unconstrained photo collections. This setup encourages the embeddings to capture residual appearance factors after geometry is accounted for by the Gaussian prediction branch. Our experiments demonstrate that the learned embeddings generalize to novel scenes and enable plausible lighting modulation on held-out data, suggesting they capture transferable rather than purely memorized patterns. Nevertheless, we agree that further analysis would strengthen the disentanglement claim, and we will add visualizations of the embedding space along with ablations on modulation behavior in the revised manuscript. revision: partial
-
Referee: [§4.2] §4.2 (Quantitative Evaluation on Varying-Illumination Datasets): The reported outperformance over pose-free baselines is presented, but the evaluation protocol for appearance control (e.g., how novel lighting conditions are synthesized or measured via PSNR/SSIM on held-out illuminations) is not detailed enough to rule out overfitting; if the test scenes share lighting statistics with training, the generalization claim is not load-bearingly supported.
Authors: We thank the referee for highlighting the need for greater detail in the evaluation protocol. In the revised version of §4.2, we will explicitly describe the appearance control evaluation: novel lighting conditions are synthesized by extracting appearance embeddings from reference images captured under different illuminations (distinct from the input views) and applying them to modulate the colors of the predicted 3D Gaussians. Quantitative metrics (PSNR/SSIM) are then computed on held-out test views. The dataset splits ensure that training and test scenes come from separate capture sessions with non-overlapping lighting statistics, as verified by per-scene illumination histograms. We will also report additional cross-dataset results to further support generalization. revision: yes
Circularity Check
No circularity: empirical feed-forward model with external validation
full rationale
The paper describes a neural architecture for feed-forward 3D Gaussian reconstruction and appearance embedding from unconstrained images. All performance claims rest on end-to-end training followed by quantitative evaluation on held-out real-world datasets with varying illumination. No mathematical derivation, uniqueness theorem, or parameter-fitting step is presented that reduces by construction to the training inputs; the architecture is a standard encoder-decoder design whose outputs are tested against independent benchmarks rather than being tautological with the loss or data statistics.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Wildcat3d: Appearance-aware multi-view diffusion in the wild
Morris Alper, David Novotny, Filippos Kokkinos, Hadar Averbuch-Elor, and Tom Monnier. Wildcat3d: Appearance-aware multi-view diffusion in the wild. InNeurIPS, 2025
2025
-
[2]
pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction
David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InCVPR, pages 19457–19467, 2024
2024
-
[3]
Hallucinated neural radiance fields in the wild
Xingyu Chen, Qi Zhang, Xiaoyu Li, Yue Chen, Ying Feng, Xuan Wang, and Jue Wang. Hallucinated neural radiance fields in the wild. InCVPR, pages 12943–12952, 2022
2022
-
[4]
Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InECCV, pages 370–386, 2024
2024
-
[5]
Mvsplat360: Feed-forward 360 scene synthesis from sparse views
Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, and Jianfei Cai. Mvsplat360: Feed-forward 360 scene synthesis from sparse views. InNeurIPS, 2024
2024
-
[6]
Swag: Splatting in the wild images with appearance-conditioned gaussians
Hiba Dahmani, Moussab Bennehar, Nathan Piasco, Luis Roldão, and Dzmitry Tsishkou. Swag: Splatting in the wild images with appearance-conditioned gaussians. InECCV, page 325–340, 2024
2024
-
[7]
Yuki Fujimura, Takahiro Kushida, Kazuya Kitano, Takuya Funatomi, and Yasuhiro Mukaigawa. Ufv-splatter: Pose-free feed-forward 3d gaussian splatting adapted to unfavorable views.arXiv preprint arXiv:2507.22342, 2025
-
[8]
Cam- bridge University Press, 2003
Richard Hartley and Andrew Zisserman.Multiple View Geometry in Computer Vision. Cam- bridge University Press, 2003
2003
-
[9]
Pf3plat: Pose-free feed-forward 3d gaussian splatting
Sunghwan Hong, Jaewoo Jung, Heeseong Shin, Jisang Han, Jiaolong Yang, Chong Luo, and Seungryong Kim. Pf3plat: Pose-free feed-forward 3d gaussian splatting. InICML, 2025
2025
-
[10]
2d gaussian splatting for geometrically accurate radiance fields
Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accurate radiance fields. InACM SIGGRAPH, 2024
2024
-
[11]
No pose at all: Self-supervised pose-free 3d gaussian splatting from sparse views
Ranran Huang and Krystian Mikolajczyk. No pose at all: Self-supervised pose-free 3d gaussian splatting from sparse views. InICCV, pages 27947–27957, 2025
2025
-
[12]
Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.ACM TOG, 44(6):1–16, 2025
Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.ACM TOG, 44(6):1–16, 2025
2025
-
[13]
Lumigauss: Relightable gaussian splatting in the wild
Joanna Kaleta, Kacper Kania, Tomasz Trzcinski, and Marek Kowalski. Lumigauss: Relightable gaussian splatting in the wild. InWACV, pages 1–10, 2025
2025
-
[14]
3d gaussian splatting for real-time radiance field rendering.ACM TOG, 42(4), 2023
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM TOG, 42(4), 2023
2023
-
[15]
Wildgaussians: 3d gaussian splatting in the wild
Jonas Kulhanek, Songyou Peng, Zuzana Kukelova, Marc Pollefeys, and Torsten Sattler. Wildgaussians: 3d gaussian splatting in the wild. InNeurIPS, 2024
2024
-
[16]
Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang
Haotong Lin, Sili Chen, Jun Hao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth anything 3: Recovering the visual space from any views. InICLR, 2026
2026
-
[17]
Decoupled weight decay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InICLR, 2019
2019
-
[18]
Ricardo Martin-Brualla, Noha Radwan, Mehdi S. M. Sajjadi, Jonathan T. Barron, Alexey Dosovitskiy, and Daniel Duckworth. Nerf in the wild: Neural radiance fields for unconstrained photo collections. InCVPR, pages 7210–7219, 2021. 10
2021
-
[19]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. InECCV, 2020
2020
-
[20]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick La...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[21]
Vision transformers for dense prediction
René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vision transformers for dense prediction. InICCV, 2021
2021
-
[22]
Nerf for outdoor scene relighting
Viktor Rudnev, Mohamed Elgharib, William Smith, Lingjie Liu, Vladislav Golyanik, and Christian Theobalt. Nerf for outdoor scene relighting. InECCV, 2022
2022
-
[23]
Schonberger and Jan-Michael Frahm
Johannes L. Schonberger and Jan-Michael Frahm. Structure-from-motion revisited. InCVPR, 2016
2016
-
[24]
GLU Variants Improve Transformer
Noam Shazeer. Glu variants improve transformer.arXiv preprint arXiv:2002.05202, 2020
work page internal anchor Pith review arXiv 2002
-
[25]
Brandon Smart, Chuanxia Zheng, Iro Laina, and Victor Adrian Prisacariu. Splatt3r: Zero-shot gaussian splatting from uncalibrated image pairs. InarXiv preprint arXiv:2408.13912, 2024
-
[26]
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.arXiv preprint arXiv:2104.09864, 2021
work page internal anchor Pith review arXiv 2021
-
[27]
Megascenes: Scene-level view synthesis at scale
Joseph Tung, Gene Chou, Ruojin Cai, Guandao Yang, Kai Zhang, Gordon Wetzstein, Bharath Hariharan, and Noah Snavely. Megascenes: Scene-level view synthesis at scale. InECCV, 2024
2024
-
[28]
Visualizing data using t-sne.Journal of Machine Learning Research, 9(86):2579–2605, 2008
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research, 9(86):2579–2605, 2008
2008
-
[29]
Grounding image matching in 3d with mast3r, 2024
Jérôme Revaud Vincent Leroy, Yohann Cabon. Grounding image matching in 3d with mast3r. InarXiv preprint arXiv:2406.09756, 2024
-
[30]
Vggt: Visual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InCVPR, pages 5294–5306, 2025
2025
-
[31]
Dust3r: Geometric 3d vision made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InCVPR, pages 20697–20709, 2024
2024
-
[32]
Jiacong Xu, Yiqun Mei, and Vishal M. Patel. Wild-gs: Real-time novel view synthesis from unconstrained photo collections. InNeurIPS, 2024
2024
-
[33]
Freesplatter: Pose-free gaussian splatting for sparse- view 3d reconstruction
Jiale Xu, Shenghua Gao, and Ying Shan. Freesplatter: Pose-free gaussian splatting for sparse- view 3d reconstruction. InICCV, pages 25442–25452, 2025
2025
-
[34]
Cross-ray neural radiance fields for novel-view synthesis from unconstrained image collections
Yifan Yang, Shuhai Zhang, Zixiong Huang, Yubing Zhang, and Mingkui Tan. Cross-ray neural radiance fields for novel-view synthesis from unconstrained image collections. InICCV, pages 15901–15911, 2023
2023
-
[35]
No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images
Botao Ye, Sifei Liu, Haofei Xu, Xueting Li, Marc Pollefeys, Ming-Hsuan Yang, and Songyou Peng. No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images. InICLR, 2025
2025
-
[36]
Gaussian in the wild: 3d gaussian splatting for unconstrained image collections
Dongbin Zhang, Chuming Wang, Weitao Wang, Peihao Li, Minghan Qin, and Haoqian Wang. Gaussian in the wild: 3d gaussian splatting for unconstrained image collections. InECCV, pages 341–359, 2024
2024
-
[37]
Efros, Eli Shechtman, and Oliver Wang
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreason- able effectiveness of deep features as a perceptual metric. InCVPR, 2018. 11
2018
-
[38]
Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views
Shangzhan Zhang, Jianyuan Wang, Yinghao Xu, Nan Xue, Christian Rupprecht, Xiaowei Zhou, Yujun Shen, and Gordon Wetzstein. Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views. InCVPR, pages 21936–21947, June 2025. 12 Appendix A Training dataset We use the MegaScenes dataset [27], which consists of Internet photos o...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.