Recognition: 2 theorem links
· Lean TheoremWildRelight: A Real-World Benchmark and Physics-Guided Adaptation for Single-Image Relighting
Pith reviewed 2026-05-13 06:25 UTC · model grok-4.3
The pith
WildRelight dataset lets synthetic relighting models adapt to real outdoor scenes using only temporal lighting changes as supervision.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
WildRelight supplies strictly aligned temporal sequences of real outdoor scenes under varying natural illuminations together with HDR environment maps; this structure enables a physics-guided inference method that integrates Diffusion Posterior Sampling with temporal Sampling-Aware Test-Time Adaptation, allowing synthetic models to align with real-world lighting statistics on the fly and turning the sim-to-real problem into a tractable self-supervised task.
What carries the argument
Strictly aligned temporal structure of the WildRelight scenes, serving as a self-supervised constraint inside a DPS-plus-temporal-TTA inference framework.
If this is right
- State-of-the-art synthetic relighting models exhibit severe domain shifts on real-world data.
- The temporal structure enables effective self-supervised adaptation without ground-truth relit images.
- The same dataset functions as a rigorous benchmark for measuring progress in single-image relighting.
- The DPS-TTA combination converts an intractable domain-shift problem into an on-the-fly self-supervised task.
Where Pith is reading between the lines
- Similar temporal or multi-frame constraints could be exploited for domain adaptation in other single-image translation tasks such as dehazing or colorization.
- Training future relighting networks with built-in test-time adaptation modules might reduce the need for separate inference-time tuning.
- Releasing the dataset publicly will let researchers test whether the adaptation generalizes beyond the outdoor scenes used here.
- Applying the same framework to indoor or synthetic-to-real indoor data would test whether the temporal self-supervision depends on natural outdoor light variation.
- keywords:[
- single-image relighting
- real-world dataset
- domain adaptation
Load-bearing premise
The captured scenes maintain perfect temporal alignment that supplies a reliable self-supervised signal for domain adaptation.
What would settle it
A direct comparison showing that models adapted with the temporal TTA produce lighting that is inconsistent across the sequence or mismatches the supplied HDR environment maps would falsify the adaptation claim.
Figures
read the original abstract
Recent single-image relighting methods, powered by advanced generative models, have achieved impressive photorealism on synthetic benchmarks. However, their effectiveness in the complex visual landscape of the real world remains largely unverified. A critical gap exists, as current datasets are typically designed for multi-view reconstruction and fail to address the unique challenges of single-image relighting. To bridge this synthetic-to-real gap, we introduce WildRelight, the first in-the-wild dataset specifically created for evaluating single-image relighting models. WildRelight features a diverse collection of high-resolution outdoor scenes, captured under strictly aligned, temporally varying natural illuminations, each paired with a high-dynamic-range environment map. Using this data, we establish a rigorous benchmark revealing that state-of-the-art models trained on synthetic data suffer from severe domain shifts. The strictly aligned temporal structure of WildRelight enables a new paradigm for domain adaptation. We demonstrate this by introducing a physics-guided inference framework that leverages the captured natural light evolution as a self-supervised constraint. By integrating Diffusion Posterior Sampling (DPS) with temporal Sampling-Aware Test-Time Adaptation (TTA), we show that the dataset allows synthetic models to align with real-world statistics on-the-fly, transforming the intractable sim-to-real challenge into a tractable self-supervised task. The dataset and code will be made publicly available to foster robust, physically-grounded relighting research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces WildRelight, the first in-the-wild dataset for single-image relighting consisting of high-resolution outdoor scenes captured under strictly aligned, temporally varying natural illuminations, each paired with an HDR environment map. It establishes a benchmark showing severe domain shifts in synthetic-trained SOTA models and proposes a physics-guided inference framework that integrates Diffusion Posterior Sampling (DPS) with temporal Sampling-Aware Test-Time Adaptation (TTA) to enable on-the-fly self-supervised adaptation of synthetic models to real-world statistics using the temporal structure as constraint.
Significance. If the adaptation framework holds, the work supplies both a needed real-world benchmark for single-image relighting and a self-supervised adaptation paradigm that could convert the sim-to-real gap into a tractable task without requiring ground-truth relit images, potentially improving generalization of generative relighting methods to outdoor scenes.
major comments (3)
- Abstract and §4 (method description): the central claim that DPS+TTA enables synthetic models to 'align with real-world statistics on-the-fly' and transforms the sim-to-real challenge into a 'tractable self-supervised task' is unsupported by any quantitative metrics, baseline comparisons, ablation studies, or implementation details in the provided text, rendering the claim unverifiable.
- Abstract and §3 (dataset): the adaptation method rests on the assumption that 'strictly aligned temporal structure' supplies a reliable self-supervised constraint with differences due solely to illumination; no analysis, statistics, or validation is supplied to confirm absence of non-illumination dynamics (foliage motion, cloud shadows, specular changes, or sub-pixel drift) that would corrupt the DPS posterior sampling and TTA objective.
- §5 (experiments): without reported numbers on adaptation performance (e.g., PSNR, LPIPS, or perceptual metrics before/after TTA on held-out real frames), it is impossible to assess whether the physics-guided loss actually drives alignment or merely fits to capture artifacts.
minor comments (2)
- Abstract: the phrase 'the dataset allows synthetic models to align...' should be rephrased to clarify that the alignment is demonstrated via the proposed method rather than being an intrinsic property of the data alone.
- Notation: the distinction between 'Sampling-Aware Test-Time Adaptation (TTA)' and standard TTA is introduced without an equation or pseudocode block; a short algorithmic outline would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the thorough and constructive review. We appreciate the acknowledgment of WildRelight's potential as the first in-the-wild benchmark for single-image relighting and the value of the proposed adaptation paradigm. We address each major comment point by point below and will revise the manuscript accordingly to strengthen the presentation and verifiability of our claims.
read point-by-point responses
-
Referee: Abstract and §4 (method description): the central claim that DPS+TTA enables synthetic models to 'align with real-world statistics on-the-fly' and transforms the sim-to-real challenge into a 'tractable self-supervised task' is unsupported by any quantitative metrics, baseline comparisons, ablation studies, or implementation details in the provided text, rendering the claim unverifiable.
Authors: We agree that the current version relies on qualitative demonstrations to support the adaptation claims. To address this, the revised manuscript will expand §4 with full implementation details of the DPS+TTA integration, including hyperparameters and sampling procedures. We will also add quantitative results, baseline comparisons, and ablation studies to §5, providing metrics that directly verify the on-the-fly alignment with real-world statistics. revision: yes
-
Referee: Abstract and §3 (dataset): the adaptation method rests on the assumption that 'strictly aligned temporal structure' supplies a reliable self-supervised constraint with differences due solely to illumination; no analysis, statistics, or validation is supplied to confirm absence of non-illumination dynamics (foliage motion, cloud shadows, specular changes, or sub-pixel drift) that would corrupt the DPS posterior sampling and TTA objective.
Authors: The capture protocol was designed with fixed-camera, short-interval sequences to isolate illumination as the dominant variable. We acknowledge the need for explicit validation. In the revision, §3 will include a new analysis subsection with alignment statistics (e.g., sub-pixel registration error), optical-flow-based quantification of non-illumination motion, and examples demonstrating that such factors remain negligible relative to lighting changes, thereby supporting the self-supervised constraint. revision: yes
-
Referee: §5 (experiments): without reported numbers on adaptation performance (e.g., PSNR, LPIPS, or perceptual metrics before/after TTA on held-out real frames), it is impossible to assess whether the physics-guided loss actually drives alignment or merely fits to capture artifacts.
Authors: We concur that numerical before/after evaluation is required to substantiate the adaptation efficacy. The current experiments emphasize the benchmark and qualitative results. The revised §5 will report PSNR, LPIPS, and additional perceptual metrics on held-out real frames, comparing performance prior to and after TTA, to demonstrate that the physics-guided objective improves alignment beyond artifact fitting. revision: yes
Circularity Check
No significant circularity; new dataset with established DPS+TTA adaptation
full rationale
The paper introduces WildRelight as an independent data collection with strictly aligned temporal frames and HDR maps. It then applies the pre-existing Diffusion Posterior Sampling (DPS) and Sampling-Aware Test-Time Adaptation (TTA) frameworks to leverage that temporal structure as a self-supervised signal. No equations are shown that define a quantity in terms of itself, rename a fitted parameter as a prediction, or reduce the alignment result to a self-citation chain. The temporal stationarity assumption is an external modeling choice about the data, not a definitional loop. The derivation therefore remains self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Strictly aligned temporal captures under varying natural illuminations provide a valid self-supervised constraint for domain adaptation.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
By integrating Diffusion Posterior Sampling (DPS) with temporal Sampling-Aware Test-Time Adaptation (TTA), we show that the dataset allows synthetic models to align with real-world statistics on-the-fly
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Aksoy, Y., Kim, C., Kellnhofer, P., Paris, S., Elgharib, M., Pollefeys, M., Matusik, W.:Adatasetofflashandambientilluminationpairsfromthecrowd.In:Proceedings of the European Conference on Computer Vision (ECCV). pp. 634–649 (2018)
work page 2018
-
[2]
IEEE Transactions on Pattern Analysis and Machine Intelligence37(8), 1670–1687 (2014)
Barron, J.T., Malik, J.: Shape, illumination, and reflectance from shading. IEEE Transactions on Pattern Analysis and Machine Intelligence37(8), 1670–1687 (2014)
work page 2014
-
[3]
In: Proceedings of International Conference on Computer Vision (ICCV)
Boss, M., Braun, R., Jampani, V., Barron, J.T., Liu, C., Lensch, H.: NeRD: Neural reflectance decomposition from image collections. In: Proceedings of International Conference on Computer Vision (ICCV). pp. 12684–12694 (2021)
work page 2021
-
[4]
Chung, H., Kim, J., Mccann, M.T., Klasky, M.L., Ye, J.C.: Diffusion posterior sampling for general noisy inverse problems. In: The Eleventh International Con- ference on Learning Representations (2023),https://openreview.net/forum?id= OnD9zGAGT0k
work page 2023
-
[5]
ACM Transactions on Graphics (ToG)1(1), 7–24 (1982)
Cook, R.L., Torrance, K.E.: A reflectance model for computer graphics. ACM Transactions on Graphics (ToG)1(1), 7–24 (1982)
work page 1982
-
[6]
In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques
Debevec, P.E., Malik, J.: Recovering high dynamic range radiance maps from photographs. In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques. p. 369–378. SIGGRAPH ’97, ACM Press/Addison- Wesley Publishing Co., USA (1997).https://doi.org/10.1145/258734.258884, https://doi.org/10.1145/258734.258884
-
[7]
In: Proceedings of SIGGRAPH ’96
Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry-and image-based approach. In: Proceedings of SIGGRAPH ’96. pp. 11–20 (1996)
work page 1996
-
[8]
In: Proceedings of International Conference on Computer Vision (ICCV)
Haque, A., Tancik, M., Efros, A.A., Holynski, A., Kanazawa, A.: Instruct- NeRF2NeRF: Editing 3D scenes with instructions. In: Proceedings of International Conference on Computer Vision (ICCV). pp. 19740–19750 (2023)
work page 2023
-
[9]
Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: LoRA: Low-rank adaptation of large language models. ICLR1(2), 3 (2022)
work page 2022
-
[10]
Jakob, W., Speierer, S., Roussel, N., Nimier-David, M., Vicini, D., Zeltner, T., Nico- let, B., Crespo, M., Leroy, V., Zhang, Z.: Mitsuba 3 renderer (2022), https://mitsuba- renderer.org
work page 2022
-
[11]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Jensen, R., Dahl, A., Vogiatzis, G., Tola, E., Aanæs, H.: Large scale multi-view stereopsis evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 406–413 (2014)
work page 2014
-
[12]
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics42(4), 139 (July 2023),https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
work page 2023
-
[13]
Advances in Neural Information Processing Systems36, 46938–46957 (2023)
Kuang, Z., Zhang, Y., Yu, H.X., Agarwala, S., Wu, E., Wu, J., et al.: Stanford- ORB: a real-world 3D object inverse rendering benchmark. Advances in Neural Information Processing Systems36, 46938–46957 (2023)
work page 2023
-
[14]
In: Proceedings of Computer Vision and Pattern Recognition (CVPR)
Li, Z., Wang, L., Huang, X., Pan, C., Yang, J.: PhyIR: Physics-based inverse rendering for panoramic indoor images. In: Proceedings of Computer Vision and Pattern Recognition (CVPR). pp. 12713–12723 (2022)
work page 2022
-
[15]
In: European Conference on Computer Vision (ECCV)
Li, Z., Shi, J., Bi, S., Zhu, R., Sunkavalli, K., Hašan, M., Xu, Z., Ramamoorthi, R., Chandraker, M.: Physically-based editing of indoor scene lighting from a single image. In: European Conference on Computer Vision (ECCV). pp. 555–572. Springer (2022)
work page 2022
-
[16]
Li, Z., Yu, T.W., Sang, S., Wang, S., Song, M., Liu, Y., Yeh, Y.Y., Zhu, R., Gundavarapu, N., Shi, J., Bi, S., Yu, H.X., Xu, Z., Sunkavalli, K., Hasan, M., Ra- mamoorthi, R., Chandraker, M.: OpenRooms: An open framework for photorealistic 16 LZ. Wang et al. indoor scene datasets. In: Proceedings of Computer Vision and Pattern Recognition (CVPR). pp. 7190–...
work page 2021
-
[17]
In: Proceedings of Computer Vision and Pattern Recognition (CVPR)
Liang, R., Gojcic, Z., Ling, H., Munkberg, J., Hasselgren, J., Lin, C.H., Gao, J., Keller, A., Vijaykumar, N., Fidler, S., et al.: Diffusion renderer: Neural inverse and forward rendering with video diffusion models. In: Proceedings of Computer Vision and Pattern Recognition (CVPR). pp. 26069–26080 (2025)
work page 2025
-
[18]
Advances in Neural Information Processing Systems36, 36951–36962 (2023)
Liu, I., Chen, L., Fu, Z., Wu, L., Jin, H., Li, Z., Wong, C.M.R., Xu, Y., Ramamoorthi, R., Xu, Z., et al.: OpenIllumination: A multi-illumination dataset for inverse rendering evaluation on real objects. Advances in Neural Information Processing Systems36, 36951–36962 (2023)
work page 2023
-
[19]
ACM Transactions on Graphics42(4), 114 (2023)
Liu,Y.,Wang,P.,Lin,C.,Long,X.,Wang,J.,Liu,L.,Komura,T.,Wang,W.:NeRO: Neural geometry and BRDF reconstruction of reflective objects from multiview images. ACM Transactions on Graphics42(4), 114 (2023)
work page 2023
-
[20]
IEEE Transactions on Pattern Analysis and Machine Intelligence38(1), 129–141 (2015)
Lombardi, S., Nishino, K.: Reflectance and illumination recovery in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence38(1), 129–141 (2015)
work page 2015
-
[21]
In: Proceedings of International Conference on Learning Representations (ICLR) (2019)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: Proceedings of International Conference on Learning Representations (ICLR) (2019)
work page 2019
-
[22]
In: ACM SIGGRAPH 2024 Conference Papers
Luo, J., Ceylan, D., Yoon, J.S., Zhao, N., Philip, J., Frühstück, A., Li, W., Richardt, C., Wang, T.: IntrinsicDiffusion: Joint intrinsic layers from latent diffusion models. In: ACM SIGGRAPH 2024 Conference Papers. pp. 74:1–74:11 (2024)
work page 2024
-
[23]
Matusik, W.: A data-driven reflectance model. Ph.D. thesis, Massachusetts Institute of Technology (2003)
work page 2003
-
[24]
Com- munications of the ACM65(1), 99–106 (2021)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: Representing scenes as neural radiance fields for view synthesis. Com- munications of the ACM65(1), 99–106 (2021)
work page 2021
-
[25]
In: Proceedings of International Conference on Computer Vision (ICCV)
Murmann, L., Gharbi, M., Aittala, M., Durand, F.: A dataset of multi-illumination images in the wild. In: Proceedings of International Conference on Computer Vision (ICCV). pp. 4080–4089 (2019)
work page 2019
-
[26]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Pei, F., Bai, J., Feng, X., Bi, Z., Zhou, K., Wu, H.: Opensubstance: A high-quality measured dataset of multi-view and-lighting images and shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5221–5231 (2025)
work page 2025
-
[27]
ACM Transactions on Graphics21(3), 267–276 (2002)
Reinhard, E., Stark, M., Shirley, P., Ferwerda, J.: Photographic tone reproduction for digital images. ACM Transactions on Graphics21(3), 267–276 (2002)
work page 2002
-
[28]
In: European Conference on Computer Vision
Rudnev, V., Elgharib, M., Smith, W., Liu, L., Golyanik, V., Theobalt, C.: Nerf for outdoor scene relighting. In: European Conference on Computer Vision. pp. 615–631. Springer (2022)
work page 2022
-
[29]
In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003
Scharstein, D., Szeliski, R.: High-accuracy stereo depth maps using structured light. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings. vol. 1, pp. I–I. IEEE (2003)
work page 2003
-
[30]
In: Proceedings of International Conference on Computer Vision (ICCV)
Sengupta, S., Gu, J., Kim, K., Liu, G., Jacobs, D.W., Kautz, J.: Neural inverse rendering of an indoor scene from a single image. In: Proceedings of International Conference on Computer Vision (ICCV). pp. 8598–8607. IEEE (2019)
work page 2019
-
[31]
In: Proceedings of Computer Vision and Pattern Recognition (CVPR)
Shi, B., Wu, Z., Mo, Z., Duan, D., Yeung, S.K., Tan, P.: A benchmark dataset and evaluation for non-lambertian and uncalibrated photometric stereo. In: Proceedings of Computer Vision and Pattern Recognition (CVPR). pp. 3707–3716 (2016)
work page 2016
-
[32]
In: Proceedings of International Conference on Computer Vision (ICCV)
Teufel, T., Gera, P., Zhou, X., Iqbal, U., Rao, P., Kautz, J., Golyanik, V., Theobalt, C.: HumanOLAT: A large-scale dataset for full-body human relighting and novel- view synthesis. In: Proceedings of International Conference on Computer Vision (ICCV). pp. 29131–29141 (2025)
work page 2025
-
[33]
In: Proceedings of Computer Vision and Pattern Recognition (CVPR)
Toschi, M., De Matteo, R., Spezialetti, R., De Gregorio, D., Di Stefano, L., Salti, S.: Relight my NeRF: A dataset for novel view synthesis and relighting of real world objects. In: Proceedings of Computer Vision and Pattern Recognition (CVPR). pp. 20762–20772 (2023)
work page 2023
-
[34]
Ummenhofer, B., Agrawal, S., Sepúlveda, R., Lao, Y., Zhang, K., Cheng, T., Richter, S.R., Wang, S., Ros, G.: Objects with lighting: A real-world dataset for evaluating reconstruction and rendering for object relighting. In: 3DV. IEEE (2024)
work page 2024
-
[35]
Rendering techniques2007, 18th (2007)
Walter, B., Marschner, S.R., Li, H., Torrance, K.E.: Microfacet models for refraction through rough surfaces. Rendering techniques2007, 18th (2007)
work page 2007
-
[36]
Wang, L., Tran, D.M., Cui, R., TG, T., Dahl, A.B., Bigdeli, S.A., Frisvad, J.R., Chandraker, M.: Materialist: Physically based editing using single-image inverse rendering. International Journal of Computer Vision134(6), 267 (2026).https: //doi.org/10.1007/s11263-026-02833-z
-
[37]
In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Yi, R., Zhu, C., Xu, K.: Weakly-supervised single-view image relighting. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8402–8411 (2023)
work page 2023
-
[38]
Rgb↔x: Image decomposition and synthesis using material- and lighting-aware diffusion models
Zeng, Z., Deschaintre, V., Georgiev, I., Hold-Geoffroy, Y., Hu, Y., Luan, F., Yan, L.Q., Hašan, M.: Rgb↔x: Image decomposition and synthesis using material- and lighting-aware diffusion models. In: ACM SIGGRAPH 2024 Conference Papers. pp. 75:1–75:11. ACM (2024).https://doi.org/10.1145/3641519.3657445
-
[39]
In: Proceedings of Computer Vision and Pattern Recognition (CVPR)
Zhang, K., Luan, F., Wang, Q., Bala, K., Snavely, N.: PhySG: Inverse rendering with spherical Gaussians for physics-based material editing and relighting. In: Proceedings of Computer Vision and Pattern Recognition (CVPR). pp. 5453–5462 (2021)
work page 2021
-
[40]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zhang, X., Tseng, N., Syed, A., Bhasin, R., Jaipuria, N.: Simbar: Single image-based scene relighting for effective data augmentation for automated driving vision tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3718–3728 (2022)
work page 2022
-
[41]
ACM Transactions on Graphics40(6), 237 (2021)
Zhang, X., Srinivasan, P.P., Deng, B., Debevec, P., Freeman, W.T., Barron, J.T.: NeRFactor: Neural factorization of shape and reflectance under an unknown illumi- nation. ACM Transactions on Graphics40(6), 237 (2021)
work page 2021
-
[42]
In: Proceedings of Computer Vision and Pattern Recognition (CVPR)
Zhang, Y., Sun, J., He, X., Fu, H., Jia, R., Zhou, X.: Modeling indirect illumination for inverse rendering. In: Proceedings of Computer Vision and Pattern Recognition (CVPR). pp. 18643–18652 (2022)
work page 2022
-
[43]
Advances in Neural Information Processing Systems37, 42593–42617 (2024)
Zhao, X., Srinivasan, P., Verbin, D., Park, K., Martin Brualla, R., Henzler, P.: Illuminerf: 3d relighting without inverse rendering. Advances in Neural Information Processing Systems37, 42593–42617 (2024)
work page 2024
-
[44]
arXiv preprint arXiv:2511.02483 (2025)
Zhou, X., Chen, J., Rao, P., Teufel, T., Lyu, L., Minasian, T., Sotnychenko, O., Long, X., Habermann, M., Theobalt, C.: Olatverse: A large-scale real-world object dataset with precise lighting control. arXiv preprint arXiv:2511.02483 (2025)
-
[45]
In: Proceedings of Computer Vision and Pattern Recognition (CVPR)
Zhu, R., Li, Z., Matai, J., Porikli, F., Chandraker, M.: IRISformer: Dense vision transformers for single-image inverse rendering in indoor scenes. In: Proceedings of Computer Vision and Pattern Recognition (CVPR). pp. 2822–2831. IEEE (2022)
work page 2022
-
[46]
QUANTITATIVE VALIDATION OF ILLUMINATION ALIGNMENT 1 Supplementary Materials 1 Quantitative Validation of Illumination Alignment We provide a rigorous quantitative validation based on metadata timestamp statistics and solar angular displacement analysis. This proves that the tempo- ral gap in our acquisition pipeline results in physically negligible illumi...
-
[47]
DETAILS SETTING OF BASELINE BENCHMARK 3 5-scene hold-out test set. We evaluate performance using three standard image quality metrics: Peak Signal to Noise Ratio (PSNR), the Structural Similarity Index (SSIM), and the Learned Perceptual Image Patch Similarity (LPIPS). Baseline ModelsWe selected three representative methods that support single- image relig...
-
[48]
ADVANTAGES OF RAW-BASED HDR IMAGE 5 reported experiments (including baselines, finetuning, and our method), we adopt a global least-squares alignment strategy. For every predicted imageIpred and its corresponding ground truthIgt, we solve for an optimal scalarα: α∗ = argmin α ∥Ipred ·α−I gt∥2 . (7) Metrics are computed on the aligned predictionIpred ·α ∗....
-
[49]
Shadow Detail and Color Fidelity:In low-light environments, the RAW format, with its high bit depth (typically 12 or 14 bits), captures extensive 6 LZ. Wang et al. detail in the dark regions. By increasing the exposure in post-processing, the original information can be recovered with minimal loss. Conversely, since this information is already discarded d...
-
[50]
The RAW data fully retains the color and tonal information within these bright areas
Highlight Information Retention:In highlight regions, while both a RAW- based HDR image and a JPG image may appear as pure white on a Standard Dynamic Range (SDR) display due to exceeding the display’s maximum brightness, the amount of information they contain is fundamentally different. The RAW data fully retains the color and tonal information within th...
-
[51]
Effective Resolution:Even with an 8K 360◦ capture, projecting the image to a standard 40mm field-of-view (FOV) yields an effective resolution signif- icantly lower than that of the 24MP+ full-frame Sony A7 used in our rig. This loss of high-frequency detail would severely compromise the evaluation of texture preservation and generation
-
[52]
More critically, they lack the dynamic range of the Sony A7’s 14-bit RAW optical path
Image Quality & Dynamic Range:Panoramic cameras typically utilize smaller sensors that introduce noise and chromatic aberration. More critically, they lack the dynamic range of the Sony A7’s 14-bit RAW optical path. High dynamic range is essential for outdoor relighting tasks to accurately recover information in deep shadows and bright highlights
-
[53]
METHODOLOGYFORDETERMININGTHENODALPOINT(NO-PARALLAXPOINT) 7 Fig.10:Side-by-side camera setup, a non-aligned envmap camera will record direct sun light, but scene camera records a shadow
-
[54]
Optical Artifacts:360 ◦ cameras rely on heavy distortion correction and stitching algorithms, which introduce resampling artifacts. Using a dedicated rectilinear lens ensures the benchmark data is free from such algorithmic interference. 5.2 Necessity of Strict Spatial Alignment Precise co-location of the environment map camera (Insta360) and the scene ca...
-
[55]
METHODOLOGYFORDETERMININGTHENODALPOINT(NO-PARALLAXPOINT) 9 the entrance pupil. Adjustments must then be made to the fore-aft position of the camera on the panoramic head, and the rotational test is repeated. The objective is to achieve a state where, upon panning the camera to the left and right, the two reference objects remain in perfect alignment, with...
-
[56]
Pairwise Comparison:For each scene, annotators performed a sequential, pairwise comparison of adjacent time steps (e.g.,t0 vs.t 1,t 1 vs.t 2, etc.)
-
[57]
Difference Visualization:To aid the human annotators, we generated absolute pixel-difference images for each pair. This visualization technique ef- fectively accentuates the contours of misaligned objects, where pixel gradients are highest, making the boundaries of dynamic elements more conspicuous
-
[58]
The primary targets for masking were clouds and moving vegetation (leaves, branches, and grass)
Manual Annotation:Annotators manually painted masks over all identified dynamic regions for each image pair. The primary targets for masking were clouds and moving vegetation (leaves, branches, and grass)
-
[59]
Mask Aggregation:The final mask for the entire scene is generated by computing the union of all pairwise masks. This ensures that any element that moved at any point during the capture sequence is included in the aggregate mask. We explicitly excluded two categories of dynamic effects from masking. First, watersurfaces(e.g.,lakesandseas)werenotannotateddu...
-
[60]
DIFFERENTIABLE COOK–TORRANCE RENDERER 11 7 Differentiable Cook–Torrance Renderer To evaluate the physics consistency of predicted G-buffers, we employ a fully differentiable Cook–Torrance microfacet model with split-sum approximation [5]. Let the per-pixel surface properties be defined by basecolorcb, normaln, rough- ness α, and metallicitym, and let the ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.