GeM-NR: Geometry-Aware Multi-View Editing for Nonrigid Scene Changes
Pith reviewed 2026-06-28 06:27 UTC · model grok-4.3
The pith
GeM-NR aligns depth-derived point clouds to propagate nonrigid edits consistently across multiple scene views.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GeM-NR is a fast and flexible training-free approach for general multi-view consistent image editing, including edits that drastically change the geometry and appearance of the scene. Given an anchor image edited with a chosen backbone editor and a query unedited image, GeM-NR edits the query image consistently with the anchor edit. The method incorporates multiple stages: depth map estimation with a strategy to maximize the alignment between the 3D point clouds of the edited and unedited scenes, projection onto a query viewpoint, and refinement of the obtained image conditioned on the unedited query. The conditioning-based formulation scales well from two to many views of an object.
What carries the argument
Depth map estimation followed by point-cloud alignment that maximizes 3D correspondence between edited and original scenes, followed by projection and conditioned refinement.
If this is right
- The method produces consistent edits for tasks that substantially alter scene geometry and appearance, where prior approaches fail.
- Quantitative and qualitative evaluations show state-of-the-art performance on edit quality together with geometric and photometric consistency across views.
- The same pipeline supports generation of 3D representations from the edited multi-view set.
- The conditioning formulation extends naturally from pairs of views to larger numbers of viewpoints without additional training.
Where Pith is reading between the lines
- If the alignment procedure remains stable under larger viewpoint gaps, the same stages could be applied to video sequences with moving cameras.
- Replacing the backbone editor with newer generative models would immediately widen the variety of nonrigid edits the pipeline can accept.
- The point-cloud alignment objective might be reused as a consistency regularizer inside other multi-view reconstruction pipelines.
Load-bearing premise
The point-cloud alignment step recovers accurate 3D correspondences even after nonrigid geometry changes have been applied to the scene.
What would settle it
Multi-view test cases in which a nonrigid edit produces large mismatches between the edited and original point clouds, resulting in visible geometric inconsistencies or failed refinement across views.
Figures
read the original abstract
Recent developments in multi-view image editing with generative models have brought us a step closer toward general 3D content generation and customization. Most existing works focus on rigid or appearance-only edits by utilizing the geometry of the unedited scene. This naturally limits these methods to edits that preserve the underlying scene structure. Other approaches are trained for specific image editing tasks, such as object removal and addition. Despite this progress, general nonrigid edits, i.e., edits that substantially change the scene geometry, remain challenging for existing methods. We propose GeM-NR, a fast and flexible training-free approach for general multi-view consistent image editing, including edits that drastically change the geometry and appearance of the scene. Given an anchor image edited with a chosen backbone editor (such as FLUX, Qwen, BrushNet) and a query unedited image, GeM-NR edits the query image consistently with the anchor edit. The method incorporates multiple stages: (i) depth map estimation, where we propose a strategy to maximize the alignment between the 3D point clouds of the edited and unedited scenes, (ii) projection onto a query viewpoint, and (iii) refinement of the obtained image conditioned on the unedited query. The conditioning-based formulation scales well from two to many views of an object. We demonstrate the ability of our method to handle edits with significant changes in geometry and appearance, something that existing methods struggle with. We perform an extensive evaluation showing that our method improves consistency for a wide variety of edit tasks, including generating 3D representations of the edited scene. Both quantitative and qualitative results indicate the state-of-the-art performance of our method in terms of edit quality as well as geometric and photometric consistency across multiple views.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes GeM-NR, a training-free pipeline for multi-view consistent image editing under nonrigid geometric and appearance changes. Given an anchor image edited by an off-the-shelf backbone (FLUX, Qwen, BrushNet) and an unedited query view, the method (i) estimates depth maps and aligns the resulting 3D point clouds, (ii) projects the edited content into the query viewpoint, and (iii) refines the projected image by conditioning a generator on the original query. The authors claim this yields state-of-the-art edit quality together with geometric and photometric consistency across views, including the ability to produce 3D representations of the edited scene.
Significance. A reliable, training-free method that genuinely supports large nonrigid edits while preserving multi-view consistency would be a notable contribution to 3D-aware image editing. The modular design (backbone editor + alignment + conditioning) is attractive for practical use. However, the central technical step—recovering usable correspondences after non-isometric geometry change—remains too vaguely described to assess whether the claimed consistency gains are attributable to the proposed alignment rather than to the backbone or to the choice of test cases.
major comments (2)
- [depth-map estimation stage (§3)] Depth-map estimation stage (abstract and §3): the alignment between edited and unedited point clouds is described only as “a strategy to maximize the alignment.” No objective function, deformation model, rigidity or topology-change handling, optimizer, or convergence criterion is supplied. Because the edit is non-isometric by construction, any rigid or near-rigid alignment will leave large residuals that propagate directly into the projection step; without an explicit formulation it is impossible to verify that the claimed consistency for “significant changes in geometry” is achieved by the method rather than by easy cases or by the later refinement.
- [evaluation section] Evaluation section: the abstract asserts quantitative SOTA results on geometric and photometric consistency, yet the provided description supplies neither the metrics, datasets, number of views, error bars, nor an ablation isolating the contribution of the point-cloud alignment. Without these data the central claim that GeM-NR outperforms prior multi-view editors on nonrigid edits cannot be evaluated.
minor comments (2)
- [abstract] The scaling statement “from two to many views” would benefit from an explicit statement of the maximum number of views tested and any degradation observed.
- [method] Notation for the three stages (i)–(iii) is introduced in the abstract but not carried forward with consistent symbols in the method description; adding equation numbers or algorithm pseudocode would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting areas where the technical description and evaluation require greater clarity. We address each major comment below and will revise the manuscript to incorporate the requested details.
read point-by-point responses
-
Referee: [depth-map estimation stage (§3)] Depth-map estimation stage (abstract and §3): the alignment between edited and unedited point clouds is described only as “a strategy to maximize the alignment.” No objective function, deformation model, rigidity or topology-change handling, optimizer, or convergence criterion is supplied. Because the edit is non-isometric by construction, any rigid or near-rigid alignment will leave large residuals that propagate directly into the projection step; without an explicit formulation it is impossible to verify that the claimed consistency for “significant changes in geometry” is achieved by the method rather than by easy cases or by the later refinement.
Authors: We agree that the alignment procedure in §3 is described at too high a level. The current manuscript refers only to “a strategy to maximize the alignment” without supplying the objective function, deformation model, rigidity assumptions, topology handling, optimizer, or convergence criteria. In the revised manuscript we will expand §3 with the explicit formulation of the point-cloud alignment, including the objective, the deformation model and its capacity to accommodate non-isometric changes, the optimizer, and convergence criterion. This will make it possible to evaluate whether the reported consistency gains for large geometric edits are attributable to the alignment step. revision: yes
-
Referee: [evaluation section] Evaluation section: the abstract asserts quantitative SOTA results on geometric and photometric consistency, yet the provided description supplies neither the metrics, datasets, number of views, error bars, nor an ablation isolating the contribution of the point-cloud alignment. Without these data the central claim that GeM-NR outperforms prior multi-view editors on nonrigid edits cannot be evaluated.
Authors: We acknowledge that the evaluation section must be expanded to support the quantitative claims. The manuscript states that an extensive evaluation was performed, but does not currently enumerate the concrete metrics, datasets, view counts, error bars, or ablations isolating the alignment module. In the revised version we will add these elements: explicit definitions of the geometric and photometric consistency metrics, the datasets and number of views used, error bars on all reported numbers, and an ablation that isolates the contribution of the point-cloud alignment step. This will allow direct assessment of the state-of-the-art claims. revision: yes
Circularity Check
No circularity; pipeline composes external depth estimators and generators without self-referential reductions
full rationale
The paper presents GeM-NR as a training-free composition of existing depth estimators, point-cloud alignment, projection, and conditioned refinement using backbone editors such as FLUX. No equations, fitted parameters, or predictions are defined in terms of themselves. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The alignment strategy is described at a high level without reducing to a tautology or fitted input. The derivation chain therefore remains self-contained against external components and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Asim, M., Wewer, C., Wimmer, T., Schiele, B., Lenssen, J.E.: Met3r: Measur- ing multi-view consistency in generated images. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 6034–6044 (2025)
2025
-
[2]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Bai, Q., Ouyang, H., Xu, Y., Wang, Q., Yang, C., Cheng, K.L., Shen, Y., Chen, Q.: Edicho: Consistent image editing in the wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 15277–15287 (October 2025)
2025
-
[3]
Proceedings of the Computer Vision and Pattern Recognition Conference (2022)
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-nerf 360: Unbounded anti-aliased neural radiance fields. Proceedings of the Computer Vision and Pattern Recognition Conference (2022)
2022
-
[4]
Bengtson, J., Nilsson, D., Lee, D.I., Lochman, Y., Kahl, F.: 3d-consistent multi- view editing by correspondence guidance (2026),https://arxiv.org/abs/2511. 22228
2026
-
[5]
In: Wallraven, C., Liu, C.L., Ross, A
Bengtson, J., Nilsson, D., Lin, C.T., Büsching, M., Kahl, F.: Adjustable visual ap- pearance for generalizable novel view synthesis. In: Wallraven, C., Liu, C.L., Ross, A. (eds.) Pattern Recognition and Artificial Intelligence. pp. 157–171. Springer Nature Singapore, Singapore (2025)
2025
-
[6]
Black Forest Labs: FLUX.2: Frontier Visual Intelligence.https://bfl.ai/blog/ flux-2(2025)
2025
-
[7]
ai / blog / flux2 - klein - towards - interactive - visual - intelligence (2025)
Black Forest Labs: FLUX.2 [klein]: Towards Interactive Visual Intelligence.https: / / bfl . ai / blog / flux2 - klein - towards - interactive - visual - intelligence (2025)
2025
-
[8]
In: Proceedings of the Computer Vision and Pattern Recog- nition Conference (2023)
Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: Learning to follow image editing instructions. In: Proceedings of the Computer Vision and Pattern Recog- nition Conference (2023)
2023
-
[9]
2021 IEEE/CVF International Conference on Computer Vision (ICCV) pp
Caron, M., Touvron, H., Misra, I., J’egou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 9630–9640 (2021), https://api.semanticscholar.org/CorpusID:233444273
2021
-
[10]
In: Proceedings of the 18 J
Chen, D.Y., Tennent, H., Hsu, C.W.: Artadapter: Text-to-image style transfer using multi-level style encoder and explicit adaptation. In: Proceedings of the 18 J. Bengtson, Y. Lochman, F. Kahl IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 8619–8628 (June 2024)
2024
-
[11]
Chen,L.,Li,R.,Zhang,G.,Wang,P.,Zhang,L.:Fastmulti-viewconsistent3dedit- ing with video priors. Proceedings of the AAAI Conference on Artificial Intelligence 40(4), 2948–2956 (Mar 2026).https://doi.org/10.1609/aaai.v40i4.37286, https://ojs.aaai.org/index.php/AAAI/article/view/37286
-
[12]
In: European Conference on Computer Vision
Chen, M., Laina, I., Vedaldi, A.: Dge: Direct gaussian 3d editing by consistent multi-view editing. In: European Conference on Computer Vision. pp. 74–92. Springer (2024)
2024
-
[13]
Chen, Y., Chen, Z., Zhang, C., Wang, F., Yang, X., Wang, Y., Cai, Z., Yang, L., Liu, H., Lin, G.: Gaussianeditor: Swift and controllable 3d editing with gaussian splatting (2023)
2023
-
[14]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Chung, J., Hyun, S., Heo, J.P.: Style injection in diffusion: A training-free approach for adapting large-scale diffusion models for style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 8795–8805 (June 2024)
2024
-
[15]
Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., et. al., E.R.: Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities (2025),https://arxiv.org/abs/2507.06261
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[16]
In: Thirty-seventh Conference on Neural Information Processing Sys- tems (2023)
Dong, J., Wang, Y.X.: Vica-nerf: View-consistency-aware 3d editing of neural radi- ance fields. In: Thirty-seventh Conference on Neural Information Processing Sys- tems (2023)
2023
-
[17]
In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition
Edstedt, J., Sun, Q., Bökman, G., Wadenbäck, M., Felsberg, M.: Roma: Robust dense feature matching. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. pp. 19790–19800 (2024)
2024
- [18]
-
[19]
In: Pro- ceedings of the 41st International Conference on Machine Learning
Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., Podell, D., Dockhorn, T., English, Z., Rombach, R.: Scaling rectified flow transformers for high-resolution image synthesis. In: Pro- ceedings of the 41st International Conference on Machine Learning. Proceedings of Machine Learning Resea...
2024
-
[20]
In: ICLR (2024),https: //arxiv.org/abs/2309.17102
Fu, T.J., Hu, W., Du, X., Wang, W., Yang, Y., Gan, Z.: Guiding instruction-based image editing via multimodal large language models. In: ICLR (2024),https: //arxiv.org/abs/2309.17102
- [21]
-
[22]
In: Proceedings of the IEEE/CVF interna- tional conference on computer vision
Haque,A.,Tancik,M.,Efros,A.A.,Holynski,A.,Kanazawa,A.:Instruct-nerf2nerf: Editing 3d scenes with instructions. In: Proceedings of the IEEE/CVF interna- tional conference on computer vision. pp. 19740–19750 (2023)
2023
-
[23]
In: International Conference on Learning Representations (2023)
Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-Or, D.: Prompt-to-prompt image editing with cross-attention control. In: International Conference on Learning Representations (2023)
2023
-
[24]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Hertz, A., Voynov, A., Fruchter, S., Cohen-Or, D.: Style aligned image generation via shared attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4775–4785 (June 2024)
2024
-
[25]
In: Pro- ceedings of the 34th International Conference on Neural Information Processing Systems
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Pro- ceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Hook, NY, USA (2020) GeM-NR: Geometry-Aware Multi-View Editing for Nonrigid Scene Changes 19
2020
-
[26]
ACM Transactions on Graphics (TOG)44(6), 1–16 (2025)
Jiang, L., Mao, Y., Xu, L., Lu, T., Ren, K., Jin, Y., Xu, X., Yu, M., Pang, J., Zhao, F., et al.: Anysplat: Feed-forward 3d gaussian splatting from unconstrained views. ACM Transactions on Graphics (TOG)44(6), 1–16 (2025)
2025
-
[27]
In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G
Ju, X., Liu, X., Wang, X., Bian, Y., Shan, Y., Xu, Q.: Brushnet: A plug-and-play image inpainting model with decomposed dual-branch diffusion. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) Computer Vision – ECCV 2024. pp. 150–168. Springer Nature Switzerland, Cham (2025)
2024
-
[28]
In: The Thirty-ninth Annual Con- ference on Neural Information Processing Systems (2026),https://openreview
Koh, E., Hyun, S., Lee, M., Chung, J., Seo, K., Heo, J.P.: Diffusion feature field for text-based 3d editing with gaussian splatting. In: The Thirty-ninth Annual Con- ference on Neural Information Processing Systems (2026),https://openreview. net/forum?id=Kf9eNbp4wy
2026
-
[29]
Labs,B.F.,Batifol,S.,Blattmann,A.,Boesel,F.,Consul,S.,Diagne,C.,Dockhorn, T., English, J., English, Z., Esser, P., Kulal, S., Lacey, K., Levi, Y., Li, C., Lorenz, D., Müller, J., Podell, D., Rombach, R., Saini, H., Sauer, A., Smith, L.: Flux.1 kontext: Flow matching for in-context image generation and editing in latent space (2025),https://arxiv.org/abs/2...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[30]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2026)
Lee, D.I., Doh, H., Chi, S., Duan, R., Kim, S., Ramani, K.: Dynamic-editor: Training-free text-driven 4d scene editing with multimodal diffusion transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2026)
2026
-
[31]
In: Proceedings of the Com- puter Vision and Pattern Recognition Conference
Lee, D.I., Park, H., Seo, J., Park, E., Park, H., Baek, H.D., Shin, S., Kim, S., Kim, S.: Editsplat: Multi-view fusion and attention-guided optimization for view- consistent 3d scene editing with 3d gaussian splatting. In: Proceedings of the Com- puter Vision and Pattern Recognition Conference. pp. 11135–11145 (2025)
2025
-
[32]
arXiv preprint arXiv:2508.19247 (2025)
Li, L., Huang, Z., Feng, H., Zhuang, G., Chen, R., Guo, C., Sheng, L.: Voxhammer: Training-free precise and coherent 3d editing in native 3d space. arXiv preprint arXiv:2508.19247 (2025)
-
[33]
In: The Fourteenth International Conference on Learning Representations (2026),https://openreview.net/forum?id=yirunib8l8
Lin, H., Chen, S., Liew, J.H., Chen, D.Y., Li, Z., Zhao, Y., Peng, S., Guo, H., Zhou, X., Shi, G., Feng, J., Kang, B.: Depth anything 3: Recovering the visual space from any views. In: The Fourteenth International Conference on Learning Representations (2026),https://openreview.net/forum?id=yirunib8l8
2026
-
[34]
In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=PqvMRDCJT9t
Lipman, Y., Chen, R.T.Q., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=PqvMRDCJT9t
2023
-
[35]
In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (October 2021)
Liu, A., Tucker, R., Jampani, V., Makadia, A., Snavely, N., Kanazawa, A.: Infinite nature: Perpetual view generation of natural scenes from a single image. In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (October 2021)
2021
-
[36]
In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=XVjTT1nw5z
Liu, X., Gong, C., qiang liu: Flow straight and fast: Learning to generate and trans- fer data with rectified flow. In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=XVjTT1nw5z
2023
-
[37]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2026)
Liyi, C., Pengfei, W., Guowen, Z., Zhiyuan, M., Lei, Z.: Omni-3dedit: Generalized versatile 3d editing in one-pass. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2026)
2026
-
[38]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: Re- paint: Inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 11461–11471 (June 2022)
2022
-
[39]
In: International Conference on Learning Representations (2022) 20 J
Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: Sdedit: Guided image synthesis and editing with stochastic differential equations. In: International Conference on Learning Representations (2022) 20 J. Bengtson, Y. Lochman, F. Kahl
2022
-
[40]
In: ECCV (2024)
Mirzaei, A., Aumentado-Armstrong, T., Brubaker, M.A., Kelly, J., Levinshtein, A., Derpanis, K.G., Gilitschenski, I.: Watch your steps: Local image and scene editing by text instructions. In: ECCV (2024)
2024
-
[41]
In: CVPR (2023)
Mirzaei, A., Aumentado-Armstrong, T., Derpanis, K.G., Kelly, J., Brubaker, M.A., Gilitschenski, I., Levinshtein, A.: SPIn-NeRF: Multiview segmentation and percep- tual inpainting with neural radiance fields. In: CVPR (2023)
2023
-
[42]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Müller, N., Schwarz, K., Rössle, B., Porzi, L., Bulò, S.R., Nießner, M., Kontschieder, P.: Multidiff: Consistent novel view synthesis from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10258–10268 (June 2024)
2024
-
[43]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Park, J., Choi, T.E., Jun, Y., Hwang, S.J.: Wave: Warp-based view guidance for consistent novel view synthesis using a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 11906– 11915 (October 2025)
2025
-
[44]
In: Meila, M., Zhang, T
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceed- ings of Machine Learning Res...
2021
-
[45]
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text- conditional image generation with clip latents (2022),https://arxiv.org/abs/ 2204.06125
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[46]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 12179–12188 (October 2021)
2021
-
[47]
IEEE Transactions on Pattern Analysis and Machine Intelligence44(3) (2022)
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence44(3) (2022)
2022
-
[48]
Rojas, S., Philip, J., Zhang, K., Bi, S., Luan, F., Ghanem, B., Sunkavalli, K.: Datenerf: Depth-aware text-based editing of nerfs. In: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Pro- ceedings, Part XI. p. 267–284. Springer-Verlag, Berlin, Heidelberg (2024).https: //doi.org/10.1007/978-3-031-73247-8_1...
-
[49]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10684– 10695 (June 2022)
2022
-
[50]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 22500–22510 (June 2023)
2023
-
[51]
In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A
Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E.L., Ghasemipour, K., Gontijo Lopes, R., Karagol Ayan, B., Salimans, T., Ho, J., Fleet, D., Norouzi, M.: Photorealistic text-to-image diffusion models with deep language understand- ing. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Informati...
2022
-
[52]
In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C
Seo, J., Fukuda, K., Shibuya, T., Narihira, T., Murata, N., Hu, S., Lai, C.H., Kim, S., Mitsufuji, Y.: Genwarp: Single image to novel views with semantic-preserving generative warping. In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (eds.) Advances in Neural Information Processing Sys- tems. vol. 37, pp. 80220–8024...
-
[53]
arXiv preprint arXiv:2312.08563 (2023)
Song, L., Cao, L., Gu, J., Jiang, Y., Yuan, J., Tang, H.: Efficient-nerf2nerf: Stream- lining text-driven 3d editing with multiview correspondence-enhanced diffusion models. arXiv preprint arXiv:2312.08563 (2023)
-
[54]
In: ECCV (2024)
Tung, J., Chou, G., Cai, R., Yang, G., Zhang, K., Wetzstein, G., Hariharan, B., Snavely, N.: Megascenes: Scene-level view synthesis at scale. In: ECCV (2024)
2024
-
[55]
Wang, B., Dutt, N.S., Mitra, N.J.: Proteusnerf: Fast lightweight nerf editing using 3d-aware image context. Proc. ACM Comput. Graph. Interact. Tech.7(1) (may 2024).https://doi.org/10.1145/3651290,https://doi.org/10.1145/3651290
-
[56]
In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (2025)
Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (2025)
2025
-
[57]
Edit in 2D, Verify in 3D: Reinforcement Learning for Multi-view Consistent Scene Editing
Wang, J., Lin, C., Sun, L., Cao, Z., Yin, Y., Nie, L., Yuan, Z., Chu, X., Wei, Y., Liao, K., et al.: Geometry-guided reinforcement learning for multi-view consistent 3d scene editing. arXiv preprint arXiv:2603.03143 (2026)
work page internal anchor Pith review arXiv 2026
-
[58]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., Revaud, J.: Dust3r: Geometric 3d vision made easy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20697–20709 (2024)
2024
-
[59]
Wang, Y., Yi, X., Wu, Z., Zhao, N., Chen, L., Zhang, H.: View-consistent 3d editing with gaussian splatting. In: Computer Vision – ECCV 2024: 18th European Con- ference, Milan, Italy, September 29 – October 4, 2024, Proceedings, Part XXXV. p. 404–420. Springer-Verlag, Berlin, Heidelberg (2024).https://doi.org/10.1007/ 978-3-031-72761-0_23,https://doi.org/...
-
[60]
Wu, C., Li, J., Zhou, J., Lin, J., Gao, K., Yan, K., ming Yin, S., Bai, S., Xu, X., Chen, Y., Chen, Y., Tang, Z., Zhang, Z., Wang, Z., Yang, A., Yu, B., Cheng, C., Liu, D., Li, D., Zhang, H., Meng, H., Wei, H., Ni, J., Chen, K., Cao, K., Peng, L., Qu, L., Wu, M., Wang, P., Yu, S., Wen, T., Feng, W., Xu, X., Wang, Y., Zhang, Y., Zhu, Y., Wu, Y., Cai, Y., L...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[61]
ECCV (2024)
Wu, J., Bian, J.W., Li, X., Wang, G., Reid, I., Torr, P., Prisacariu, V.: GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing. ECCV (2024)
2024
- [62]
-
[63]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Xiao, S., Wang, Y., Zhou, J., Yuan, H., Xing, X., Yan, R., Li, C., Wang, S., Huang, T., Liu, Z.: Omnigen: Unified image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 13294– 13304 (June 2025)
2025
-
[64]
Yang, A., Yang, B., Hui, B., Zheng, B., Yu, B., Zhou, C., Li, C., Li, C., Liu, D., et. al., F.H.: Qwen2 technical report (2024),https://arxiv.org/abs/2407.10671
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[65]
In: CVPR (2024)
Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., Zhao, H.: Depth anything: Un- leashing the power of large-scale unlabeled data. In: CVPR (2024)
2024
-
[66]
Proceedings of the Computer Vision and Pattern Recognition Conference (2020) 22 J
Yao, Y., Luo, Z., Li, S., Zhang, J., Ren, Y., Zhou, L., Fang, T., Quan, L.: Blended- mvs: A large-scale dataset for generalized multi-view stereo networks. Proceedings of the Computer Vision and Pattern Recognition Conference (2020) 22 J. Bengtson, Y. Lochman, F. Kahl
2020
- [67]
-
[68]
In: International Conference on Learning Representations (2025)
You, M., Zhu, Z., Liu, H., Hou, J.: Nvs-solver: Video diffusion model as zero-shot novel view synthesizer. In: International Conference on Learning Representations (2025)
2025
-
[69]
Yu, W., Xing, J., Yuan, L., Hu, W., Li, X., Huang, Z., Gao, X., Wong, T.T., Shan, Y., Tian, Y.: ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis . IEEE Transactions on Pattern Analysis & Machine Intelligence (01), 1–18 (Sep 5555).https://doi.org/10.1109/TPAMI.2025.3613256,https: //doi.ieeecomputersociety.org/10.1109/TPAMI....
-
[70]
In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S
Zhang, K., Mo, L., Chen, W., Sun, H., Su, Y.: Magicbrush: A manually anno- tated dataset for instruction-guided image editing. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems. vol. 36, pp. 31428–31449. Curran Associates, Inc. (2023),https://proceedings.neurips.cc/paper_file...
2023
-
[71]
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models (2023)
2023
-
[72]
In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR)
Zhang, Y., Huang, N., Tang, F., Huang, H., Ma, C., Dong, W., Xu, C.: Inversion- based style transfer with diffusion models. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR). pp. 10146–10156 (June 2023)
2023
-
[73]
Zhao, S., Chen, D., Chen, Y.C., Bao, J., Hao, S., Yuan, L., Wong, K.Y.K.: Uni- controlnet:All-in-onecontroltotext-to-imagediffusionmodels.AdvancesinNeural Information Processing Systems (2023)
2023
-
[74]
Zhu, Z., Chen, H., Li, P., Wei, M.: Coreeditor: Correspondence-constrained dif- fusion for consistent 3d editing. IEEE Transactions on Visualization and Com- puter Graphics32(3), 2838–2851 (2026).https://doi.org/10.1109/TVCG.2026. 3657658
-
[75]
Make him carry a bag of groceries
Zhuang, J., Zeng, Y., Liu, W., Yuan, C., Chen, K.: A task is worth one word: Learn- ing with task prompts for high-quality versatile image inpainting. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) Computer Vision – ECCV 2024. pp. 195–211. Springer Nature Switzerland, Cham (2025) GeM-NR: Geometry-Aware Multi-View Ed...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.