Reconstruction by Generation: 3D Multi-Object Scene Reconstruction from Sparse Observations
Pith reviewed 2026-05-07 08:24 UTC · model grok-4.3
The pith
RecGen jointly estimates shapes, parts, and poses for multi-object 3D scenes from sparse RGB-D views by training generative models on compositional synthetic scenes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RecGen is a generative framework for probabilistic joint estimation of object and part shapes, as well as their pose under occlusion and partial visibility from one or multiple RGB-D images. By leveraging compositional synthetic scene generation and strong 3D shape priors, RecGen generalizes across diverse object types and real-world environments. It achieves state-of-the-art performance on complex, heavily occluded datasets, robustly handling severe occlusions, symmetric objects, object parts, and intricate geometry and texture.
What carries the argument
Generative model trained on compositionally assembled synthetic scenes to produce transferable 3D shape and pose priors for joint probabilistic inference from sparse RGB-D input.
If this is right
- The method produces usable estimates for object parts and symmetric items that prior techniques handled poorly under occlusion.
- It reaches higher geometric accuracy, texture fidelity, and pose precision than SAM3D while requiring roughly 80 percent fewer training meshes.
- Performance holds across single-view and multi-view inputs on heavily occluded real-world test sets.
Where Pith is reading between the lines
- The data-efficiency result points to structured synthetic composition as a practical route for lowering the cost of building 3D perception systems for new environments.
- Similar generative priors could be tested for extending reconstruction to dynamic or video sequences where temporal information further constrains the possible shapes and motions.
- Robotics applications that need rapid scene models for planning would gain from the reported robustness to partial views and clutter.
Load-bearing premise
Shape priors acquired from synthetic scenes composed of known objects will transfer to real photographs that contain different lighting, textures, and object instances without a large performance penalty.
What would settle it
A clear performance collapse relative to baselines when the same model is evaluated on a fresh set of real multi-object scenes whose object categories or surface appearances were never used in the synthetic training compositions.
read the original abstract
Accurately reconstructing complex full multi-object scenes from sparse observations remains a core challenge in computer vision and a key step toward scalable and reliable simulation for robotics. In this work, we introduce RecGen, a generative framework for probabilistic joint estimation of object and part shapes, as well as their pose under occlusion and partial visibility from one or multiple RGB-D images. By leveraging compositional synthetic scene generation and strong 3D shape priors, RecGen generalizes across diverse object types and real-world environments. RecGen achieves state-of-the-art performance on complex, heavily occluded datasets, robustly handling severe occlusions, symmetric objects, object parts, and intricate geometry and texture. Despite using nearly 80% fewer training meshes than the previous state of the art SAM3D, RecGen outperforms it by 30.1% in geometric shape quality, 9.1% in texture reconstruction, and 33.9% in pose estimation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces RecGen, a generative framework for probabilistic joint estimation of object and part shapes as well as their poses from sparse RGB-D observations in multi-object scenes. It relies on compositional synthetic scene generation to learn strong 3D shape priors that are claimed to generalize to diverse real-world environments, handling severe occlusions, symmetry, and intricate geometry/texture. The central claim is state-of-the-art performance on complex, heavily occluded datasets, outperforming SAM3D by 30.1% in geometric shape quality, 9.1% in texture reconstruction, and 33.9% in pose estimation while using nearly 80% fewer training meshes.
Significance. If the generalization from synthetic compositional priors to real occluded scenes holds, the work would be significant for scalable robotics simulation by showing that generative 3D priors can deliver substantial gains with far less training data than prior methods. The reconstruction-by-generation paradigm for joint shape-pose inference under partial visibility is a promising direction, and the efficiency claim (80% fewer meshes) would be a notable contribution if supported by rigorous cross-domain validation.
major comments (2)
- [§5] §5 (Experiments): The headline performance gains on real-world heavily occluded datasets are presented without quantitative evidence that the synthetic training distribution closes the domain gap for real textures, lighting, and sensor noise. No real-vs-synthetic performance tables, domain-randomization ablations, or texture distribution statistics are reported, so it is unclear whether the 30.1% shape-quality improvement follows from the method or from unverified transfer assumptions.
- [§4] §4 (Method) and §5.1 (Ablations): The claim that strong shape priors learned from compositional synthetic scenes suffice for real-world generalization is load-bearing for the data-efficiency argument, yet the manuscript provides no controlled experiments isolating the contribution of the generative prior versus potential differences in baseline re-implementations or metric definitions.
minor comments (2)
- The abstract and introduction should explicitly list the exact real-world datasets used for testing and the precise training mesh count for both RecGen and SAM3D to allow direct verification of the 80% reduction claim.
- Figure captions and table footnotes could more clearly indicate whether reported metrics are computed on held-out synthetic scenes or on the real-world test sets.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential significance of the reconstruction-by-generation paradigm. We address the major comments point by point below and outline planned revisions to strengthen the empirical support for our claims.
read point-by-point responses
-
Referee: [§5] §5 (Experiments): The headline performance gains on real-world heavily occluded datasets are presented without quantitative evidence that the synthetic training distribution closes the domain gap for real textures, lighting, and sensor noise. No real-vs-synthetic performance tables, domain-randomization ablations, or texture distribution statistics are reported, so it is unclear whether the 30.1% shape-quality improvement follows from the method or from unverified transfer assumptions.
Authors: We agree that explicit quantification of the domain gap would strengthen the presentation. The current results rely on direct evaluation on real datasets as implicit evidence of generalization from the compositional synthetic priors. In the revised manuscript we will add (i) a table comparing reconstruction metrics on held-out synthetic test scenes versus the real evaluation sets, (ii) domain-randomization ablations that vary texture, lighting, and noise parameters during training, and (iii) basic texture-distribution statistics between the synthetic corpus and the real test images. These additions will make the source of the reported gains more transparent. revision: yes
-
Referee: [§4] §4 (Method) and §5.1 (Ablations): The claim that strong shape priors learned from compositional synthetic scenes suffice for real-world generalization is load-bearing for the data-efficiency argument, yet the manuscript provides no controlled experiments isolating the contribution of the generative prior versus potential differences in baseline re-implementations or metric definitions.
Authors: We acknowledge the need for tighter isolation of the generative prior's contribution. Section 5.1 already contains ablations that disable the compositional generation and shape-prior components, showing measurable drops in performance. To address concerns about re-implementation details, the revision will (i) expand the description of our SAM3D re-implementation (including exact mesh counts, training schedules, and metric computation code), (ii) add a controlled experiment that trains RecGen without the generative prior while keeping all other architecture and optimization choices identical, and (iii) include a short appendix clarifying metric definitions. These changes will better separate the effect of the prior from other factors. revision: yes
Circularity Check
No circularity detected in derivation or performance claims
full rationale
The manuscript introduces RecGen as a generative model leveraging compositional synthetic scene generation and 3D shape priors to achieve reported gains over the external baseline SAM3D. No equations, self-definitional relations, fitted-input predictions, or load-bearing self-citations are present that reduce the claimed shape/texture/pose metrics or generalization statements to quantities defined by construction within the paper itself. Performance numbers are framed as direct empirical comparisons against an independent prior method on held-out data, rendering the central claims self-contained rather than tautological.
Axiom & Free-Parameter Ledger
free parameters (1)
- shape prior strength
axioms (1)
- domain assumption Compositional synthetic scene generation produces training distributions sufficiently close to real-world multi-object scenes for the learned priors to transfer.
Reference graph
Works this paper leans on
-
[1]
Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation
Li, C, Zhang, R, Wong, J, Gokmen, C, Srivastava, S, Martín-Martín, R, Wang, C, Levine, G, Lingelbach, M, Sun, J, et al. Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation. CoRL. (2023)
work page 2023
-
[2]
Habitat: A platform for embodied ai research
Savva, M, Kadian, A, Maksymets, O, Zhao, Y, Wijmans, E, Jain, B, Straub, J, Liu, J, Koltun, V, Malik, J, et al. Habitat: A platform for embodied ai research. ICCV. (2019)
work page 2019
-
[3]
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
Mittal, M, Roth, P, Tigue, J, Richard, A, Zhang, O, Du, P, Serrano-Munoz, A, Yao, X, Zurbrügg, R, Rudin, N, et al. Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning. arXiv:2511.04831 (2025)
work page internal anchor Pith review arXiv 2025
-
[4]
Chen, T, Chen, Z, Chen, B, Cai, Z, Liu, Y, Li, Z, Liang, Q, Lin, X, Ge, Y, Gu, Z, et al. Robotwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation. arXiv:2506.18088 (2025)
work page internal anchor Pith review arXiv 2025
-
[5]
Advancements and challenges of digital twins in industry
Tao, F, Zhang, H, and Zhang, C. Advancements and challenges of digital twins in industry. Nature Computational Science (2024)
work page 2024
-
[6]
Living scenes: Multi-object relocalization and recon- struction in changing 3d environments
Zhu, L, Huang, S, Schindler, K, and Armeni, I. Living scenes: Multi-object relocalization and recon- struction in changing 3d environments. CVPR. (2024)
work page 2024
-
[7]
SAM 3D: 3Dfy Anything in Images
Chen, X, Chu, FJ, Gleize, P, Liang, KJ, Sax, A, Tang, H, Wang, W, Guo, M, Hardin, T, Li, X, et al. Sam 3d: 3dfy anything in images. arXiv:2511.16624 (2025)
work page internal anchor Pith review arXiv 2025
-
[8]
Ikeda, T, Zakharov, S, Ko, T, Irshad, MZ, Lee, R, Liu, K, Ambrus, R, and Nishiwaki, K. Diffusionnocs: Managing symmetry and uncertainty in sim2real multi-modal category-level pose estimation. IROS. (2024)
work page 2024
-
[9]
Zero-1-to-3: Zero-shot one image to 3d object
Liu, R, Wu, R, Van Hoorick, B, Tokmakov, P, Zakharov, S, and Vondrick, C. Zero-1-to-3: Zero-shot one image to 3d object. ICCV. (2023)
work page 2023
-
[10]
Structured 3d latents for scalable and versatile 3d generation
Xiang, J, Lv, Z, Xu, S, Deng, Y, Wang, R, Zhang, B, Chen, D, Tong, X, and Yang, J. Structured 3d latents for scalable and versatile 3d generation. CVPR. (2025)
work page 2025
-
[11]
Team, TH.Hunyuan3D 1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation. (2024)
work page 2024
-
[12]
Any6D: Model-free 6D Pose Estimation of Novel Objects
Lee, T, Wen, B, Kang, M, Kang, G, Kweon, IS, and Yoon, KJ. Any6D: Model-free 6D Pose Estimation of Novel Objects. CVPR. (2025)
work page 2025
-
[13]
Gen6d: Generalizable model-free 6-dof object pose estimation from rgb images
Liu, Y, Wen, Y, Peng, S, Lin, C, Long, X, Komura, T, and Wang, W. Gen6d: Generalizable model-free 6-dof object pose estimation from rgb images. ECCV. (2022)
work page 2022
-
[14]
Agarwal, A, Singh, G, Sen, B, Lozano-Pérez, T, and Kaelbling, LP. SceneComplete: Open-World 3D Scene Completion in Complex Real World Environments for Robot Manipulation. arXiv:2410.23643 (2024)
-
[15]
Foundationpose: Unified 6d pose estimation and tracking of novel objects
Wen, B, Yang, W, Kautz, J, and Birchfield, S. Foundationpose: Unified 6d pose estimation and tracking of novel objects. CVPR. (2024) 12
work page 2024
-
[16]
Xu, J, Cheng, W, Gao, Y, Wang, X, Gao, S, and Shan, Y. InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models. arXiv:2404.07191 (2024)
work page internal anchor Pith review arXiv 2024
-
[17]
Yu, Q, Yuan, X, Jiang, Y, Chen, J, Zheng, D, Hao, C, You, Y, Chen, Y, Mu, Y, Liu, L, et al. Artgs: 3d gaussian splatting for interactive visual-physical modeling and manipulation of articulated objects. IROS. (2025)
work page 2025
-
[18]
DexSim2Real2: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation
Jiang, T, Guan, Y, Ma, L, Xu, J, Meng, J, Chen, W, Zeng, Z, Li, L, Wu, D, and Chen, R. DexSim2Real2: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation. IEEE Transac- tions on Robotics (2025)
work page 2025
-
[19]
Foundationstereo: Zero-shot stereo matching
Wen, B, Trepte, M, Aribido, J, Kautz, J, Gallo, O, and Birchfield, S. Foundationstereo: Zero-shot stereo matching. CVPR. (2025)
work page 2025
-
[20]
Wang, Z, Wang, Y, Chen, Y, Xiang, C, Chen, S, Yu, D, Li, C, Su, H, and Zhu, J.CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model. (2024)
work page 2024
-
[21]
Tang, J, Chen, Z, Chen, X, Wang, T, Zeng, G, and Liu, Z.LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation. (2024)
work page 2024
-
[22]
Team, TH.Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation. (2025)
work page 2025
-
[23]
Gigapose: Fast and robust novel object pose estimation via one correspondence
Nguyen, VN, Groueix, T, Salzmann, M, and Lepetit, V. Gigapose: Fast and robust novel object pose estimation via one correspondence. CVPR. (2024)
work page 2024
-
[24]
Pos3R: 6D Pose Estimation for Unseen Objects Made Easy
Deng, W, Campbell, D, Sun, C, Zhang, J, Kanitkar, S, Shaffer, ME, and Gould, S. Pos3R: 6D Pose Estimation for Unseen Objects Made Easy. CVPR. (2025)
work page 2025
-
[25]
Liu, K, Zakharov, S, Chen, D, Ikeda, T, Shakhnarovich, G, Gaidon, A, and Ambrus, R.OmniShape: Zero-Shot Multi-Hypothesis Shape and Pose Estimation in the Real World. (2025)
work page 2025
-
[26]
Structure-from-motion revisited
Schonberger, JL and Frahm, JM. Structure-from-motion revisited. CVPR. (2016)
work page 2016
-
[27]
Ardelean, A, Özer, M, and Egger, B.Gen3DSR: Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View. (2025)
work page 2025
-
[28]
Irshad, MZ, Kollar, T, Laskey, M, Stone, K, and Kira, Z. CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation. ICRA. (2022)
work page 2022
-
[29]
ShAPO: Implicit Representations for Multi-Object Shape Appearance and Pose Optimization
Irshad, MZ, Zakharov, S, Ambrus, R, Kollar, T, Kira, Z, and Gaidon, A. ShAPO: Implicit Representations for Multi-Object Shape Appearance and Pose Optimization. ECCV. (2022)
work page 2022
-
[30]
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
Li, Y, Zhang, J, Chen, Z, Wang, Z, and Liu, Z. MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation. CVPR. (2025)
work page 2025
-
[31]
PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models
Chen, M, Shapovalov, R, Laina, I, Monnier, T, Wang, J, Novotny, D, and Vedaldi, A. PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models. CVPR. (2025)
work page 2025
-
[32]
UniPart: Part-Level 3D Generation with Unified 3D Geom–Seg Latents
He, X, Wu, Y, Guo, X, Ye, C, Zhou, J, Hu, T, Han, X, and Du, D. UniPart: Part-Level 3D Generation with Unified 3D Geom–Seg Latents. arXiv:2512.09435 (2026)
-
[33]
BANG: Dividing 3D Assets via Generative Exploded Dynamics
Zhang, L, Zhang, Q, Jiang, H, Bai, Y, Yang, W, Xu, L, and Yu, J. BANG: Dividing 3D Assets via Generative Exploded Dynamics. ACM TOG (2025)
work page 2025
-
[34]
Lin, Y, Lin, C, Pan, P, Yan, H, Feng, Y, Mu, Y, and Fragkiadaki, K.PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers. (2025)
work page 2025
-
[35]
Melnik, A, Alt, B, Nguyen, G, Wilkowski, A, Stefańczyk, M, Wu, Q, Harms, S, Rhodin, H, Savva, M, and Beetz, M.Digital Twin Generation from Visual Data: A Survey. (2026)
work page 2026
-
[36]
Irshad, MZ, Comi, M, Lin, YC, Heppert, N, Valada, A, Ambrus, R, Kira, Z, and Tremblay, J.Neural Fields in Robotics: A Survey. (2024)
work page 2024
-
[37]
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Kerbl, B, Kopanas, G, Leimkühler, T, and Drettakis, G. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM TOG (2023)
work page 2023
-
[38]
Yu, J, Hari, K, El-Refai, K, Dalil, A, Kerr, J, Kim, CM, Cheng, R, Irshad, MZ, and Goldberg, K. Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects. ICRA (2025)
work page 2025
-
[39]
Qureshi, MN, Garg, S, Yandun, F, Held, D, Kantor, G, and Silwal, A.SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting. (2024)
work page 2024
-
[40]
Shorinwa, O, Tucker, J, Smith, A, Swann, A, Chen, T, Firoozi, R, Kennedy, MD, and Schwager, M. Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting (2024)
work page 2024
- [41]
-
[42]
Graspsplats: Efficient manipulation with 3d feature splatting
Ji, M, Qiu, RZ, Zou, X, and Wang, X. GraspSplats: Efficient Manipulation with 3D Feature Splatting. arXiv:2409.02084 (2024)
-
[43]
Chhablani, G, Ye, X, Irshad, MZ, and Kira, Z.EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device. (2025)
work page 2025
-
[44]
Escontrela, A, Kerr, J, Allshire, A, Frey, J, Duan, R, Sferrazza, C, and Abbeel, P.GaussGym: An open-source real-to-sim framework for learning locomotion from pixels. (2025)
work page 2025
-
[45]
Shen, W, Yang, G, Yu, A, Wong, J, Kaelbling, LP, and Isola, P. Distilled feature fields enable few-shot language-guided manipulation. arXiv:2308.07931 (2023)
-
[46]
Yang, S, Yu, W, Zeng, J, Lv, J, Ren, K, Lu, C, Lin, D, and Pang, J.Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation. (2025)
work page 2025
-
[47]
Jiang, G, Chang, H, Qiu, RZ, Liang, Y, Ji, M, Zhu, J, Dong, Z, Zou, X, and Wang, X. GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation. arXiv:2510.20813 (2025)
-
[48]
X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real
Dan, P, Kedia, K, Chao, A, Duan, E, Pace, MA, Ma, WC, and Choudhury, S. X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real. CoRL. (2025)
work page 2025
-
[49]
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination
Barcellona, L, Zadaianchuk, A, Allegro, D, Papa, S, Ghidoni, S, and Gavves, E. Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination. ICLR. (2025)
work page 2025
-
[50]
Yu,J,Fu,L,Huang,H,El-Refai,K,Ambrus,RA,Cheng,R,Irshad,MZ,andGoldberg,K.Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware. (2025)
work page 2025
-
[51]
ZeroBot: Learning From Scratch in Minutes With Generative Real2Sim
Kapelyukh, I, Zhang, X, James, S, Herlant, L, and Johns, E. ZeroBot: Learning From Scratch in Minutes With Generative Real2Sim. RA-L (2026)
work page 2026
-
[52]
Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions
Zhang, K, Sha, S, Jiang, H, Loper, M, Song, H, Cai, G, Xu, Z, Hu, X, Zheng, C, and Li, Y. Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions. ICRA. (2026)
work page 2026
-
[53]
Jangir, Y, Zhang, Y, Lo, PC, Yamazaki, K, Zhang, C, Tu, KH, Ke, TW, Ke, L, Bisk, Y, and Fragkiadaki, K.RobotArena∞: Scalable Robot Benchmarking via Real-to-Sim Translation. (2025)
work page 2025
-
[54]
Jain, A, Zhang, M, Arora, K, Chen, W, Torne, M, Irshad, MZ, Zakharov, S, Wang, Y, Levine, S, Finn, C, Ma, WC, Shah, D, Gupta, A, and Pertsch, K.PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies. (2025)
work page 2025
-
[55]
Picasso: Holistic Scene Reconstruction with Physics- Constrained Sampling
Yu, X, Talak, R, Shaikewitz, L, and Carlone, L. Picasso: Holistic Scene Reconstruction with Physics- Constrained Sampling. arXiv:2602.08058 (2026)
work page internal anchor Pith review arXiv 2026
-
[56]
Xiang, T, Cao, J, Guo, S, Zhao, G, Luo, AF, and Ma, J.Real-to-Sim for Highly Cluttered Environments via Physics-Consistent Inter-Object Reasoning. (2026)
work page 2026
-
[57]
Huang, WC, Han, J, Ye, X, Pan, Z, and Hauser, K.Simulation-Ready Cluttered Scene Estimation via Physics-aware Joint Shape and Pose Optimization. (2026)
work page 2026
-
[58]
Flow Matching for Generative Modeling
Lipman, Y, Chen, RT, Ben-Hamu, H, Nickel, M, and Le, M. Flow Matching for Generative Modeling. ICLR. (2023)
work page 2023
-
[59]
Scalable diffusion models with transformers
Peebles, W and Xie, S. Scalable diffusion models with transformers. ICCV. (2023)
work page 2023
-
[60]
DINOv2: Learning Robust Visual Features without Supervision
Oquab, M, Darcet, T, Moutakanni, T, Vo, HV, Szafraniec, M, Khalidov, V, Fernandez, P, Haziza, D, Massa, F, El-Nouby, A, Howes, R, Huang, PY, Xu, H, Sharma, V, Li, SW, Galuba, W, Rabbat, M, Assran, M, Ballas, N, Synnaeve, G, Misra, I, Jegou, H, Mairal, J, Labatut, P, Joulin, A, and Bojanowski, P. DINOv2: Learning Robust Visual Features without Supervision....
work page internal anchor Pith review arXiv 2023
-
[61]
Learning with 3D rotations, a hitchhiker’s guide to SO (3)
Geist, AR, Frey, J, Zhobro, M, Levina, A, and Martius, G. Learning with 3D rotations, a hitchhiker’s guide to SO (3). arXiv:2404.11735 (2024)
-
[62]
On the continuity of rotation representations in neural networks
Zhou, Y, Barnes, C, Lu, J, Yang, J, and Li, H. On the continuity of rotation representations in neural networks. CVPR. (2019)
work page 2019
-
[63]
O-cnn: Octree-based convolutional neural networks for 3d shape analysis
Wang, PS, Liu, Y, Guo, YX, Sun, CY, and Tong, X. O-cnn: Octree-based convolutional neural networks for 3d shape analysis. ACM TOG (2017)
work page 2017
-
[64]
Flexible Isosurface Extraction for Gradient-Based Mesh Optimization
Shen, T, Munkberg, J, Hasselgren, J, Yin, K, Wang, Z, Chen, W, Gojcic, Z, Fidler, S, Sharp, N, and Gao, J. Flexible Isosurface Extraction for Gradient-Based Mesh Optimization. ACM TOG (2023)
work page 2023
-
[65]
Objaverse-XL: A Universe of 10M+ 3D Objects
Deitke, M, Liu, R, Wallingford, M, Ngo, H, Michel, O, Kusupati, A, Fan, A, Laforte, C, Voleti, V, Gadre, SY, VanderBilt, E, Kembhavi, A, Vondrick, C, Gkioxari, G, Ehsani, K, Schmidt, L, and Farhadi, A. Objaverse-XL: A Universe of 10M+ 3D Objects. arXiv:2307.05663 (2023)
work page internal anchor Pith review arXiv 2023
-
[66]
ABO: Dataset and Benchmarks for Real-World 3D Object Understanding
Collins, J, Goel, S, Deng, K, Luthra, A, Xu, L, Gundogdu, E, Zhang, X, Yago Vicente, TF, Dideriksen, T, Arora, H, Guillaumin, M, and Malik, J. ABO: Dataset and Benchmarks for Real-World 3D Object Understanding. CVPR (2022) 14
work page 2022
-
[67]
Khanna*, M, Mao*, Y, Jiang, H, Haresh, S, Shacklett, B, Batra, D, Clegg, A, Undersander, E, Chang, AX, and Savva, M. Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation. arXiv (2023)
work page 2023
- [68]
-
[69]
Partnext: A next-generation dataset for fine-grained and hierarchical 3d part understanding
Wang, P, He, Y, Lv, X, Zhou, Y, Xu, L, Yu, J, and Gu, J. Partnext: A next-generation dataset for fine-grained and hierarchical 3d part understanding. arXiv:2510.20155 (2025)
-
[70]
SAPIEN: A SimulAted Part-based Interactive ENvironment
Xiang, F, Qin, Y, Mo, K, Xia, Y, Zhu, H, Liu, F, Liu, M, Jiang, H, Yuan, Y, Wang, H, Yi, L, Chang, AX, Guibas, LJ, and Su, H. SAPIEN: A SimulAted Part-based Interactive ENvironment. CVPR. (2020)
work page 2020
-
[71]
Learning 6d object pose estimation using 3d object coordinates
Brachmann, E, Krull, A, Michel, F, Gumhold, S, Shotton, J, and Rother, C. Learning 6d object pose estimation using 3d object coordinates. ECCV. (2014)
work page 2014
-
[72]
Homebreweddb: Rgb-d dataset for 6d pose estimation of 3d objects
Kaskman, R, Zakharov, S, Shugurov, I, and Ilic, S. Homebreweddb: Rgb-d dataset for 6d pose estimation of 3d objects. ICCVW. (2019)
work page 2019
-
[73]
Tyree, S, Tremblay, J, To, T, Cheng, J, Mosier, T, Smith, J, and Birchfield, S. 6-DoF Pose Estimation of Household Objects for Robotic Manipulation: An Accessible Dataset and Benchmark. IROS. (2022)
work page 2022
-
[74]
ZeroGrasp: Zero-shot shape reconstruction enabled robotic grasping
Iwase, S, Irshad, MZ, Liu, K, Guizilini, V, Lee, R, Ikeda, T, Amma, A, Nishiwaki, K, Kitani, K, Ambrus, R, et al. ZeroGrasp: Zero-shot shape reconstruction enabled robotic grasping. CVPR. (2025)
work page 2025
-
[75]
Jin, Z, Che, Z, Zhao, Z, Wu, K, Zhang, Y, Zhao, Y, Liu, Z, Zhang, Q, Ju, X, Tian, J, et al. Artvip: Articulated digital assets of visual realism, modular interaction, and physical fidelity for robot learning. arXiv:2506.04941 (2025)
-
[76]
BOP: Benchmark for 6D object pose estimation
Hodan, T, Michel, F, Brachmann, E, Kehl, W, GlentBuch, A, Kraft, D, Drost, B, Vidal, J, Ihrke, S, Zabulis, X, et al. BOP: Benchmark for 6D object pose estimation. ECCV. (2018)
work page 2018
-
[77]
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset (2024)
Khazatsky, A, Pertsch, K, Nair, S, Balakrishna, A, Dasari, S, Karamcheti, S, Nasiriany, S, Srirama, MK, Chen, LY, Ellis, K, Fagan, PD, Hejna, J, Itkina, M, Lepert, M, Ma, YJ, Miller, PT, Wu, J, Belkhale, S, Dass, S, Ha, H, Jain, A, Lee, A, Lee, Y, Memmel, M, Park, S, Radosavovic, I, Wang, K, Zhan, A, Black, K, Chi, C, Hatch, KB, Lin, S, Lu, J, Mercat, J, ...
work page 2024
-
[78]
Native and Compact Structured Latents for 3D Generation
Xiang, J, Chen, X, Xu, S, Wang, R, Lv, Z, Deng, Y, Zhu, H, Dong, Y, Zhao, H, Yuan, NJ, and Yang, J. Native and Compact Structured Latents for 3D Generation. Tech report (2025)
work page 2025
-
[79]
Geng, Z, Wang, N, Xu, S, Ye, C, Li, B, Chen, Z, Peng, S, and Zhao, H. One View, Many Worlds: Single-Image to 3D Object Meets Generative Domain Randomization for One-Shot 6D Pose Estimation. arXiv:2509.07978 (2025)
-
[80]
Denninger, M, Sundermeyer, M, Winkelbauer, D, Zidan, Y, Olefir, D, Elbadrawy, M, Lodhi, A, and Katam, H. Blenderproc. arXiv:1911.01911 (2019)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.