Recognition: no theorem link
Physically Grounded 3D Generative Reconstruction under Hand Occlusion using Proprioception and Multi-Contact Touch
Pith reviewed 2026-05-10 18:18 UTC · model grok-4.3
The pith
Proprioception and multi-contact touch enable metric-scale 3D object reconstruction under severe hand occlusion by constraining surfaces with physical signals.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We represent object structure as a pose-aware, camera-aligned signed distance field (SDF) and learn a compact latent space with a Structure-VAE. In this latent space, we train a conditional flow-matching diffusion model, pretraining on vision-only images and finetuning on occluded manipulation scenes while conditioning on visible RGB evidence, occluder/visibility masks, the hand latent representation, and tactile information. Crucially, we incorporate physics-based objectives and differentiable decoder-guidance during finetuning and inference to reduce hand–object interpenetration and to align the reconstructed surface with contact observations.
What carries the argument
A conditional flow-matching diffusion model in the latent space of a Structure-VAE for pose-aware SDFs, conditioned on RGB, masks, hand proprioception, and multi-contact touch while guided by physics objectives for non-interpenetration and contact matching.
If this is right
- Substantially improves completion of occluded object parts compared to vision-only baselines.
- Yields physically plausible reconstructions at correct real-world scale.
- Reduces hand-object interpenetration through the added physics objectives.
- Transfers to real humanoid robots even when the end-effector differs from training data.
- Integrates directly into two-stage pipelines where a downstream module can refine geometry and predict appearance.
Where Pith is reading between the lines
- Robots could maintain object models continuously during grasping sequences without pausing for clear views.
- The same conditioning strategy might apply to other occluders if their geometry and contact data are available.
- Real-time versions could support closed-loop control by updating the object estimate as contacts change.
- Reducing dependence on external cameras could simplify workspace setups in cluttered or mobile manipulation scenarios.
Load-bearing premise
That physics-based objectives and differentiable decoder guidance during finetuning and inference will reliably reduce hand-object interpenetration and align surfaces with contacts without introducing artifacts or scale errors.
What would settle it
Reconstructed meshes that still penetrate the hand model or fail to match observed contact locations at test time, or output objects whose real-world scale deviates from ground-truth measurements on the robot.
Figures
read the original abstract
We propose a multimodal, physically grounded approach for metric-scale amodal object reconstruction and pose estimation under severe hand occlusion. Unlike prior occlusion-aware 3D generation methods that rely only on vision, we leverage physical interaction signals: proprioception provides the posed hand geometry, and multi-contact touch constrains where the object surface must lie, reducing ambiguity in occluded regions. We represent object structure as a pose-aware, camera-aligned signed distance field (SDF) and learn a compact latent space with a Structure-VAE. In this latent space, we train a conditional flow-matching diffusion model, pretraining on vision-only images and finetuning on occluded manipulation scenes while conditioning on visible RGB evidence, occluder/visibility masks, the hand latent representation, and tactile information. Crucially, we incorporate physics-based objectives and differentiable decoder-guidance during finetuning and inference to reduce hand--object interpenetration and to align the reconstructed surface with contact observations. Because our method produces a metric, physically consistent structure estimate, it integrates naturally into existing two-stage reconstruction pipelines, where a downstream module refines geometry and predicts appearance. Experiments in simulation show that adding proprioception and touch substantially improves completion under occlusion and yields physically plausible reconstructions at correct real-world scale compared to vision-only baselines; we further validate transfer by deploying the model on a real humanoid robot with an end-effector different from those used during training.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a multimodal, physically grounded method for metric-scale amodal 3D object reconstruction and pose estimation under severe hand occlusion. It represents objects as pose-aware camera-aligned SDFs via a Structure-VAE, then trains a conditional flow-matching diffusion model on vision, occluder masks, hand latent representations, and tactile signals. Physics-based objectives and differentiable decoder guidance are added during finetuning and inference to enforce physical consistency. Simulation experiments claim substantial gains in occluded completion and scale accuracy over vision-only baselines, with further validation via deployment on a real humanoid robot using a novel end-effector.
Significance. If the results hold, the work meaningfully advances grounded 3D generation for robotics by showing how proprioception and multi-contact touch can resolve visual ambiguity in manipulation scenes while producing metric, non-penetrating reconstructions. The sim-to-real transfer with an unseen end-effector and the integration path into two-stage pipelines are practical strengths that could influence downstream tasks such as grasping and interaction planning.
major comments (2)
- Abstract: the central claim that proprioception and touch 'substantially improves completion under occlusion and yields physically plausible reconstructions at correct real-world scale' is presented without any quantitative metrics, error bars, ablation tables, or statistical tests in the abstract itself; this makes it impossible to judge the magnitude or reliability of the reported gains from the provided text alone.
- Method description: the physics-based objectives and differentiable decoder-guidance are described as key to reducing interpenetration and aligning surfaces with contacts, yet no explicit formulation, weighting schedule, or ablation isolating their contribution is referenced, leaving open whether they reliably avoid artifacts or scale drift as assumed.
minor comments (3)
- Abstract: expand the experimental summary to include at least one key quantitative result (e.g., IoU or Chamfer distance improvement) so readers can immediately gauge the effect size.
- Notation: confirm that 'SDF' is expanded on first use and that all conditioning signals (RGB, masks, hand latent, tactile) are consistently denoted across text and figures.
- Figures: ensure simulation and real-robot result panels are clearly labeled with baseline comparisons and scale references so the physical-consistency claim is visually verifiable.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work and the constructive feedback. We address each major comment below and have revised the manuscript accordingly to improve clarity and completeness.
read point-by-point responses
-
Referee: Abstract: the central claim that proprioception and touch 'substantially improves completion under occlusion and yields physically plausible reconstructions at correct real-world scale' is presented without any quantitative metrics, error bars, ablation tables, or statistical tests in the abstract itself; this makes it impossible to judge the magnitude or reliability of the reported gains from the provided text alone.
Authors: We agree that the abstract would benefit from including key quantitative indicators to convey the scale of improvements. We have revised the abstract to reference representative metrics from our simulation experiments (e.g., improvements in completion IoU and scale accuracy relative to vision-only baselines) while preserving brevity, with full details, error bars, and statistical comparisons remaining in the main results section and tables. revision: yes
-
Referee: Method description: the physics-based objectives and differentiable decoder-guidance are described as key to reducing interpenetration and aligning surfaces with contacts, yet no explicit formulation, weighting schedule, or ablation isolating their contribution is referenced, leaving open whether they reliably avoid artifacts or scale drift as assumed.
Authors: The explicit mathematical formulations for the physics-based objectives (interpenetration penalty and contact alignment terms) and the differentiable decoder guidance appear in Section 3.4, with the weighting schedule and training procedure described in the implementation details. To address the concern directly, we have added cross-references to the relevant equations in the method overview and included a new ablation study in the revised manuscript that isolates the contribution of these components, confirming their role in reducing interpenetration and scale drift. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper's core pipeline—Structure-VAE for latent representation of pose-aware SDFs, followed by conditional flow-matching diffusion pretrained on vision and finetuned with proprioception/touch conditioning plus physics-based losses—is built from standard generative modeling components. No equation or claim reduces the reconstructed output to a fitted input by construction, nor does any uniqueness theorem or ansatz rely on self-citation chains. Simulation gains and real-robot transfer with a novel end-effector are presented as empirical validation rather than definitional consequences. The approach therefore contains independent content and does not trigger any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
free parameters (2)
- latent space dimension
- conditioning signal weights
axioms (2)
- domain assumption Proprioception provides accurate posed hand geometry
- domain assumption Multi-contact touch observations constrain object surface location
Reference graph
Works this paper leans on
-
[1]
Building Normalizing Flows with Stochastic Interpolants
Albergo, M.S., Vanden-Eijnden, E.: Building normalizing flows with stochastic interpolants. arXiv preprint arXiv:2209.15571 (2022) Physically Grounded 3D Generative Reconstruction 15
work page internal anchor Pith review arXiv 2022
-
[2]
IEEE Sensors Journal25(5), 8697–8709 (2025).https://doi
Caddeo, G.M., Maracani, A., Alfano, P.D., Piga, N.A., Rosasco, L., Natale, L.: Sim2surf: A sim2real surface classifier for vision-based tactile sensors with a bilevel adaptation pipeline. IEEE Sensors Journal25(5), 8697–8709 (2025).https://doi. org/10.1109/JSEN.2025.3530712
-
[3]
IEEE Robotics Automation Magazine22(3), 36–52 (2015).https://doi.org/ 10.1109/MRA.2015.2448951
Calli, B., Walsman, A., Singh, A., Srinivasa, S., Abbeel, P., Dollar, A.M.: Bench- marking in manipulation research: Using the yale-cmu-berkeley object and model set. IEEE Robotics Automation Magazine22(3), 36–52 (2015).https://doi.org/ 10.1109/MRA.2015.2448951
-
[4]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Chan, E.R., Lin, C.Z., Chan, M.A., Nagano, K., Pan, B., De Mello, S., Gallo, O., Guibas, L.J., Tremblay, J., Khamis, S., et al.: Efficient geometry-aware 3d generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16123–16133 (2022)
2022
-
[5]
In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition
Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5799–5809 (2021)
2021
-
[6]
In: Proceedings of the IEEE/CVF international conference on computer vision
Chen, H., Gu, J., Chen, A., Tian, W., Tu, Z., Liu, L., Su, H.: Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 2416–2425 (2023)
2023
-
[7]
In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Chen, Y., Tekden, A.E., Deisenroth, M.P., Bekiroglu, Y.: Sliding touch-based ex- ploration for modeling unknown object shape with multi-fingered hands. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 8943–8950. IEEE (2023)
2023
-
[8]
Advances in Neural Information Processing Systems37, 102501–102530 (2024)
Chen, Y., Xie, T., Zong, Z., Li, X., Gao, F., Yang, Y., Wu, Y.N., Jiang, C.: Atlas3d: Physically constrained self-supporting text-to-3d for simulation and fabrication. Advances in Neural Information Processing Systems37, 102501–102530 (2024)
2024
-
[9]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Chen, Z., Tang, J., Dong, Y., Cao, Z., Hong, F., Lan, Y., Wang, T., Xie, H., Wu, T., Saito, S., et al.: 3dtopia-xl: Scaling high-quality 3d asset generation via prim- itive diffusion. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 26576–26586 (2025)
2025
-
[10]
Chi, S., Sachdeva, E., Huang, P.H., Lee, K.: Contact-aware amodal completion for human-object interaction via multi-regional inpainting (2025),https://arxiv. org/abs/2508.00427
-
[11]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Cho, J., Youwang, K., Yang, H., Oh, T.H.: Robust 3d shape reconstruction in zero- shot from a single image in the wild. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 22786–22798 (2025)
2025
- [12]
-
[13]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Collins, J., Goel, S., Deng, K., Luthra, A., Xu, L., Gundogdu, E., Zhang, X., Vicente, T.F.Y., Dideriksen, T., Arora, H., et al.: Abo: Dataset and benchmarks for real-world 3d object understanding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 21126–21136 (2022)
2022
-
[14]
IEEE Robotics and Automation Letters9(6), 5719–5726 (2024)
Comi, M., Lin, Y., Church, A., Tonioni, A., Aitchison, L., Lepora, N.F.: Touchsdf: A deepsdf approach for 3d shape reconstruction using vision-based tactile sensing. IEEE Robotics and Automation Letters9(6), 5719–5726 (2024)
2024
-
[15]
In: 2025 International Conference on 3D Vision (3DV)
Comi, M., Tonioni, A., Tremblay, J., Yang, M., Blukis, V., Lin, Y., Lepora, N.F., Aitchison, L.: Snap-it, tap-it, splat-it: Tactile-informed 3d gaussian splatting for reconstructing challenging surfaces. In: 2025 International Conference on 3D Vision (3DV). pp. 1134–1143. IEEE (2025) 16 Gabriele M. Caddeo , Pasquale Marra , and Lorenzo Natake
2025
-
[16]
In: European Conference on Computer Vision
Cui, R., Liu, W., Sun, W., Wang, S., Shang, T., Li, Y., Song, X., Yan, H., Wu, Z., Chen, S., et al.: Neusdfusion: A spatial-aware generative model for 3d shape completion, reconstruction, and generation. In: European Conference on Computer Vision. pp. 1–18. Springer (2024)
2024
-
[17]
Advances in Neural Information Processing Systems36, 35799–35813 (2023)
Deitke, M., Liu, R., Wallingford, M., Ngo, H., Michel, O., Kusupati, A., Fan, A., Laforte, C., Voleti, V., Gadre, S.Y., et al.: Objaverse-xl: A universe of 10m+ 3d objects. Advances in Neural Information Processing Systems36, 35799–35813 (2023)
2023
-
[18]
In: 2022 International Conference on Robotics and Automation (ICRA)
Downs, L., Francis, A., Koenig, N., Kinman, B., Hickman, R., Reymann, K., McHugh, T.B., Vanhoucke, V.: Google scanned objects: A high-quality dataset of 3d scanned household items. In: 2022 International Conference on Robotics and Automation (ICRA). pp. 2553–2560. Ieee (2022)
2022
-
[19]
International Journal of Computer Vision129(12), 3313–3337 (2021)
Fu, H., Jia, R., Gao, L., Gong, M., Zhao, B., Maybank, S., Tao, D.: 3d-future: 3d furniture shape with texture. International Journal of Computer Vision129(12), 3313–3337 (2021)
2021
-
[20]
Advances in neural information processing systems35, 31841–31854 (2022)
Gao, J., Shen, T., Wang, Z., Chen, W., Yin, K., Li, D., Litany, O., Gojcic, Z., Fidler, S.: Get3d: A generative model of high quality 3d textured shapes learned from images. Advances in neural information processing systems35, 31841–31854 (2022)
2022
-
[21]
Advances in Neural Information Processing Systems37, 29839–29863 (2024)
Gao, R., Deng, K., Yang, G., Yuan, W., Zhu, J.Y.: Tactile dreamfusion: Exploit- ing tactile sensing for 3d generation. Advances in Neural Information Processing Systems37, 29839–29863 (2024)
2024
-
[22]
Communications of the ACM63(11), 139–144 (2020)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Communications of the ACM63(11), 139–144 (2020)
2020
-
[23]
Advances in Neural Information Processing Systems37, 119260–119282 (2024)
Guo, M., Wang, B., Ma, P., Zhang, T., Owens, C., Gan, C., Tenenbaum, J., He, K., Matusik, W.: Physically compatible 3d object modeling from a single image. Advances in Neural Information Processing Systems37, 119260–119282 (2024)
2024
-
[24]
Advances in neural information processing systems33, 6840–6851 (2020)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)
2020
-
[25]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Huang, Z., Yu, Y., Xu, J., Ni, F., Le, X.: Pf-net: Point fractal network for 3d point cloud completion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7662–7670 (2020)
2020
- [26]
- [27]
-
[28]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Khanna, M., Mao, Y., Jiang, H., Haresh, S., Shacklett, B., Batra, D., Clegg, A., Undersander, E., Chang, A.X., Savva, M.: Habitat synthetic scenes dataset (hssd- 200): An analysis of 3d scene scale and realism tradeoffs for objectgoal navigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16384–16393 (2024)
2024
-
[29]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Li, M., Duan, Y., Zhou, J., Lu, J.: Diffusion-sdf: Text-to-shape via voxelized diffu- sion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12642–12651 (2023)
2023
-
[30]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Li, R., Zheng, C., Rupprecht, C., Vedaldi, A.: Dso: Aligning 3d generators with simulation feedback for physical soundness. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6772–6783 (2025) Physically Grounded 3D Generative Reconstruction 17
2025
-
[31]
In: proceedings of the AAAI Conference on Artificial Intelligence
Lin, C.H., Kong, C., Lucey, S.: Learning efficient point cloud generation for dense 3d object reconstruction. In: proceedings of the AAAI Conference on Artificial Intelligence. vol. 32 (2018)
2018
-
[32]
Flow Matching for Generative Modeling
Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. arXiv preprint arXiv:2210.02747 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[33]
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Liu, X., Gong, C., Liu, Q.: Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[34]
IEEE Transactions on Robotics (2025)
Luo, S., Lepora, N.F., Yuan, W., Althoefer, K., Cheng, G., Dahiya, R.: Tactile robotics: An outlook. IEEE Transactions on Robotics (2025)
2025
-
[35]
In: 2018 IEEE International Conference on Robotics and Automation (ICRA)
Luo, S., Yuan, W., Adelson, E., Cohn, A.G., Fuentes, R.: Vitac: Feature sharing between vision and tactile sensing for cloth texture recognition. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). pp. 2722–2727 (2018).https://doi.org/10.1109/ICRA.2018.8460494
-
[36]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition
Luo, S., Hu, W.: Diffusion probabilistic models for 3d point cloud generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 2837–2845 (2021)
2021
-
[37]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Melas-Kyriazi, L., Rupprecht, C., Vedaldi, A.: Pc2: Projection-conditioned point cloud diffusion for single-image 3d reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12923– 12932 (2023)
2023
-
[38]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Müller, N., Siddiqui, Y., Porzi, L., Bulo, S.R., Kontschieder, P., Nießner, M.: Diffrf: Rendering-guided 3d radiance field diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4328–4338 (2023)
2023
-
[39]
Advances in Neural Information Processing Systems37, 25747–25780 (2024)
Ni, J., Chen, Y., Jing, B., Jiang, N., Wang, B., Dai, B., Li, P., Zhu, Y., Zhu, S.C., Huang, S.: Phyrecon: Physically plausible neural scene reconstruction. Advances in Neural Information Processing Systems37, 25747–25780 (2024)
2024
-
[40]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Niemeyer, M., Geiger, A.: Giraffe: Representing scenes as compositional generative neural feature fields. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11453–11464 (2021)
2021
- [41]
-
[42]
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W.,Howes,R.,Huang,P.Y.,Li,S.W.,Misra,I.,Rabbat,M.,Sharma,V.,Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: Dinov2: Learning robust visual features without su...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[43]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Ren, X., Huang, J., Zeng, X., Museth, K., Fidler, S., Williams, F.: Xcube: Large- scale 3d generative modeling using sparse voxel hierarchies. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4209–4219 (2024)
2024
-
[44]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)
2022
-
[45]
Advances in neural information processing systems 33, 20154–20166 (2020)
Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: Graf: Generative radiance fields for 3d-aware image synthesis. Advances in neural information processing systems 33, 20154–20166 (2020)
2020
-
[46]
Safe multi-agent navigation guided by goal- conditioned safe reinforcement learning
Shahidzadeh, A.H., Caddeo, G.M., Alapati, K., Natale, L., Fermüler, C., Aloi- monos, Y.: Feelanyforce: Estimating contact force feedback from tactile sensa- tion for vision-based tactile sensors. In: 2025 IEEE International Conference on 18 Gabriele M. Caddeo , Pasquale Marra , and Lorenzo Natake Robotics and Automation (ICRA). pp. 251–257 (2025).https://...
-
[47]
In: 2024 IEEE International Conference on Robotics and Automation (ICRA)
Shahidzadeh, A.H., Yoo, S.J., Mantripragada, P., Singh, C.D., Fermüller, C., Aloi- monos, Y.: Actexplore: Active tactile exploration on unknown objects. In: 2024 IEEE International Conference on Robotics and Automation (ICRA). pp. 3411–
2024
-
[48]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Shue, J.R., Chan, E.R., Po, R., Ankner, Z., Wu, J., Wetzstein, G.: 3d neural field generation using triplane diffusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20875–20886 (2023)
2023
-
[49]
Smith, E., Calandra, R., Romero, A., Gkioxari, G., Meger, D., Malik, J., Drozdzal, M.:3dshapereconstructionfromvisionandtouch.AdvancesinNeuralInformation Processing Systems33, 14193–14206 (2020)
2020
-
[50]
Advances in Neural Information Processing Systems34, 16064–16078 (2021)
Smith, E., Meger, D., Pineda, L., Calandra, R., Malik, J., Romero Soriano, A., Drozdzal, M.: Active 3d shape reconstruction from vision and touch. Advances in Neural Information Processing Systems34, 16064–16078 (2021)
2021
-
[51]
NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance Fields , booktitle =
Sodhi, P., Kaess, M., Mukadanr, M., Anderson, S.: Patchgraph: In-hand tac- tile tracking with learned surface normals. In: 2022 International Conference on Robotics and Automation (ICRA). pp. 2164–2170 (2022).https://doi.org/10. 1109/ICRA46639.2022.9811953
-
[52]
In: International conference on machine learning
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsuper- vised learning using nonequilibrium thermodynamics. In: International conference on machine learning. pp. 2256–2265. pmlr (2015)
2015
-
[53]
Science Robotics9(96), eadl0628 (2024)
Suresh, S., Qi, H., Wu, T., Fan, T., Pineda, L., Lambeta, M., Malik, J., Kalakrish- nan, M., Calandra, R., Kaess, M., et al.: Neuralfeels with neural fields: Visuotactile perception for in-hand manipulation. Science Robotics9(96), eadl0628 (2024)
2024
-
[54]
In: 2022 International Conference on Robotics and Automation (ICRA)
Suresh, S., Si, Z., Mangelson, J.G., Yuan, W., Kaess, M.: Shapemap 3-d: Efficient shape mapping through dense touch and vision. In: 2022 International Conference on Robotics and Automation (ICRA). pp. 7073–7080. IEEE (2022)
2022
-
[55]
In: 2024 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS)
Swann,A.,Strong,M.,Do,W.K.,Camps,G.S.,Schwager,M.,Kennedy,M.:Touch- gs: Visual-tactile supervised 3d gaussian splatting. In: 2024 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS). pp. 10511–10518. IEEE (2024)
2024
-
[56]
In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Tahoun, M., Tahri, O., Ramón, J.A.C., Mezouar, Y.: Visual-tactile fusion for 3d objects reconstruction from a single depth view and a single gripper touch for robotics tasks. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 6786–6793. IEEE (2021)
2021
- [57]
-
[58]
Team, S.D., Chen, X., Chu, F.J., Gleize, P., Liang, K.J., Sax, A., Tang, H., Wang, W., Guo, M., Hardin, T., Li, X., Lin, A., Liu, J., Ma, Z., Sagar, A., Song, B., Wang, X., Yang, J., Zhang, B., Dollár, P., Gkioxari, G., Feiszli, M., Malik, J.: Sam 3d: 3dfy anything in images (2025),https://arxiv.org/abs/2511.16624
work page internal anchor Pith review arXiv 2025
-
[59]
Advances in Neural Infor- mation Processing Systems35, 10021–10039 (2022)
Vahdat, A., Williams, F., Gojcic, Z., Litany, O., Fidler, S., Kreis, K., et al.: Lion: Latent point diffusion models for 3d shape generation. Advances in Neural Infor- mation Processing Systems35, 10021–10039 (2022)
2022
-
[60]
IEEE Transactions on Robotics33(5), 1139–1155 (2017).https://doi.org/10.1109/TRO.2017.2707092
Vezzani, G., Pattacini, U., Battistelli, G., Chisci, L., Natale, L.: Memory unscented particle filter for 6-dof tactile localization. IEEE Transactions on Robotics33(5), 1139–1155 (2017).https://doi.org/10.1109/TRO.2017.2707092
-
[61]
arXiv preprint arXiv:2410.18070 (2024) Physically Grounded 3D Generative Reconstruction 19
Wang, L., Cheng, C., Liao, Y., Qu, Y., Liu, G.: Training free guided flow matching with optimal control. arXiv preprint arXiv:2410.18070 (2024) Physically Grounded 3D Generative Reconstruction 19
-
[62]
ACM Transactions on Graphics (SIGGRAPH) 41(4) (2022)
Wang, P.S., Liu, Y., Tong, X.: Dual octree graph networks for learning adaptive volumetric shape representations. ACM Transactions on Graphics (SIGGRAPH) 41(4) (2022)
2022
-
[63]
In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Wang, S., Wu, J., Sun, X., Yuan, W., Freeman, W.T., Tenenbaum, J.B., Adelson, E.H.: 3d shape perception from monocular vision, touch, and shape priors. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 1606–1613. IEEE (2018)
2018
- [64]
-
[65]
Wen, B., Yang, W., Kautz, J., Birchfield, S.: Foundationpose: Unified 6d pose estimation and tracking of novel objects (2024),https://arxiv.org/abs/2312. 08344
2024
-
[66]
Wu, T., Zheng, C., Guan, F., Vedaldi, A., Cham, T.J.: Amodal3r: Amodal 3d reconstruction from occluded 2d images (2025),https://arxiv.org/abs/2503. 13439
2025
-
[67]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Wu, Z., Wang, Y., Feng, M., Xie, H., Mian, A.: Sketch and text guided diffu- sion model for colored point cloud generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8929–8939 (2023)
2023
-
[68]
arXiv preprint arXiv:2412.01506 (2024) 4
Xiang, J., Lv, Z., Xu, S., Deng, Y., Wang, R., Zhang, B., Chen, D., Tong, X., Yang, J.: Structured 3d latents for scalable and versatile 3d generation. arXiv preprint arXiv:2412.01506 (2024)
-
[69]
PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes (2018),https://arxiv. org/abs/1711.00199
work page Pith review arXiv 2018
- [70]
-
[71]
In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition
Yang, Y., Jia, B., Zhi, P., Huang, S.: Physcene: Physically interactable 3d scene synthesis for embodied ai. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. pp. 16262–16272 (2024)
2024
-
[72]
In: European Conference on Computer Vision
Zhang, B., Cheng, Y., Wang, C., Zhang, T., Yang, J., Tang, Y., Zhao, F., Chen, D., Guo, B.: Rodinhd: High-fidelity 3d avatar generation with diffusion models. In: European Conference on Computer Vision. pp. 465–483. Springer (2024)
2024
-
[73]
11 Published as a conference paper at ICLR 2025 A DATA We use two datasets in our experiments
Zhang, B., Cheng, Y., Yang, J., Wang, C., Zhao, F., Tang, Y., Chen, D., Guo, B.: Gaussiancube: A structured and explicit radiance representation for 3d generative modeling. arXiv preprint arXiv:2403.19655 (2024)
-
[74]
ACM Transactions on Graphics (TOG)43(4), 1–20 (2024)
Zhang, L., Wang, Z., Zhang, Q., Qiu, Q., Pang, A., Jiang, H., Yang, W., Xu, L., Yu, J.: Clay: A controllable large-scale generative model for creating high-quality 3d assets. ACM Transactions on Graphics (TOG)43(4), 1–20 (2024)
2024
-
[75]
Zheng, H., Jalba, A., Cuijpers, R.H., IJsselsteijn, W., Schoenmakers, S.: A bayesian framework for active tactile object recognition, pose estimation and shape transfer learning (2024),https://arxiv.org/abs/2409.06912 20 Gabriele M. Caddeo , Pasquale Marra , and Lorenzo Natake A More details on Datasets A.1 Image processing Pose-consistent sprite placemen...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.