Recognition: 1 theorem link
· Lean TheoremHuman Interaction-Aware 3D Reconstruction from a Single Image
Pith reviewed 2026-05-10 18:52 UTC · model grok-4.3
The pith
HUG3D reconstructs textured 3D models of multiple interacting people from one image by jointly modeling group context and physical contacts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The HUG3D framework first transforms the input image to canonical orthographic space, then applies Human Group-Instance Multi-View Diffusion (HUG-MVD) to produce complete multi-view normals and images by jointly reasoning over instance and group context, and finally uses Human Group-Instance Geometric Reconstruction (HUG-GR) to optimize geometry with explicit physics-based interaction priors that enforce physical plausibility and accurate inter-human contact; the multi-view images are fused into a high-fidelity texture.
What carries the argument
Human Group-Instance Multi-View Diffusion (HUG-MVD), which jointly generates multi-view normals and images by modeling both individual and group-level information to resolve occlusions and proximity artifacts.
If this is right
- Reconstructions of crowded scenes become usable for AR/VR without manual cleanup of overlaps or holes.
- Inter-human contact can be modeled directly from monocular input rather than added as post-processing.
- Multi-view consistency is achieved in one forward pass instead of independent per-person generation.
- Physics priors become part of the core optimization rather than an optional constraint.
- Textured output quality improves because the diffusion stage already produces coherent multi-view appearance.
Where Pith is reading between the lines
- The orthographic canonicalization step could be applied to other multi-object reconstruction tasks where perspective distortion currently limits accuracy.
- Replacing the diffusion backbone with newer video or 3D diffusion models might allow the same group-instance coupling to work on dynamic interactions.
- The physics-prior optimization could be extended to include soft-tissue deformation or clothing dynamics for more lifelike results.
- Quantitative failure cases in extreme occlusion or unusual body configurations would reveal whether the joint diffusion truly generalizes beyond the training distribution.
Load-bearing premise
Transforming the input to canonical orthographic space and then applying joint group-instance diffusion plus explicit physics priors will remove perspective distortions and enforce realistic contacts without introducing new geometric errors or requiring extra input.
What would settle it
Quantitative comparison of reconstructed meshes against ground-truth 3D scans of known multi-person contact scenes, measuring interpenetration volume, contact-point error, and surface reconstruction accuracy in occluded regions.
Figures
read the original abstract
Reconstructing textured 3D human models from a single image is fundamental for AR/VR and digital human applications. However, existing methods mostly focus on single individuals and thus fail in multi-human scenes, where naive composition of individual reconstructions often leads to artifacts such as unrealistic overlaps, missing geometry in occluded regions, and distorted interactions. These limitations highlight the need for approaches that incorporate group-level context and interaction priors. We introduce a holistic method that explicitly models both group- and instance-level information. To mitigate perspective-induced geometric distortions, we first transform the input into a canonical orthographic space. Our primary component, Human Group-Instance Multi-View Diffusion (HUG-MVD), then generates complete multi-view normals and images by jointly modeling individuals and group context to resolve occlusions and proximity. Subsequently, the Human Group-Instance Geometric Reconstruction (HUG-GR) module optimizes the geometry by leveraging explicit, physics-based interaction priors to enforce physical plausibility and accurately model inter-human contact. Finally, the multi-view images are fused into a high-fidelity texture. Together, these components form our complete framework, HUG3D. Extensive experiments show that HUG3D significantly outperforms both single-human and existing multi-human methods, producing physically plausible, high-fidelity 3D reconstructions of interacting people from a single image. Project page: https://jongheean11.github.io/HUG3D_project
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces HUG3D, a framework for textured 3D reconstruction of multiple interacting humans from a single image. It first maps the input to canonical orthographic space, then applies Human Group-Instance Multi-View Diffusion (HUG-MVD) to jointly generate multi-view normals and images while modeling both instance and group context to address occlusions and proximity artifacts. The Human Group-Instance Geometric Reconstruction (HUG-GR) stage then optimizes geometry using explicit physics-based interaction priors to enforce contact plausibility. Multi-view images are finally fused for texture. The central claim is that this pipeline significantly outperforms prior single-human and multi-human methods in producing high-fidelity, physically plausible results.
Significance. If validated, the work would advance single-image 3D human reconstruction by explicitly handling group interactions, a gap in existing methods that often produce overlaps or distorted contacts. The sequential design—orthographic canonicalization, joint diffusion, and physics priors—offers a structured way to incorporate group-level context, which could benefit AR/VR and digital human applications. The approach credits standard diffusion and optimization tools while adding interaction-specific components; however, its impact hinges on whether the priors reliably correct diffusion-stage errors, as asserted but not quantified in the provided description.
major comments (2)
- [HUG-GR module] HUG-GR module: the claim that explicit physics-based interaction priors reliably enforce plausible contacts and resolve occlusions assumes the initial geometry from HUG-MVD is sufficiently accurate. If the priors are soft penalty terms, large initial errors (common in single-image diffusion) can trap the optimizer in local minima with residual penetrations or distorted limbs, directly undermining the central claim of physically plausible multi-human reconstructions.
- [Abstract and Experiments] Abstract and Experiments section: the assertion of 'extensive experiments' showing significant outperformance over single-human and multi-human baselines is load-bearing for the contribution, yet the abstract supplies no metrics, specific baselines, ablation results, or implementation details. This prevents verification that gains are not due to post-hoc selection or that the physics priors contribute measurably beyond the diffusion stage.
minor comments (2)
- [Abstract] The abstract would be strengthened by briefly stating one or two key quantitative improvements (e.g., contact error or IoU metrics) to support the outperformance claim.
- [Method] Notation for the orthographic transform and the precise form of the physics priors (e.g., whether they are hard constraints or weighted penalties) should be defined explicitly in the method section for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the paper accordingly to improve clarity and strengthen the presentation of our contributions.
read point-by-point responses
-
Referee: [HUG-GR module] HUG-GR module: the claim that explicit physics-based interaction priors reliably enforce plausible contacts and resolve occlusions assumes the initial geometry from HUG-MVD is sufficiently accurate. If the priors are soft penalty terms, large initial errors (common in single-image diffusion) can trap the optimizer in local minima with residual penetrations or distorted limbs, directly undermining the central claim of physically plausible multi-human reconstructions.
Authors: We appreciate this important observation on the sensitivity of the optimization. The HUG-GR stage formulates an energy function combining data fidelity terms (from HUG-MVD normals and images) with soft physics-based penalties for inter-human contacts and non-penetration. To reduce the risk of poor local minima, the optimization proceeds in stages: first aligning coarse poses, then refining geometry with the priors. In the revised manuscript we have added quantitative results showing contact accuracy metrics (penetration volume and contact point error) before versus after HUG-GR, as well as an ablation disabling the physics terms. These demonstrate measurable reductions in implausible contacts. We also include a short discussion of remaining failure modes when HUG-MVD produces severe initial errors. While we cannot guarantee perfect recovery from arbitrarily bad initializations, the reported experiments indicate that the diffusion outputs are sufficiently reliable for the evaluated cases, supporting the overall claim. revision: partial
-
Referee: [Abstract and Experiments] Abstract and Experiments section: the assertion of 'extensive experiments' showing significant outperformance over single-human and multi-human baselines is load-bearing for the contribution, yet the abstract supplies no metrics, specific baselines, ablation results, or implementation details. This prevents verification that gains are not due to post-hoc selection or that the physics priors contribute measurably beyond the diffusion stage.
Authors: We agree that the abstract would benefit from concrete numbers to make the claims immediately verifiable. In the revised version we have updated the abstract to report key quantitative gains, including reductions in penetration depth and improvements in normal consistency and perceptual metrics relative to both single-human and multi-human baselines. The experiments section already details all baselines (single-human methods such as PIFuHD and multi-human approaches), the full set of metrics (Chamfer distance, contact plausibility, etc.), component ablations isolating the contribution of the group-instance diffusion and the physics priors, and implementation hyperparameters. We have further clarified the evaluation protocol to confirm that comparisons were performed uniformly without post-hoc selection. These additions directly address the verifiability concern. revision: yes
Circularity Check
No significant circularity; pipeline is modular and externally grounded
full rationale
The provided abstract and description outline a sequential pipeline: canonical orthographic transform, followed by HUG-MVD joint diffusion for multi-view normals/images, then HUG-GR optimization using physics-based priors, and final texture fusion. No equations, fitted parameters, or self-citations are quoted that reduce any output (e.g., reconstructed geometry or contacts) to an input by construction. The central claims rest on empirical comparisons to baselines rather than tautological redefinitions or load-bearing self-references. This matches the default expectation of non-circularity for a methods paper describing standard diffusion-plus-optimization stages.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
HUG-GR optimizes the mesh using group-level, instance-level, and physics-based supervision... Lpen = meanV [ξ ln(1 + e^(tol−|si,j1−si,j2|)/ξ)] ... Ltotal = Lnormal + λvis·Lvis + λpen·Lpen
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Multi-hmr: Multi-person whole-body hu- man mesh recovery in a single shot.ECCV, 2024
Fabien Baradel, Matthieu Armando, Salma Galaaoui, Ro- main Br ´egier, Philippe Weinzaepfel, Gr ´egory Rogez, and Thomas Lucas. Multi-hmr: Multi-person whole-body hu- man mesh recovery in a single shot.ECCV, 2024. 2
2024
-
[2]
In- stant3dit: Multiview inpainting for fast editing of 3d objects
Amir Barda, Matheus Gadelha, Vladimir G Kim, Noam Aigerman, Amit H Bermano, and Thibault Groueix. In- stant3dit: Multiview inpainting for fast editing of 3d objects. arXiv preprint arXiv:2412.00518, 2024. 2, 30
-
[3]
Keep it smpl: Automatic estimation of 3d human pose and shape from a single image.ECCV, 2016
Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J Black. Keep it smpl: Automatic estimation of 3d human pose and shape from a single image.ECCV, 2016. 2
2016
-
[4]
Chenjie Cao, Chaohui Yu, Fan Wang, Xiangyang Xue, and Yanwei Fu. Mvinpainter: Learning multi-view consistent inpainting to bridge 2d and 3d editing.arXiv preprint arXiv:2408.08000, 2024. 2, 30
-
[5]
3d reconstruc- tion of interacting multi-person in clothing from a single im- age.WACV, 2024
Junuk Cha, Hansol Lee, Jaewon Kim, Nhat Nguyen Bao Truong, Jaeshin Yoon, and Seungryul Baek. 3d reconstruc- tion of interacting multi-person in clothing from a single im- age.WACV, 2024. 2, 7, 20, 21, 36
2024
-
[6]
Robust-pifu: Robust pixel-aligned im- plicit function for 3d human digitalization from a single im- age.ICLR, 2024
Kennard Chan, Fayao Liu, Guosheng Lin, Chuan-Sheng Foo, and Weisi Lin. Robust-pifu: Robust pixel-aligned im- plicit function for 3d human digitalization from a single im- age.ICLR, 2024. 2
2024
-
[7]
Retinaface: Single-shot multi-level face localisation in the wild.CVPR, 2020
Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. Retinaface: Single-shot multi-level face localisation in the wild.CVPR, 2020. 37
2020
-
[8]
Remips: Phys- ically consistent 3d reconstruction of multiple interacting people under weak supervision.NeurIPS, 2021
Mihai Fieraru, Mihai Zanfir, Teodor Szente, Eduard Baza- van, Vlad Olaru, and Cristian Sminchisescu. Remips: Phys- ically consistent 3d reconstruction of multiple interacting people under weak supervision.NeurIPS, 2021. 2
2021
-
[9]
Populating 3d scenes by learning human-scene interaction.CVPR, 2021
Mohamed Hassan, Partha Ghosh, Joachim Tesch, Dimitrios Tzionas, and Michael J Black. Populating 3d scenes by learning human-scene interaction.CVPR, 2021. 2
2021
-
[10]
Learn- ing locally editable virtual humans.CVPR, 2023
Hsuan-I Ho, Lixin Xue, Jie Song, and Otmar Hilliges. Learn- ing locally editable virtual humans.CVPR, 2023. 4, 6, 8, 14, 15, 37
2023
-
[11]
Sith: Single-view textured human reconstruction with image-conditioned dif- fusion.CVPR, 2024
I Ho, Jie Song, Otmar Hilliges, et al. Sith: Single-view textured human reconstruction with image-conditioned dif- fusion.CVPR, 2024. 1, 2, 7, 20, 21
2024
-
[12]
Classifier-free diffusion guidance.NeurIPS, 2021
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.NeurIPS, 2021. 16
2021
-
[13]
Closely interactive human reconstruction with proxemics and physics-guided adaption
Buzhen Huang, Chen Li, Chongyang Xu, Liang Pan, Yan- gang Wang, and Gim Hee Lee. Closely interactive human reconstruction with proxemics and physics-guided adaption. CVPR, 2024. 18
2024
-
[14]
Multiply: Re- construction of multiple people from monocular video in the wild.CVPR, 2024
Zeren Jiang, Chen Guo, Manuel Kaufmann, Tianjian Jiang, Julien Valentin, Otmar Hilliges, and Jie Song. Multiply: Re- construction of multiple people from monocular video in the wild.CVPR, 2024. 1, 2, 7, 20, 21, 23
2024
-
[15]
Total cap- ture: A 3d deformation model for tracking faces, hands, and bodies.CVPR, 2018
Hanbyul Joo, Tomas Simon, and Yaser Sheikh. Total cap- ture: A 3d deformation model for tracking faces, hands, and bodies.CVPR, 2018. 2
2018
-
[16]
End-to-end recovery of human shape and pose.CVPR, 2018
Angjoo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. End-to-end recovery of human shape and pose.CVPR, 2018. 2
2018
-
[17]
YOLOv11: An Overview of the Key Architectural Enhancements
Rahima Khanam and Muhammad Hussain. Yolov11: An overview of the key architectural enhancements.arXiv preprint arXiv:2410.17725, 2024. 12
work page internal anchor Pith review arXiv 2024
-
[18]
Sapiens: Foundation for human vision mod- els.ECCV, 2024
Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez, Su Zhaoen, Austin James, Peter Selednik, Stuart Anderson, and Shunsuke Saito. Sapiens: Foundation for human vision mod- els.ECCV, 2024. 3
2024
-
[19]
Chupa: Carv- ing 3d clothed humans from skinned shape priors using 2d diffusion probabilistic models.ICCV, 2023
Byungjun Kim, Patrick Kwon, Kwangho Lee, Myunggi Lee, Sookwan Han, Daesik Kim, and Hanbyul Joo. Chupa: Carv- ing 3d clothed humans from skinned shape priors using 2d diffusion probabilistic models.ICCV, 2023. 1
2023
-
[20]
Adam: A Method for Stochastic Optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv:1412.6980, 2014. 6, 16
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[21]
Segment any- thing.ICCV, 2023
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing.ICCV, 2023. 12
2023
-
[22]
Vibe: Video inference for human body pose and shape estimation.CVPR, 2020
Muhammed Kocabas, Nikos Athanasiou, and Michael J Black. Vibe: Video inference for human body pose and shape estimation.CVPR, 2020. 2
2020
-
[23]
Flux.https://github.com/ black-forest-labs/flux, 2024
Black Forest Labs. Flux.https://github.com/ black-forest-labs/flux, 2024. 2, 30
2024
-
[24]
Guess the unseen: Dynamic 3d scene reconstruction from partial 2d glimpses
Inhee Lee, Byungjun Kim, and Hanbyul Joo. Guess the unseen: Dynamic 3d scene reconstruction from partial 2d glimpses. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1062–1071,
-
[25]
Crowdpose: Efficient crowded scenes pose estimation and a new benchmark.CVPR, 2019
Jiefeng Li, Can Wang, Hao Zhu, Yihuan Mao, Hao-Shu Fang, and Cewu Lu. Crowdpose: Efficient crowded scenes pose estimation and a new benchmark.CVPR, 2019. 2
2019
-
[26]
Era3d: high-resolution multiview diffusion using efficient row-wise attention.NeurIPS, 2024
Peng Li, Yuan Liu, Xiaoxiao Long, Feihu Zhang, Cheng Lin, Mengfei Li, Xingqun Qi, Shanghang Zhang, Wei Xue, Wen- han Luo, et al. Era3d: high-resolution multiview diffusion using efficient row-wise attention.NeurIPS, 2024. 2, 8, 16
2024
-
[27]
Peng Li, Wangguandong Zheng, Yuan Liu, Tao Yu, Yang- guang Li, Xingqun Qi, Mengfei Li, Xiaowei Chi, Siyu Xia, Wei Xue, et al. Pshuman: Photorealistic single-view human reconstruction using cross-scale diffusion.arXiv preprint arXiv:2409.10141, 2024. 1, 2, 4, 6, 7, 15, 16, 20, 21, 23, 37
-
[28]
Smpl: A skinned multi- person linear model.ACM Transactions on Graphics (TOG),
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. Smpl: A skinned multi- person linear model.ACM Transactions on Graphics (TOG),
-
[29]
Pixel codec avatars.CVPR, 2021
Shugao Ma, Tomas Simon, Jason Saragih, Dawei Wang, Yuecheng Li, Fernando De La Torre, and Yaser Sheikh. Pixel codec avatars.CVPR, 2021. 1
2021
-
[30]
Generative proxemics: A prior for 3d social interaction from images.CVPR, 2024
Lea M ¨uller, Vickie Ye, Georgios Pavlakos, Michael Black, and Angjoo Kanazawa. Generative proxemics: A prior for 3d social interaction from images.CVPR, 2024. 2, 3, 6, 12, 30, 33
2024
-
[31]
Multi-person implicit reconstruction from a single image.CVPR, 2021
Armin Mustafa, Akin Caliskan, Lourdes Agapito, and Adrian Hilton. Multi-person implicit reconstruction from a single image.CVPR, 2021. 2
2021
-
[32]
Holoportation: Virtual 3d teleportation in real-time.ACM Symposium on User Interface Software and Technology, pages 741–754, 2016
Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L Davidson, Sameh Khamis, Mingsong Dou, et al. Holoportation: Virtual 3d teleportation in real-time.ACM Symposium on User Interface Software and Technology, pages 741–754, 2016. 1
2016
-
[33]
Pytorch: An imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zem- ing Lin, Natalia Gimelshein, Luca Antiga, Alban Desmai- son, Andreas Kopf, Edward Yang, Zachary DeVito, Mar- tin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high...
2019
-
[34]
Expressive body capture: 3d hands, face, and body from a single image.CVPR, 2019
Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed Osman, Dimitrios Tzionas, and Michael J Black. Expressive body capture: 3d hands, face, and body from a single image.CVPR, 2019. 2
2019
-
[35]
LHM: large animatable human reconstruction model from a single image in seconds
Lingteng Qiu, Xiaodong Gu, Peihao Li, Qi Zuo, Weichao Shen, Junfei Zhang, Kejie Qiu, Weihao Yuan, Guanying Chen, Zilong Dong, et al. Lhm: Large animatable human reconstruction model from a single image in seconds.arXiv preprint arXiv:2503.10625, 2025. 2
-
[36]
Accelerating 3D Deep Learning with PyTorch3D
Nikhila Ravi, Jeremy Reizenstein, David Novotny, Tay- lor Gordon, Wan-Yen Lo, Justin Johnson, and Georgia Gkioxari. Accelerating 3d deep learning with pytorch3d. arXiv preprint arXiv:2007.08501, 2020. 14, 37
work page internal anchor Pith review arXiv 2007
-
[37]
High-resolution image syn- thesis with latent diffusion models.CVPR, 2022
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models.CVPR, 2022. 2, 6, 15, 30, 37
2022
-
[38]
Pifu: Pixel-aligned implicit function for high-resolution clothed human digitiza- tion.ICCV, 2019
Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Mor- ishima, Angjoo Kanazawa, and Hao Li. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitiza- tion.ICCV, 2019. 1, 2
2019
-
[39]
Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization.CVPR, 2020
Shunsuke Saito, Tomas Simon, Jason Saragih, and Hanbyul Joo. Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization.CVPR, 2020. 1, 2
2020
-
[40]
Putting people in their place: Monocular regression of 3d people in depth.CVPR, 2022
Yu Sun, Wu Liu, Qian Bao, Yili Fu, Tao Mei, and Michael J Black. Putting people in their place: Monocular regression of 3d people in depth.CVPR, 2022. 2, 30
2022
-
[41]
Dif- fusers: State-of-the-art diffusion models, 2022
Patrick von Platen, Luke Lewis, and Thomas Wolf. Dif- fusers: State-of-the-art diffusion models, 2022. 37
2022
-
[42]
Meat: Multiview diffu- sion model for human generation on megapixels with mesh attention.CVPR, 2025
Yuhan Wang, Fangzhou Hong, Shuai Yang, Liming Jiang, Wayne Wu, and Chen Change Loy. Meat: Multiview diffu- sion model for human generation on megapixels with mesh attention.CVPR, 2025. 2
2025
-
[43]
Individual comparisons by ranking meth- ods.Biometrics bulletin, 1(6):80–83, 1945
Frank Wilcoxon. Individual comparisons by ranking meth- ods.Biometrics bulletin, 1(6):80–83, 1945. 31
1945
-
[44]
Icon: Implicit clothed humans obtained from nor- mals.CVPR, 2022
Yuliang Xiu, Jinlong Yang, Dimitrios Tzionas, and Michael J Black. Icon: Implicit clothed humans obtained from nor- mals.CVPR, 2022. 1
2022
-
[45]
Econ: Explicit clothed humans optimized via normal integration.CVPR, 2023
Yuliang Xiu, Jinlong Yang, Xu Cao, Dimitrios Tzionas, and Michael J Black. Econ: Explicit clothed humans optimized via normal integration.CVPR, 2023. 7, 20, 21, 23
2023
-
[46]
Vit- pose: Simple vision transformer baselines for human pose estimation.NeurIPS, 2022
Yufei Xu, Jing Zhang, Qiming Zhang, and Dacheng Tao. Vit- pose: Simple vision transformer baselines for human pose estimation.NeurIPS, 2022. 12
2022
-
[47]
Hi4d: 4d instance segmen- tation of close human interaction.CVPR, 2023
Yifei Yin, Chen Guo, Manuel Kaufmann, Juan Jose Zarate, Jie Song, and Otmar Hilliges. Hi4d: 4d instance segmen- tation of close human interaction.CVPR, 2023. License: Non-commercial academic use only. Seehttps://hi4d. ait.ethz.ch/. 2, 4, 6, 8, 14, 15, 16, 37
2023
-
[48]
Function4d: Real-time human vol- umetric capture from very sparse consumer rgbd sensors
Tao Yu, Zerong Zheng, Kaiwen Guo, Pengpeng Liu, Qiong- hai Dai, and Yebin Liu. Function4d: Real-time human vol- umetric capture from very sparse consumer rgbd sensors. CVPR, 2021. 4, 6, 8, 14, 15, 37
2021
-
[49]
Adding conditional control to text-to-image diffusion models.ICCV,
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models.ICCV,
-
[50]
Sifu: Side- view conditioned implicit function for real-world usable clothed human reconstruction.CVPR, 2024
Zechuan Zhang, Zongxin Yang, and Yi Yang. Sifu: Side- view conditioned implicit function for real-world usable clothed human reconstruction.CVPR, 2024. 1, 2, 7, 20, 21
2024
-
[51]
Deepmulticap: Perfor- mance capture of multiple characters using sparse multiview cameras.ICCV, 2021
Yang Zheng, Ruizhi Shao, Yuxiang Zhang, Tao Yu, Zerong Zheng, Qionghai Dai, and Yebin Liu. Deepmulticap: Perfor- mance capture of multiple characters using sparse multiview cameras.ICCV, 2021. 1, 2, 6, 7, 20, 21, 23, 30, 37
2021
-
[52]
A rendering image of 3D human, [v] view, [M] map
Shangchen Zhou, Kelvin Chan, Chongyi Li, and Chen Change Loy. Towards robust blind face restora- tion with codebook lookup transformer.NeurIPS, 2022. 18, 37 Human Interaction-Aware 3D Reconstruction from a Single Image Supplementary Material List of Contents A. Details on Methods and Implementation A.1. Robust Instance Segmentation and SMPL-X Estimation A...
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.