pith. sign in

arxiv: 2605.21001 · v1 · pith:S5ZSPD3Dnew · submitted 2026-05-20 · 💻 cs.CV

DAMA: Disentangled Body-Anchored Gaussians for Controllable Multi-Layered Avatars

Pith reviewed 2026-05-21 05:57 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D avatar reconstructionGaussian splattingclothed human modelinglayered garment representationmulti-view 3D reconstructionbody-anchored Gaussiansgarment disentanglementSMPL-X parameterization
0
0 comments X

The pith

DAMA reconstructs 3D avatars as layered Gaussians bound to body model faces, delivering non-penetrating garments and user-controlled stacking order from ordinary multi-view photos.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a representation that anchors 3D Gaussians to the triangular faces of the SMPL-X body model using barycentric coordinates within each face and a small positive offset along the surface normal. This binding turns 2D image segmentations into separate, topologically ordered garment layers that stay physically plausible without intersecting. The method lifts the segmentations into these anchored Gaussians, applies topology-guided refinement, and jointly optimizes geometry and appearance. A reader would care because prior Gaussian and implicit-surface avatar techniques either merge clothing into a single surface or allow garments to intersect, blocking clean separation and any practical control over layering.

Core claim

DAMA is the first Gaussian avatar reconstruction method from multi-view images to achieve physically plausible layering, clean garment separation, and explicit stacking control by binding Gaussians to SMPL-X faces via barycentric in-plane coordinates and a positive normal offset, then lifting 2D segmentations, applying topology-guided correction, and jointly optimizing geometry and appearance.

What carries the argument

Body-anchored Gaussians parameterized by barycentric in-plane coordinates on SMPL-X faces plus a positive normal offset, which enforces layer ordering and non-penetration during reconstruction and editing.

If this is right

  • Produces state-of-the-art geometry accuracy, garment separation quality, and near-zero penetration depth on the full 4D-DRESS dataset of 82 scans.
  • Supports immediate user-driven reordering of any garment layer on the finished avatar.
  • Converts the layered Gaussian representation into simulation-ready meshes with little extra processing.
  • Maintains high visual fidelity while preserving explicit layer boundaries that prior single-surface or entangled methods lose.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same anchoring could supply clean initial geometry for physics-based cloth simulators that currently struggle with self-intersections.
  • Extending the offset and barycentric binding to time-varying sequences might produce layered 4D avatars ready for animation without post-processing fixes.
  • Because layers remain editable after reconstruction, the approach may reduce the manual cleanup now required for virtual try-on or character customization pipelines.

Load-bearing premise

Anchoring Gaussians to body faces with barycentric coordinates and a positive normal offset will by itself keep garment layers from intersecting and maintain correct topological order without any separate collision handling.

What would settle it

Reconstruct an avatar from multi-view images of a person wearing overlapping garments, reorder the layers in software, and check whether any rendered frame shows visible intersections or incorrect depth ordering between the layers.

Figures

Figures reproduced from arXiv: 2605.21001 by Berna Kabadayi, Daniel Eskandar, Garvita Tiwari, Gerard Pons-Moll.

Figure 1
Figure 1. Figure 1: We present DAMA, a method for reconstructing physically plausible multi-layered avatars. (a) From multi-view RGB images and masks, we reconstruct clean, intersection-free layers via body-anchored Gaussians. (b) The layers enable garment composition, stacking, and reordering (e.g., Shirt > Jeans vs. Jeans > Shirt). (c) The garments are animatable and convertible to simulation-ready meshes. Abstract Existing… view at source ↗
Figure 2
Figure 2. Figure 2: DAMA Overview. Given multi-view images and masks, we reconstruct a layered avatar with clean garment separation and no interpenetration. The method consists of three stages: (1) lifting 2D masks to SMPL-X–anchored Gaussians by optimizing coarse geometry and labels; (2) mapping labels to SMPL-X and refining them using mesh topology; (3) jointly optimizing geometry and appearance for each layer under masked … view at source ↗
Figure 3
Figure 3. Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Garment Transfer and Stacking. We transfer a garment layer (outer garment here) to a target avatar by recomputing its Gaussian parameters on the target SMPL-X mesh and merging it with the avatar layers. The naive merge creates intersections. Our representation resolves them by reordering layers and shifting the garment outward using the offsets of lower layers. This offset may distort appearance. We theref… view at source ↗
Figure 5
Figure 5. Figure 5: Full-Avatar Reconstruction. GALA shows artifacts from garment–body mesh intersections (left). Disco4D produces noisy boundaries and incorrect lifted regions (right). DAMA reconstructs non-intersecting layered garments with accurate labels. Shoes Upper Garment Lower Garment Outer Garment DAMA (ours) Disco4D GALA [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Garment Stacking and Reordering. DAMA enables garment transfer between avatars, garment stacking with collision resolu￾tion, reordering of semantic layers, and SMPL-X-driven animation. Jumping Jacks Gaussian Garments Extracted Meshes Punching One-Leg Jump [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Clothing Simulation. DAMA converts garment geometry to meshes that can be simulated in CLO3D [9]. We show simulation of individual garments (top) and stacked garments (bottom) driven by SMPL-X animation from AMASS [51]. Free XYZ Barycentric ( Barycentric (ours) PosedCanonical [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative Ablation of our Gaussian Representation. Free XYZ causes drifting Gaussians, barycentric with unsigned offset (δ ∈ R) produces artifacts, while our positive offset (δ > 0) keeps Gaussians surface-aligned and stable under animation. 4.3. Applications Garment Stacking and Reordering. Our representation enables garment transfer and stacking on existing layers, with collisions resolved by offset or… view at source ↗
Figure 11
Figure 11. Figure 11: Additional Loss Ablations. Effect of removing La, Ld, and Lr. D. Additional Applications and Results Hair Transfer. Our representation naturally extends to hair [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: illustrates transferring hair from a source subject to a target, along with reordering its layer. Source Hair Transferred Hair Hair Inside Shirt [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: SMPL-X–Driven Avatar Animation. We animate the reconstructed avatar with transferred and stacked garments using SMPL￾X motion sequences from AMASS [51]. The sequence shows that the layered garments deform consistently with the body while preserving their ordering and separation throughout the motion. Gaussian Garments Extracted Meshes Stacking and Simulation (Running on Spot) [PITH_FULL_IMAGE:figures/ful… view at source ↗
Figure 14
Figure 14. Figure 14: Additional Clothing Simulation Example. We show an additional example with one lower garment and three upper garments. (Left) Simulation-ready meshes extracted from the Gaussian layers. (Right) CLO3D [9] simulation driven by a running-on-spot motion sequence from AMASS [51]. The garments are progressively stacked, showing that the extracted meshes preserve layer ordering and remain stable during simulatio… view at source ↗
read the original abstract

Existing 3D clothed avatar reconstruction methods achieve high visual fidelity but ignore geometric structure and physical plausibility. They either model clothed humans as a single deformable surface or attempt garment disentanglement without enforcing geometric constraints, resulting in ambiguous garment boundaries and no control over stacking or layer ordering. To address these limitations, we introduce DAMA (Disentangled body-Anchored Gaussians for Controllable Multi-layered Avatars), a 3D avatar reconstruction method that produces physically plausible clothed avatars through a dedicated representation and reconstruction method. At the representation level, we bind Gaussians to SMPL-X faces using barycentric in-plane coordinates and a positive normal offset. Based on this parameterization, the reconstruction method lifts 2D segmentations to body-anchored Gaussians, refines layers using topology-guided correction, and jointly optimizes geometry and appearance. DAMA is the first Gaussian avatar reconstruction method from multi-view images to achieve physically plausible layering, clean garment separation, and explicit stacking control. On the full 4D-DRESS dataset (82 scans), it achieves state-of-the-art performance in geometry reconstruction, garment separation, penetration rate, and penetration depth. The representation further supports user-defined garment reordering and fast conversion of body-conforming garments to simulation-ready meshes. Project Page: https://danieleskandar.github.io/dama/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces DAMA, a 3D avatar reconstruction method from multi-view images that produces controllable multi-layered clothed avatars. It defines a body-anchored Gaussian representation that binds each Gaussian to SMPL-X faces via barycentric in-plane coordinates plus a positive normal offset. The pipeline lifts 2D segmentations to these Gaussians, applies topology-guided correction for layer refinement, and performs joint optimization of geometry and appearance. The work claims to be the first Gaussian-based method to deliver physically plausible layering, clean garment separation, and explicit stacking control, reporting SOTA results on the full 4D-DRESS dataset (82 scans) for geometry reconstruction, garment separation, penetration rate, and penetration depth, plus downstream support for user-defined reordering and conversion to simulation-ready meshes.

Significance. If the binding parameterization and refinement steps reliably enforce non-penetrating, topologically ordered layers, the contribution would be significant for Gaussian avatar modeling. It moves beyond single-surface or unconstrained disentanglement approaches by embedding geometric structure and layer ordering directly into the representation, enabling practical applications such as garment reordering and mesh export for simulation.

major comments (2)
  1. [Abstract] Abstract (representation level): The central claim that the body-anchored Gaussian parameterization achieves physically plausible layering 'by construction' depends on binding via barycentric in-plane coordinates and positive normal offset from SMPL-X. No explicit collision, repulsion, or signed-distance loss term is referenced in the optimization; the topology-guided correction is described only as a refinement step. This leaves open whether inter-layer penetrations are prevented during joint optimization or only mitigated afterward, particularly for loose clothing or high-curvature regions.
  2. [Abstract] Abstract (results): The SOTA claims on geometry, separation, penetration rate, and depth are reported on the full 4D-DRESS dataset, yet the abstract provides no quantitative values, baseline comparisons, or ablation references. Without these details or a table citation, it is difficult to assess whether the reported low penetration metrics stem from the representation itself or from dataset-specific properties.
minor comments (1)
  1. [Abstract] The abstract states that the method 'supports user-defined garment reordering and fast conversion of body-conforming garments to simulation-ready meshes,' but does not indicate the computational cost or quality of the mesh conversion step; a brief complexity or timing reference would clarify the practical utility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below in detail and indicate where revisions will be made to improve clarity without misrepresenting the work.

read point-by-point responses
  1. Referee: [Abstract] Abstract (representation level): The central claim that the body-anchored Gaussian parameterization achieves physically plausible layering 'by construction' depends on binding via barycentric in-plane coordinates and positive normal offset from SMPL-X. No explicit collision, repulsion, or signed-distance loss term is referenced in the optimization; the topology-guided correction is described only as a refinement step. This leaves open whether inter-layer penetrations are prevented during joint optimization or only mitigated afterward, particularly for loose clothing or high-curvature regions.

    Authors: The body-anchored representation initializes and constrains each Gaussian via barycentric in-plane coordinates on SMPL-X faces together with a strictly positive normal offset. This design places all Gaussians outside the body surface by construction and prevents body penetration throughout optimization. For garment layers, the topology-guided correction is applied after lifting 2D segmentations and explicitly reorders Gaussians according to the underlying SMPL-X topology before and during joint optimization; this ordering is preserved because subsequent gradient updates operate on the already-corrected layer assignments. While no explicit repulsion or signed-distance loss is added to the objective, the combination of the parameterization and the topology-guided step is what produces the observed low penetration rates. We will revise the abstract to more precisely distinguish the representation-level constraints from the refinement procedure. revision: partial

  2. Referee: [Abstract] Abstract (results): The SOTA claims on geometry, separation, penetration rate, and depth are reported on the full 4D-DRESS dataset, yet the abstract provides no quantitative values, baseline comparisons, or ablation references. Without these details or a table citation, it is difficult to assess whether the reported low penetration metrics stem from the representation itself or from dataset-specific properties.

    Authors: We agree that the abstract would benefit from explicit numerical support for the SOTA claims. In the revised manuscript we will insert the key quantitative results (e.g., penetration rate and depth on the full 82-scan 4D-DRESS set) together with a direct citation to the corresponding table that compares against baselines. This addition will allow readers to immediately evaluate the magnitude of the improvements. revision: yes

Circularity Check

0 steps flagged

No circularity: DAMA parameterization is an explicit design choice evaluated on external data.

full rationale

The paper defines its core representation directly as binding Gaussians to SMPL-X faces via barycentric in-plane coordinates plus positive normal offset, then lifts 2D segmentations and performs joint optimization with topology-guided correction. No equations reduce the claimed physically plausible layering or garment separation to a fitted parameter optimized on the target result, nor to a self-citation chain or imported uniqueness theorem. Performance metrics on the full 4D-DRESS dataset (82 scans) are reported independently, making the derivation self-contained against external benchmarks rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The method rests on the SMPL-X body model as a fixed anchor and on the assumption that 2D segmentations provide reliable layer labels. No new physical constants or particles are introduced.

axioms (1)
  • domain assumption SMPL-X provides a topologically consistent mesh suitable for barycentric anchoring of Gaussians.
    Invoked in the representation level description.
invented entities (1)
  • Body-anchored Gaussians with barycentric in-plane coordinates and positive normal offset independent evidence
    purpose: To enforce geometric layering and prevent penetration between garment layers
    Core of the representation; independent evidence would be quantitative penetration metrics on held-out data.

pith-pipeline@v0.9.0 · 5784 in / 1251 out tokens · 25400 ms · 2026-05-21T05:57:55.234233+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

99 extracted references · 99 canonical work pages · 1 internal anchor

  1. [1]

    Layered-garment net: Generating multiple implicit garment layers from a single image

    Alakh Aggarwal, Jikai Wang, Steven Hogue, Saifeng Ni, Madhukar Budagavi, and Xiaohu Guo. Layered-garment net: Generating multiple implicit garment layers from a single image. InProceedings of the Asian Conference on Computer Vision (ACCV), 2022. 3

  2. [2]

    Video based reconstruction of 3d people models

    Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. Video based reconstruction of 3d people models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 2

  3. [3]

    imghum: Implicit generative models of 3d human shape and articulated pose

    Thiemo Alldieck, Hongyi Xu, and Cristian Sminchisescu. imghum: Implicit generative models of 3d human shape and articulated pose. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision (ICCV), pages 5461– 5470, 2021. 2

  4. [4]

    Close: A 3d clothing segmentation dataset and model

    Dimitrije Anti ´c, Garvita Tiwari, Batuhan Ozcomlekci, Ric- cardo Marin, and Gerard Pons-Moll. Close: A 3d clothing segmentation dataset and model. In2024 international con- ference on 3D vision (3DV), pages 591–601. IEEE, 2024. 2

  5. [5]

    Multi-garment net: Learning to dress 3d people from images

    Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, and Gerard Pons-Moll. Multi-garment net: Learning to dress 3d people from images. InProceedings of the IEEE/CVF international conference on computer vision, pages 5420– 5430, 2019. 2, 3

  6. [6]

    Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

    Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127, 2023. 6

  7. [7]

    Gaussianvton: 3d human virtual try- on via multi-stage gaussian splatting editing with image prompting.arXiv preprint arXiv:2405.07472, 2024

    Haodong Chen, Yongle Huang, Haojian Huang, Xiangsheng Ge, and Dian Shao. Gaussianvton: 3d human virtual try- on via multi-stage gaussian splatting editing with image prompting.arXiv preprint arXiv:2405.07472, 2024. 3

  8. [8]

    Gaussian wardrobe: Composi- tional 3d gaussian avatars for free-form virtual try-on

    Zhiyi Chen, Hsuan-I Ho, Tianjian Jiang, Jie Song, Manuel Kaufmann, and Chen Guo. Gaussian wardrobe: Composi- tional 3d gaussian avatars for free-form virtual try-on. In Proceedings of the International Conference on 3D Vision (3DV), 2026. 2, 3, 5

  9. [9]

    CLO Virtual Fashion, Seoul, South Korea, 2026

    CLO Virtual Fashion.CLO3D (Version 2025.2.368). CLO Virtual Fashion, Seoul, South Korea, 2026. Updated March 19, 2026. 8, 16

  10. [10]

    Smplicit: Topology-aware generative model for clothed people

    Enric Corona, Albert Pumarola, Guillem Alenya, Ger- ard Pons-Moll, and Francesc Moreno-Noguer. Smplicit: Topology-aware generative model for clothed people. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 11875–11885,

  11. [11]

    Drapenet: Garment generation and self-supervised draping

    Luca De Luigi, Ren Li, Beno ˆıt Guillard, Mathieu Salz- mann, and Pascal Fua. Drapenet: Garment generation and self-supervised draping. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1451–1460, 2023. 3

  12. [12]

    Tela: Text to layer-wise 3d clothed human generation

    Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, and Bo Dai. Tela: Text to layer-wise 3d clothed human generation. InComputer Vision – ECCV 2024, pages 19–36, Cham, 2025. Springer Nature Switzer- land. 3

  13. [13]

    Black, and Andreas Geiger

    Zijian Dong, Longteng Duan, Jie Song, Michael J. Black, and Andreas Geiger. Moga: 3d generative avatar prior for monocular gaussian avatar reconstruction. InInternational Conference on Computer Vision (ICCV), 2025. 3

  14. [14]

    Capturing and animation of body and clothing from monocular video

    Yao Feng, Jinlong Yang, Marc Pollefeys, Michael J Black, and Timo Bolkart. Capturing and animation of body and clothing from monocular video. InSIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022. 1, 2

  15. [15]

    Learning disentangled avatars with hybrid 3d representations.arXiv preprint arXiv:2309.06441, 2023

    Yao Feng, Weiyang Liu, Timo Bolkart, Jinlong Yang, Marc Pollefeys, and Michael J Black. Learning disentangled avatars with hybrid 3d representations.arXiv preprint arXiv:2309.06441, 2023. 1, 2

  16. [16]

    Hood: Hierarchical graphs for generalized modelling of clothing dynamics

    Artur Grigorev, Michael J Black, and Otmar Hilliges. Hood: Hierarchical graphs for generalized modelling of clothing dynamics. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16965– 16974, 2023. 3

  17. [17]

    ContourCraft: Learning to resolve intersections in neural multi-garment simulations

    Artur Grigorev, Giorgio Becherini, Michael Black, Ot- mar Hilliges, and Bernhard Thomaszewski. ContourCraft: Learning to resolve intersections in neural multi-garment simulations. InACM SIGGRAPH 2024 Conference Papers, pages 1–10, 2024. 3

  18. [18]

    Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition

    Chen Guo, Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12858–12868, 2023. 2

  19. [19]

    Reloo: Reconstructing humans dressed in loose garments from monocular video in the wild

    Chen Guo, Tianjian Jiang, Manuel Kaufmann, Chengwei Zheng, Julien Valentin, Jie Song, and Otmar Hilliges. Reloo: Reconstructing humans dressed in loose garments from monocular video in the wild. InEuropean conference on computer vision, pages 21–38. Springer, 2024. 1, 2, 3

  20. [20]

    Vid2avatar-pro: Authentic avatar from videos in the wild via universal prior

    Chen Guo, Junxuan Li, Yash Kant, Yaser Sheikh, Shunsuke Saito, and Chen Cao. Vid2avatar-pro: Authentic avatar from videos in the wild via universal prior. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5559–5570, 2025. 2

  21. [21]

    Pgc: Physics-based gaussian cloth from a single pose

    Michelle Guo, Matt Jen-Yuan Chiang, Igor Santesteban, Nikolaos Sarafianos, Hsiao-yu Chen, Oshri Halimi, Alja ˇz 9 Boˇziˇc, Shunsuke Saito, Jiajun Wu, C Karen Liu, et al. Pgc: Physics-based gaussian cloth from a single pose. InProceed- ings of the Computer Vision and Pattern Recognition Confer- ence, pages 21215–21225, 2025. 3

  22. [22]

    Livecap: Real-time human performance capture from monocular video.ACM Trans

    Marc Habermann, Weipeng Xu, Michael Zollh ¨ofer, Gerard Pons-Moll, and Christian Theobalt. Livecap: Real-time human performance capture from monocular video.ACM Trans. Graph., 38(2), 2019. 2

  23. [23]

    Deepcap: Monocular human performance capture using weak supervision

    Marc Habermann, Weipeng Xu, Michael Zollhofer, Gerard Pons-Moll, and Christian Theobalt. Deepcap: Monocular human performance capture using weak supervision. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 2

  24. [24]

    Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and Larry S. Davis. Viton: An image-based virtual try-on network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 3

  25. [25]

    Vton 360: High-fidelity virtual try-on from any viewing direction

    Zijian He, Yuwei Ning, Yipeng Qin, Guangrun Wang, Sibei Yang, Liang Lin, and Guanbin Li. Vton 360: High-fidelity virtual try-on from any viewing direction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26388–26398, 2025. 1

  26. [26]

    Learn- ing locally editable virtual humans

    Hsuan-I Ho, Lixin Xue, Jie Song, and Otmar Hilliges. Learn- ing locally editable virtual humans. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21024–21035, 2023. 3

  27. [27]

    Neural-abc: Neural parametric models for articulated body with clothes

    Chen Honghu, Yao Yuxin, and Juyong Zhang. Neural-abc: Neural parametric models for articulated body with clothes. IEEE Transactions on Visualization and Computer Graphics,

  28. [28]

    Gaussianavatar: Towards realistic human avatar model- ing from a single video via animatable 3d gaussians

    Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Shengping Zhang, and Liqiang Nie. Gaussianavatar: Towards realistic human avatar model- ing from a single video via animatable 3d gaussians. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 1, 2

  29. [29]

    Gauhuman: Articu- lated gaussian splatting from monocular human videos

    Shoukang Hu, Tao Hu, and Ziwei Liu. Gauhuman: Articu- lated gaussian splatting from monocular human videos. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 20418–20431, 2024. 1, 2

  30. [30]

    Humanliff: Layer-wise 3d human diffusion model: Humanliff: Layer- wise 3d human diffusion model.Int

    Shoukang Hu, Fangzhou Hong, Tao Hu, Liang Pan, Haiyi Mei, Weiye Xiao, Lei Yang, and Ziwei Liu. Humanliff: Layer-wise 3d human diffusion model: Humanliff: Layer- wise 3d human diffusion model.Int. J. Comput. Vision, 133 (9):5938–5957, 2025. 3

  31. [31]

    2d gaussian splatting for geometrically accu- rate radiance fields

    Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accu- rate radiance fields. InSIGGRAPH 2024 Conference Papers. Association for Computing Machinery, 2024. 1, 2, 3, 5, 6, 14

  32. [32]

    Sith: Single- view textured human reconstruction with image-conditioned diffusion

    Hsuan I Ho, Jie Song, and Otmar Hilliges. Sith: Single- view textured human reconstruction with image-conditioned diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 538–549, 2024. 3

  33. [33]

    Bcnet: Learning body and cloth shape from a single image

    Boyi Jiang, Juyong Zhang, Yang Hong, Jinhao Luo, Ligang Liu, and Hujun Bao. Bcnet: Learning body and cloth shape from a single image. InEuropean Conference on Computer Vision, pages 18–35. Springer, 2020. 2, 3

  34. [34]

    In- stantavatar: Learning avatars from monocular video in 60 seconds

    Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. In- stantavatar: Learning avatars from monocular video in 60 seconds. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16922– 16932, 2023. 2, 3

  35. [35]

    Prioravatar: Efficient and robust avatar creation from monocular video using learned priors

    Tianjian Jiang, Hsuan-I Ho, Manuel Kaufmann, and Jie Song. Prioravatar: Efficient and robust avatar creation from monocular video using learned priors. InProceedings of the SIGGRAPH Asia 2025 Conference Papers, pages 1–10,

  36. [36]

    Neuman: Neural human radiance field from a single video

    Wei Jiang, Kwang Moo Yi, Golnoosh Samei, Oncel Tuzel, and Anurag Ranjan. Neuman: Neural human radiance field from a single video. InComputer Vision – ECCV 2022, pages 402–418, Cham, 2022. Springer Nature Switzerland. 2, 3

  37. [37]

    Total cap- ture: A 3d deformation model for tracking faces, hands, and bodies

    Hanbyul Joo, Tomas Simon, and Yaser Sheikh. Total cap- ture: A 3d deformation model for tracking faces, hands, and bodies. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 2

  38. [38]

    Physhead: Simulation- ready gaussian head avatars, 2026

    Berna Kabadayi, Vanessa Sklyarova, Wojciech Zielonka, Justus Thies, and Gerard Pons-Moll. Physhead: Simulation- ready gaussian head avatars, 2026. 3

  39. [39]

    3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 1, 2, 3

  40. [40]

    Gala: Generating animatable layered assets from a sin- gle scan

    Taeksoo Kim, Byungjun Kim, Shunsuke Saito, and Hanbyul Joo. Gala: Generating animatable layered assets from a sin- gle scan. InCVPR, 2024. 2, 3, 5, 6, 7

  41. [41]

    Segment any- thing

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 6

  42. [42]

    Hugs: Human gaussian splats

    Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan. Hugs: Human gaussian splats. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 505–515, 2024. 1, 2

  43. [43]

    Gart: Gaussian articulated template mod- els

    Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, and Kostas Daniilidis. Gart: Gaussian articulated template mod- els. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 19876–19887,

  44. [44]

    Dig: Draping implicit garment over the human body

    Ren Li, Benoit Guillard, Edoardo Remelli, and Pascal Fua. Dig: Draping implicit garment over the human body. In Proceedings of the Asian Conference on Computer Vision (ACCV), pages 2780–2795, 2022. 3

  45. [45]

    Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. Learning a model of facial shape and ex- pression from 4D scans.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):194:1–194:17, 2017. 3

  46. [46]

    Diffavatar: Simulation-ready garment optimization with differentiable simulation

    Yifei Li, Hsiao-yu Chen, Egor Larionov, Nikolaos Sarafi- anos, Wojciech Matusik, and Tuur Stuyck. Diffavatar: Simulation-ready garment optimization with differentiable simulation. InProceedings of the IEEE/CVF Conference 10 on Computer Vision and Pattern Recognition (CVPR), pages 4368–4378, 2024. 3

  47. [47]

    Ani- matable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling

    Zhe Li, Zerong Zheng, Lizhen Wang, and Yebin Liu. Ani- matable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19711–19722, 2024. 1, 2, 3

  48. [48]

    Tingting Liao, Hongwei Yi, Yuliang Xiu, Jiaxiang Tang, Yangyi Huang, Justus Thies, and Michael J. Black. TADA! Text to Animatable Digital Avatars. InInternational Confer- ence on 3D Vision (3DV), 2024. 3

  49. [49]

    Layga: Layered gaussian avatars for animatable clothing transfer

    Siyou Lin, Zhe Li, Zhaoqi Su, Zerong Zheng, Hongwen Zhang, and Yebin Liu. Layga: Layered gaussian avatars for animatable clothing transfer. InSIGGRAPH Conference Pa- pers, 2024. 2, 3, 5

  50. [50]

    Gas: Generative avatar syn- thesis from a single image

    Yixing Lu, Junting Dong, Youngjoong Kwon, Qin Zhao, Bo Dai, and Fernando De la Torre. Gas: Generative avatar syn- thesis from a single image. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12883– 12893, 2025. 3

  51. [51]

    Troje, Ger- ard Pons-Moll, and Michael J

    Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Ger- ard Pons-Moll, and Michael J. Black. AMASS: Archive of motion capture as surface shapes. InInternational Confer- ence on Computer Vision, pages 5442–5451, 2019. 8, 16

  52. [52]

    Occupancy networks: Learning 3d reconstruction in function space

    Lars Mescheder, Michael Oechsle, Michael Niemeyer, Se- bastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), 2019. 2

  53. [53]

    Srinivasan, Matthew Tancik, Jonathan T

    Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 1, 2

  54. [54]

    3d clothed human reconstruction in the wild

    Gyeongsik Moon, Hyeongjin Nam, Takaaki Shiratori, and Kyoung Mu Lee. 3d clothed human reconstruction in the wild. InEuropean conference on computer vision, pages 184–200. Springer, 2022. 3

  55. [55]

    Expressive whole-body 3d gaussian avatar

    Gyeongsik Moon, Takaaki Shiratori, and Shunsuke Saito. Expressive whole-body 3d gaussian avatar. InEuropean Conference on Computer Vision, pages 19–35. Springer,

  56. [56]

    Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022

    Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022. 1, 2

  57. [57]

    Disco4d: Disentangled 4d human generation and animation from a single image

    Hui En Pang, Shuai Liu, Zhongang Cai, Lei Yang, Tianwei Zhang, and Ziwei Liu. Disco4d: Disentangled 4d human generation and animation from a single image. InCVPR,

  58. [58]

    Deepsdf: Learning con- tinuous signed distance functions for shape representation

    Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning con- tinuous signed distance functions for shape representation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 2

  59. [59]

    Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3D hands, face, and body from a single image. InProceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 10975–10985, 2019. 1, 3

  60. [60]

    Pica: Physics-integrated clothed avatar.arXiv preprint arXiv:2407.05324, 2024

    Bo Peng, Yunfan Tao, Haoyu Zhan, Yudong Guo, and Juy- ong Zhang. Pica: Physics-integrated clothed avatar.arXiv preprint arXiv:2407.05324, 2024. 3

  61. [61]

    Ani- matable neural radiance fields for modeling dynamic human bodies

    Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. Ani- matable neural radiance fields for modeling dynamic human bodies. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 14314–14323, 2021. 2, 3

  62. [62]

    Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans

    Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. InCVPR,

  63. [63]

    Im- plicit neural representations with structured latent codes for human body modeling.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 2023

    Sida Peng, Chen Geng, Yuanqing Zhang, Yinghao Xu, Qian- qian Wang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. Im- plicit neural representations with structured latent codes for human body modeling.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 2023. 1, 2, 3

  64. [64]

    Clothcap: Seamless 4d clothing capture and retar- geting.ACM Transactions on Graphics (ToG), 36(4):1–15,

    Gerard Pons-Moll, Sergi Pujades, Sonny Hu, and Michael J Black. Clothcap: Seamless 4d clothing capture and retar- geting.ACM Transactions on Graphics (ToG), 36(4):1–15,

  65. [65]

    Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians

    Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, and Matthias Nießner. Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 20299–20309,

  66. [66]

    3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting

    Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang. 3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5020–5030, 2024. 1, 2

  67. [67]

    Black, Bernhard Thomaszewski, Christina Tsalicoglou, and Otmar Hilliges

    Boxiang Rong, Artur Grigorev, Wenbo Wang, Michael J. Black, Bernhard Thomaszewski, Christina Tsalicoglou, and Otmar Hilliges. Gaussian Garments: Reconstruct- ing simulation-ready clothing with photorealistic appearance from multi-view video. InInternational Conference on 3D Vision 2025, 2025. 3

  68. [68]

    Pifu: Pixel-aligned implicit function for high-resolution clothed human digitiza- tion

    Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Mor- ishima, Angjoo Kanazawa, and Hao Li. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitiza- tion. InProceedings of the IEEE/CVF International Confer- ence on Computer Vision (ICCV), 2019. 3

  69. [69]

    Relightable gaussian codec avatars

    Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, and Giljoo Nam. Relightable gaussian codec avatars. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 130–141, 2024. 3

  70. [70]

    Otaduy, and Dan Casas

    Igor Santesteban, Miguel A. Otaduy, and Dan Casas. Snug: Self-supervised neural dynamic garments. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8140–8150, 2022. 3 11

  71. [71]

    DiffHu- man: Probabilistic Photorealistic 3D Reconstruction of Hu- mans

    Akash Sengupta, Thiemo Alldieck, Nikos Kolotouros, Enric Corona, Andrei Zanfir, and Cristian Sminchisescu. DiffHu- man: Probabilistic Photorealistic 3D Reconstruction of Hu- mans. InCVPR, 2024. 3

  72. [72]

    SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

    Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, and Zeyu Wang. SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 2, 3

  73. [73]

    X- avatar: Expressive human avatars

    Kaiyue Shen, Chen Guo, Manuel Kaufmann, Juan Jose Zarate, Julien Valentin, Jie Song, and Otmar Hilliges. X- avatar: Expressive human avatars. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16911–16921, 2023. 3

  74. [74]

    Caphy: Cap- turing physical properties for animatable human avatars

    Zhaoqi Su, Liangxiao Hu, Siyou Lin, Hongwen Zhang, Shengping Zhang, Justus Thies, and Yebin Liu. Caphy: Cap- turing physical properties for animatable human avatars. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14150–14160, 2023. 3

  75. [75]

    Outfitanyone: Ultra-high quality virtual try-on for any clothing and any person.arXiv preprint arXiv:2407.16224, 2024

    Ke Sun, Jian Cao, Qi Wang, Linrui Tian, Xindi Zhang, Lian Zhuo, Bang Zhang, Liefeng Bo, Wenbo Zhou, Weiming Zhang, and Daiheng Gao. Outfitanyone: Ultra-high quality virtual try-on for any clothing and any person.arXiv preprint arXiv:2407.16224, 2024. 3

  76. [76]

    Open-vocabulary se- mantic part segmentation of 3d human

    Keito Suzuki, Bang Du, Girish Krishnan, Kunyao Chen, Runfa Blark Li, and Truong Nguyen. Open-vocabulary se- mantic part segmentation of 3d human. In2025 International Conference on 3D Vision (3DV), pages 1572–1582. IEEE,

  77. [77]

    Dressrecon: Freeform 4d human recon- struction from monocular video

    Jeff Tan, Donglai Xiang, Shubham Tulsiani, Deva Ramanan, and Gengshan Yang. Dressrecon: Freeform 4d human recon- struction from monocular video. In2025 International Con- ference on 3D Vision (3DV), pages 250–260. IEEE, 2025. 2

  78. [78]

    Sizer: A dataset and model for parsing 3d clothing and learning size sensitive 3d clothing

    Garvita Tiwari, Bharat Lal Bhatnagar, Tony Tung, and Ger- ard Pons-Moll. Sizer: A dataset and model for parsing 3d clothing and learning size sensitive 3d clothing. InEuropean Conference on Computer Vision, pages 1–18. Springer, 2020. 2, 3

  79. [79]

    Remu: Reconstructing multi- layer 3d clothed human from images

    Onat Vuran and Hsuan-I Ho. Remu: Reconstructing multi- layer 3d clothed human from images. InBritish Machine Vision Conference (BMVC), 2025. 3

  80. [80]

    Disentangled clothed avatar generation from text descriptions

    Jionghao Wang, Yuan Liu, Zhiyang Dou, Zhengming Yu, Yongqing Liang, Cheng Lin, Rong Xie, Li Song, Xin Li, and Wenping Wang. Disentangled clothed avatar generation from text descriptions. InEuropean Conference on Com- puter Vision, pages 381–401. Springer, 2024. 3

Showing first 80 references.