DAMA: Disentangled Body-Anchored Gaussians for Controllable Multi-Layered Avatars

Berna Kabadayi; Daniel Eskandar; Garvita Tiwari; Gerard Pons-Moll

arxiv: 2605.21001 · v1 · pith:S5ZSPD3Dnew · submitted 2026-05-20 · 💻 cs.CV

DAMA: Disentangled Body-Anchored Gaussians for Controllable Multi-Layered Avatars

Daniel Eskandar , Berna Kabadayi , Garvita Tiwari , Gerard Pons-Moll This is my paper

Pith reviewed 2026-05-21 05:57 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D avatar reconstructionGaussian splattingclothed human modelinglayered garment representationmulti-view 3D reconstructionbody-anchored Gaussiansgarment disentanglementSMPL-X parameterization

0 comments

The pith

DAMA reconstructs 3D avatars as layered Gaussians bound to body model faces, delivering non-penetrating garments and user-controlled stacking order from ordinary multi-view photos.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a representation that anchors 3D Gaussians to the triangular faces of the SMPL-X body model using barycentric coordinates within each face and a small positive offset along the surface normal. This binding turns 2D image segmentations into separate, topologically ordered garment layers that stay physically plausible without intersecting. The method lifts the segmentations into these anchored Gaussians, applies topology-guided refinement, and jointly optimizes geometry and appearance. A reader would care because prior Gaussian and implicit-surface avatar techniques either merge clothing into a single surface or allow garments to intersect, blocking clean separation and any practical control over layering.

Core claim

DAMA is the first Gaussian avatar reconstruction method from multi-view images to achieve physically plausible layering, clean garment separation, and explicit stacking control by binding Gaussians to SMPL-X faces via barycentric in-plane coordinates and a positive normal offset, then lifting 2D segmentations, applying topology-guided correction, and jointly optimizing geometry and appearance.

What carries the argument

Body-anchored Gaussians parameterized by barycentric in-plane coordinates on SMPL-X faces plus a positive normal offset, which enforces layer ordering and non-penetration during reconstruction and editing.

If this is right

Produces state-of-the-art geometry accuracy, garment separation quality, and near-zero penetration depth on the full 4D-DRESS dataset of 82 scans.
Supports immediate user-driven reordering of any garment layer on the finished avatar.
Converts the layered Gaussian representation into simulation-ready meshes with little extra processing.
Maintains high visual fidelity while preserving explicit layer boundaries that prior single-surface or entangled methods lose.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same anchoring could supply clean initial geometry for physics-based cloth simulators that currently struggle with self-intersections.
Extending the offset and barycentric binding to time-varying sequences might produce layered 4D avatars ready for animation without post-processing fixes.
Because layers remain editable after reconstruction, the approach may reduce the manual cleanup now required for virtual try-on or character customization pipelines.

Load-bearing premise

Anchoring Gaussians to body faces with barycentric coordinates and a positive normal offset will by itself keep garment layers from intersecting and maintain correct topological order without any separate collision handling.

What would settle it

Reconstruct an avatar from multi-view images of a person wearing overlapping garments, reorder the layers in software, and check whether any rendered frame shows visible intersections or incorrect depth ordering between the layers.

Figures

Figures reproduced from arXiv: 2605.21001 by Berna Kabadayi, Daniel Eskandar, Garvita Tiwari, Gerard Pons-Moll.

**Figure 1.** Figure 1: We present DAMA, a method for reconstructing physically plausible multi-layered avatars. (a) From multi-view RGB images and masks, we reconstruct clean, intersection-free layers via body-anchored Gaussians. (b) The layers enable garment composition, stacking, and reordering (e.g., Shirt > Jeans vs. Jeans > Shirt). (c) The garments are animatable and convertible to simulation-ready meshes. Abstract Existing… view at source ↗

**Figure 2.** Figure 2: DAMA Overview. Given multi-view images and masks, we reconstruct a layered avatar with clean garment separation and no interpenetration. The method consists of three stages: (1) lifting 2D masks to SMPL-X–anchored Gaussians by optimizing coarse geometry and labels; (2) mapping labels to SMPL-X and refining them using mesh topology; (3) jointly optimizing geometry and appearance for each layer under masked … view at source ↗

**Figure 3.** Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Garment Transfer and Stacking. We transfer a garment layer (outer garment here) to a target avatar by recomputing its Gaussian parameters on the target SMPL-X mesh and merging it with the avatar layers. The naive merge creates intersections. Our representation resolves them by reordering layers and shifting the garment outward using the offsets of lower layers. This offset may distort appearance. We theref… view at source ↗

**Figure 5.** Figure 5: Full-Avatar Reconstruction. GALA shows artifacts from garment–body mesh intersections (left). Disco4D produces noisy boundaries and incorrect lifted regions (right). DAMA reconstructs non-intersecting layered garments with accurate labels. Shoes Upper Garment Lower Garment Outer Garment DAMA (ours) Disco4D GALA [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Garment Stacking and Reordering. DAMA enables garment transfer between avatars, garment stacking with collision resolution, reordering of semantic layers, and SMPL-X-driven animation. Jumping Jacks Gaussian Garments Extracted Meshes Punching One-Leg Jump [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Clothing Simulation. DAMA converts garment geometry to meshes that can be simulated in CLO3D [9]. We show simulation of individual garments (top) and stacked garments (bottom) driven by SMPL-X animation from AMASS [51]. Free XYZ Barycentric ( Barycentric (ours) PosedCanonical [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative Ablation of our Gaussian Representation. Free XYZ causes drifting Gaussians, barycentric with unsigned offset (δ ∈ R) produces artifacts, while our positive offset (δ > 0) keeps Gaussians surface-aligned and stable under animation. 4.3. Applications Garment Stacking and Reordering. Our representation enables garment transfer and stacking on existing layers, with collisions resolved by offset or… view at source ↗

**Figure 11.** Figure 11: Additional Loss Ablations. Effect of removing La, Ld, and Lr. D. Additional Applications and Results Hair Transfer. Our representation naturally extends to hair [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

**Figure 12.** Figure 12: illustrates transferring hair from a source subject to a target, along with reordering its layer. Source Hair Transferred Hair Hair Inside Shirt [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗

**Figure 13.** Figure 13: SMPL-X–Driven Avatar Animation. We animate the reconstructed avatar with transferred and stacked garments using SMPLX motion sequences from AMASS [51]. The sequence shows that the layered garments deform consistently with the body while preserving their ordering and separation throughout the motion. Gaussian Garments Extracted Meshes Stacking and Simulation (Running on Spot) [PITH_FULL_IMAGE:figures/ful… view at source ↗

**Figure 14.** Figure 14: Additional Clothing Simulation Example. We show an additional example with one lower garment and three upper garments. (Left) Simulation-ready meshes extracted from the Gaussian layers. (Right) CLO3D [9] simulation driven by a running-on-spot motion sequence from AMASS [51]. The garments are progressively stacked, showing that the extracted meshes preserve layer ordering and remain stable during simulatio… view at source ↗

read the original abstract

Existing 3D clothed avatar reconstruction methods achieve high visual fidelity but ignore geometric structure and physical plausibility. They either model clothed humans as a single deformable surface or attempt garment disentanglement without enforcing geometric constraints, resulting in ambiguous garment boundaries and no control over stacking or layer ordering. To address these limitations, we introduce DAMA (Disentangled body-Anchored Gaussians for Controllable Multi-layered Avatars), a 3D avatar reconstruction method that produces physically plausible clothed avatars through a dedicated representation and reconstruction method. At the representation level, we bind Gaussians to SMPL-X faces using barycentric in-plane coordinates and a positive normal offset. Based on this parameterization, the reconstruction method lifts 2D segmentations to body-anchored Gaussians, refines layers using topology-guided correction, and jointly optimizes geometry and appearance. DAMA is the first Gaussian avatar reconstruction method from multi-view images to achieve physically plausible layering, clean garment separation, and explicit stacking control. On the full 4D-DRESS dataset (82 scans), it achieves state-of-the-art performance in geometry reconstruction, garment separation, penetration rate, and penetration depth. The representation further supports user-defined garment reordering and fast conversion of body-conforming garments to simulation-ready meshes. Project Page: https://danieleskandar.github.io/dama/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DAMA anchors Gaussians to SMPL-X with barycentric coords and normal offset to get explicit layer control and low penetration on 4D-DRESS.

read the letter

The main thing to know about this paper is that it anchors Gaussians to SMPL-X using barycentric in-plane coordinates plus a positive normal offset. This gives a built-in way to separate garments into ordered layers that can be reordered by the user and converted to simulation meshes without much extra work. They lift 2D segmentations from multi-view images, apply topology-guided correction, and optimize geometry and appearance jointly. On the full 4D-DRESS set of 82 scans the numbers come out ahead of prior Gaussian and implicit methods on geometry, separation, penetration rate, and depth. That combination of anchoring and correction is what lets them claim physically plausible layering where earlier single-surface or unconstrained approaches fell short. The practical extras like fast mesh export make it relevant for animation and virtual try-on pipelines. One spot to check is the stress-test point on penetrations. The method leans on the fixed positive offset and the post-correction step rather than an explicit collision or repulsion term during optimization. If the full paper has ablations that show removing the offset or correction spikes the penetration numbers, then the representation carries its weight. If those numbers stay low only because of the dataset's clothing styles, the general claim weakens. Readers working on controllable 3D avatars or layered reconstruction will get the most out of it. The parameterization is concrete, the benchmark results are reported on a standard full dataset, and the contribution is focused enough to go through peer review instead of a desk reject. I would send it to referees for a closer look at the optimization details and failure cases.

Referee Report

2 major / 1 minor

Summary. The paper introduces DAMA, a 3D avatar reconstruction method from multi-view images that produces controllable multi-layered clothed avatars. It defines a body-anchored Gaussian representation that binds each Gaussian to SMPL-X faces via barycentric in-plane coordinates plus a positive normal offset. The pipeline lifts 2D segmentations to these Gaussians, applies topology-guided correction for layer refinement, and performs joint optimization of geometry and appearance. The work claims to be the first Gaussian-based method to deliver physically plausible layering, clean garment separation, and explicit stacking control, reporting SOTA results on the full 4D-DRESS dataset (82 scans) for geometry reconstruction, garment separation, penetration rate, and penetration depth, plus downstream support for user-defined reordering and conversion to simulation-ready meshes.

Significance. If the binding parameterization and refinement steps reliably enforce non-penetrating, topologically ordered layers, the contribution would be significant for Gaussian avatar modeling. It moves beyond single-surface or unconstrained disentanglement approaches by embedding geometric structure and layer ordering directly into the representation, enabling practical applications such as garment reordering and mesh export for simulation.

major comments (2)

[Abstract] Abstract (representation level): The central claim that the body-anchored Gaussian parameterization achieves physically plausible layering 'by construction' depends on binding via barycentric in-plane coordinates and positive normal offset from SMPL-X. No explicit collision, repulsion, or signed-distance loss term is referenced in the optimization; the topology-guided correction is described only as a refinement step. This leaves open whether inter-layer penetrations are prevented during joint optimization or only mitigated afterward, particularly for loose clothing or high-curvature regions.
[Abstract] Abstract (results): The SOTA claims on geometry, separation, penetration rate, and depth are reported on the full 4D-DRESS dataset, yet the abstract provides no quantitative values, baseline comparisons, or ablation references. Without these details or a table citation, it is difficult to assess whether the reported low penetration metrics stem from the representation itself or from dataset-specific properties.

minor comments (1)

[Abstract] The abstract states that the method 'supports user-defined garment reordering and fast conversion of body-conforming garments to simulation-ready meshes,' but does not indicate the computational cost or quality of the mesh conversion step; a brief complexity or timing reference would clarify the practical utility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below in detail and indicate where revisions will be made to improve clarity without misrepresenting the work.

read point-by-point responses

Referee: [Abstract] Abstract (representation level): The central claim that the body-anchored Gaussian parameterization achieves physically plausible layering 'by construction' depends on binding via barycentric in-plane coordinates and positive normal offset from SMPL-X. No explicit collision, repulsion, or signed-distance loss term is referenced in the optimization; the topology-guided correction is described only as a refinement step. This leaves open whether inter-layer penetrations are prevented during joint optimization or only mitigated afterward, particularly for loose clothing or high-curvature regions.

Authors: The body-anchored representation initializes and constrains each Gaussian via barycentric in-plane coordinates on SMPL-X faces together with a strictly positive normal offset. This design places all Gaussians outside the body surface by construction and prevents body penetration throughout optimization. For garment layers, the topology-guided correction is applied after lifting 2D segmentations and explicitly reorders Gaussians according to the underlying SMPL-X topology before and during joint optimization; this ordering is preserved because subsequent gradient updates operate on the already-corrected layer assignments. While no explicit repulsion or signed-distance loss is added to the objective, the combination of the parameterization and the topology-guided step is what produces the observed low penetration rates. We will revise the abstract to more precisely distinguish the representation-level constraints from the refinement procedure. revision: partial
Referee: [Abstract] Abstract (results): The SOTA claims on geometry, separation, penetration rate, and depth are reported on the full 4D-DRESS dataset, yet the abstract provides no quantitative values, baseline comparisons, or ablation references. Without these details or a table citation, it is difficult to assess whether the reported low penetration metrics stem from the representation itself or from dataset-specific properties.

Authors: We agree that the abstract would benefit from explicit numerical support for the SOTA claims. In the revised manuscript we will insert the key quantitative results (e.g., penetration rate and depth on the full 82-scan 4D-DRESS set) together with a direct citation to the corresponding table that compares against baselines. This addition will allow readers to immediately evaluate the magnitude of the improvements. revision: yes

Circularity Check

0 steps flagged

No circularity: DAMA parameterization is an explicit design choice evaluated on external data.

full rationale

The paper defines its core representation directly as binding Gaussians to SMPL-X faces via barycentric in-plane coordinates plus positive normal offset, then lifts 2D segmentations and performs joint optimization with topology-guided correction. No equations reduce the claimed physically plausible layering or garment separation to a fitted parameter optimized on the target result, nor to a self-citation chain or imported uniqueness theorem. Performance metrics on the full 4D-DRESS dataset (82 scans) are reported independently, making the derivation self-contained against external benchmarks rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The method rests on the SMPL-X body model as a fixed anchor and on the assumption that 2D segmentations provide reliable layer labels. No new physical constants or particles are introduced.

axioms (1)

domain assumption SMPL-X provides a topologically consistent mesh suitable for barycentric anchoring of Gaussians.
Invoked in the representation level description.

invented entities (1)

Body-anchored Gaussians with barycentric in-plane coordinates and positive normal offset independent evidence
purpose: To enforce geometric layering and prevent penetration between garment layers
Core of the representation; independent evidence would be quantitative penetration metrics on held-out data.

pith-pipeline@v0.9.0 · 5784 in / 1251 out tokens · 25400 ms · 2026-05-21T05:57:55.234233+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

bind Gaussians to SMPL-X faces using barycentric in-plane coordinates and a positive normal offset... prevents interpenetration with the body and lower layers
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DAMA is the first Gaussian avatar reconstruction method... physically plausible layering, clean garment separation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

99 extracted references · 99 canonical work pages · 1 internal anchor

[1]

Layered-garment net: Generating multiple implicit garment layers from a single image

Alakh Aggarwal, Jikai Wang, Steven Hogue, Saifeng Ni, Madhukar Budagavi, and Xiaohu Guo. Layered-garment net: Generating multiple implicit garment layers from a single image. InProceedings of the Asian Conference on Computer Vision (ACCV), 2022. 3

work page 2022
[2]

Video based reconstruction of 3d people models

Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. Video based reconstruction of 3d people models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 2

work page 2018
[3]

imghum: Implicit generative models of 3d human shape and articulated pose

Thiemo Alldieck, Hongyi Xu, and Cristian Sminchisescu. imghum: Implicit generative models of 3d human shape and articulated pose. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision (ICCV), pages 5461– 5470, 2021. 2

work page 2021
[4]

Close: A 3d clothing segmentation dataset and model

Dimitrije Anti ´c, Garvita Tiwari, Batuhan Ozcomlekci, Ric- cardo Marin, and Gerard Pons-Moll. Close: A 3d clothing segmentation dataset and model. In2024 international con- ference on 3D vision (3DV), pages 591–601. IEEE, 2024. 2

work page 2024
[5]

Multi-garment net: Learning to dress 3d people from images

Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, and Gerard Pons-Moll. Multi-garment net: Learning to dress 3d people from images. InProceedings of the IEEE/CVF international conference on computer vision, pages 5420– 5430, 2019. 2, 3

work page 2019
[6]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127, 2023. 6

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

Gaussianvton: 3d human virtual try- on via multi-stage gaussian splatting editing with image prompting.arXiv preprint arXiv:2405.07472, 2024

Haodong Chen, Yongle Huang, Haojian Huang, Xiangsheng Ge, and Dian Shao. Gaussianvton: 3d human virtual try- on via multi-stage gaussian splatting editing with image prompting.arXiv preprint arXiv:2405.07472, 2024. 3

work page arXiv 2024
[8]

Gaussian wardrobe: Composi- tional 3d gaussian avatars for free-form virtual try-on

Zhiyi Chen, Hsuan-I Ho, Tianjian Jiang, Jie Song, Manuel Kaufmann, and Chen Guo. Gaussian wardrobe: Composi- tional 3d gaussian avatars for free-form virtual try-on. In Proceedings of the International Conference on 3D Vision (3DV), 2026. 2, 3, 5

work page 2026
[9]

CLO Virtual Fashion, Seoul, South Korea, 2026

CLO Virtual Fashion.CLO3D (Version 2025.2.368). CLO Virtual Fashion, Seoul, South Korea, 2026. Updated March 19, 2026. 8, 16

work page 2025
[10]

Smplicit: Topology-aware generative model for clothed people

Enric Corona, Albert Pumarola, Guillem Alenya, Ger- ard Pons-Moll, and Francesc Moreno-Noguer. Smplicit: Topology-aware generative model for clothed people. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 11875–11885,

work page
[11]

Drapenet: Garment generation and self-supervised draping

Luca De Luigi, Ren Li, Beno ˆıt Guillard, Mathieu Salz- mann, and Pascal Fua. Drapenet: Garment generation and self-supervised draping. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1451–1460, 2023. 3

work page 2023
[12]

Tela: Text to layer-wise 3d clothed human generation

Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, and Bo Dai. Tela: Text to layer-wise 3d clothed human generation. InComputer Vision – ECCV 2024, pages 19–36, Cham, 2025. Springer Nature Switzer- land. 3

work page 2024
[13]

Black, and Andreas Geiger

Zijian Dong, Longteng Duan, Jie Song, Michael J. Black, and Andreas Geiger. Moga: 3d generative avatar prior for monocular gaussian avatar reconstruction. InInternational Conference on Computer Vision (ICCV), 2025. 3

work page 2025
[14]

Capturing and animation of body and clothing from monocular video

Yao Feng, Jinlong Yang, Marc Pollefeys, Michael J Black, and Timo Bolkart. Capturing and animation of body and clothing from monocular video. InSIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022. 1, 2

work page 2022
[15]

Learning disentangled avatars with hybrid 3d representations.arXiv preprint arXiv:2309.06441, 2023

Yao Feng, Weiyang Liu, Timo Bolkart, Jinlong Yang, Marc Pollefeys, and Michael J Black. Learning disentangled avatars with hybrid 3d representations.arXiv preprint arXiv:2309.06441, 2023. 1, 2

work page arXiv 2023
[16]

Hood: Hierarchical graphs for generalized modelling of clothing dynamics

Artur Grigorev, Michael J Black, and Otmar Hilliges. Hood: Hierarchical graphs for generalized modelling of clothing dynamics. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16965– 16974, 2023. 3

work page 2023
[17]

ContourCraft: Learning to resolve intersections in neural multi-garment simulations

Artur Grigorev, Giorgio Becherini, Michael Black, Ot- mar Hilliges, and Bernhard Thomaszewski. ContourCraft: Learning to resolve intersections in neural multi-garment simulations. InACM SIGGRAPH 2024 Conference Papers, pages 1–10, 2024. 3

work page 2024
[18]

Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition

Chen Guo, Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12858–12868, 2023. 2

work page 2023
[19]

Reloo: Reconstructing humans dressed in loose garments from monocular video in the wild

Chen Guo, Tianjian Jiang, Manuel Kaufmann, Chengwei Zheng, Julien Valentin, Jie Song, and Otmar Hilliges. Reloo: Reconstructing humans dressed in loose garments from monocular video in the wild. InEuropean conference on computer vision, pages 21–38. Springer, 2024. 1, 2, 3

work page 2024
[20]

Vid2avatar-pro: Authentic avatar from videos in the wild via universal prior

Chen Guo, Junxuan Li, Yash Kant, Yaser Sheikh, Shunsuke Saito, and Chen Cao. Vid2avatar-pro: Authentic avatar from videos in the wild via universal prior. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5559–5570, 2025. 2

work page 2025
[21]

Pgc: Physics-based gaussian cloth from a single pose

Michelle Guo, Matt Jen-Yuan Chiang, Igor Santesteban, Nikolaos Sarafianos, Hsiao-yu Chen, Oshri Halimi, Alja ˇz 9 Boˇziˇc, Shunsuke Saito, Jiajun Wu, C Karen Liu, et al. Pgc: Physics-based gaussian cloth from a single pose. InProceed- ings of the Computer Vision and Pattern Recognition Confer- ence, pages 21215–21225, 2025. 3

work page 2025
[22]

Livecap: Real-time human performance capture from monocular video.ACM Trans

Marc Habermann, Weipeng Xu, Michael Zollh ¨ofer, Gerard Pons-Moll, and Christian Theobalt. Livecap: Real-time human performance capture from monocular video.ACM Trans. Graph., 38(2), 2019. 2

work page 2019
[23]

Deepcap: Monocular human performance capture using weak supervision

Marc Habermann, Weipeng Xu, Michael Zollhofer, Gerard Pons-Moll, and Christian Theobalt. Deepcap: Monocular human performance capture using weak supervision. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 2

work page 2020
[24]

Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and Larry S. Davis. Viton: An image-based virtual try-on network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 3

work page 2018
[25]

Vton 360: High-fidelity virtual try-on from any viewing direction

Zijian He, Yuwei Ning, Yipeng Qin, Guangrun Wang, Sibei Yang, Liang Lin, and Guanbin Li. Vton 360: High-fidelity virtual try-on from any viewing direction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26388–26398, 2025. 1

work page 2025
[26]

Learn- ing locally editable virtual humans

Hsuan-I Ho, Lixin Xue, Jie Song, and Otmar Hilliges. Learn- ing locally editable virtual humans. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21024–21035, 2023. 3

work page 2023
[27]

Neural-abc: Neural parametric models for articulated body with clothes

Chen Honghu, Yao Yuxin, and Juyong Zhang. Neural-abc: Neural parametric models for articulated body with clothes. IEEE Transactions on Visualization and Computer Graphics,

work page
[28]

Gaussianavatar: Towards realistic human avatar model- ing from a single video via animatable 3d gaussians

Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Shengping Zhang, and Liqiang Nie. Gaussianavatar: Towards realistic human avatar model- ing from a single video via animatable 3d gaussians. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 1, 2

work page 2024
[29]

Gauhuman: Articu- lated gaussian splatting from monocular human videos

Shoukang Hu, Tao Hu, and Ziwei Liu. Gauhuman: Articu- lated gaussian splatting from monocular human videos. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 20418–20431, 2024. 1, 2

work page 2024
[30]

Humanliff: Layer-wise 3d human diffusion model: Humanliff: Layer- wise 3d human diffusion model.Int

Shoukang Hu, Fangzhou Hong, Tao Hu, Liang Pan, Haiyi Mei, Weiye Xiao, Lei Yang, and Ziwei Liu. Humanliff: Layer-wise 3d human diffusion model: Humanliff: Layer- wise 3d human diffusion model.Int. J. Comput. Vision, 133 (9):5938–5957, 2025. 3

work page 2025
[31]

2d gaussian splatting for geometrically accu- rate radiance fields

Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accu- rate radiance fields. InSIGGRAPH 2024 Conference Papers. Association for Computing Machinery, 2024. 1, 2, 3, 5, 6, 14

work page 2024
[32]

Sith: Single- view textured human reconstruction with image-conditioned diffusion

Hsuan I Ho, Jie Song, and Otmar Hilliges. Sith: Single- view textured human reconstruction with image-conditioned diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 538–549, 2024. 3

work page 2024
[33]

Bcnet: Learning body and cloth shape from a single image

Boyi Jiang, Juyong Zhang, Yang Hong, Jinhao Luo, Ligang Liu, and Hujun Bao. Bcnet: Learning body and cloth shape from a single image. InEuropean Conference on Computer Vision, pages 18–35. Springer, 2020. 2, 3

work page 2020
[34]

In- stantavatar: Learning avatars from monocular video in 60 seconds

Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. In- stantavatar: Learning avatars from monocular video in 60 seconds. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16922– 16932, 2023. 2, 3

work page 2023
[35]

Prioravatar: Efficient and robust avatar creation from monocular video using learned priors

Tianjian Jiang, Hsuan-I Ho, Manuel Kaufmann, and Jie Song. Prioravatar: Efficient and robust avatar creation from monocular video using learned priors. InProceedings of the SIGGRAPH Asia 2025 Conference Papers, pages 1–10,

work page 2025
[36]

Neuman: Neural human radiance field from a single video

Wei Jiang, Kwang Moo Yi, Golnoosh Samei, Oncel Tuzel, and Anurag Ranjan. Neuman: Neural human radiance field from a single video. InComputer Vision – ECCV 2022, pages 402–418, Cham, 2022. Springer Nature Switzerland. 2, 3

work page 2022
[37]

Total cap- ture: A 3d deformation model for tracking faces, hands, and bodies

Hanbyul Joo, Tomas Simon, and Yaser Sheikh. Total cap- ture: A 3d deformation model for tracking faces, hands, and bodies. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 2

work page 2018
[38]

Physhead: Simulation- ready gaussian head avatars, 2026

Berna Kabadayi, Vanessa Sklyarova, Wojciech Zielonka, Justus Thies, and Gerard Pons-Moll. Physhead: Simulation- ready gaussian head avatars, 2026. 3

work page 2026
[39]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 1, 2, 3

work page 2023
[40]

Gala: Generating animatable layered assets from a sin- gle scan

Taeksoo Kim, Byungjun Kim, Shunsuke Saito, and Hanbyul Joo. Gala: Generating animatable layered assets from a sin- gle scan. InCVPR, 2024. 2, 3, 5, 6, 7

work page 2024
[41]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 6

work page 2023
[42]

Hugs: Human gaussian splats

Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan. Hugs: Human gaussian splats. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 505–515, 2024. 1, 2

work page 2024
[43]

Gart: Gaussian articulated template mod- els

Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, and Kostas Daniilidis. Gart: Gaussian articulated template mod- els. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 19876–19887,

work page
[44]

Dig: Draping implicit garment over the human body

Ren Li, Benoit Guillard, Edoardo Remelli, and Pascal Fua. Dig: Draping implicit garment over the human body. In Proceedings of the Asian Conference on Computer Vision (ACCV), pages 2780–2795, 2022. 3

work page 2022
[45]

Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. Learning a model of facial shape and ex- pression from 4D scans.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):194:1–194:17, 2017. 3

work page 2017
[46]

Diffavatar: Simulation-ready garment optimization with differentiable simulation

Yifei Li, Hsiao-yu Chen, Egor Larionov, Nikolaos Sarafi- anos, Wojciech Matusik, and Tuur Stuyck. Diffavatar: Simulation-ready garment optimization with differentiable simulation. InProceedings of the IEEE/CVF Conference 10 on Computer Vision and Pattern Recognition (CVPR), pages 4368–4378, 2024. 3

work page 2024
[47]

Ani- matable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling

Zhe Li, Zerong Zheng, Lizhen Wang, and Yebin Liu. Ani- matable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19711–19722, 2024. 1, 2, 3

work page 2024
[48]

Tingting Liao, Hongwei Yi, Yuliang Xiu, Jiaxiang Tang, Yangyi Huang, Justus Thies, and Michael J. Black. TADA! Text to Animatable Digital Avatars. InInternational Confer- ence on 3D Vision (3DV), 2024. 3

work page 2024
[49]

Layga: Layered gaussian avatars for animatable clothing transfer

Siyou Lin, Zhe Li, Zhaoqi Su, Zerong Zheng, Hongwen Zhang, and Yebin Liu. Layga: Layered gaussian avatars for animatable clothing transfer. InSIGGRAPH Conference Pa- pers, 2024. 2, 3, 5

work page 2024
[50]

Gas: Generative avatar syn- thesis from a single image

Yixing Lu, Junting Dong, Youngjoong Kwon, Qin Zhao, Bo Dai, and Fernando De la Torre. Gas: Generative avatar syn- thesis from a single image. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12883– 12893, 2025. 3

work page 2025
[51]

Troje, Ger- ard Pons-Moll, and Michael J

Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Ger- ard Pons-Moll, and Michael J. Black. AMASS: Archive of motion capture as surface shapes. InInternational Confer- ence on Computer Vision, pages 5442–5451, 2019. 8, 16

work page 2019
[52]

Occupancy networks: Learning 3d reconstruction in function space

Lars Mescheder, Michael Oechsle, Michael Niemeyer, Se- bastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), 2019. 2

work page 2019
[53]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 1, 2

work page 2020
[54]

3d clothed human reconstruction in the wild

Gyeongsik Moon, Hyeongjin Nam, Takaaki Shiratori, and Kyoung Mu Lee. 3d clothed human reconstruction in the wild. InEuropean conference on computer vision, pages 184–200. Springer, 2022. 3

work page 2022
[55]

Expressive whole-body 3d gaussian avatar

Gyeongsik Moon, Takaaki Shiratori, and Shunsuke Saito. Expressive whole-body 3d gaussian avatar. InEuropean Conference on Computer Vision, pages 19–35. Springer,

work page
[56]

Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022. 1, 2

work page 2022
[57]

Disco4d: Disentangled 4d human generation and animation from a single image

Hui En Pang, Shuai Liu, Zhongang Cai, Lei Yang, Tianwei Zhang, and Ziwei Liu. Disco4d: Disentangled 4d human generation and animation from a single image. InCVPR,

work page
[58]

Deepsdf: Learning con- tinuous signed distance functions for shape representation

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning con- tinuous signed distance functions for shape representation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 2

work page 2019
[59]

Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3D hands, face, and body from a single image. InProceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 10975–10985, 2019. 1, 3

work page 2019
[60]

Pica: Physics-integrated clothed avatar.arXiv preprint arXiv:2407.05324, 2024

Bo Peng, Yunfan Tao, Haoyu Zhan, Yudong Guo, and Juy- ong Zhang. Pica: Physics-integrated clothed avatar.arXiv preprint arXiv:2407.05324, 2024. 3

work page arXiv 2024
[61]

Ani- matable neural radiance fields for modeling dynamic human bodies

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. Ani- matable neural radiance fields for modeling dynamic human bodies. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 14314–14323, 2021. 2, 3

work page 2021
[62]

Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans

Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. InCVPR,

work page
[63]

Im- plicit neural representations with structured latent codes for human body modeling.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 2023

Sida Peng, Chen Geng, Yuanqing Zhang, Yinghao Xu, Qian- qian Wang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. Im- plicit neural representations with structured latent codes for human body modeling.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 2023. 1, 2, 3

work page 2023
[64]

Clothcap: Seamless 4d clothing capture and retar- geting.ACM Transactions on Graphics (ToG), 36(4):1–15,

Gerard Pons-Moll, Sergi Pujades, Sonny Hu, and Michael J Black. Clothcap: Seamless 4d clothing capture and retar- geting.ACM Transactions on Graphics (ToG), 36(4):1–15,

work page
[65]

Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians

Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, and Matthias Nießner. Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 20299–20309,

work page
[66]

3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang. 3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5020–5030, 2024. 1, 2

work page 2024
[67]

Black, Bernhard Thomaszewski, Christina Tsalicoglou, and Otmar Hilliges

Boxiang Rong, Artur Grigorev, Wenbo Wang, Michael J. Black, Bernhard Thomaszewski, Christina Tsalicoglou, and Otmar Hilliges. Gaussian Garments: Reconstruct- ing simulation-ready clothing with photorealistic appearance from multi-view video. InInternational Conference on 3D Vision 2025, 2025. 3

work page 2025
[68]

Pifu: Pixel-aligned implicit function for high-resolution clothed human digitiza- tion

Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Mor- ishima, Angjoo Kanazawa, and Hao Li. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitiza- tion. InProceedings of the IEEE/CVF International Confer- ence on Computer Vision (ICCV), 2019. 3

work page 2019
[69]

Relightable gaussian codec avatars

Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, and Giljoo Nam. Relightable gaussian codec avatars. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 130–141, 2024. 3

work page 2024
[70]

Otaduy, and Dan Casas

Igor Santesteban, Miguel A. Otaduy, and Dan Casas. Snug: Self-supervised neural dynamic garments. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8140–8150, 2022. 3 11

work page 2022
[71]

DiffHu- man: Probabilistic Photorealistic 3D Reconstruction of Hu- mans

Akash Sengupta, Thiemo Alldieck, Nikos Kolotouros, Enric Corona, Andrei Zanfir, and Cristian Sminchisescu. DiffHu- man: Probabilistic Photorealistic 3D Reconstruction of Hu- mans. InCVPR, 2024. 3

work page 2024
[72]

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, and Zeyu Wang. SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 2, 3

work page 2024
[73]

X- avatar: Expressive human avatars

Kaiyue Shen, Chen Guo, Manuel Kaufmann, Juan Jose Zarate, Julien Valentin, Jie Song, and Otmar Hilliges. X- avatar: Expressive human avatars. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16911–16921, 2023. 3

work page 2023
[74]

Caphy: Cap- turing physical properties for animatable human avatars

Zhaoqi Su, Liangxiao Hu, Siyou Lin, Hongwen Zhang, Shengping Zhang, Justus Thies, and Yebin Liu. Caphy: Cap- turing physical properties for animatable human avatars. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14150–14160, 2023. 3

work page 2023
[75]

Outfitanyone: Ultra-high quality virtual try-on for any clothing and any person.arXiv preprint arXiv:2407.16224, 2024

Ke Sun, Jian Cao, Qi Wang, Linrui Tian, Xindi Zhang, Lian Zhuo, Bang Zhang, Liefeng Bo, Wenbo Zhou, Weiming Zhang, and Daiheng Gao. Outfitanyone: Ultra-high quality virtual try-on for any clothing and any person.arXiv preprint arXiv:2407.16224, 2024. 3

work page arXiv 2024
[76]

Open-vocabulary se- mantic part segmentation of 3d human

Keito Suzuki, Bang Du, Girish Krishnan, Kunyao Chen, Runfa Blark Li, and Truong Nguyen. Open-vocabulary se- mantic part segmentation of 3d human. In2025 International Conference on 3D Vision (3DV), pages 1572–1582. IEEE,

work page
[77]

Dressrecon: Freeform 4d human recon- struction from monocular video

Jeff Tan, Donglai Xiang, Shubham Tulsiani, Deva Ramanan, and Gengshan Yang. Dressrecon: Freeform 4d human recon- struction from monocular video. In2025 International Con- ference on 3D Vision (3DV), pages 250–260. IEEE, 2025. 2

work page 2025
[78]

Sizer: A dataset and model for parsing 3d clothing and learning size sensitive 3d clothing

Garvita Tiwari, Bharat Lal Bhatnagar, Tony Tung, and Ger- ard Pons-Moll. Sizer: A dataset and model for parsing 3d clothing and learning size sensitive 3d clothing. InEuropean Conference on Computer Vision, pages 1–18. Springer, 2020. 2, 3

work page 2020
[79]

Remu: Reconstructing multi- layer 3d clothed human from images

Onat Vuran and Hsuan-I Ho. Remu: Reconstructing multi- layer 3d clothed human from images. InBritish Machine Vision Conference (BMVC), 2025. 3

work page 2025
[80]

Disentangled clothed avatar generation from text descriptions

Jionghao Wang, Yuan Liu, Zhiyang Dou, Zhengming Yu, Yongqing Liang, Cheng Lin, Rong Xie, Li Song, Xin Li, and Wenping Wang. Disentangled clothed avatar generation from text descriptions. InEuropean Conference on Com- puter Vision, pages 381–401. Springer, 2024. 3

work page 2024

Showing first 80 references.

[1] [1]

Layered-garment net: Generating multiple implicit garment layers from a single image

Alakh Aggarwal, Jikai Wang, Steven Hogue, Saifeng Ni, Madhukar Budagavi, and Xiaohu Guo. Layered-garment net: Generating multiple implicit garment layers from a single image. InProceedings of the Asian Conference on Computer Vision (ACCV), 2022. 3

work page 2022

[2] [2]

Video based reconstruction of 3d people models

Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. Video based reconstruction of 3d people models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 2

work page 2018

[3] [3]

imghum: Implicit generative models of 3d human shape and articulated pose

Thiemo Alldieck, Hongyi Xu, and Cristian Sminchisescu. imghum: Implicit generative models of 3d human shape and articulated pose. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision (ICCV), pages 5461– 5470, 2021. 2

work page 2021

[4] [4]

Close: A 3d clothing segmentation dataset and model

Dimitrije Anti ´c, Garvita Tiwari, Batuhan Ozcomlekci, Ric- cardo Marin, and Gerard Pons-Moll. Close: A 3d clothing segmentation dataset and model. In2024 international con- ference on 3D vision (3DV), pages 591–601. IEEE, 2024. 2

work page 2024

[5] [5]

Multi-garment net: Learning to dress 3d people from images

Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, and Gerard Pons-Moll. Multi-garment net: Learning to dress 3d people from images. InProceedings of the IEEE/CVF international conference on computer vision, pages 5420– 5430, 2019. 2, 3

work page 2019

[6] [6]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127, 2023. 6

work page internal anchor Pith review Pith/arXiv arXiv 2023

[7] [7]

Gaussianvton: 3d human virtual try- on via multi-stage gaussian splatting editing with image prompting.arXiv preprint arXiv:2405.07472, 2024

Haodong Chen, Yongle Huang, Haojian Huang, Xiangsheng Ge, and Dian Shao. Gaussianvton: 3d human virtual try- on via multi-stage gaussian splatting editing with image prompting.arXiv preprint arXiv:2405.07472, 2024. 3

work page arXiv 2024

[8] [8]

Gaussian wardrobe: Composi- tional 3d gaussian avatars for free-form virtual try-on

Zhiyi Chen, Hsuan-I Ho, Tianjian Jiang, Jie Song, Manuel Kaufmann, and Chen Guo. Gaussian wardrobe: Composi- tional 3d gaussian avatars for free-form virtual try-on. In Proceedings of the International Conference on 3D Vision (3DV), 2026. 2, 3, 5

work page 2026

[9] [9]

CLO Virtual Fashion, Seoul, South Korea, 2026

CLO Virtual Fashion.CLO3D (Version 2025.2.368). CLO Virtual Fashion, Seoul, South Korea, 2026. Updated March 19, 2026. 8, 16

work page 2025

[10] [10]

Smplicit: Topology-aware generative model for clothed people

Enric Corona, Albert Pumarola, Guillem Alenya, Ger- ard Pons-Moll, and Francesc Moreno-Noguer. Smplicit: Topology-aware generative model for clothed people. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 11875–11885,

work page

[11] [11]

Drapenet: Garment generation and self-supervised draping

Luca De Luigi, Ren Li, Beno ˆıt Guillard, Mathieu Salz- mann, and Pascal Fua. Drapenet: Garment generation and self-supervised draping. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1451–1460, 2023. 3

work page 2023

[12] [12]

Tela: Text to layer-wise 3d clothed human generation

Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, and Bo Dai. Tela: Text to layer-wise 3d clothed human generation. InComputer Vision – ECCV 2024, pages 19–36, Cham, 2025. Springer Nature Switzer- land. 3

work page 2024

[13] [13]

Black, and Andreas Geiger

Zijian Dong, Longteng Duan, Jie Song, Michael J. Black, and Andreas Geiger. Moga: 3d generative avatar prior for monocular gaussian avatar reconstruction. InInternational Conference on Computer Vision (ICCV), 2025. 3

work page 2025

[14] [14]

Capturing and animation of body and clothing from monocular video

Yao Feng, Jinlong Yang, Marc Pollefeys, Michael J Black, and Timo Bolkart. Capturing and animation of body and clothing from monocular video. InSIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022. 1, 2

work page 2022

[15] [15]

Learning disentangled avatars with hybrid 3d representations.arXiv preprint arXiv:2309.06441, 2023

Yao Feng, Weiyang Liu, Timo Bolkart, Jinlong Yang, Marc Pollefeys, and Michael J Black. Learning disentangled avatars with hybrid 3d representations.arXiv preprint arXiv:2309.06441, 2023. 1, 2

work page arXiv 2023

[16] [16]

Hood: Hierarchical graphs for generalized modelling of clothing dynamics

Artur Grigorev, Michael J Black, and Otmar Hilliges. Hood: Hierarchical graphs for generalized modelling of clothing dynamics. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16965– 16974, 2023. 3

work page 2023

[17] [17]

ContourCraft: Learning to resolve intersections in neural multi-garment simulations

Artur Grigorev, Giorgio Becherini, Michael Black, Ot- mar Hilliges, and Bernhard Thomaszewski. ContourCraft: Learning to resolve intersections in neural multi-garment simulations. InACM SIGGRAPH 2024 Conference Papers, pages 1–10, 2024. 3

work page 2024

[18] [18]

Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition

Chen Guo, Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12858–12868, 2023. 2

work page 2023

[19] [19]

Reloo: Reconstructing humans dressed in loose garments from monocular video in the wild

Chen Guo, Tianjian Jiang, Manuel Kaufmann, Chengwei Zheng, Julien Valentin, Jie Song, and Otmar Hilliges. Reloo: Reconstructing humans dressed in loose garments from monocular video in the wild. InEuropean conference on computer vision, pages 21–38. Springer, 2024. 1, 2, 3

work page 2024

[20] [20]

Vid2avatar-pro: Authentic avatar from videos in the wild via universal prior

Chen Guo, Junxuan Li, Yash Kant, Yaser Sheikh, Shunsuke Saito, and Chen Cao. Vid2avatar-pro: Authentic avatar from videos in the wild via universal prior. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5559–5570, 2025. 2

work page 2025

[21] [21]

Pgc: Physics-based gaussian cloth from a single pose

Michelle Guo, Matt Jen-Yuan Chiang, Igor Santesteban, Nikolaos Sarafianos, Hsiao-yu Chen, Oshri Halimi, Alja ˇz 9 Boˇziˇc, Shunsuke Saito, Jiajun Wu, C Karen Liu, et al. Pgc: Physics-based gaussian cloth from a single pose. InProceed- ings of the Computer Vision and Pattern Recognition Confer- ence, pages 21215–21225, 2025. 3

work page 2025

[22] [22]

Livecap: Real-time human performance capture from monocular video.ACM Trans

Marc Habermann, Weipeng Xu, Michael Zollh ¨ofer, Gerard Pons-Moll, and Christian Theobalt. Livecap: Real-time human performance capture from monocular video.ACM Trans. Graph., 38(2), 2019. 2

work page 2019

[23] [23]

Deepcap: Monocular human performance capture using weak supervision

Marc Habermann, Weipeng Xu, Michael Zollhofer, Gerard Pons-Moll, and Christian Theobalt. Deepcap: Monocular human performance capture using weak supervision. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 2

work page 2020

[24] [24]

Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and Larry S. Davis. Viton: An image-based virtual try-on network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 3

work page 2018

[25] [25]

Vton 360: High-fidelity virtual try-on from any viewing direction

Zijian He, Yuwei Ning, Yipeng Qin, Guangrun Wang, Sibei Yang, Liang Lin, and Guanbin Li. Vton 360: High-fidelity virtual try-on from any viewing direction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26388–26398, 2025. 1

work page 2025

[26] [26]

Learn- ing locally editable virtual humans

Hsuan-I Ho, Lixin Xue, Jie Song, and Otmar Hilliges. Learn- ing locally editable virtual humans. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21024–21035, 2023. 3

work page 2023

[27] [27]

Neural-abc: Neural parametric models for articulated body with clothes

Chen Honghu, Yao Yuxin, and Juyong Zhang. Neural-abc: Neural parametric models for articulated body with clothes. IEEE Transactions on Visualization and Computer Graphics,

work page

[28] [28]

Gaussianavatar: Towards realistic human avatar model- ing from a single video via animatable 3d gaussians

Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Shengping Zhang, and Liqiang Nie. Gaussianavatar: Towards realistic human avatar model- ing from a single video via animatable 3d gaussians. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 1, 2

work page 2024

[29] [29]

Gauhuman: Articu- lated gaussian splatting from monocular human videos

Shoukang Hu, Tao Hu, and Ziwei Liu. Gauhuman: Articu- lated gaussian splatting from monocular human videos. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 20418–20431, 2024. 1, 2

work page 2024

[30] [30]

Humanliff: Layer-wise 3d human diffusion model: Humanliff: Layer- wise 3d human diffusion model.Int

Shoukang Hu, Fangzhou Hong, Tao Hu, Liang Pan, Haiyi Mei, Weiye Xiao, Lei Yang, and Ziwei Liu. Humanliff: Layer-wise 3d human diffusion model: Humanliff: Layer- wise 3d human diffusion model.Int. J. Comput. Vision, 133 (9):5938–5957, 2025. 3

work page 2025

[31] [31]

2d gaussian splatting for geometrically accu- rate radiance fields

Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accu- rate radiance fields. InSIGGRAPH 2024 Conference Papers. Association for Computing Machinery, 2024. 1, 2, 3, 5, 6, 14

work page 2024

[32] [32]

Sith: Single- view textured human reconstruction with image-conditioned diffusion

Hsuan I Ho, Jie Song, and Otmar Hilliges. Sith: Single- view textured human reconstruction with image-conditioned diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 538–549, 2024. 3

work page 2024

[33] [33]

Bcnet: Learning body and cloth shape from a single image

Boyi Jiang, Juyong Zhang, Yang Hong, Jinhao Luo, Ligang Liu, and Hujun Bao. Bcnet: Learning body and cloth shape from a single image. InEuropean Conference on Computer Vision, pages 18–35. Springer, 2020. 2, 3

work page 2020

[34] [34]

In- stantavatar: Learning avatars from monocular video in 60 seconds

Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. In- stantavatar: Learning avatars from monocular video in 60 seconds. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16922– 16932, 2023. 2, 3

work page 2023

[35] [35]

Prioravatar: Efficient and robust avatar creation from monocular video using learned priors

Tianjian Jiang, Hsuan-I Ho, Manuel Kaufmann, and Jie Song. Prioravatar: Efficient and robust avatar creation from monocular video using learned priors. InProceedings of the SIGGRAPH Asia 2025 Conference Papers, pages 1–10,

work page 2025

[36] [36]

Neuman: Neural human radiance field from a single video

Wei Jiang, Kwang Moo Yi, Golnoosh Samei, Oncel Tuzel, and Anurag Ranjan. Neuman: Neural human radiance field from a single video. InComputer Vision – ECCV 2022, pages 402–418, Cham, 2022. Springer Nature Switzerland. 2, 3

work page 2022

[37] [37]

Total cap- ture: A 3d deformation model for tracking faces, hands, and bodies

Hanbyul Joo, Tomas Simon, and Yaser Sheikh. Total cap- ture: A 3d deformation model for tracking faces, hands, and bodies. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 2

work page 2018

[38] [38]

Physhead: Simulation- ready gaussian head avatars, 2026

Berna Kabadayi, Vanessa Sklyarova, Wojciech Zielonka, Justus Thies, and Gerard Pons-Moll. Physhead: Simulation- ready gaussian head avatars, 2026. 3

work page 2026

[39] [39]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 1, 2, 3

work page 2023

[40] [40]

Gala: Generating animatable layered assets from a sin- gle scan

Taeksoo Kim, Byungjun Kim, Shunsuke Saito, and Hanbyul Joo. Gala: Generating animatable layered assets from a sin- gle scan. InCVPR, 2024. 2, 3, 5, 6, 7

work page 2024

[41] [41]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 6

work page 2023

[42] [42]

Hugs: Human gaussian splats

Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan. Hugs: Human gaussian splats. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 505–515, 2024. 1, 2

work page 2024

[43] [43]

Gart: Gaussian articulated template mod- els

Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, and Kostas Daniilidis. Gart: Gaussian articulated template mod- els. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 19876–19887,

work page

[44] [44]

Dig: Draping implicit garment over the human body

Ren Li, Benoit Guillard, Edoardo Remelli, and Pascal Fua. Dig: Draping implicit garment over the human body. In Proceedings of the Asian Conference on Computer Vision (ACCV), pages 2780–2795, 2022. 3

work page 2022

[45] [45]

Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. Learning a model of facial shape and ex- pression from 4D scans.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):194:1–194:17, 2017. 3

work page 2017

[46] [46]

Diffavatar: Simulation-ready garment optimization with differentiable simulation

Yifei Li, Hsiao-yu Chen, Egor Larionov, Nikolaos Sarafi- anos, Wojciech Matusik, and Tuur Stuyck. Diffavatar: Simulation-ready garment optimization with differentiable simulation. InProceedings of the IEEE/CVF Conference 10 on Computer Vision and Pattern Recognition (CVPR), pages 4368–4378, 2024. 3

work page 2024

[47] [47]

Ani- matable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling

Zhe Li, Zerong Zheng, Lizhen Wang, and Yebin Liu. Ani- matable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19711–19722, 2024. 1, 2, 3

work page 2024

[48] [48]

Tingting Liao, Hongwei Yi, Yuliang Xiu, Jiaxiang Tang, Yangyi Huang, Justus Thies, and Michael J. Black. TADA! Text to Animatable Digital Avatars. InInternational Confer- ence on 3D Vision (3DV), 2024. 3

work page 2024

[49] [49]

Layga: Layered gaussian avatars for animatable clothing transfer

Siyou Lin, Zhe Li, Zhaoqi Su, Zerong Zheng, Hongwen Zhang, and Yebin Liu. Layga: Layered gaussian avatars for animatable clothing transfer. InSIGGRAPH Conference Pa- pers, 2024. 2, 3, 5

work page 2024

[50] [50]

Gas: Generative avatar syn- thesis from a single image

Yixing Lu, Junting Dong, Youngjoong Kwon, Qin Zhao, Bo Dai, and Fernando De la Torre. Gas: Generative avatar syn- thesis from a single image. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12883– 12893, 2025. 3

work page 2025

[51] [51]

Troje, Ger- ard Pons-Moll, and Michael J

Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Ger- ard Pons-Moll, and Michael J. Black. AMASS: Archive of motion capture as surface shapes. InInternational Confer- ence on Computer Vision, pages 5442–5451, 2019. 8, 16

work page 2019

[52] [52]

Occupancy networks: Learning 3d reconstruction in function space

Lars Mescheder, Michael Oechsle, Michael Niemeyer, Se- bastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), 2019. 2

work page 2019

[53] [53]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 1, 2

work page 2020

[54] [54]

3d clothed human reconstruction in the wild

Gyeongsik Moon, Hyeongjin Nam, Takaaki Shiratori, and Kyoung Mu Lee. 3d clothed human reconstruction in the wild. InEuropean conference on computer vision, pages 184–200. Springer, 2022. 3

work page 2022

[55] [55]

Expressive whole-body 3d gaussian avatar

Gyeongsik Moon, Takaaki Shiratori, and Shunsuke Saito. Expressive whole-body 3d gaussian avatar. InEuropean Conference on Computer Vision, pages 19–35. Springer,

work page

[56] [56]

Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022. 1, 2

work page 2022

[57] [57]

Disco4d: Disentangled 4d human generation and animation from a single image

Hui En Pang, Shuai Liu, Zhongang Cai, Lei Yang, Tianwei Zhang, and Ziwei Liu. Disco4d: Disentangled 4d human generation and animation from a single image. InCVPR,

work page

[58] [58]

Deepsdf: Learning con- tinuous signed distance functions for shape representation

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning con- tinuous signed distance functions for shape representation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 2

work page 2019

[59] [59]

Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3D hands, face, and body from a single image. InProceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 10975–10985, 2019. 1, 3

work page 2019

[60] [60]

Pica: Physics-integrated clothed avatar.arXiv preprint arXiv:2407.05324, 2024

Bo Peng, Yunfan Tao, Haoyu Zhan, Yudong Guo, and Juy- ong Zhang. Pica: Physics-integrated clothed avatar.arXiv preprint arXiv:2407.05324, 2024. 3

work page arXiv 2024

[61] [61]

Ani- matable neural radiance fields for modeling dynamic human bodies

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. Ani- matable neural radiance fields for modeling dynamic human bodies. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 14314–14323, 2021. 2, 3

work page 2021

[62] [62]

Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans

Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. InCVPR,

work page

[63] [63]

Im- plicit neural representations with structured latent codes for human body modeling.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 2023

Sida Peng, Chen Geng, Yuanqing Zhang, Yinghao Xu, Qian- qian Wang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. Im- plicit neural representations with structured latent codes for human body modeling.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 2023. 1, 2, 3

work page 2023

[64] [64]

Clothcap: Seamless 4d clothing capture and retar- geting.ACM Transactions on Graphics (ToG), 36(4):1–15,

Gerard Pons-Moll, Sergi Pujades, Sonny Hu, and Michael J Black. Clothcap: Seamless 4d clothing capture and retar- geting.ACM Transactions on Graphics (ToG), 36(4):1–15,

work page

[65] [65]

Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians

Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, and Matthias Nießner. Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 20299–20309,

work page

[66] [66]

3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang. 3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5020–5030, 2024. 1, 2

work page 2024

[67] [67]

Black, Bernhard Thomaszewski, Christina Tsalicoglou, and Otmar Hilliges

Boxiang Rong, Artur Grigorev, Wenbo Wang, Michael J. Black, Bernhard Thomaszewski, Christina Tsalicoglou, and Otmar Hilliges. Gaussian Garments: Reconstruct- ing simulation-ready clothing with photorealistic appearance from multi-view video. InInternational Conference on 3D Vision 2025, 2025. 3

work page 2025

[68] [68]

Pifu: Pixel-aligned implicit function for high-resolution clothed human digitiza- tion

Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Mor- ishima, Angjoo Kanazawa, and Hao Li. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitiza- tion. InProceedings of the IEEE/CVF International Confer- ence on Computer Vision (ICCV), 2019. 3

work page 2019

[69] [69]

Relightable gaussian codec avatars

Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, and Giljoo Nam. Relightable gaussian codec avatars. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 130–141, 2024. 3

work page 2024

[70] [70]

Otaduy, and Dan Casas

Igor Santesteban, Miguel A. Otaduy, and Dan Casas. Snug: Self-supervised neural dynamic garments. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8140–8150, 2022. 3 11

work page 2022

[71] [71]

DiffHu- man: Probabilistic Photorealistic 3D Reconstruction of Hu- mans

Akash Sengupta, Thiemo Alldieck, Nikos Kolotouros, Enric Corona, Andrei Zanfir, and Cristian Sminchisescu. DiffHu- man: Probabilistic Photorealistic 3D Reconstruction of Hu- mans. InCVPR, 2024. 3

work page 2024

[72] [72]

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, and Zeyu Wang. SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 2, 3

work page 2024

[73] [73]

X- avatar: Expressive human avatars

Kaiyue Shen, Chen Guo, Manuel Kaufmann, Juan Jose Zarate, Julien Valentin, Jie Song, and Otmar Hilliges. X- avatar: Expressive human avatars. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16911–16921, 2023. 3

work page 2023

[74] [74]

Caphy: Cap- turing physical properties for animatable human avatars

Zhaoqi Su, Liangxiao Hu, Siyou Lin, Hongwen Zhang, Shengping Zhang, Justus Thies, and Yebin Liu. Caphy: Cap- turing physical properties for animatable human avatars. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14150–14160, 2023. 3

work page 2023

[75] [75]

Outfitanyone: Ultra-high quality virtual try-on for any clothing and any person.arXiv preprint arXiv:2407.16224, 2024

Ke Sun, Jian Cao, Qi Wang, Linrui Tian, Xindi Zhang, Lian Zhuo, Bang Zhang, Liefeng Bo, Wenbo Zhou, Weiming Zhang, and Daiheng Gao. Outfitanyone: Ultra-high quality virtual try-on for any clothing and any person.arXiv preprint arXiv:2407.16224, 2024. 3

work page arXiv 2024

[76] [76]

Open-vocabulary se- mantic part segmentation of 3d human

Keito Suzuki, Bang Du, Girish Krishnan, Kunyao Chen, Runfa Blark Li, and Truong Nguyen. Open-vocabulary se- mantic part segmentation of 3d human. In2025 International Conference on 3D Vision (3DV), pages 1572–1582. IEEE,

work page

[77] [77]

Dressrecon: Freeform 4d human recon- struction from monocular video

Jeff Tan, Donglai Xiang, Shubham Tulsiani, Deva Ramanan, and Gengshan Yang. Dressrecon: Freeform 4d human recon- struction from monocular video. In2025 International Con- ference on 3D Vision (3DV), pages 250–260. IEEE, 2025. 2

work page 2025

[78] [78]

Sizer: A dataset and model for parsing 3d clothing and learning size sensitive 3d clothing

Garvita Tiwari, Bharat Lal Bhatnagar, Tony Tung, and Ger- ard Pons-Moll. Sizer: A dataset and model for parsing 3d clothing and learning size sensitive 3d clothing. InEuropean Conference on Computer Vision, pages 1–18. Springer, 2020. 2, 3

work page 2020

[79] [79]

Remu: Reconstructing multi- layer 3d clothed human from images

Onat Vuran and Hsuan-I Ho. Remu: Reconstructing multi- layer 3d clothed human from images. InBritish Machine Vision Conference (BMVC), 2025. 3

work page 2025

[80] [80]

Disentangled clothed avatar generation from text descriptions

Jionghao Wang, Yuan Liu, Zhiyang Dou, Zhengming Yu, Yongqing Liang, Cheng Lin, Rong Xie, Li Song, Xin Li, and Wenping Wang. Disentangled clothed avatar generation from text descriptions. InEuropean Conference on Com- puter Vision, pages 381–401. Springer, 2024. 3

work page 2024