pith. sign in

arxiv: 2604.08547 · v1 · submitted 2026-04-09 · 💻 cs.CV · cs.GR

GaussiAnimate: Reconstruct and Rig Animatable Categories with Level of Dynamics

Pith reviewed 2026-05-10 18:18 UTC · model grok-4.3

classification 💻 cs.CV cs.GR
keywords 4D reconstructionGaussian splattingrigging systemnon-rigid deformationskeleton extractionmotion matchingreanimationdeformable models
0
0 comments X

The pith

Free-form bones bound to an adaptive skeleton via partwise motion matching enable higher-fidelity reanimation of unseen poses than standard skinning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a method to reconstruct 4D shapes and turn them into animatable rigs by compressing their surface dynamics into a controllable structure. It creates free-form bones from consistent Gaussian points to capture non-rigid deformations, derives a category-agnostic skeleton from mean curvature, and binds the two using non-parametric partwise motion matching to synthesize novel movements. This produces compact skelebones that support intuitive control while keeping reconstruction quality high, especially for characters with intricate dynamics. A sympathetic reader would care because it addresses the trade-off between flexible deformation modeling and easy animation control without relying on large training sets.

Core claim

The central claim is that the Scaffold-Skin Rigging System, called Skelebones, compresses the Level of Dynamics of 4D shapes into compact, controllable skelebones. This is achieved by approximating non-rigid deformations with free-form bones from temporally-consistent Gaussians, extracting and temporally refining a mean curvature skeleton for kinematic structure, and binding them through non-parametric partwise motion matching that synthesizes novel bone motions by matching, retrieving, and blending existing ones. The resulting system outperforms Linear Blend Skinning by 17.3 percent PSNR and Bag-of-Bones by 21.7 percent PSNR on unseen poses while maintaining high reconstruction fidelity, as

What carries the argument

Scaffold-Skin Rigging System (Skelebones) that creates free-form bones from deformable Gaussians, extracts a temporally refined mean curvature skeleton, and binds them using non-parametric partwise motion matching (PartMM) to synthesize novel motions.

If this is right

  • Reanimation performance improves by 17.3 percent PSNR over Linear Blend Skinning and 21.7 percent over Bag-of-Bones for unseen poses.
  • Reconstruction fidelity stays high for characters with complex non-rigid surface dynamics.
  • PartMM generalizes to both Gaussian and mesh inputs in low-data regimes of roughly 1000 frames, delivering 48.4 percent RMSE improvement over robust Linear Blend Skinning.
  • The method outperforms GRU- and MLP-based learning approaches by more than 20 percent under similar low-data conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If motion matching succeeds with short sequences, the approach could support rigging directly from brief video captures without needing extensive motion libraries.
  • The category-agnostic skeleton extraction opens the possibility of applying the same pipeline to deformable objects beyond human or animal characters.
  • Integrating the bones-and-skeleton output with other 4D reconstruction pipelines might enable fully automatic pipelines from raw captures to controllable animated assets.

Load-bearing premise

Partwise motion matching can reliably create accurate new bone motions for unseen poses by retrieving and blending from existing motions without artifacts or loss of fidelity, even in low-data settings.

What would settle it

Apply the system to a test set of poses that are highly dissimilar to the training motions and check whether surface reanimation quality drops below Linear Blend Skinning or shows visible blending artifacts.

Figures

Figures reproduced from arXiv: 2604.08547 by Anpei Chen, Cheng Lin, Dongxin Lyu, Jiaxin Wang, Yuliang Xiu, Zeyu Cai, Zhiyang Dou.

Figure 1
Figure 1. Figure 1: GaussiAnimate is designed to: 1) rig diverse animatable entities—typically featuring a soft exterior and rigid core (e.g., clothed humans, quadrupeds, bipeds, birds, and garments)—from either reconstructed consistent 4DGS or mesh sequences. This relies on a novel skelebones representation that balances the intuitive control of kinematic skeletons with the deformation fidelity of free-form bones; and 2) ani… view at source ↗
Figure 2
Figure 2. Figure 2: Pipeline Overview Given a monocular or multi-view video, our method reconstructs a consistent 4DGS, extracts the inner skeleton via curve skeletonization [57] and the outer free-from bones via SSDR [31], together forming “skelebones”, which are then used to build motion database. Recent works like AP-NeRF [63], SK-GS [64], and Rig-GS [75] show promise by optimizing medial skeletons or constructing kinemati… view at source ↗
Figure 3
Figure 3. Figure 3: Inner Skeleton Initialization We first extract the curve skeleton (A) of the object in the canonical space. Then we estimate the joint locations on the curve skeleton through skinning analysis. Specifically, we project the skinning weights of the 3D points onto the curve skeleton (B), and then identify positions along the 1D curve where neighboring skinning weights exhibit the highest similarity as potenti… view at source ↗
Figure 4
Figure 4. Figure 4: Partwise Motion Matching (PartMM). Given a novel inner-skeleton pose sequence, we animate skelebones by synthesizing outer-bone motion via part-wise matching. Our method: (a) decomposes the kinematic tree into multiple parts (shown as two parts; user-defined in practice); (b) extracts part-wise motion patches R novel J from the novel pose sequence; (c) queries these patches against a pre-built motion datab… view at source ↗
Figure 5
Figure 5. Figure 5: Part Alignment. Since perfect skele￾tal matching is rare, we further compute the optimal rotation to compensate. where dSO(3) denotes the geodesic dis￾tance on the rotation manifold. Notably, the patch matching is performed indepen￾dently for each kinematic part, allowing for flexible recombination of motion segments across different parts, which is particularly beneficial for handling complex motions that… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative Comparison on VTO dataset. We visualize the reconstructed meshes, error maps, and ground-truth overlaps (top to bottom). Compared to FullMM, our proposed PartMM not only yields lower reconstruction errors but also exhibits better generalization to unseen motions [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative Comparison on D4D dataset. Visualizations of the reconstructed meshes, error maps, and ground-truth overlaps, demonstrating that PartMM robustly captures complex skinned animal deformations. SSDR ARAP SSDR ARAP SSDR ARAP [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: ARAP Ablation. We visualize the effect of ARAP loss and SSDR skinning. Time Axis 100%0% 30% 60% 80% [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
read the original abstract

Free-form bones, that conform closely to the surface, can effectively capture non-rigid deformations, but lack a kinematic structure necessary for intuitive control. Thus, we propose a Scaffold-Skin Rigging System, termed "Skelebones", with three key steps: (1) Bones: compress temporally-consistent deformable Gaussians into free-form bones, approximating non-rigid surface deformations; (2) Skeleton: extract a Mean Curvature Skeleton from canonical Gaussians and refine it temporally, ensuring a category-agnostic, motion-adaptive, and topology-correct kinematic structure; (3) Binding: bind the skeleton and bones via non-parametric partwise motion matching (PartMM), synthesizing novel bone motions by matching, retrieving, and blending existing ones. Collectively, these three steps enable us to compress the Level of Dynamics of 4D shapes into compact skelebones that are both controllable and expressive. We validate our approach on both synthetic and real-world datasets, achieving significant improvements in reanimation performance across unseen poses-with 17.3% PSNR gains over Linear Blend Skinning (LBS) and 21.7% over Bag-of-Bones (BoB)-while maintaining excellent reconstruction fidelity, particularly for characters exhibiting complex non-rigid surface dynamics. Our Partwise Motion Matching algorithm demonstrates strong generalization to both Gaussian and mesh representations, especially under low-data regime (~1000 frames), achieving 48.4% RMSE improvement over robust LBS and outperforming GRU- and MLP-based learning methods by >20%. Code will be made publicly available for research purposes at cookmaker.cn/gaussianimate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces GaussiAnimate for reconstructing and rigging animatable 3D categories from 4D data via a Scaffold-Skin Rigging System ('Skelebones'). It compresses temporally-consistent deformable Gaussians into free-form bones to approximate non-rigid deformations, extracts and refines a mean curvature skeleton for a category-agnostic kinematic structure, and binds them using non-parametric partwise motion matching (PartMM) to synthesize novel bone motions by matching/retrieving/blending from existing data. The method claims to compress 'Level of Dynamics' into controllable yet expressive rigs, with empirical validation on synthetic and real datasets showing 17.3% PSNR gains over Linear Blend Skinning (LBS) and 21.7% over Bag-of-Bones (BoB) for reanimation on unseen poses, plus 48.4% RMSE improvement in low-data (~1000 frames) regimes over LBS and >20% over GRU/MLP baselines, while preserving reconstruction fidelity.

Significance. If the results hold, the work offers a hybrid approach to animatable reconstruction that bridges free-form Gaussian representations with intuitive skeletal control, addressing limitations of both parametric skinning and purely learned models for complex non-rigid dynamics. Strengths include the category-agnostic design, explicit handling of low-data generalization via non-parametric matching, and the commitment to public code release, which supports reproducibility and potential adoption in graphics and vision applications.

major comments (2)
  1. [Binding via non-parametric partwise motion matching (PartMM)] Binding step (PartMM description): The headline reanimation and low-data claims (17.3% PSNR, 48.4% RMSE) depend on PartMM's ability to reliably match, retrieve, and blend bone motions for truly unseen poses from a ~1000-frame library. No quantitative characterization of pose-space coverage, the distance metric for matching, blending weights, or ablation on out-of-distribution articulations is provided, leaving the generalization assumption untested and risking artifacts as noted in the skeptic analysis.
  2. [Validation on synthetic and real-world datasets] Evaluation section: The specific percentage gains (17.3% PSNR over LBS, 21.7% over BoB, 48.4% RMSE) and outperformance over learning baselines are reported without dataset details, test-pose counts, error bars, cross-validation protocols, or implementation specifics (e.g., Gaussian compression parameters or skeleton refinement). This makes it impossible to verify if the improvements are robust or depend on unstated choices, directly undermining the central empirical claims.
minor comments (2)
  1. [Abstract and introduction] The term 'Level of Dynamics' is used in the title and abstract without an explicit definition, quantification, or equation showing how it is compressed into the Skelebones representation.
  2. [Abstract] The abstract states 'Code will be made publicly available' but provides only a placeholder URL; confirming a functional repository with data splits and training scripts would strengthen the reproducibility claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have reviewed the major comments carefully and provide point-by-point responses below, outlining specific revisions we will make to address the concerns while preserving the core contributions of the work.

read point-by-point responses
  1. Referee: [Binding via non-parametric partwise motion matching (PartMM)] Binding step (PartMM description): The headline reanimation and low-data claims (17.3% PSNR, 48.4% RMSE) depend on PartMM's ability to reliably match, retrieve, and blend bone motions for truly unseen poses from a ~1000-frame library. No quantitative characterization of pose-space coverage, the distance metric for matching, blending weights, or ablation on out-of-distribution articulations is provided, leaving the generalization assumption untested and risking artifacts as noted in the skeptic analysis.

    Authors: We agree that the current description of PartMM would benefit from expanded quantitative details to better substantiate the generalization claims. In the revised manuscript, we will augment the method section with: (1) statistics and visualizations characterizing pose-space coverage in the motion library (e.g., histograms of rotation and translation variances across the ~1000 frames); (2) the explicit distance metric, defined as a weighted combination of Euclidean bone position differences and geodesic angular distances on rotations; (3) the blending procedure, using normalized inverse-distance weights with a similarity threshold for retrieval; and (4) a new ablation study that partitions data into in-distribution and out-of-distribution articulations to quantify performance drops and artifact rates. These additions will directly test and document the robustness of the non-parametric matching approach. revision: yes

  2. Referee: [Validation on synthetic and real-world datasets] Evaluation section: The specific percentage gains (17.3% PSNR over LBS, 21.7% over BoB, 48.4% RMSE) and outperformance over learning baselines are reported without dataset details, test-pose counts, error bars, cross-validation protocols, or implementation specifics (e.g., Gaussian compression parameters or skeleton refinement). This makes it impossible to verify if the improvements are robust or depend on unstated choices, directly undermining the central empirical claims.

    Authors: We concur that greater transparency in the experimental protocol is essential for verifying the reported gains. We will revise the Experiments and Evaluation sections to incorporate: complete dataset specifications (including sequence counts, total frames per category, synthetic vs. real-world splits, and pose variation metrics); precise test-pose counts and selection criteria for unseen reanimation; error bars derived from multiple runs or cross-validation; the full cross-validation protocol; and implementation parameters such as Gaussian count per bone, compression ratios, and skeleton refinement thresholds (e.g., curvature and temporal consistency criteria). These details will be presented in a new table and accompanying text to allow independent verification of the PSNR, RMSE, and baseline comparisons. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method relies on standard non-parametric techniques and external benchmarks

full rationale

The paper describes a Scaffold-Skin Rigging System (Skelebones) with three explicit steps—compressing Gaussians into free-form bones, extracting/refining a Mean Curvature Skeleton, and binding via non-parametric PartMM (matching/retrieving/blending existing motions)—without any equations, derivations, or fitted parameters that reduce to the inputs by construction. Reported gains (e.g., PSNR over LBS/BoB, RMSE over LBS) are empirical validation results on synthetic/real datasets, not predictions forced by self-definition or self-citation chains. PartMM is presented as a non-parametric algorithm whose success depends on motion library coverage, but this is an assumption about data density rather than a circular reduction; no load-bearing premise collapses to a prior self-citation or ansatz smuggled in. The derivation chain is self-contained against external baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented physical entities; the method introduces named components (Skelebones, PartMM) but these are algorithmic rather than postulated entities with independent evidence.

pith-pipeline@v0.9.0 · 5611 in / 1220 out tokens · 44934 ms · 2026-05-10T18:18:39.167089+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages

  1. [1]

    In: SIGGRAPH Asia 2023 Technical Communications

    Abdrashitov, R., Raichstat, K., Monsen, J., Hill, D.: Robust skin weights transfer via weight inpainting. In: SIGGRAPH Asia 2023 Technical Communications. SA ’23, Association for Computing Machinery, New York, NY, USA (2023).https://doi. org/10.1145/3610543.3626180 , https://doi.org/10.1145/3610543. 362618012, 13

  2. [2]

    In: ACM SIGGRAPH 2007 papers (Jul 2007).https://doi.org/10.1145/1275808

    Baran, I., Popović, J.: Automatic rigging and animation of 3d characters. In: ACM SIGGRAPH 2007 papers (Jul 2007).https://doi.org/10.1145/1275808. 1276467,http://dx.doi.org/10.1145/1275808.12764675, 15

  3. [3]

    Models for the perception of speech and visual form pp

    Blum, H.: A transformation for extracting new descriptions of shape. Models for the perception of speech and visual form pp. 362–380 (1967) 4

  4. [4]

    In: Proc

    Brunner, D., Brunnett, G.: Mesh segmentation using the object skeleton graph. In: Proc. IASTED International Conf. on Computer Graphics and Imaging. pp. 48–55 (2004) 4

  5. [5]

    Up2you: Fast reconstruc- tion of yourself from unconstrained photo collections.arXiv preprint arXiv:2509.24817, 2025

    Cai, Z., Li, Z., Li, X., Li, B., Wang, Z., Zhang, Z., Xiu, Y.: Up2you: Fast re- construction of yourself from unconstrained photo collections. arXiv preprint arXiv:2509.24817 (2025) 5

  6. [6]

    arXiv preprint arXiv:2409.05099 , year=

    Cai, Z., Wang, D., Liang, Y., Shao, Z., Chen, Y.C., Zhan, X., Wang, Z.: Dreammap- ping: High-fidelity text-to-3d generation via variational distribution mapping. arXiv preprint arXiv:2409.05099 (2024) 5

  7. [7]

    arXiv preprint arXiv:2601.14253 (2026) 5

    Chen, H., Chen, X., Zhang, Y., Xu, Z., Chen, A.: Motion 3-to-4: 3d motion reconstruction for 4d synthesis. arXiv preprint arXiv:2601.14253 (2026) 5

  8. [8]

    arXiv preprint arXiv:2508.13139 (2025) 3, 9, 16

    Chen, L.H., Zhang, Y., Yin, Z., Dou, Z., Chen, X., Wang, J., Komura, T., Zhang, L.: Motion2motion: Cross-topology motion transfer with sparse correspondence. arXiv preprint arXiv:2508.13139 (2025) 3, 9, 16

  9. [9]

    Easi3r: Estimating disentangled motion from dust3r without training.arXiv preprint arXiv:2503.24391,

    Chen, X., Chen, Y., Xiu, Y., Geiger, A., Chen, A.: Easi3r: Estimating disentangled motion from dust3r without training. arXiv preprint arXiv:2503.24391 (2025) 5

  10. [10]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Chen, Y., Chen, X., Chen, A., Pons-Moll, G., Xiu, Y.: Feat2gs: Probing visual foundation models with gaussian splatting. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 6348–6361 (2025) 5

  11. [11]

    Human3r: Everyone everywhere all at once

    Chen, Y., Chen, X., Xue, Y., Chen, A., Xiu, Y., Gerard, P.M.: Human3r: Everyone everywhere all at once. arXiv preprint arXiv:2510.06219 (2025) 5

  12. [12]

    Dna- rendering: A diverse neural actor repository for high-fidelity human-centric rendering.arXiv preprint, arXiv:2307.10173,

    Cheng, W., Chen, R., Yin, W., Fan, S., Chen, K., He, H., Luo, H., Cai, Z., Wang, J., Gao, Y., Yu, Z., Lin, Z., Ren, D., Yang, L., Liu, Z., Loy, C.C., Qian, C., Wu, W., Lin, D., Dai, B., Lin, K.Y.: Dna-rendering: A diverse neural actor repository for high-fidelity human-centric rendering. arXiv preprintarXiv:2307.10173(2023) 11, 13

  13. [13]

    In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers

    Deng, Y., Zhang, Y., Geng, C., Wu, S., Wu, J.: Anymate: A dataset and baselines for learning 3d object rigging. In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers. pp. 1–10 (2025) 5, 15

  14. [14]

    In: Symposium on geometry processing

    Dey, T.K., Sun, J.: Defining and computing curve-skeletons with medial geodesic function. In: Symposium on geometry processing. vol. 6, pp. 143–152 (2006) 4

  15. [15]

    In: Computer Graphics Forum

    Dou, Z., Lin, C., Xu, R., Yang, L., Xin, S., Komura, T., Wang, W.: Coverage axis: Inner point selection for 3d shape skeletonization. In: Computer Graphics Forum. vol. 41, pp. 419–432. Wiley Online Library (2022) 4

  16. [16]

    In: European conference on computer vision

    Epstein, D., Park, T., Zhang, R., Shechtman, E., Efros, A.A.: Blobgan: Spatially disentangled scene representations. In: European conference on computer vision. pp. 616–635. Springer (2022) 2 18 J. Wang, D. Lyu, Z. Cai, Z. Dou, C. Lin, A. Chen and Y. Xiu

  17. [17]

    In: SIGGRAPH Asia 2022 Conference Papers (2022) 12, 13

    Fang, J., Yi, T., Wang, X., Xie, L., Zhang, X., Liu, W., Nießner, M., Tian, Q.: Fast dynamic radiance fields with time-aware neural voxels. In: SIGGRAPH Asia 2022 Conference Papers (2022) 12, 13

  18. [18]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Granot, N., Feinstein, B., Shocher, A., Bagon, S., Irani, M.: Drop the gan: In defense of patches nearest neighbors as single image generative models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 13460–13469 (2022) 3, 9

  19. [19]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Grigorev, A., Black, M.J., Hilliges, O.: Hood: Hierarchical graphs for generalized modelling of clothing dynamics. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16965–16974 (2023) 12

  20. [20]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    He, G., Geng, C., Wu, S., Wu, J.: Category-agnostic neural object rigging. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 22078–22088 (2025) 2

  21. [21]

    He, G., Geng, C., Wu, S., Wu, J.: Category-agnostic neural object rigging (2025), https://arxiv.org/abs/2505.202833, 5

  22. [22]

    ACM Trans

    Holden, D., Kanoun, O., Perepichka, M., Popa, T.: Learned motion matching. ACM Trans. Graph.39(4) (Aug 2020).https://doi.org/10.1145/3386569. 3392440,https://doi.org/10.1145/3386569.339244016

  23. [23]

    In: ACM SIGGRAPH 2024 Conference Papers

    Huang, B., Yu, Z., Chen, A., Geiger, A., Gao, S.: 2d gaussian splatting for ge- ometrically accurate radiance fields. In: SIGGRAPH 2024 Conference Papers. Association for Computing Machinery (2024). https://doi.org/10.1145/ 3641519.36574287

  24. [24]

    Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes,

    Huang,Y.H.,Sun,Y.T.,Yang,Z.,Lyu,X.,Cao,Y.P.,Qi,X.:Sc-gs:Sparse-controlled gaussian splatting for editable dynamic scenes. arXiv preprint arXiv:2312.14937 (2023) 3, 6, 12, 13, 14

  25. [25]

    ACM Transactions on Graphics43(3), 1–30 (2024) 15

    Huang, Z., Tozoni, D.C., Gjoka, A., Ferguson, Z., Schneider, T., Panozzo, D., Zorin, D.: Differentiable solver for time-dependent deformation problems with contact. ACM Transactions on Graphics43(3), 1–30 (2024) 15

  26. [26]

    Humanrf: High-fidelity neural radiance fields for humans in motion,

    Işık, M., Rünz, M., Georgopoulos, M., Khakhulin, T., Starck, J., Agapito, L., Nießner, M.: Humanrf: High-fidelity neural radiance fields for humans in motion. ACM Transactions on Graphics (TOG)42(4), 1–12 (2023).https://doi.org/ 10.1145/3592415,https://doi.org/10.1145/359241511, 13

  27. [27]

    ACM Transactions on Graphics (proceedings of ACM SIGGRAPH)30(4), 78:1–78:8 (2011) 12

    Jacobson, A., Baran, I., Popović, J., Sorkine, O.: Bounded biharmonic weights for real-time deformation. ACM Transactions on Graphics (proceedings of ACM SIGGRAPH)30(4), 78:1–78:8 (2011) 12

  28. [28]

    ACM Transactions on Graph- ics p

    James, D.L., Twigg, C.D.: Skinning mesh animations. ACM Transactions on Graph- ics p. 399–407 (Jul 2005).https://doi.org/10.1145/1073204.1073206 , http://dx.doi.org/10.1145/1073204.10732065

  29. [29]

    Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering (Aug 2023) 2, 5, 6

  30. [30]

    Kuai, T., Karthikeyan, A., Kant, Y., Mirzaei, A., Gilitschenski, I.: Camm: Building category-agnostic and animatable 3d models from monocular videos (2023),https: //arxiv.org/abs/2304.069373, 5

  31. [31]

    ACM Trans

    Le, B., Deng, Z.: Smooth skinning decomposition with rigid bones. ACM Trans. Graph.31(6) (Dec 2012), in press 2, 3, 5, 6, 7

  32. [32]

    : Build-to-last: Strength to weight 3d printed objects

    Le, B.H., Deng, Z.: Robust and accurate skeletal rigging from mesh sequences. ACM Transactions on Graphics p. 1–10 (Jul 2014).https://doi.org/10.1145/ 2601097.2601161, http://dx.doi.org/10.1145/2601097.2601161 3, 5, 7, 8

  33. [33]

    Lei, J., Wang, Y., Pavlakos, G., Liu, L., Daniilidis, K.: Gart: Gaussian articulated template models (2023),https://arxiv.org/abs/2311.160996 Reconstruct and Rig Animatable Categories with Level of Dynamics 19

  34. [34]

    In: Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pp

    Lewis, J.P., Cordner, M., Fong, N.: Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation. In: Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pp. 811–818 (2023) 3

  35. [35]

    Codimensional incremental potential contact

    Li, M., Kaufman, D.M., Jiang, C.: Codimensional incremental potential contact. arXiv preprint arXiv:2012.04457 (2020) 15

  36. [36]

    ACM Transactions on Graphics p

    Li, P., Wang, B., Sun, F., Guo, X., Zhang, C., Wang, W.: Q-mat. ACM Transactions on Graphics p. 1–16 (Dec 2015).https://doi.org/10.1145/2753755, https: //doi.org/10.1145/27537554

  37. [37]

    Li, R., Tanke, J., Vo, M., Zollhofer, M., Gall, J., Kanazawa, A., Lassner, C.: Tava: Template-free animatable volumetric actors (2022) 3, 5

  38. [38]

    ACM Transactions on Graphics (TOG) 42(4), 1–12 (2023) 3, 9

    Li, W., Chen, X., Li, P., Sorkine-Hornung, O., Chen, B.: Example-based motion synthesis via generative motion matching. ACM Transactions on Graphics (TOG) 42(4), 1–12 (2023) 3, 9

  39. [39]

    International Conference on Computer Vision (ICCV) (2021) 11

    Li, Y., Takehara, H., Taketomi, T., Zheng, B., Nießner, M.: 4dcomplete: Non-rigid motion estimation beyond the observable surface. International Conference on Computer Vision (ICCV) (2021) 11

  40. [40]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) 16

    Li, Z., Zheng, Z., Wang, L., Liu, Y.: Animatable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024) 16

  41. [41]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Lin, C., Li, C., Liu, Y., Chen, N., Choi, Y.K., Wang, W.: Point2skeleton: Learning skeletal representations from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4277–4286 (June 2021) 4

  42. [42]

    Liu, I., Su, H., Wang, X.: Dynamic gaussians mesh: Consistent mesh reconstruction from monocular videos (2024) 11, 12, 13

  43. [43]

    ACM Transactions on Graphics (TOG)44(4), 1–12 (2025) 5

    Liu, I., Xu, Z., Yifan, W., Tan, H., Xu, Z., Wang, X., Su, H., Shi, Z.: Riganything: Template-free autoregressive rigging for diverse 3d assets. ACM Transactions on Graphics (TOG)44(4), 1–12 (2025) 5

  44. [44]

    ACM Transactions on Graphics (TOG)32(6), 1–7 (2013) 15

    Liu, T., Bargteil, A.W., O’Brien, J.F., Kavan, L.: Fast simulation of mass-spring systems. ACM Transactions on Graphics (TOG)32(6), 1–7 (2013) 15

  45. [45]

    In: Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pp

    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: A skinned multi-person linear model. In: Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pp. 851–866 (2023) 2, 5, 12

  46. [46]

    In: 3DV (2024) 12, 13

    Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In: 3DV (2024) 12, 13

  47. [47]

    In: ACM SIGGRAPH 2010 papers, pp

    Miklos, B., Giesen, J., Pauly, M.: Discrete scale axis representations for 3d geometry. In: ACM SIGGRAPH 2010 papers, pp. 1–10 (2010) 4

  48. [48]

    Commu- nications of the ACM65(1), 99–106 (2021) 5

    Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Commu- nications of the ACM65(1), 99–106 (2021) 5

  49. [49]

    A ConvNet for the 2020s

    Noguchi, A., Iqbal, U., Tremblay, J., Harada, T., Gallo, O.: Watch it move: Un- supervised discovery of 3d joints for re-posing of articulated objects. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Jun 2022). https://doi.org/10.1109/cvpr52688.2022.00366 , http: //dx.doi.org/10.1109/cvpr52688.2022.003663, 5

  50. [50]

    In: ACM SIGGRAPH 2022 conference proceedings

    Pan, X., Mai, J., Jiang, X., Tang, D., Li, J., Shao, T., Zhou, K., Jin, X., Manocha, D.: Predicting loose-fitting garment deformations using bone-driven motion networks. In: ACM SIGGRAPH 2022 conference proceedings. pp. 1–10 (2022) 2, 11, 12 20 J. Wang, D. Lyu, Z. Cai, Z. Dou, C. Lin, A. Chen and Y. Xiu

  51. [51]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A., Tzionas, D., Black, M.J.: Expressive body capture: 3d hands, face, and body from a single image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10975–10985 (2019) 2

  52. [52]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) 11, 12, 13

    Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-NeRF: Neural Radiance Fields for Dynamic Scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) 11, 12, 13

  53. [53]

    In: Computer Graphics Forum

    Reniers, D., Telea, A.: Part-type segmentation of articulated voxel-shapes using the junction rule. In: Computer Graphics Forum. vol. 27, pp. 1845–1852. Wiley Online Library (2008) 4

  54. [54]

    Sabathier, R., Novotny, D., Mitra, N.J., Monnier, T.: Actionmesh: Animated 3d mesh generation with temporal 3d diffusion (2026),https://arxiv.org/abs/ 2601.161485

  55. [55]

    In: Proceedings of the Fifth Eurographics Symposium on Geometry Processing

    Schaefer, S., Yuksel, C.: Example-based skeleton extraction. In: Proceedings of the Fifth Eurographics Symposium on Geometry Processing. p. 153–162. SGP ’07, Eurographics Association, Goslar, DEU (2007) 5, 8

  56. [56]

    In: Proceedings of the Fifth Eurographics Symposium on Geometry Processing

    Sorkine, O., Alexa, M.: As-rigid-as-possible surface modeling. In: Proceedings of the Fifth Eurographics Symposium on Geometry Processing. p. 109–116. SGP ’07, Eurographics Association, Goslar, DEU (2007) 3, 7

  57. [57]

    In: Computer Graphics Forum

    Tagliasacchi, A., Alhashim, I., Olson, M., Zhang, H.: Mean curvature skeletons. In: Computer Graphics Forum. vol. 31, pp. 1735–1744. Wiley Online Library (2012) 3, 4, 5, 6, 8

  58. [58]

    In: 3DV (2025) 2, 3, 5, 12, 13

    Tan, J., Xiang, D., Tulsiani, S., Ramanan, D., Yang, G.: Dressrecon: Freeform 4d human reconstruction from monocular video. In: 3DV (2025) 2, 3, 5, 12, 13

  59. [59]

    ACM Transaction on Graphics (Proc

    Thiery, J.M., Guy, E., Boubekeur, T.: Sphere-meshes: Shape approximation using spherical quadric error metrics. ACM Transaction on Graphics (Proc. SIGGRAPH Asia 2013)32(6), Art. No. 178 (2013) 5

  60. [61]

    ACM Trans

    Thiery, J.M., Guy, E., Boubekeur, T., Eisemann, E.: Animated mesh approximation with sphere-meshes. ACM Trans. Graph.35(3), 30:1–30:13 (May 2016).https: //doi.org/10.1145/2898350 , http://doi.acm.org/10.1145/2898350 5

  61. [62]

    In: IEEE International Conference on Shape Modeling and Applications 2007 (SMI’07)

    Tierny, J., Vandeborre, J.P., Daoudi, M.: Topology driven 3d mesh hierarchical seg- mentation. In: IEEE International Conference on Shape Modeling and Applications 2007 (SMI’07). pp. 215–220. IEEE (2007) 4

  62. [63]

    Advances in Neural Information Processing Systems36(2024) 3, 6, 12, 13

    Uzolas, L., Eisemann, E., Kellnhofer, P.: Template-free articulated neural point clouds for reposable view synthesis. Advances in Neural Information Processing Systems36(2024) 3, 6, 12, 13

  63. [64]

    Wan, D., Wang, Y., Lu, R., Zeng, G.: Template-free articulated gaussian splatting for real-time reposable dynamic view synthesis (2024),https://arxiv.org/ abs/2412.055706

  64. [65]

    In: International Conference on 3D Vision (3DV) (2025) 5

    Wang, D., Meng, H., Cai, Z., Shao, Z., Liu, Q., Wang, L., Fan, M., Zhan, X., Wang, Z.: Headevolver: Text to head avatars via expressive and attribute-preserving mesh deformation. In: International Conference on 3D Vision (3DV) (2025) 5

  65. [66]

    ACM Transactions on Graphics (TOG)40(4), 1–14 (2021) 15 Reconstruct and Rig Animatable Categories with Level of Dynamics 21

    Wang, H.: Gpu-based simulation of cloth wrinkles at submillimeter levels. ACM Transactions on Graphics (TOG)40(4), 1–14 (2021) 15 Reconstruct and Rig Animatable Categories with Level of Dynamics 21

  66. [67]

    In: Computer Graphics Forum

    Wang, Z., Dou, Z., Xu, R., Lin, C., Liu, Y., Long, X., Xin, S., Komura, T., Yuan, X., Wang, W.: Coverage axis++: Efficient inner point selection for 3d shape skeletonization. In: Computer Graphics Forum. vol. 43, p. e15143. Wiley Online Library (2024) 4

  67. [68]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Wu, G., Yi, T., Fang, J., Xie, L., Zhang, X., Wei, W., Liu, W., Tian, Q., Wang, X.: 4d gaussian splatting for real-time dynamic scene rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 20310–20320 (June 2024) 12, 13

  68. [69]

    ACM Transac- tionsonGraphics(Aug2020)

    Xu, Z., Zhou, Y., Kalogerakis, E., Landreth, C., Singh, K.: Rignet. ACM Transac- tionsonGraphics(Aug2020). https://doi.org/10.1145/3386569.3392379, http://dx.doi.org/10.1145/3386569.33923795, 15

  69. [70]

    doi:10.1145/3550469.3555392 , urldate =

    Xu, Z., Zhou, Y., Yi, L., Kalogerakis, E.: Morig: Motion-aware rigging of character meshes from point clouds. In: SIGGRAPH Asia 2022 Conference Papers (Nov 2022). https://doi.org/10.1145/3550469.3555390, http://dx.doi.org/10. 1145/3550469.35553905

  70. [71]

    Computer Graphics Forum37(7), 301–311 (2018)

    Yang, B., Yao, J., Guo, X.: Dmat: Deformable medial axis transform for ani- mated mesh approximation. Computer Graphics Forum37(7), 301–311 (2018). https://doi.org/https://doi.org/10.1111/cgf.13569 , https:// onlinelibrary.wiley.com/doi/abs/10.1111/cgf.135694

  71. [72]

    In: CVPR (2022) 3, 5

    Yang, G., Vo, M., Neverova, N., Ramanan, D., Vedaldi, A., Joo, H.: Banmo: Building animatable 3d neural models from many casual videos. In: CVPR (2022) 3, 5

  72. [73]

    In: CVPR (2023) 3, 5

    Yang, G., Wang, C., Reddy, N.D., Ramanan, D.: Reconstructing animatable cate- gories from videos. In: CVPR (2023) 3, 5

  73. [74]

    Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction.arXiv preprint arXiv:2309.13101, 2023

    Yang, Z., Gao, X., Zhou, W., Jiao, S., Zhang, Y., Jin, X.: Deformable 3d gaus- sians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101 (2023) 2

  74. [75]

    In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (2025) 3, 6, 11, 12, 13, 14, 15

    Yao, Y., Deng, Z., Hou, J.: Riggs: Rigging of 3d gaussians for modeling articulated objects in videos. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (2025) 3, 6, 11, 12, 13, 14, 15

  75. [76]

    Zhan, Y., Zhu, Q., Niu, M., Ma, M., Zhao, J., Zhong, Z., Sun, X., Qiao, Y., Zheng, Y.: Tomie: Towards modular growth in enhanced smpl skeleton for 3d human with animatable garments (2024),https://arxiv.org/abs/2410.080825

  76. [77]

    arXiv preprint arXiv:2601.06378 (2026)

    Zhang, H., Luo, J., Wan, B., Zhao, Y., Li, Z., Vasilkovsky, M., Wang, C., Wang, J., Ahuja, N., Zhou, B.: Rigmo: Unifying rig and motion learning for generative animation. arXiv preprint arXiv:2601.06378 (2026) 5, 16

  77. [78]

    ACM Transactions on Graphics (TOG)44(4), 1–18 (2025) 5

    Zhang, J.P., Pu, C.F., Guo, M.H., Cao, Y.P., Hu, S.M.: One model to rig them all: Diverse skeleton rigging with unirig. ACM Transactions on Graphics (TOG)44(4), 1–18 (2025) 5

  78. [79]

    In: IEEE Conf

    Zuffi, S., Kanazawa, A., Jacobs, D., Black, M.J.: 3D menagerie: Modeling the 3D shape and pose of animals. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (Jul 2017) 5