pith. sign in

arxiv: 2606.08043 · v1 · pith:7DYGIPVBnew · submitted 2026-06-06 · 💻 cs.GR · cs.CV

OmniFaceRig: Fully Automatic Inner-Mouth-Aware Face Rigging Across Diverse 3D Character Topologies

Pith reviewed 2026-06-27 19:01 UTC · model grok-4.3

classification 💻 cs.GR cs.CV
keywords face riggingFACS blendshapes3D character animationautomatic rigginginner-mouth geometryprocedural modelingtopology adaptationfacial parsing
0
0 comments X

The pith

OmniFaceRig automatically converts static 3D character meshes into FACS rigs complete with teeth, gums, and tongue across human and animal topologies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes an end-to-end pipeline that accepts any surface-only 3D mesh and produces an animation-ready rig with up to 155 blendshapes plus fitted inner-mouth geometry. Manual steps such as landmark placement, template adjustment, and oral cavity modeling are eliminated. The approach handles humans, humanoids, long-muzzled animals, and short-muzzled animals by selecting topology-specific templates and applying collision-aware fitting. A public benchmark of 1,000 rigged characters is released to support evaluation of similar methods.

Core claim

OmniFaceRig is a fully automatic pipeline that turns a static surface-only 3D character mesh into an inner-mouth-aware FACS rig containing up to 155 blendshapes, procedurally placed teeth, gums, and tongue, and repacked UVs, while supporting humans, humanoids, and both long- and short-muzzled animals with no landmarks, templates, or per-asset tuning.

What carries the argument

The hybrid pipeline of VLM+CV riggability checking, multi-model face parsing, dense keypoint-driven template registration, procedural inner-mouth construction, and collision-aware blendshape transfer.

If this is right

  • Generated character meshes can move directly into animation without additional rigging labor.
  • Non-human characters receive topology-matched templates and collision-safe inner-mouth placement automatically.
  • The released 1,000-character Omni-Bench dataset supplies standardized test cases for rigging quality across species.
  • UV and texture repacking produces animation-ready assets in one pass.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Animation studios could shorten the gap between asset generation and usable facial animation.
  • The same registration and fitting logic might extend to other facial control systems beyond FACS.
  • Automated inner-mouth construction could reduce the need for separate dental modeling tools.

Load-bearing premise

The hybrid VLM+CV riggability checking and multi-model face parsing ensemble will correctly classify and parse inputs across all screened diverse topologies without manual intervention or failures that break later registration and fitting steps.

What would settle it

A single new character mesh from an unseen topology where the parsing ensemble fails to locate the mouth or misidentifies the species category, causing the inner-mouth fitting step to produce invalid geometry or intersections.

read the original abstract

Facial rigging - creating FACS-based blendshapes together with inner-mouth geometry (teeth, gums, and tongue) - remains a major bottleneck in 3D character production. Existing pipelines still require substantial designer effort, especially for manual landmark annotation, per-character template adjustment, and inner-mouth placement. We present OmniFaceRig, a fully automatic end-to-end pipeline that converts a static surface-only 3D character mesh, with no pre-modeled oral cavity, into an inner-mouth-aware FACS rig with up to 155 blendshapes, procedurally fitted teeth, gums, and tongue, and re-packed UV/texture. OmniFaceRig supports diverse topologies - humans, humanoids, long-muzzled animals (e.g., dogs, wolves, foxes), and short-muzzled animals (e.g., cats, bears, rabbits, tigers) - with no manual landmarks, no user-provided templates, and no per-asset setup. The pipeline combines hybrid VLM+CV riggability checking, multi-model face parsing, dense keypoint-driven template registration, procedural inner-mouth construction, and collision-aware blendshape transfer. For non-human characters, OmniFaceRig selects topology-specific face and inner-mouth templates and uses collision-aware inner-mouth fitting to reduce teeth-face intersections without exposing users to category-specific tuning. We also publicly release Omni-Bench, a freely available benchmark dataset of 1,000 biped 3D characters with FACS facial blendshapes and inner-mouth geometry, spanning humans, humanoids, cats, dogs, and other animals. Experiments show high final rigging success on screened Omni-Bench inputs, nearly complete face detection recall from the segmentation ensemble and reliable inner-mouth placement with low penetration. Together, OmniFaceRig provides an automatic path from static generated characters to animation-ready facial rigs across both human and non-human topologies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper presents OmniFaceRig, a fully automatic end-to-end pipeline that takes a static surface-only 3D character mesh (no pre-modeled oral cavity) and outputs an inner-mouth-aware FACS rig with up to 155 blendshapes, procedurally fitted teeth/gums/tongue, and re-packed UV/texture. It supports diverse topologies (humans, humanoids, long-muzzled and short-muzzled animals) with no manual landmarks, templates, or per-asset setup. The pipeline uses hybrid VLM+CV riggability checking, multi-model face parsing, dense keypoint-driven registration, procedural inner-mouth construction, and collision-aware blendshape transfer. The authors release Omni-Bench, a dataset of 1,000 biped characters with FACS blendshapes and inner-mouth geometry. Experiments report high rigging success on screened inputs, near-complete face detection recall, and reliable inner-mouth placement with low penetration.

Significance. If the pipeline reliably achieves the claimed automation across unfiltered diverse topologies, it would substantially reduce manual effort in 3D character production pipelines. The public release of Omni-Bench is a clear positive contribution that enables future benchmarking. The hybrid VLM+CV approach and topology-specific template selection for non-humans represent a practical engineering advance, though the absence of detailed quantitative metrics, ablation studies, or failure-mode analysis in the provided abstract limits assessment of robustness.

major comments (1)
  1. [Abstract / Experiments] Abstract and Experiments section: The central claim of a 'fully automatic' pipeline with 'no per-asset setup' and 'no manual landmarks' is load-bearing on the hybrid VLM+CV riggability checker and multi-model parsing ensemble working without failures across all screened diverse topologies. However, success is reported only on 'screened Omni-Bench inputs' with qualifiers such as 'nearly complete face detection recall' and 'high final rigging success,' without per-category failure rates, confusion matrices, or the fraction of inputs rejected by the checker. This directly affects whether the 'no manual intervention' guarantee holds.
minor comments (2)
  1. [Abstract] Abstract: The description of 'up to 155 blendshapes' and 'reliable inner-mouth placement with low penetration' would benefit from explicit quantitative metrics (e.g., mean/max penetration depth, success rate percentages) rather than qualitative statements.
  2. [Method] The manuscript should clarify the exact composition of the multi-model face parsing ensemble and any fallback mechanisms when individual models disagree on non-human topologies.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed feedback. We agree that the current reporting of aggregate metrics on screened inputs leaves the robustness of the riggability checker and parsing ensemble insufficiently quantified, and we will revise the Experiments section to address this directly.

read point-by-point responses
  1. Referee: [Abstract / Experiments] Abstract and Experiments section: The central claim of a 'fully automatic' pipeline with 'no per-asset setup' and 'no manual landmarks' is load-bearing on the hybrid VLM+CV riggability checker and multi-model parsing ensemble working without failures across all screened diverse topologies. However, success is reported only on 'screened Omni-Bench inputs' with qualifiers such as 'nearly complete face detection recall' and 'high final rigging success,' without per-category failure rates, confusion matrices, or the fraction of inputs rejected by the checker. This directly affects whether the 'no manual intervention' guarantee holds.

    Authors: We accept this criticism. The manuscript will be revised to report (1) per-category success/failure rates for the VLM+CV riggability checker and the multi-model parsing ensemble across human, humanoid, long-muzzled, and short-muzzled classes; (2) the exact fraction of Omni-Bench inputs rejected by the checker before any rigging proceeds; and (3) a concise failure-mode summary. These numbers will be added to the Experiments section and the abstract will be updated to remove or qualify the current aggregate phrasing. We already possess the underlying per-asset logs and can compute the requested breakdowns without new experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering pipeline with no derivations or fitted predictions

full rationale

The paper presents an end-to-end engineering system (VLM+CV checker, multi-model parsing, template registration, procedural inner-mouth construction) whose claims rest on empirical success rates rather than any mathematical derivation chain. No equations, parameters fitted to subsets then re-predicted, self-citation load-bearing uniqueness theorems, or ansatzes appear in the provided text. The 'fully automatic' claim is supported by reported recall and success metrics on screened data, not by construction from inputs. This is the normal non-finding for a systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The system relies on standard computer vision assumptions for parsing and registration but introduces no new free parameters, axioms beyond domain norms, or invented entities.

axioms (1)
  • domain assumption Multi-model face parsing and dense keypoint detection will succeed on diverse character topologies including long- and short-muzzled animals.
    Invoked in the description of the hybrid VLM+CV pipeline and template registration steps.

pith-pipeline@v0.9.1-grok · 5923 in / 1264 out tokens · 30973 ms · 2026-06-27T19:01:57.392495+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

155 extracted references · 10 canonical work pages · 3 internal anchors

  1. [1]

    SIGGRAPH , pages=

    A morphable model for the synthesis of 3D faces , author=. SIGGRAPH , pages=

  2. [2]

    IEEE Transactions on Visualization and Computer Graphics , volume=

    FaceWarehouse: A 3D facial expression database for visual computing , author=. IEEE Transactions on Visualization and Computer Graphics , volume=

  3. [3]

    ACM Transactions on Graphics (TOG) , volume=

    Learning a model of facial shape and expression from 4D scans , author=. ACM Transactions on Graphics (TOG) , volume=

  4. [4]

    European Conference on Computer Vision (ECCV) , year=

    Generating 3D faces using convolutional mesh autoencoders , author=. European Conference on Computer Vision (ECCV) , year=

  5. [5]

    ACM Transactions on Graphics (TOG) , volume=

    Learning an animatable detailed 3D face model from in-the-wild images , author=. ACM Transactions on Graphics (TOG) , volume=

  6. [6]

    International Conference on Computer Vision (ICCV) , year=

    A decoupled 3D facial shape model by adversarial training , author=. International Conference on Computer Vision (ICCV) , year=

  7. [7]

    Computer Vision and Pattern Recognition (CVPR) , year=

    Disentangled representation learning for 3D face shape , author=. Computer Vision and Pattern Recognition (CVPR) , year=

  8. [8]

    ACM Transactions on Graphics (TOG) , volume=

    Dreamface: Progressive generation of animatable 3D faces under text guidance , author=. ACM Transactions on Graphics (TOG) , volume=

  9. [9]

    International Conference on Learning Representations (ICLR) , year=

    DreamFusion: Text-to-3D using 2D Diffusion , author=. International Conference on Learning Representations (ICLR) , year=

  10. [10]

    Computer Vision and Pattern Recognition (CVPR) , year=

    Magic3D: High-Resolution Text-to-3D Content Creation , author=. Computer Vision and Pattern Recognition (CVPR) , year=

  11. [12]

    International Conference on Computer Vision (ICCV) , year=

    Zero-1-to-3: Zero-shot One Image to 3D Object , author=. International Conference on Computer Vision (ICCV) , year=

  12. [14]

    2025 , note=

    Introducing Meta 3D AssetGen 2.0: A New Foundation Model for 3D Content Creation , author=. 2025 , note=

  13. [15]

    Boss, Mark and Huang, Zixuan and Vasishta, Aaryaman and Jampani, Varun , journal=

  14. [17]

    Facial Action Coding System: A Technique for the Measurement of Facial Movement , author=

  15. [18]

    Eurographics State of the Art Reports (EG STAR) , year=

    Practice and theory of blendshape facial models , author=. Eurographics State of the Art Reports (EG STAR) , year=

  16. [19]

    IEEE Computer Graphics and Applications , volume=

    Direct manipulation blendshapes , author=. IEEE Computer Graphics and Applications , volume=

  17. [20]

    ACM Transactions on Graphics (TOG) , volume=

    Artist friendly facial animation retargeting , author=. ACM Transactions on Graphics (TOG) , volume=

  18. [21]

    Graphical Models , volume=

    Sketching manipulators for localized blendshape editing , author=. Graphical Models , volume=

  19. [22]

    Graphical Models , volume=

    Stabilized blendshape editing using localized jacobian transpose descent , author=. Graphical Models , volume=

  20. [23]

    SIGGRAPH , year=

    Compressed skinning for facial blendshapes , author=. SIGGRAPH , year=

  21. [24]

    SIGGRAPH , pages=

    Expression cloning , author=. SIGGRAPH , pages=

  22. [25]

    ACM Transactions on Graphics (TOG) , volume=

    Deformation transfer for triangle meshes , author=. ACM Transactions on Graphics (TOG) , volume=

  23. [26]

    ACM Transactions on Graphics (TOG) , volume=

    Example-based facial rigging , author=. ACM Transactions on Graphics (TOG) , volume=

  24. [27]

    Computer Graphics Forum (CGF) , volume=

    Transferring the rig and animations from a character to different face models , author=. Computer Graphics Forum (CGF) , volume=

  25. [28]

    SIGGRAPH Courses , year=

    Face transfer with multilinear models , author=. SIGGRAPH Courses , year=

  26. [29]

    European Conference on Computer Vision (ECCV) , year=

    Multilinear wavelets: A statistical shape space for human faces , author=. European Conference on Computer Vision (ECCV) , year=

  27. [30]

    Computer Animation and Virtual Worlds (CAVW) , volume=

    A framework for locally retargeting and rendering facial performance , author=. Computer Animation and Virtual Worlds (CAVW) , volume=

  28. [31]

    ACM Transactions on Graphics (TOG) , volume=

    Facial retargeting with automatic range of motion alignment , author=. ACM Transactions on Graphics (TOG) , volume=

  29. [32]

    ACM Transactions on Graphics (TOG) , volume=

    Spacetime expression cloning for blendshapes , author=. ACM Transactions on Graphics (TOG) , volume=

  30. [33]

    ACM Transactions on Graphics (TOG) , volume=

    Meshcnn: a network with an edge , author=. ACM Transactions on Graphics (TOG) , volume=

  31. [34]

    ACM Transactions on Graphics (TOG) , volume=

    DiffusionNet: Discretization agnostic learning on surfaces , author=. ACM Transactions on Graphics (TOG) , volume=

  32. [35]

    ACM Transactions on Graphics (TOG) , volume=

    Dynamic facial asset and rig generation from a single scan , author=. ACM Transactions on Graphics (TOG) , volume=

  33. [36]

    Computer Graphics Forum (CGF) , volume=

    Shape transformers: Topology-independent 3D shape models using transformers , author=. Computer Graphics Forum (CGF) , volume=

  34. [37]

    ACM Transactions on Graphics (TOG) , volume=

    Local anatomically-constrained facial performance retargeting , author=. ACM Transactions on Graphics (TOG) , volume=

  35. [38]

    International Conference on 3D Vision (3DV) , year=

    Semantic deep face models , author=. International Conference on 3D Vision (3DV) , year=

  36. [39]

    IEEE Conference on Virtual Reality and 3D User Interfaces (VR) , pages=

    Fully automatic blendshape generation for stylized characters , author=. IEEE Conference on Virtual Reality and 3D User Interfaces (VR) , pages=

  37. [40]

    European Conference on Computer Vision (ECCV) , year=

    High-quality mesh blendshape generation from face videos via neural inverse rendering , author=. European Conference on Computer Vision (ECCV) , year=

  38. [41]

    SIGGRAPH , pages=

    Neural face rigging for animating and retargeting facial meshes in the wild , author=. SIGGRAPH , pages=

  39. [42]

    SIGGRAPH Asia , year=

    Fabrig: A cloth-simulated transferable 3D face parameterization , author=. SIGGRAPH Asia , year=

  40. [43]

    Computer Vision and Pattern Recognition (CVPR) , pages=

    Zero-shot pose transfer for unrigged stylized 3D characters , author=. Computer Vision and Pattern Recognition (CVPR) , pages=

  41. [44]

    Eurographics , year=

    Neural facial deformation transfer , author=. Eurographics , year=

  42. [45]

    Computer Graphics Forum (CGF) , volume=

    Neural face skinning for mesh-agnostic facial expression cloning , author=. Computer Graphics Forum (CGF) , volume=

  43. [46]

    Neural Information Processing Systems (NeurIPS) , year=

    RigAnyFace: Scaling neural facial mesh auto-rigging with unlabeled data , author=. Neural Information Processing Systems (NeurIPS) , year=

  44. [47]

    IEEE Conference on Virtual Reality and 3D User Interfaces (VR) , year=

    Automatic generation and stylization of 3D facial rigs , author=. IEEE Conference on Virtual Reality and 3D User Interfaces (VR) , year=

  45. [49]

    Advanced Video and Signal Based Surveillance (AVSS) , year=

    A 3D face model for pose and illumination invariant face recognition , author=. Advanced Video and Signal Based Surveillance (AVSS) , year=

  46. [50]

    IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) , year=

    FaceScape: 3D facial dataset and benchmark for single-view 3D face reconstruction , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) , year=

  47. [51]

    Computer Vision and Pattern Recognition (CVPR) , year=

    FaceScape: a large-scale high quality 3D face dataset and detailed riggable 3D face prediction , author=. Computer Vision and Pattern Recognition (CVPR) , year=

  48. [53]

    Computer Vision and Pattern Recognition (CVPR) , year=

    Capture, learning, and synthesis of 3D speaking styles , author=. Computer Vision and Pattern Recognition (CVPR) , year=

  49. [54]

    Computer Vision and Pattern Recognition (CVPR) , year=

    Learning formation of physically-based face attributes , author=. Computer Vision and Pattern Recognition (CVPR) , year=

  50. [55]

    Computer Vision and Pattern Recognition (CVPR) , year=

    RaBit: Parametric modeling of 3D biped cartoon characters with a topological-consistent dataset , author=. Computer Vision and Pattern Recognition (CVPR) , year=

  51. [57]

    Computer Vision and Pattern Recognition (CVPR) , year=

    Make-it-vivid: Dressing your animatable biped cartoon characters from text , author=. Computer Vision and Pattern Recognition (CVPR) , year=

  52. [59]

    ACM Transactions on Graphics (TOG) , volume=

    Bounded Biharmonic Weights for Real-Time Deformation , author=. ACM Transactions on Graphics (TOG) , volume=

  53. [60]

    SIGGRAPH , pages=

    Retargeting Motion to New Characters , author=. SIGGRAPH , pages=

  54. [61]

    Lugaresi, Camillo and Tang, Jiuqiang and Nash, Hadon and others , booktitle=

  55. [62]

    As-Rigid-As-Possible Surface Modeling , author=. Proc. SGP , pages=

  56. [63]

    Carion, Nicolas and Gustafson, Laura and Hu, Yuan-Ting and Debnath, Shoubhik and Hu, Ronghang and Suris, Didac and Ryali, Chaitanya and Alwala, Kalyan Vasudev and Khedr, Haitham and Huang, Andrew and others , journal=

  57. [64]

    European Conference on Computer Vision (ECCV) , year=

    Sapiens: Foundation for Human Vision Models , author=. European Conference on Computer Vision (ECCV) , year=

  58. [65]

    Computer Vision and Pattern Recognition (CVPR) , pages=

    3D Menagerie: Modeling the 3D Shape and Pose of Animals , author=. Computer Vision and Pattern Recognition (CVPR) , pages=

  59. [66]

    International Journal of Computer Vision (IJCV) , volume=

    R. International Journal of Computer Vision (IJCV) , volume=

  60. [67]

    Eurographics State of the Art Reports (EG STAR) , pages=

    A Facial Rigging Survey , author=. Eurographics State of the Art Reports (EG STAR) , pages=

  61. [68]

    Disney Research , booktitle=

  62. [69]

    Jang, Wonhyeok and others , booktitle=

  63. [70]

    International Conference on Computer Vision (ICCV) , pages=

    Segment Anything , author=. International Conference on Computer Vision (ICCV) , pages=

  64. [72]

    Yu, Changqian and Wang, Jingbo and Peng, Chao and Gao, Changxin and Yu, Gang and Sang, Nong , booktitle=

  65. [73]

    Journal of Computer Science and Technology , volume=

    Spectral Animation Compression , author=. Journal of Computer Science and Technology , volume=

  66. [74]

    ACM Multimedia (ACM MM) , year=

    Facial Landmark Detection for Stylized Characters , author=. ACM Multimedia (ACM MM) , year=

  67. [75]

    Computer Vision and Pattern Recognition (CVPR) , pages=

    GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians , author=. Computer Vision and Pattern Recognition (CVPR) , pages=

  68. [76]

    European Conference on Computer Vision (ECCV) , pages=

    HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting , author=. European Conference on Computer Vision (ECCV) , pages=

  69. [77]

    Computer Vision and Pattern Recognition (CVPR) , pages=

    FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding , author=. Computer Vision and Pattern Recognition (CVPR) , pages=

  70. [78]

    ACM SIGGRAPH 2024 Conference Proceedings , year=

    3D Gaussian Blendshapes for Head Avatar Animation , author=. ACM SIGGRAPH 2024 Conference Proceedings , year=

  71. [79]

    Computer Vision and Pattern Recognition (CVPR) , pages=

    RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars , author=. Computer Vision and Pattern Recognition (CVPR) , pages=

  72. [80]

    Computer Vision and Pattern Recognition (CVPR) , year=

    MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing , author=. Computer Vision and Pattern Recognition (CVPR) , year=

  73. [81]

    Computer Vision and Pattern Recognition (CVPR) , year=

    Gaussian Eigen Models for Human Heads , author=. Computer Vision and Pattern Recognition (CVPR) , year=

  74. [82]

    Computer Vision and Pattern Recognition (CVPR) , year=

    FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video , author=. Computer Vision and Pattern Recognition (CVPR) , year=

  75. [83]

    International Conference on Computer Vision (ICCV) , year=

    Avat3r: Large Animatable Gaussian Reconstruction Model for High-Fidelity 3D Head Avatars , author=. International Conference on Computer Vision (ICCV) , year=

  76. [84]

    A decoupled 3d facial shape model by adversarial training

    Victoria F Abrevaya, Adnane Boukhayma, Stefanie Wuhrer, and Edmond Boyer. A decoupled 3d facial shape model by adversarial training. In International Conference on Computer Vision (ICCV), 2019

  77. [85]

    A morphable model for the synthesis of 3d faces

    Volker Blanz and Thomas Vetter. A morphable model for the synthesis of 3d faces. In SIGGRAPH, pages 187--194, 1999

  78. [86]

    SF3D : Stable fast 3D mesh reconstruction with UV -unwrapping and illumination disentanglement

    Mark Boss, Zixuan Huang, Aaryaman Vasishta, and Varun Jampani. SF3D : Stable fast 3D mesh reconstruction with UV -unwrapping and illumination disentanglement. arXiv preprint arXiv:2408.00653, 2024

  79. [87]

    Multilinear wavelets: A statistical shape space for human faces

    Alan Brunton, Timo Bolkart, and Chenglei Wu. Multilinear wavelets: A statistical shape space for human faces. In European Conference on Computer Vision (ECCV), 2014

  80. [88]

    Facewarehouse: A 3d facial expression database for visual computing

    Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics, 20 0 (3): 0 413--425, 2014

Showing first 80 references.