Recognition: no theorem link
HairOrbit: Multi-view Aware 3D Hair Modeling from Single Portraits
Pith reviewed 2026-05-13 19:36 UTC · model grok-4.3
The pith
Video generation models convert single-portrait hair reconstruction into a calibrated multi-view task for consistent strand-level results.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that leveraging the 3D priors of video generation models transforms single-view hair reconstruction into a calibrated multi-view task; this is combined with a neural orientation extractor trained on sparse real annotations for full-view direction estimation and a two-stage strand-growing algorithm on a hybrid implicit field to synthesize detailed 3D strand curves, delivering state-of-the-art performance on single-view 3D hair strand reconstruction across diverse portraits in both visible and invisible regions.
What carries the argument
Calibrated multi-view reconstruction derived from video generation model priors, which supplies consistent 3D structure for hair across synthesized views.
If this is right
- Produces state-of-the-art strand-level 3D hair from single images on diverse portraits
- Maintains consistent and realistic attributes in invisible regions
- Achieves better full-view orientation estimation than prior single-view methods
- Balances reconstruction quality with computational speed through the hybrid implicit field
Where Pith is reading between the lines
- The same prior-calibration step could be tested on other thin structures such as eyebrows or individual clothing threads.
- Pairing the output strands with existing single-image face or head geometry would produce complete head avatars from one photo.
- The two-stage growing process might extend naturally to time-varying hair if the video model is conditioned on motion.
Load-bearing premise
The 3D priors inside video generation models can be leveraged and calibrated for hair without creating inconsistencies in regions never seen in the input portrait.
What would settle it
Render the output 3D hair strands from novel viewpoints and compare them directly to real photographs taken from those exact angles on the same subject; systematic mismatch in strand direction or density in the unseen regions would falsify the claim.
Figures
read the original abstract
Reconstructing strand-level 3D hair from a single-view image is highly challenging, especially when preserving consistent and realistic attributes in unseen regions. Existing methods rely on limited frontal-view cues and small-scale/style-restricted synthetic data, often failing to produce satisfactory results in invisible regions. In this work, we propose a novel framework that leverages the strong 3D priors of video generation models to transform single-view hair reconstruction into a calibrated multi-view reconstruction task. To balance reconstruction quality and efficiency for the reformulated multi-view task, we further introduce a neural orientation extractor trained on sparse real-image annotations for better full-view orientation estimation. In addition, we design a two-stage strand-growing algorithm based on a hybrid implicit field to synthesize the 3D strand curves with fine-grained details at a relatively fast speed. Extensive experiments demonstrate that our method achieves state-of-the-art performance on single-view 3D hair strand reconstruction on a diverse range of hair portraits in both visible and invisible regions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents HairOrbit, a framework for strand-level 3D hair reconstruction from a single portrait. It reformulates the task as calibrated multi-view reconstruction by leveraging 3D priors from video generation models, introduces a neural orientation extractor trained on sparse real-image annotations for full-view orientation estimation, and employs a two-stage strand-growing algorithm based on a hybrid implicit field to synthesize detailed 3D curves. The central claim is that this approach achieves state-of-the-art performance on diverse hair portraits in both visible and invisible regions.
Significance. If the central claim holds, the work would advance single-view 3D hair modeling by demonstrating how pre-trained video models can be calibrated for multi-view consistency without heavy reliance on limited synthetic data. The hybrid implicit strand-growing step and neural orientation extractor could provide an efficient path to fine-grained details in unseen regions, with potential impact on graphics pipelines, animation, and AR/VR applications. The use of real-image annotations for the orientation extractor is a positive step toward reducing domain gaps.
major comments (3)
- [Abstract] Abstract: The claim of 'state-of-the-art performance' and 'extensive experiments' is unsupported by any quantitative metrics, error bars, ablation tables, or baseline comparisons. Without these, the central claim that the video-prior calibration and hybrid strand-growing steps outperform prior art cannot be verified and is load-bearing for acceptance.
- [Methods] Methods (orientation extractor and calibration): The description of how the neural orientation extractor is calibrated against video-model outputs to enforce multi-view consistency is absent; if this step implicitly depends on fitted components from the strand-growing stage, it risks circularity in the multi-view reformulation.
- [Experiments] Experiments: No details are provided on the test set composition, how 'invisible regions' are evaluated (e.g., via synthetic ground truth or perceptual studies), or runtime/memory trade-offs of the two-stage strand-growing algorithm, all of which are required to substantiate the efficiency and accuracy claims.
minor comments (2)
- [Abstract] Abstract: The phrase 'calibrated multi-view reconstruction task' would benefit from a one-sentence definition or pointer to the specific calibration loss or procedure.
- [Methods] Notation: The hybrid implicit field is introduced without a brief equation or diagram reference; adding a short definition would improve clarity for readers unfamiliar with implicit strand representations.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below, clarifying aspects of the manuscript and outlining planned revisions to strengthen the presentation of results and methods.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim of 'state-of-the-art performance' and 'extensive experiments' is unsupported by any quantitative metrics, error bars, ablation tables, or baseline comparisons. Without these, the central claim that the video-prior calibration and hybrid strand-growing steps outperform prior art cannot be verified and is load-bearing for acceptance.
Authors: We agree that the abstract's SOTA claim would benefit from explicit quantitative support. The full manuscript includes visual comparisons and qualitative evaluations across diverse portraits, but we will revise the abstract and add a dedicated quantitative results section with metrics (e.g., strand-level accuracy, coverage in invisible regions), error bars from multiple runs, ablation tables isolating the video-prior and hybrid growing components, and direct baseline comparisons. This will be incorporated in the revised version. revision: yes
-
Referee: [Methods] Methods (orientation extractor and calibration): The description of how the neural orientation extractor is calibrated against video-model outputs to enforce multi-view consistency is absent; if this step implicitly depends on fitted components from the strand-growing stage, it risks circularity in the multi-view reformulation.
Authors: The neural orientation extractor is trained independently on sparse real-image annotations prior to any strand-growing and produces per-view orientation maps. These maps are then used as conditioning input to the video-prior calibration step, which generates consistent multi-view images; the strand-growing stage operates downstream on the resulting hybrid implicit field. We will expand the methods section with a clear sequential diagram and explicit description of this non-circular pipeline to eliminate any ambiguity. revision: yes
-
Referee: [Experiments] Experiments: No details are provided on the test set composition, how 'invisible regions' are evaluated (e.g., via synthetic ground truth or perceptual studies), or runtime/memory trade-offs of the two-stage strand-growing algorithm, all of which are required to substantiate the efficiency and accuracy claims.
Authors: We will add a dedicated experimental setup subsection detailing the test set (composition, number of portraits, diversity of styles and viewpoints), evaluation protocol for invisible regions (synthetic ground-truth comparisons where available plus user perceptual studies), and quantitative runtime/memory benchmarks for the two-stage strand-growing algorithm versus baselines. These additions will directly support the efficiency and accuracy claims. revision: yes
Circularity Check
No significant circularity; derivation relies on external priors and independent training
full rationale
The paper's central pipeline transforms single-view reconstruction via pre-trained video generation models (external), trains a neural orientation extractor on sparse real-image annotations (independent data), and uses a hybrid implicit field for strand growing. No equations or steps in the abstract or described framework reduce by construction to fitted inputs, self-definitions, or self-citation chains. The multi-view calibration is presented as leveraging external 3D priors rather than deriving from the target reconstruction itself. This is the common case of a self-contained method with external dependencies.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Bochkovskii, A., Delaunoy, A., Germain, H., Santos, M., Zhou, Y., Richter, S.R., Koltun, V.: Depth pro: Sharp monocular metric depth in less than a second. arXiv preprint arXiv:2410.02073 (2024)
work page internal anchor Pith review arXiv 2024
-
[2]
ACM Transactions on Graphics35(4) (2016)
Chai, M., Shao, T., Wu, H., Weng, Y., Zhou, K.: Autohair: Fully automatic hair modeling from a single image. ACM Transactions on Graphics35(4) (2016)
work page 2016
-
[3]
ACM Transactions on Graphics (TOG)32(4), 1–8 (2013)
Chai, M., Wang, L., Weng, Y., Jin, X., Zhou, K.: Dynamic hair manipulation in images and videos. ACM Transactions on Graphics (TOG)32(4), 1–8 (2013)
work page 2013
-
[4]
ACM Transactions on Graphics (TOG)31(4), 1–8 (2012)
Chai,M.,Wang,L.,Weng,Y.,Yu,Y.,Guo,B.,Zhou,K.:Single-viewhairmodeling for portrait manipulation. ACM Transactions on Graphics (TOG)31(4), 1–8 (2012)
work page 2012
-
[5]
Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. ICLR1(2), 3 (2022)
work page 2022
-
[6]
ACM Transactions on Graphics (TOG)33(4), 1–10 (2014)
Hu, L., Ma, C., Luo, L., Li, H.: Robust hair capture using simulated examples. ACM Transactions on Graphics (TOG)33(4), 1–10 (2014)
work page 2014
-
[7]
ACM Transactions on Graphics (ToG)34(4), 1–9 (2015)
Hu,L.,Ma,C.,Luo,L.,Li,H.:Single-viewhairmodelingusingahairstyledatabase. ACM Transactions on Graphics (ToG)34(4), 1–9 (2015)
work page 2015
-
[8]
In: ACM SIGGRAPH 2024 conference papers
Huang, B., Yu, Z., Chen, A., Geiger, A., Gao, S.: 2d gaussian splatting for geo- metrically accurate radiance fields. In: ACM SIGGRAPH 2024 conference papers. pp. 1–11 (2024)
work page 2024
-
[9]
In: SIGGRAPH Asia 2022 Conference Papers
Kuang, Z., Chen, Y., Fu, H., Zhou, K., Zheng, Y.: Deepmvshair: Deep hair mod- eling from sparse views. In: SIGGRAPH Asia 2022 Conference Papers. pp. 1–8 (2022)
work page 2022
-
[10]
Labs, B.F.: Flux.https://github.com/black-forest-labs/flux(2024)
work page 2024
-
[11]
Labs,B.F.,Batifol,S.,Blattmann,A.,Boesel,F.,Consul,S.,Diagne,C.,Dockhorn, T., English, J., English, Z., Esser, P., Kulal, S., Lacey, K., Levi, Y., Li, C., Lorenz, D., Müller, J., Podell, D., Rombach, R., Saini, H., Sauer, A., Smith, L.: Flux.1 kontext: Flow matching for in-context image generation and editing in latent space (2025),https://arxiv.org/abs/2...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[12]
ACM Transactions on Graphics (TOG) 37(6), 1–14 (2018)
Liang,S.,Huang,X.,Meng,X.,Chen,K.,Shapiro,L.G.,Kemelmacher-Shlizerman, I.: Video to fully automatic 3d hair model. ACM Transactions on Graphics (TOG) 37(6), 1–14 (2018)
work page 2018
-
[13]
Advances in Neural Information Processing Systems36, 22226–22246 (2023) 16 L
Liu, M., Xu, C., Jin, H., Chen, L., Varma T, M., Xu, Z., Su, H.: One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. Advances in Neural Information Processing Systems36, 22226–22246 (2023) 16 L. Jin et al
work page 2023
-
[14]
In: Seminal graphics: pioneering efforts that shaped the field, pp
Lorensen, W.E., Cline, H.E.: Marching cubes: A high resolution 3d surface con- struction algorithm. In: Seminal graphics: pioneering efforts that shaped the field, pp. 347–353 (1998)
work page 1998
-
[15]
In: 2012 IEEE Conference on Computer Vision and Pattern Recognition
Luo, L., Li, H., Paris, S., Weise, T., Pauly, M., Rusinkiewicz, S.: Multi-view hair capture using orientation fields. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1490–1497. IEEE (2012)
work page 2012
-
[16]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Luo, L., Zhang, C., Zhang, Z., Rusinkiewicz, S.: Wide-baseline hair capture using strand-based refinement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 265–272 (2013)
work page 2013
-
[17]
Commu- nications of the ACM65(1), 99–106 (2021)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Commu- nications of the ACM65(1), 99–106 (2021)
work page 2021
-
[18]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Nam, G., Wu, C., Kim, M.H., Sheikh, Y.: Strand-accurate multi-view hair capture. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 155–164 (2019)
work page 2019
-
[19]
In: European conference on computer vision
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose esti- mation. In: European conference on computer vision. pp. 483–499. Springer (2016)
work page 2016
-
[20]
DINOv2: Learning Robust Visual Features without Supervision
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[21]
ACM transactions on graphics (TOG)23(3), 712–719 (2004)
Paris, S., Briceno, H.M., Sillion, F.X.: Capture of hair geometry from multiple images. ACM transactions on graphics (TOG)23(3), 712–719 (2004)
work page 2004
- [22]
-
[23]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)
work page 2022
-
[24]
In: International Conference on Medical image computing and computer-assisted intervention
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedi- cal image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. pp. 234–241. Springer (2015)
work page 2015
-
[25]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Rosu, R.A., Wu, K., Feng, Y., Zheng, Y., Black, M.J.: Difflocks: Generating 3d hair from a single image using diffusion models. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 10847–10857 (2025)
work page 2025
-
[26]
Saito, S., Hu, L., Ma, C., Ibayashi, H., Luo, L., Li, H.: 3d hair synthesis using volumetricvariationalautoencoders.ACMTransactionsonGraphics(TOG)37(6), 1–12 (2018)
work page 2018
-
[27]
IEEE transactions on visualization and computer graphics 27(7), 3250–3263 (2020)
Shen, Y., Zhang, C., Fu, H., Zhou, K., Zheng, Y.: Deepsketchhair: Deep sketch- based 3d hair modeling. IEEE transactions on visualization and computer graphics 27(7), 3250–3263 (2020)
work page 2020
-
[28]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Sklyarova, V., Chelishev, J., Dogaru, A., Medvedev, I., Lempitsky, V., Zakharov, E.: Neural haircut: Prior-guided strand-based hair reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19762–19773 (2023)
work page 2023
-
[29]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Sklyarova, V., Zakharov, E., Hilliges, O., Black, M.J., Thies, J.: Text-conditioned generative model of 3d strand-based human hairstyles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4703– 4712 (2024)
work page 2024
-
[30]
In: HairOrbit 17 Proceedings of the IEEE/CVF International Conference on Computer Vision
Sklyarova, V., Zakharov, E., Prinzler, M., Becherini, G., Black, M.J., Thies, J.: Im2haircut: Single-view strand-based hair reconstruction for human avatars. In: HairOrbit 17 Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10656–10665 (2025)
work page 2025
-
[31]
Tang, J., Ren, J., Zhou, H., Liu, Z., Zeng, G.: Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653 (2023)
-
[32]
IEEE Transactions on Visualization and Computer Graphics (2025)
Tang, Z., Geng, J., Weng, Y., Zheng, Y., Zhou, K.: Single-view 3d hair modeling with clumping optimization. IEEE Transactions on Visualization and Computer Graphics (2025)
work page 2025
-
[33]
Wan: Open and Advanced Large-Scale Video Generative Models
Wan, T., Wang, A., Ai, B., Wen, B., Mao, C., Xie, C.W., Chen, D., Yu, F., Zhao, H., Yang, J., et al.: Wan: Open and advanced large-scale video generative models. arXiv preprint arXiv:2503.20314 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[34]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Wu, K., Yang, L., Kuang, Z., Feng, Y., Han, X., Shen, Y., Fu, H., Zhou, K., Zheng, Y.: Monohair: High-fidelity hair modeling from a monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 24164–24173 (2024)
work page 2024
-
[35]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Wu, K., Ye, Y., Yang, L., Fu, H., Zhou, K., Zheng, Y.: Neuralhdhair: Automatic high-fidelity hair modeling from a single image using implicit neural represen- tations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1526–1535 (2022)
work page 2022
-
[36]
ACM Transactions on Graphics (TOG)38(6), 1–12 (2019)
Yang, L., Shi, Z., Zheng, Y., Zhou, K.: Dynamic hair modeling from monocular videos using deep neural networks. ACM Transactions on Graphics (TOG)38(6), 1–12 (2019)
work page 2019
-
[37]
In: European Conference on Computer Vision
Zakharov, E., Sklyarova, V., Black, M., Nam, G., Thies, J., Hilliges, O.: Human hair reconstruction with strand-aligned 3d gaussians. In: European Conference on Computer Vision. pp. 409–425. Springer (2024)
work page 2024
- [38]
-
[39]
ACM Transactions on Graphics (TOG)37(6), 1–10 (2018)
Zhang, M., Wu, P., Wu, H., Weng, Y., Zheng, Y., Zhou, K.: Modeling hair from an rgb-d camera. ACM Transactions on Graphics (TOG)37(6), 1–10 (2018)
work page 2018
-
[40]
Visual Informatics3(2), 102–112 (2019)
Zhang, M., Zheng, Y.: Hair-GAN: Recovering 3D hair structure from a single image using generative adversarial networks. Visual Informatics3(2), 102–112 (2019)
work page 2019
-
[41]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)
work page 2018
-
[42]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zheng, Y., Jin, Z., Li, M., Huang, H., Ma, C., Cui, S., Han, X.: Hairstep: Transfer synthetic to real using strand and depth maps for single-view 3d hair modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12726–12735 (2023)
work page 2023
-
[43]
In: SIGGRAPH Asia 2024 Conference Papers
Zheng, Y., Qiu, Y., Jin, L., Ma, C., Huang, H., Zhang, D., Wan, P., Han, X.: Towards unified 3d hair reconstruction from single-view portraits. In: SIGGRAPH Asia 2024 Conference Papers. pp. 1–11 (2024)
work page 2024
-
[44]
In: Proceedings of the European Conference on Computer Vision (ECCV)
Zhou,Y.,Hu,L.,Xing,J.,Chen,W.,Kung,H.W.,Tong,X.,Li,H.:Hairnet:Single- view hair reconstruction using convolutional neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 235–251 (2018)
work page 2018
-
[45]
ACM Transactions on Graphics (TOG)43(6), 1–15 (2024)
Zhou, Y., Chai, M., Wang, D., Winberg, S., Wood, E., Sarkar, K., Gross, M., Beeler, T.: Groomcap: High-fidelity prior-free hair capture. ACM Transactions on Graphics (TOG)43(6), 1–15 (2024)
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.