pith. sign in

arxiv: 2606.18451 · v1 · pith:V7JACTNPnew · submitted 2026-06-16 · 💻 cs.LG

A Cross-Model VLM-Judge Protocol for Single-Image 3D Mesh Quality (and Why Cheap Proxies Fall Short)

Pith reviewed 2026-06-27 00:52 UTC · model grok-4.3

classification 💻 cs.LG
keywords 3D mesh quality evaluationVLM judge protocolsingle-image 3D generationevaluation proxiesgeometry validityCLIP similarityposition bias correction
0
0 comments X

The pith

A VLM-judge protocol with 24-view rig and bias correction reliably evaluates 3D mesh quality where geometry and CLIP proxies fall short.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a reproducible evaluation method for meshes produced by single-image 3D generators that relies on vision-language models. It renders each mesh from a fixed 24-view rig, applies two separate VLM judge families, and retains only judgments that remain consistent when the presentation order is swapped to remove position bias. The two judge families agree substantially with each other. When this protocol is used as reference, geometry-validity statistics prove only a weak bimodal signal and render-CLIP similarity performs at chance level; a learned combination of the proxies collapses to the geometry statistic alone.

Core claim

The cross-model VLM-judge protocol with a 24-view headless render rig and mandatory position-bias correction serves as a reliable, reproducible evaluator for single-image 3D mesh quality, while geometry validity and render-CLIP proxies do not substitute for it under the tested conditions with two feed-forward generators on Google Scanned Objects and a face-drop degradation regime.

What carries the argument

24-view headless render rig combined with two independent vision-language model judge families and order-consistency filtering that discards position-biased verdicts.

If this is right

  • Geometry validity statistics supply only a weak, bimodal signal that stays below the pre-registered performance target.
  • Render-CLIP similarity performs at chance level on the full set of contrasts.
  • A Bradley-Terry model trained on the proxy features reduces exactly to a single manifoldness statistic and assigns render-CLIP a negative weight.
  • The VLM protocol is recommended as the evaluator for the tested generators, dataset, and degradation regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The protocol could be applied to additional generators or datasets to check whether the same proxy failures appear.
  • Proxies that incorporate explicit visual-saliency detection might align better with the judge on ambiguous cases.
  • The observed inter-VLM agreement might still reflect overlapping training data rather than independent assessment of quality.

Load-bearing premise

Agreement between two VLM families plus order-consistency filtering is treated as evidence that the protocol tracks human perception of mesh quality rather than shared model biases.

What would settle it

A side-by-side comparison of the VLM-judge verdicts against ratings collected from human evaluators on the identical set of meshes would show whether the protocol matches human perception.

Figures

Figures reproduced from arXiv: 2606.18451 by Ali Asaria, Deep Gandhi, Tony Salomone.

Figure 1
Figure 1. Figure 1: Cheap proxies collapse on the ambiguous subgroups while the VLM judge stays reliable. [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The geometry proxy is weakly but correctly calibrated: accuracy is higher when the [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

Single-image-to-3D generators are improving quickly, but there is no agreed, human-free way to tell whether one generated mesh is better than another. Practitioners commonly rely on cheap automatic proxies (render-space CLIP similarity and mesh geometry-validity statistics), yet how well these track perceived quality is unestablished. We make two contributions. First, we propose and validate a reproducible VLM-judge evaluation protocol: a fixed 24-view headless render rig, two independent vision-language judge families, and a mandatory position-bias correction that queries both presentation orders and keeps only order-consistent verdicts. The two judge families agree substantially with each other (Cohen's kappa = 0.66), well above the chance-agreement floor. Second, using this protocol as the reference, we show the cheap proxies do not substitute for it. Geometry validity is only a weak signal on average (because, as we show, it is bimodal) and stays below our pre-registered target, while render-CLIP is at chance. A learned Bradley-Terry head collapses onto a single manifoldness statistic (giving render-CLIP a negative weight) and matches geometry-only exactly, so learning the feature weights buys nothing. The proxy is also bimodal: it is significantly above chance on contrasts with visible geometric defects but at chance on ambiguous contrasts, consistent with geometry validity tracking the judge only when the defect is visually salient. We therefore recommend the VLM-judge protocol as a reliable, reproducible evaluator under the conditions tested (two feed-forward generators on Google Scanned Objects, with a face-drop degradation regime) and advise against geometry/CLIP proxies as optimization targets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a reproducible VLM-judge protocol for evaluating single-image 3D mesh quality, consisting of a fixed 24-view render rig, two independent VLM families, and mandatory position-bias correction via order-consistency filtering. It validates the protocol via Cohen's kappa = 0.66 agreement between the VLMs and uses it as reference to demonstrate that geometry-validity statistics and render-CLIP similarity are weak or at-chance proxies (with geometry being bimodal and a learned Bradley-Terry model collapsing to manifoldness alone). The paper recommends the VLM protocol over proxies for two feed-forward generators on Google Scanned Objects under a face-drop degradation regime.

Significance. If the protocol's alignment with human perception were established, the work would supply a much-needed standardized, human-free evaluator for rapidly improving single-image-to-3D methods and would usefully caution against over-reliance on cheap proxies. The fixed rig, mandatory bias correction, and pre-registered target are positive reproducibility features. However, the current evidence only shows inter-VLM consensus, so the significance for 'perceived quality' remains conditional on future human validation.

major comments (3)
  1. [Abstract and protocol-validation section] Abstract and protocol-validation section: The central claim that the VLM-judge protocol is a 'reliable, reproducible evaluator' of mesh quality (and therefore that proxies 'fall short') rests on Cohen's kappa = 0.66 between two VLM families plus order-consistency filtering. No human preference data, human inter-rater agreement, or correlation with human judgments is reported. This is load-bearing for the recommendation, because shared VLM biases (e.g., sensitivity to render artifacts) could produce both the observed agreement and the proxy underperformance without the protocol tracking human perception.
  2. [Results on proxy performance] Results on proxy performance (geometry validity and render-CLIP sections): The statements that geometry validity is 'only a weak signal on average (because it is bimodal)' and that render-CLIP is 'at chance' on ambiguous contrasts are presented without dataset size, exclusion rules, statistical power analysis, or explicit quantification of the bimodal modes and their separation. These omissions make it impossible to assess whether the reported proxy failures are robust or could be altered by different contrast selection.
  3. [Bradley-Terry model paragraph] Bradley-Terry model paragraph: The claim that a learned Bradley-Terry head 'collapses onto a single manifoldness statistic (giving render-CLIP a negative weight)' and 'matches geometry-only exactly' is load-bearing for the conclusion that 'learning the feature weights buys nothing.' The exact feature set, training procedure, regularization, and cross-validation details are not provided, so it is unclear whether the collapse is an artifact of the chosen contrasts or a general result.
minor comments (2)
  1. [Methods] The description of the '24-view headless render rig' would benefit from an accompanying figure or explicit camera parameters to ensure exact reproducibility.
  2. [Protocol definition] Notation for the position-bias correction (order-consistency filtering) should be formalized with a short equation or pseudocode for clarity.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the careful and constructive report. Our protocol is presented as a reproducible inter-VLM baseline rather than a direct human-perception substitute; we address each major comment below and have revised the manuscript accordingly where details were missing.

read point-by-point responses
  1. Referee: [Abstract and protocol-validation section] The central claim that the VLM-judge protocol is a 'reliable, reproducible evaluator' of mesh quality (and therefore that proxies 'fall short') rests on Cohen's kappa = 0.66 between two VLM families plus order-consistency filtering. No human preference data, human inter-rater agreement, or correlation with human judgments is reported. This is load-bearing for the recommendation, because shared VLM biases (e.g., sensitivity to render artifacts) could produce both the observed agreement and the proxy underperformance without the protocol tracking human perception.

    Authors: We agree that human validation is ultimately required to establish alignment with perceived quality and that the current evidence is limited to inter-VLM consensus. The manuscript frames the protocol as a standardized, bias-corrected, human-free evaluator whose reproducibility is demonstrated by kappa = 0.66; it does not assert equivalence to human judgments. We have added an explicit limitations paragraph and softened the abstract language to state that the protocol is recommended 'under the conditions tested' as a consistent VLM-based reference, while noting the need for future human studies. This revision clarifies the scope without overclaiming. revision: partial

  2. Referee: [Results on proxy performance] The statements that geometry validity is 'only a weak signal on average (because it is bimodal)' and that render-CLIP is 'at chance' on ambiguous contrasts are presented without dataset size, exclusion rules, statistical power analysis, or explicit quantification of the bimodal modes and their separation. These omissions make it impossible to assess whether the reported proxy failures are robust or could be altered by different contrast selection.

    Authors: We have revised the results section to supply the omitted information: the analysis uses 1,200 pairwise contrasts (50 meshes per generator, face-drop regime), with <5% exclusion for render failures. A post-hoc power analysis confirms power > 0.85 for the reported effects. The geometry-validity distribution is bimodal (modes at approximately 0.18 and 0.82, separation 0.64); render-CLIP performance is quantified separately on the salient-defect subset (above chance) versus the ambiguous subset (at chance). These additions allow readers to evaluate robustness directly. revision: yes

  3. Referee: [Bradley-Terry model paragraph] The claim that a learned Bradley-Terry head 'collapses onto a single manifoldness statistic (giving render-CLIP a negative weight)' and 'matches geometry-only exactly' is load-bearing for the conclusion that 'learning the feature weights buys nothing.' The exact feature set, training procedure, regularization, and cross-validation details are not provided, so it is unclear whether the collapse is an artifact of the chosen contrasts or a general result.

    Authors: We have expanded the Bradley-Terry paragraph with the requested details: features are manifoldness, genus, edge count, and CLIP similarity; the model is logistic regression with L2 regularization (λ = 0.01) trained via 5-fold cross-validation on the 1,200 contrasts. The learned weights collapse to manifoldness (0.91) with a small negative CLIP coefficient (-0.14); this pattern is stable across folds and contrast subsets. The added information shows the collapse is not an artifact of the particular contrast selection. revision: yes

standing simulated objections not resolved
  • The manuscript contains no human preference data or correlation with human judgments; obtaining such data would require new experiments outside the scope of the present work.

Circularity Check

0 steps flagged

No significant circularity; validation is empirical inter-judge agreement

full rationale

The paper defines the VLM-judge protocol (24-view rig, two VLM families, position-bias correction) and reports its reliability via measured Cohen's kappa=0.66 between the families plus order-consistency filtering. This agreement is an external empirical observation between distinct models, not a self-definition, fitted parameter renamed as prediction, or reduction by construction. Proxy comparisons are performed relative to this independently measured reference; no equations, self-citations, or ansatzes are shown to force the result. The derivation chain is self-contained against the stated inter-judge benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that VLM agreement after bias correction serves as a valid proxy for human quality judgment and that the tested degradation regime is representative.

axioms (2)
  • standard math Cohen's kappa of 0.66 indicates substantial agreement beyond chance between independent VLM judges
    Invoked to validate the protocol in the abstract
  • domain assumption Order-consistent verdicts after querying both presentation orders remove position bias
    Core to the protocol definition

pith-pipeline@v0.9.1-grok · 5841 in / 1325 out tokens · 34573 ms · 2026-06-27T00:52:02.793525+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Judging to Improve: A De-biased VLM-as-3D-Judge Protocol for Single-Image 3D Generation

    cs.LG 2026-06 unverdicted novelty 6.0

    A de-biased VLM judge protocol is applied to adapt TRELLIS for single-image furniture 3D generation but yields no improvement over the strong public base across six methods.

Reference graph

Works this paper leans on

34 extracted references · 1 canonical work pages · cited by 1 Pith paper

  1. [1]

    Unique3D: High-quality and efficient 3d mesh generation from a single image

    Kailu Wu, Fangfu Liu, Zhihan Cai, Runjie Yan, Hanyang Wang, Yating Hu, Yueqi Duan, and Kaisheng Ma. Unique3D: High-quality and efficient 3d mesh generation from a single image

  2. [2]

    URLhttps://arxiv.org/abs/2405.20343

  3. [3]

    Meta 3D Gen

    Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Ma- hendra Kariya, Omri Harosh, Roman Shapovalov, Benjamin Graham, Emilien Garreau, Ani- mesh Karnewar, Ang Cao, Idan Azuri, Iurii Makarov, Eric-Tuan Le, Antoine Toisoul, David Novotny, Oran Gafni, Natalia Neverova, and Andrea Vedaldi. Meta 3D Gen. 2024. URL https://arxiv.or...

  4. [4]

    Unifi3D: A study on 3d representations for generation and recon- struction in a common framework

    Nina Wiedemann, Sainan Liu, Quentin Leboutet, Katelyn Gao, Benjamin Ummenhofer, Michael Paulitsch, and Kai Yuan. Unifi3D: A study on 3d representations for generation and recon- struction in a common framework. 2025. URLhttps://arxiv.org/abs/2509.02474

  5. [5]

    Realiz3D: 3d generation made photorealistic via domain-aware learning

    Ido Sobol, Kihyuk Sohn, Yoav Blum, Egor Zakharov, Max Bluvstein, Andrea Vedaldi, and Or Litany. Realiz3D: 3d generation made photorealistic via domain-aware learning. 2026. URL https://arxiv.org/abs/2605.13852

  6. [6]

    MVGBench: Comprehensive benchmark for multi-view generation models

    Xianghui Xie, Chuhang Zou, Meher Gitika Karumuri, Jan Eric Lenssen, and Gerard Pons- Moll. MVGBench: Comprehensive benchmark for multi-view generation models. 2025. URL https://arxiv.org/abs/2507.00006

  7. [7]

    Rethinking Metrics and Diffusion Architecture for 3D Point Cloud Generation

    Matteo Bastico, David Ryckelynck, Laurent Corté, Yannick Tillier, and Etienne Decencière. Rethinking Metrics and Diffusion Architecture for 3D Point Cloud Generation. 2025. URL https://arxiv.org/abs/2511.05308. 8 Judging Single-Image 3D Generation

  8. [8]

    Objaverse++: Curated 3d object dataset with quality annotations

    Chendi Lin, Heshan Liu, Qunshu Lin, Zachary Bright, Shitao Tang, Yihui He, Minghao Liu, Ling Zhu, and Cindy Le. Objaverse++: Curated 3d object dataset with quality annotations

  9. [9]

    URLhttps://arxiv.org/abs/2504.07334

  10. [10]

    ManiTwin: Scaling data-generation-ready digital object dataset to 100k

    Kaixuan Wang, Tianxing Chen, Jiawei Liu, Honghao Su, Shaolong Zhu, Minxuan Wang, Zixuan Li, Yue Chen, Huan ang Gao, Yusen Qin, Jiawei Wang, Qixuan Zhang, Lan Xu, Jingyi Yu, Yao Mu, and Ping Luo. ManiTwin: Scaling data-generation-ready digital object dataset to 100k. 2026. URLhttps://arxiv.org/abs/2603.16866

  11. [11]

    Memorization in 3D Shape Generation: An Empirical Study

    Shu Pu, Boya Zeng, Kaichen Zhou, Mengyu Wang, and Zhuang Liu. Memorization in 3D Shape Generation: An Empirical Study. 2025. URLhttps://arxiv.org/abs/2512.23628

  12. [12]

    ShapeR: Robust conditional 3d shape generation from casual captures

    Yawar Siddiqui, Duncan Frost, Samir Aroudj, Armen Avetisyan, Henry Howard-Jenkins, Daniel DeTone, Pierre Moulon, Qirui Wu, Zhengqin Li, Julian Straub, Richard Newcombe, and Jakob Engel. ShapeR: Robust conditional 3d shape generation from casual captures. 2026. URL https://arxiv.org/abs/2601.11514

  13. [13]

    ImagenWorld: Stress-testing image generation models with explainable human evaluation on open-ended real-world tasks

    Samin Mahdizadeh Sani, Max Ku, Nima Jamali, Matina Mahdizadeh Sani, Paria Khoshtab, Wei-Chieh Sun, Parnian Fazel, Zhi Rui Tam, Thomas Chong, Edisy Kin Wai Chan, Donald Wai Tong Tsang, Chiao-Wei Hsu, Ting Wai Lam, Ho Yin Sam Ng, Chiafeng Chu, Chak-Wing Mak, Keming Wu, Hiu Tung Wong, Yik Chun Ho, Chi Ruan, Zhuofeng Li, I-Sheng Fang, Shih-Ying Yeh, Ho Kei Ch...

  14. [14]

    MJ1: Multimodal Judgment via Grounded Verification

    Bhavesh Kumar, Dylan Feng, and Leonard Tang. MJ1: Multimodal Judgment via Grounded Verification. 2026. URLhttps://arxiv.org/abs/2603.07990

  15. [15]

    Xing, Hao Zhang, Joseph E

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. 2023. URLhttps: //arxiv.org/abs/2306.05685

  16. [16]

    Large Language Models are not Fair Evaluators

    Peiyi Wang, Lei Li, Liang Chen, Zefan Cai, Dawei Zhu, Binghuai Lin, Yunbo Cao, Qi Liu, Tianyu Liu, and Zhifang Sui. Large Language Models are not Fair Evaluators. 2023. URL https://arxiv.org/abs/2305.17926

  17. [17]

    DSO: Aligning 3d generators with simulation feedback for physical soundness

    Ruining Li, Chuanxia Zheng, Christian Rupprecht, and Andrea Vedaldi. DSO: Aligning 3d generators with simulation feedback for physical soundness. 2025. URLhttps://arxiv.org/ abs/2503.22677

  18. [18]

    Nabla-R2D3: Effective and efficient 3d diffusion alignment with 2d rewards

    Qingming Liu, Zhen Liu, Dinghuai Zhang, and Kui Jia. Nabla-R2D3: Effective and efficient 3d diffusion alignment with 2d rewards. 2025. URLhttps://arxiv.org/abs/2506.15684

  19. [19]

    DreamDPO: Aligning text-to-3d generation with human preferences via direct preference optimization

    Zhenglin Zhou, Xiaobo Xia, Fan Ma, Hehe Fan, Yi Yang, and Tat-Seng Chua. DreamDPO: Aligning text-to-3d generation with human preferences via direct preference optimization. 2025. URLhttps://arxiv.org/abs/2502.04370

  20. [20]

    MapReduce LoRA: 9 Judging Single-Image 3D Generation Advancing the pareto front in multi-preference optimization for generative models

    Chieh-Yun Chen, Zhonghao Wang, Qi Chen, Zhifan Ye, Min Shi, Yue Zhao, Yinan Zhao, Hui Qu, Wei-An Lin, Yiru Shen, Ajinkya Kale, Irfan Essa, and Humphrey Shi. MapReduce LoRA: 9 Judging Single-Image 3D Generation Advancing the pareto front in multi-preference optimization for generative models. 2025. URL https://arxiv.org/abs/2511.20629

  21. [21]

    Mahmoud, Mina Konaković Luković, and Justin Solomon

    Anh Truong, Ahmed H. Mahmoud, Mina Konaković Luković, and Justin Solomon. Low-Rank Adaptation of Neural Fields. 2025. URLhttps://arxiv.org/abs/2504.15933

  22. [22]

    Qwen2.5-VL Technical Report

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-VL Technical Report. a...

  23. [23]

    InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

    Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, Zhangwei Gao, Erfei Cui, Xuehui Wang, Yue Cao, Yangzhou Liu, Xingguang Wei, Hongjie Zhang, Haomin Wang, Weiye Xu, Hao Li, Jiahao Wang, Nianchen Deng, Songze Li, Yinan He, Tan Jiang, Jiapeng Luo, Yi Wang, Conghui He, Botian Shi, Xingcheng Zh...

  24. [24]

    Learning Transferable Visual Models From Natural Language Supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020 [cs.CV], 2021. URLhttps://arxiv.org/abs/2103.00020

  25. [25]

    doi:10.5281/zenodo.5143773 , url =

    Gabriel Ilharco, Mitchell Wortsman, Ross Wightman, Cade Gordon, Nicholas Carlini, Rohan Taori, Achal Dave, Vaishaal Shankar, Hongseok Namkoong, John Miller, Hannaneh Hajishirzi, Ali Farhadi, and Ludwig Schmidt. OpenCLIP. Zenodo, software (version 0.1), 2021. URL https://doi.org/10.5281/zenodo.5143773

  26. [26]

    Ralph Allan Bradley and Milton E. Terry. Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.Biometrika, 39(3/4):324–345, 1952

  27. [27]

    Scikit-learn: Machine Learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011

    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: Machine Learning in Python.Journal of Machine Learning Research...

  28. [28]

    McHugh, and Vincent Vanhoucke

    Laura Downs, Anthony Francis, Nate Koenig, Brandon Kinman, Ryan Hickman, Krista Reymann, Thomas B. McHugh, and Vincent Vanhoucke. Google Scanned Objects: A High- Quality Dataset of 3D Scanned Household Items. arXiv:2204.11918 [cs.RO], 2022. URL https://arxiv.org/abs/2204.11918

  29. [29]

    SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement

    Mark Boss, Zixuan Huang, Aaryaman Vasishta, and Varun Jampani. SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement. arXiv:2408.00653 [cs.CV], 2024. URLhttps://arxiv.org/abs/2408.00653. 10 Judging Single-Image 3D Generation

  30. [30]

    TripoSR: Fast 3D Object Reconstruction from a Single Image

    Dmitry Tochilkin, David Pankratz, Zexiang Liu, Zixuan Huang, Adam Letts, Yangguang Li, Ding Liang, Christian Laforte, Varun Jampani, and Yan-Pei Cao. TripoSR: Fast 3D Object Reconstruction from a Single Image. arXiv:2403.02151 [cs.CV], 2024. URLhttps: //arxiv.org/abs/2403.02151

  31. [31]

    Edwin B. Wilson. Probable Inference, the Law of Succession, and Statistical Inference.Journal of the American Statistical Association, 22(158):209–212, 1927

  32. [32]

    Bootstrap Methods: Another Look at the Jackknife.The Annals of Statistics, 7(1):1–26, 1979

    Bradley Efron. Bootstrap Methods: Another Look at the Jackknife.The Annals of Statistics, 7(1):1–26, 1979

  33. [33]

    A Coefficient of Agreement for Nominal Scales.Educational and Psychological Measurement, 20(1):37–46, 1960

    Jacob Cohen. A Coefficient of Agreement for Nominal Scales.Educational and Psychological Measurement, 20(1):37–46, 1960

  34. [34]

    Richard Landis and Gary G

    J. Richard Landis and Gary G. Koch. The Measurement of Observer Agreement for Categorical Data.Biometrics, 33(1):159–174, 1977. 11