Mesh-Aware Epipolar Matching for Multi-View Multi-Person 3D Pose Estimation in Basketball
Pith reviewed 2026-06-29 08:27 UTC · model grok-4.3
The pith
Mesh geometry from monocular recovery enables training-free cross-view association for multi-person 3D pose in basketball.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A monocular 3D mesh recovery frontend supplies dense surface geometry that supports two-stage epipolar matching; the first stage uses disjoint-set-union clustering on mesh-derived epipolar distances to group candidate views per person, and the second stage performs per-joint triangulation on the resulting consistent sets to produce final 3D poses.
What carries the argument
Two-stage mesh-aware epipolar matching that combines disjoint-set-union clustering of mesh points with per-joint triangulation to link identities across views.
If this is right
- Cross-view association improves without 2D keypoint detectors alone, reducing failures from occlusions.
- Team-uniform similarity no longer limits identity matching because mesh shape supplies an additional cue.
- No target-domain fine-tuning or multi-view labels are required, so the pipeline applies directly to new courts.
- Indoor and outdoor basketball scenarios both show gains over earlier training-free association methods.
Where Pith is reading between the lines
- The same mesh-to-epipolar pipeline could extend to other team sports that share occlusion and uniform problems.
- Monocular mesh models might serve as drop-in replacements for 2D detectors in any multi-view geometry task.
- If mesh consistency across views can be further enforced, reconstruction accuracy would rise without extra supervision.
Load-bearing premise
The monocular mesh recovery model must produce mesh outputs that remain sufficiently accurate and consistent across different camera views.
What would settle it
A dataset in which the same monocular mesh model yields visibly inconsistent 3D surfaces for the same player across synchronized views, causing the epipolar distances to group wrong players or inflate triangulation error.
read the original abstract
Multi-view multi-person 3D pose estimation in team sports scenarios remains challenging due to player occlusions, appearance similarity caused by team uniforms, and the scarcity of annotated multi-view data, all of which limit the effectiveness and generalization capability of learning-based methods. In contrast, the performance of training-free approaches is inherently constrained by the accuracy of 2D keypoint detection and the robustness of cross-view association. To address these challenges, we propose Mesh-Aware Epipolar Matching (MAEM), a training-free framework for multi-view multi-person 3D pose estimation. Our method employs a monocular 3D human mesh recovery model as the frontend and introduces a two-stage epipolar matching strategy based on the recovered mesh outputs. Specifically, the proposed framework combines disjoint-set-union-based clustering with per-joint triangulation to achieve robust cross-view association and accurate 3D pose reconstruction. Experiments on two public multi-view basketball datasets demonstrate that MAEM consistently outperforms existing training-free association baselines while achieving competitive RGB-only performance in both indoor and outdoor basketball scenarios. MAEM achieves MPJPE/PA-MPJPE scores of 59.8/40.7 mm on SportCenter EPFL and 74.0/51.8 mm on Human-M3 Basketball, highlighting the effectiveness of dense mesh geometry for cross-view association without requiring target-domain training or fine-tuning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Mesh-Aware Epipolar Matching (MAEM), a training-free framework for multi-view multi-person 3D pose estimation in basketball. It uses a monocular 3D human mesh recovery model as frontend, followed by a two-stage epipolar matching strategy that combines disjoint-set-union clustering with per-joint triangulation for cross-view association and 3D reconstruction. Experiments on SportCenter EPFL and Human-M3 Basketball report MPJPE/PA-MPJPE of 59.8/40.7 mm and 74.0/51.8 mm, claiming consistent outperformance over training-free baselines and competitive RGB-only performance.
Significance. If the results hold, this would be a meaningful contribution by demonstrating that dense mesh geometry can improve cross-view association in team sports without target-domain training, addressing challenges of occlusions and uniform appearances. The training-free nature and reliance on standard epipolar geometry plus an external mesh model are strengths for reproducibility and generalization.
major comments (2)
- [Method] Method section: The outperformance claim over 2D-keypoint baselines rests on the precondition that the monocular mesh recovery frontend produces sufficiently accurate and view-consistent 3D meshes across views. No quantitative validation, ablation, or consistency analysis of the mesh outputs is provided on the basketball datasets with occlusions and identical uniforms, making this assumption load-bearing for the central claim.
- [Experiments] Experiments section: The reported MPJPE/PA-MPJPE scores (59.8/40.7 mm and 74.0/51.8 mm) are presented without error bars, details on baseline re-implementations, data splits, or failure-case analysis, which undermines verification of the 'consistently outperforms' claim.
minor comments (1)
- [Abstract] Abstract: Dataset names (SportCenter EPFL, Human-M3 Basketball) appear only in the results sentence; moving them earlier would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the two major comments below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Method] Method section: The outperformance claim over 2D-keypoint baselines rests on the precondition that the monocular mesh recovery frontend produces sufficiently accurate and view-consistent 3D meshes across views. No quantitative validation, ablation, or consistency analysis of the mesh outputs is provided on the basketball datasets with occlusions and identical uniforms, making this assumption load-bearing for the central claim.
Authors: We agree this is a valid concern and that the mesh frontend's accuracy is central to the claims. The framework relies on an off-the-shelf monocular mesh model without target-domain fine-tuning. To address the gap, the revised manuscript will include quantitative validation such as 2D reprojection errors of the recovered meshes against detected keypoints, cross-view mesh consistency metrics, and qualitative examples on the basketball datasets. revision: yes
-
Referee: [Experiments] Experiments section: The reported MPJPE/PA-MPJPE scores (59.8/40.7 mm and 74.0/51.8 mm) are presented without error bars, details on baseline re-implementations, data splits, or failure-case analysis, which undermines verification of the 'consistently outperforms' claim.
Authors: We acknowledge that additional experimental details would improve verifiability. The revised version will add error bars (where multiple evaluations are feasible), explicit descriptions of baseline re-implementations including hyperparameters and code references, data split details, and a brief failure-case analysis to better support the performance comparisons. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes a training-free pipeline that invokes an external monocular mesh recovery model as frontend, then applies standard epipolar geometry plus DSU clustering for association. No equations are shown that define any output quantity in terms of itself, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness claims rest on self-citations. Reported MPJPE numbers are empirical results on held-out public datasets rather than quantities forced by the method's own definitions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A monocular 3D human mesh recovery model supplies sufficiently accurate 3D geometry from single views to support cross-view matching.
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)
Bridgeman, L., Volino, M., Guillemaut, J.-Y., Hilton, A.: Multi-person 3D pose estimation and tracking in sports. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)
2019
-
[2]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp
Yeung, C., Suzuki, T., Tanaka, R., Yin, Z., Fujii, K.: Athletepose3D: A bench- mark dataset for 3D human pose estimation and kinematic validation in athletic movements. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 5991–6002 (2025) 21
2025
-
[3]
In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pp
Dong, J., Jiang, W., Huang, Q., Bao, H., Zhou, X.: Fast and robust multi-person 3D pose estimation from multiple views. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pp. 7792–7801 (2019)
2019
-
[4]
In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp
Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara, S., Sheikh, Y.: Panoptic studio: A massively multiview system for social motion capture. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3334–3342 (2015)
2015
-
[5]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp
Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures for multiple human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1669–1676 (2014)
2014
-
[6]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp
Zhang, Y., An, L., Yu, T., Li, X., Li, K., Liu, Y.: 4D association graph for realtime multi-person motion capture using multiple video cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1324–1333 (2020)
2020
-
[7]
In: Proceedings of the 31st ACM International Conference on Multimedia, pp
He, L., Liao, X., Liu, W., Liu, X., Cheng, P., Mei, T.: FastReID: A pytorch toolbox for general instance re-identification. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 9664–9667 (2023)
2023
-
[8]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp
He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: TransReID: Transformer- based object re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15013–15022 (2021)
2021
-
[9]
In: Proceedings of the European Conference on Computer Vision (ECCV), pp
Tu, H., Wang, C., Zeng, W.: Voxelpose: Towards multi-camera 3D human pose estimation in wild environment. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 197–212 (2020)
2020
-
[10]
In: Advances in Neural Information Processing Systems, pp
Wang, T., Zhang, J., Cai, Y., Yan, S., Feng, J.: Direct multi-view multi-person 3D pose estimation. In: Advances in Neural Information Processing Systems, pp. 13153–13164 (2021)
2021
-
[11]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp
Liao, Z., Zhu, J., Wang, C., Hu, H., Waslander, S.L.: Multiple view geometry transformers for 3D human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 708–717 (2024)
2024
-
[12]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp
Qiu, Z., Yang, Q., Wang, J., Feng, H., Han, J., Ding, E., Xu, C., Fu, D., Wang, J.: PSVT: End-to-end multi-person 3D pose and shape estimation with progressive video transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 21254–21263 (2023)
2023
-
[13]
In: Proceedings of the European Conference on Computer Vision (ECCV), pp
Huang, C., Jiang, S., Li, Y., Zhang, Z., Traish, J., Deng, C., Ferguson, S., Da Xu, 22 R.Y.: End-to-end dynamic matching network for multi-view multi-person 3D pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 477–493 (2020)
2020
-
[14]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp
Lin, J., Lee, G.H.: Multi-view multi-person 3D pose estimation with plane sweep stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11886–11895 (2021)
2021
-
[15]
In: Proceedings of the European Conference on Computer Vision (ECCV), pp
Ye, H., Zhu, W., Wang, C., Wu, R., Wang, Y.: Faster voxelpose: Real-time 3D human pose estimation by orthographic projection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 142–159 (2022)
2022
-
[16]
https: //www.epfl.ch/labs/cvlab/data/sportcenter-dataset/
EPFL CVLAB: SportCenter Multi-View Human Pose Estimation Dataset. https: //www.epfl.ch/labs/cvlab/data/sportcenter-dataset/. Accessed: April 19, 2026 (2022)
2026
-
[17]
arXiv preprint arXiv:2308.00628 (2023)
Fan, B., Wang, S., Zheng, W., Feng, J., Zhou, J.: Human-M3: A multi-view multi- modal dataset for 3D human pose estimation in outdoor scenes. arXiv preprint arXiv:2308.00628 (2023)
-
[18]
In: Proceedings of the European Conference on Computer Vision (ECCV), pp
Jiang, T., Billingham, J., Müksch, S., Zarate, J., Evans, N., Oswald, M.R., Pol- leyfeys, M., Hilliges, O., Kaufmann, M., Song, J.: Worldpose: A world cup dataset for global 3D human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 343–362 (2025)
2025
-
[19]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp
Dong, Z., Song, J., Chen, X., Guo, C., Hilliges, O.: Shape-aware multi-person pose estimation from multi-view images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11158–11168 (2021)
2021
-
[20]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp
Hokari, Y., Hori, R., Saito, H.: Human mesh reconstruction of sports players with multiple dynamic cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6049–6059 (2025)
2025
-
[21]
In: ACM SIGGRAPH 2022 Conference Proceedings, pp
Zhou, Z., Shuai, Q., Wang, Y., Fang, Q., Ji, X., Li, F., Bao, H., Zhou, X.: Quick- pose: Real-time multi-view multi-person pose estimation in crowded scenes. In: ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–9 (2022)
2022
-
[22]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp
Wandt, B., Rudolph, M., Zell, P., Rhodin, H., Rosenhahn, B.: Canonpose: Self- supervised monocular 3D human pose estimation in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13294–13304 (2021)
2021
-
[23]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp
Srivastav, V., Chen, K., Padoy, N.: Selfpose3D: Self-supervised multi-person multi-view 3D pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2502–2512 (2024)
2024
-
[24]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp
Bartol, K., Bojanić, D., Petković, T., Pribanić, T.: Generalizable human pose 23 triangulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11028–11037 (2022)
2022
-
[25]
ACM Trans
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia)34(6), 248–124816 (2015)
2015
-
[26]
In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp
Yin, J.O., Li, T., Wang, J., Zhang, Y., Yuille, A.: Easyret3D: Uncalibrated multi-view multi-human 3D reconstruction and tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3128–3137 (2025)
2025
-
[27]
In: Proceedings of the European Conference on Computer Vision (ECCV), pp
Lu, F., Dong, Z., Song, J., Hilliges, O.: Avatarpose: Avatar-guided 3D pose esti- mation of close human interaction from sparse multi-view videos. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 215–233 (2025)
2025
-
[28]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp
Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A.A., Tzionas, D., Black, M.J.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10975–10985 (2019)
2019
-
[29]
Sam 3d body: Robust full-body human mesh recovery
Yang, X., Kukreja, D., Pinkus, D., Sagar, A., Fan, T., Park, J., Shin, S., Cao, J., Liu, J., Ugrinovic, N., Feiszli, M., Malik, J., Dollar, P., Kitani, K.: SAM 3D Body: Robust full-body human mesh recovery. arXiv preprint arXiv:2602.15989 (2026)
-
[30]
A.; Bescos, B.; Stoll, C.; Twigg, C.; Lassner, C.; Otte, D.; Vignola, E.; Prada, F.; Bogo, F.; et al
Ferguson, A., Osman, A.A.A., Bescos, B., Stoll, C., Twigg, C., Lassner, C., Otte, D., Vignola, E., Prada, F., Bogo, F., Santesteban, I., Romero, J., Zarate, J., Lee, J., Park, J., Yang, J., Doublestein, J., Venkateshan, K., Kitani, K., Kavan, L., Farra, M.D., Hu, M., Cioffi, M., Fabris, M., Ranieri, M., Modarres, M., Kadlecek, P., Khirodkar, R., Abdrashit...
-
[31]
5219–5228 (2023)
Ingwersen, C.K., Mikkelstrup, C.M., Jensen, J.N., Hannemose, M.R., Dahl, A.B.: Sportspose–adynamic3Dsportsposedataset.In:ProceedingsoftheIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 5219–5228 (2023)
2023
-
[32]
In: Proceedings of the 8th International ACM Workshop on Multimedia Content Analysis in Sports, pp
Suzuki, T., Tanaka, R., Yeung, C., Fujii, K.: Athleticspose: Authentic sports motion dataset on athletic field and evaluation of monocular 3D pose estimation ability. In: Proceedings of the 8th International ACM Workshop on Multimedia Content Analysis in Sports, pp. 8–17 (2025)
2025
-
[33]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp
Yeung, C., Ide, K., Fujii, K.: Autosoccerpose: Automated 3D posture analy- sis of soccer shot movements. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 3214–3224 24 (2024)
2024
-
[34]
In: Proceedings of the 8th International ACM Workshop on Multimedia Content Analysis in Sports, pp
Yamada, K., Yin, L., Hu, Q., Ding, N., Iwashita, S., Ichikawa, J., Kotani, K., Yeung, C., Fujii, K.: TrackID3x3: A dataset and algorithm for multi-player track- ing with identification and pose estimation in 3x3 basketball full-court videos. In: Proceedings of the 8th International ACM Workshop on Multimedia Content Analysis in Sports, pp. 163–173 (2025)
2025
-
[35]
Sports Engineering29(1), 12 (2026)
Yin, L., Yeung, C., Hu, Q., Ichikawa, J., Azechi, H., Takahashi, S., Fujii, K.: Enhanced multi-object tracking using pose-based virtual markers in 3x3 basketball. Sports Engineering29(1), 12 (2026)
2026
-
[36]
Naval Research Logistics Quarterly2(1-2), 83–97 (1955)
Kuhn, H.W.: The hungarian method for the assignment problem. Naval Research Logistics Quarterly2(1-2), 83–97 (1955)
1955
-
[37]
Cambridge University Press, Cambridge (2003)
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2003)
2003
-
[38]
Communications of the ACM7(5), 301–303 (1964)
Galler, B.A., Fisher, M.J.: An improved equivalence algorithm. Communications of the ACM7(5), 301–303 (1964)
1964
-
[39]
Commu- nications of the ACM24(6), 381–395 (1981)
Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commu- nications of the ACM24(6), 381–395 (1981)
1981
-
[40]
https://github.com/openxrlab/xrmocap (2022)
XRMoCap Contributors: OpenXRLab Multi-View Motion Capture Toolbox and Benchmark. https://github.com/openxrlab/xrmocap (2022)
2022
-
[41]
In: Advances in Neural Information Processing Systems, vol
Xu, Y., Zhang, J., Zhang, Q., Tao, D.: ViTPose: Simple vision transformer base- lines for human pose estimation. In: Advances in Neural Information Processing Systems, vol. 35, pp. 38571–38584 (2022)
2022
-
[42]
In: 2022 International Conference on 3D Vision (3DV), pp
Roy, S.K., Citraro, L., Honari, S., Fua, P.: On triangulation as a form of self- supervision for 3D human pose estimation. In: 2022 International Conference on 3D Vision (3DV), pp. 1–10 (2022) 25 Fig. S1: Detailed flowchart of the MAEM pipeline, where each processing step is described textually. Given multi-view images, Stage 1 recovers per-person 3D mesh...
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.