B\'ezier Degradation Modeling for LiDAR-based Human Motion Capture

Chen Gong; Jian Yang; Jun Li; Lin Zhao; Xiaoqi An

arxiv: 2605.19620 · v1 · pith:BIULOMJYnew · submitted 2026-05-19 · 💻 cs.CV

B\'ezier Degradation Modeling for LiDAR-based Human Motion Capture

Xiaoqi An , Lin Zhao , Jun Li , Chen Gong , Jian Yang This is my paper

Pith reviewed 2026-05-20 06:48 UTC · model grok-4.3

classification 💻 cs.CV

keywords LiDARhuman motion captureBézier curves3D pose estimationocclusion handlingmotion reconstructiontemporal coherence

0 comments

The pith

Bézier curve compression of motion trajectories enables accurate 3D human pose recovery from occluded and noisy LiDAR data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that human motion can be represented as temporally compressible Bézier curves whose control points are reduced by a trajectory-preserving strategy, creating a coherent representation suitable for learning from partial observations. A progressive reconstruction module then predicts curves at multiple temporal scales with a Time-scale Motion Transformer and fuses them via a Multi-level Motion Aggregator to fill gaps caused by occlusions and noise. This coarse-to-fine process yields more accurate and temporally stable poses than prior methods. The claim matters for applications such as autonomous driving and robotics, where LiDAR inputs are frequently incomplete yet reliable motion capture is required.

Core claim

BMLiCap models motion using temporally compressible Bézier curves. By reducing control points through a trajectory-preserving strategy, it obtains a coherent and learning-friendly motion representation. The progressive motion-reconstruction module introduces a Time-scale Motion Transformer to predict motion curves at multiple temporal scales and a Multi-level Motion Aggregator to adaptively fuse the multi-scale curves, recovering detailed and temporally coherent poses from LiDAR point-cloud cues.

What carries the argument

Trajectory-preserving reduction of Bézier control points that produces a compressible yet detail-retaining motion representation, processed by the multi-scale Time-scale Motion Transformer and Multi-level Motion Aggregator for progressive pose recovery.

If this is right

Achieves state-of-the-art accuracy and temporal continuity across the LiDARHuman26M, FreeMotion, NoiseMotion, and SLOPER4D benchmarks.
Compensates for severe occlusions by bridging observation gaps with multi-scale curve prediction and fusion.
Reduces prediction jitter while maintaining motion coherence in complex scenes.
Produces a learning-friendly motion representation that supports stable reconstruction from unstable LiDAR inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same control-point reduction idea could be tested on other sparse time-series sensor streams such as radar or event-camera data for motion tracking.
Multi-scale curve fusion might transfer to video-based or RGB-D pose estimation pipelines facing similar temporal dropout.
Fewer control points per trajectory could lower memory and compute costs enough to support onboard processing in mobile robotics.

Load-bearing premise

The trajectory-preserving strategy for reducing Bézier control points retains enough motion detail to allow accurate reconstruction from partial LiDAR observations without introducing systematic bias in pose recovery.

What would settle it

Run the reduced-control-point Bézier model on a synthetic dataset of ground-truth motions known to require high-frequency details that low-order curves cannot preserve; if reconstruction error rises above that of an uncompressed baseline under identical occlusion patterns, the core assumption does not hold.

Figures

Figures reproduced from arXiv: 2605.19620 by Chen Gong, Jian Yang, Jun Li, Lin Zhao, Xiaoqi An.

**Figure 1.** Figure 1: BMLiCap models human motion by Bezier curves. ´ Our contributions mainly include: (a) a novel Bezier degradation ´ method for generating easy-to-learn motion representations; (b) a progressive motion reconstruction model conditioned on LiDAR point clouds in a coarse-to-fine manner, compensating for severe input occlusions. tical alternatives based on RGB or RGB-D inputs [3, 5, 36, 49, 83] have been propose… view at source ↗

**Figure 2.** Figure 2: Analysis of motion approximation error using B [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: The pipeline of our proposed BMLiCap framework. During training, we first apply the B [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: A demonstration of trajectory-aware Bezier degrada- ´ tion, we not only resample the control points but also adjust their lengths to better fit the finest curve. During training, we supervise the network to predict these multi-level motion representations, enabling the model with coarse-to-fine motion reconstruction capability. 3.2. Progressive Motion Reconstruction Overall Architecture. To effectively le… view at source ↗

**Figure 5.** Figure 5: Sequential visualization. Even under severe occlusion, [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Single frame visual comparisons. On samples with special motion or severe [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 8.** Figure 8: Visualization of the predicted intermediate B [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Attention map of different occlusion levels. The atten [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

read the original abstract

LiDAR-based 3D human motion capture has broad applications in fields such as autonomous driving and robotics, where accurate motion reconstruction is crucial. However, existing methods often struggle with unstable inputs and severe occlusions, leading to jittery or even failed pose predictions. To address these challenges, we propose BMLiCap, a coarse-to-fine framework that models motion using temporally compressible B\'ezier curves. By reducing control points through a trajectory-preserving strategy, we obtain a coherent and learning-friendly motion representation. To reconstruct human actions from LiDAR point-cloud cues, we design a progressive motion-reconstruction module. Specifically, a Time-scale Motion Transformer (TMT) is introduced to predict motion curves at multiple temporal scales, and a Multi-level Motion Aggregator (MMA) is utilized to adaptively fuse the multi-scale curves to recover detailed, temporally coherent poses, effectively bridging observation gaps caused by occlusions and noise. Across four mainstream benchmarks LiDARHuman26M, FreeMotion, NoiseMotion, and SLOPER4D, BMLiCap achieves state-of-the-art accuracy and temporal continuity in complex scenes, demonstrating its ability to compensate for severe occlusions and reduce prediction jitter.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BMLiCap uses Bézier curves for a coarse-to-fine LiDAR motion model that targets occlusion and jitter, but the abstract gives no numbers to judge whether it actually delivers.

read the letter

The paper's main contribution is a framework called BMLiCap that represents motion with temporally compressible Bézier curves for LiDAR-based 3D human motion capture. This helps handle unstable inputs and occlusions that cause jitter in existing methods. They reduce control points using a trajectory-preserving strategy to get a coherent representation. Then a progressive module uses the Time-scale Motion Transformer to predict at multiple scales and the Multi-level Motion Aggregator to fuse them for detailed poses. This is new as a specific application of Bézier modeling to this sensor domain with those two modules. It targets a real need in robotics and autonomous driving for reliable motion reconstruction. The approach is sensible for creating a learning-friendly input from point clouds and bridging gaps from noise or missing data. That said, the abstract claims state-of-the-art accuracy and temporal continuity on four benchmarks but shows no quantitative tables or ablations. Without those, it's tough to see the size of the improvement or confirm it reduces jitter effectively. The stress-test point about losing high-frequency motion details is worth checking. Reducing Bézier points could smooth out quick changes even if the overall trajectory is kept, and the multi-scale fusion might not always recover them from partial observations. The paper should demonstrate that this doesn't create bias in pose recovery. This is relevant for people working on LiDAR perception in dynamic environments. A reader interested in practical improvements to motion capture would get value from seeing how they implement the coarse-to-fine idea. It deserves peer review because the problem is well-motivated and the method is described in enough detail to assess, though the experiments need close scrutiny.

Referee Report

2 major / 2 minor

Summary. The manuscript presents BMLiCap, a coarse-to-fine framework for LiDAR-based 3D human motion capture. It models motion with temporally compressible Bézier curves, applies a trajectory-preserving strategy to reduce control points for a coherent representation, and uses a progressive reconstruction module with a Time-scale Motion Transformer (TMT) to predict multi-scale curves and a Multi-level Motion Aggregator (MMA) to fuse them adaptively. The central claim is that this approach achieves state-of-the-art accuracy and temporal continuity on the LiDARHuman26M, FreeMotion, NoiseMotion, and SLOPER4D benchmarks by compensating for severe occlusions and reducing jitter.

Significance. If the quantitative results and ablations hold, the work would offer a meaningful advance in robust LiDAR mocap by introducing a Bézier-based motion representation that supports multi-scale fusion for handling missing observations. The TMT and MMA components provide a structured way to bridge temporal gaps, which could benefit applications in autonomous driving and robotics. The paper ships a clear high-level architecture and identifies a plausible failure mode (jitter under occlusion) that the method targets.

major comments (2)

[Abstract / Experiments] Abstract and Experiments section: the SOTA claims on four benchmarks are asserted without any quantitative tables, baseline comparisons, ablation studies, or error analysis (e.g., MPJPE, jitter metrics, or occlusion-specific breakdowns), which is load-bearing for the central claim that the method compensates for occlusions and reduces jitter.
[Method (Bézier Degradation Modeling)] Method section on Bézier degradation modeling: the trajectory-preserving strategy for control-point reduction is presented as retaining sufficient motion detail, yet low-order Bézier fits can attenuate high-frequency components (sharp accelerations at foot strikes or hand gestures); this risks systematic bias in pose recovery from partial LiDAR observations that the subsequent TMT+MMA fusion may not fully compensate, directly challenging the occlusion-robustness claim.

minor comments (2)

[Method] Clarify the exact mathematical definition of the trajectory-preserving reduction (e.g., how control points are selected or optimized) and the temporal scales used in TMT to aid reproducibility.
[Introduction] Add a short related-work paragraph contrasting the Bézier representation with prior spline or polynomial motion models in human pose estimation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We have reviewed the comments carefully and provide detailed point-by-point responses below. Where revisions are warranted, we commit to incorporating them to strengthen the manuscript's clarity and support for its claims.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: the SOTA claims on four benchmarks are asserted without any quantitative tables, baseline comparisons, ablation studies, or error analysis (e.g., MPJPE, jitter metrics, or occlusion-specific breakdowns), which is load-bearing for the central claim that the method compensates for occlusions and reduces jitter.

Authors: We acknowledge the referee's concern regarding the presentation of results. The Experiments section does contain quantitative tables reporting MPJPE and other metrics across the four benchmarks (LiDARHuman26M, FreeMotion, NoiseMotion, SLOPER4D), along with comparisons to baselines and ablations on TMT and MMA. Jitter metrics are included to support temporal continuity claims. However, to make the occlusion-robustness argument more explicit and load-bearing, we will add dedicated occlusion-specific error breakdowns and expanded analysis in the revised Experiments section. This will directly address the central claim without altering the existing results. revision: partial
Referee: [Method (Bézier Degradation Modeling)] Method section on Bézier degradation modeling: the trajectory-preserving strategy for control-point reduction is presented as retaining sufficient motion detail, yet low-order Bézier fits can attenuate high-frequency components (sharp accelerations at foot strikes or hand gestures); this risks systematic bias in pose recovery from partial LiDAR observations that the subsequent TMT+MMA fusion may not fully compensate, directly challenging the occlusion-robustness claim.

Authors: We appreciate this insightful observation on potential limitations of low-order Bézier representations. While Bézier curves can smooth high-frequency details, the trajectory-preserving strategy prioritizes control points that capture key motion dynamics, and the multi-scale TMT explicitly predicts curves at varying temporal resolutions to recover both low- and high-frequency components. The MMA then performs adaptive fusion to mitigate any residual attenuation. We will add a targeted discussion and supporting ablation on high-frequency motion preservation (e.g., foot-strike and gesture analysis) in the revised Method and Experiments sections to further substantiate that TMT+MMA compensates effectively under occlusion. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation chain is self-contained and externally grounded

full rationale

The paper introduces a coarse-to-fine Bézier curve modeling framework with a trajectory-preserving control-point reduction, followed by a Time-scale Motion Transformer (TMT) and Multi-level Motion Aggregator (MMA) for pose reconstruction from LiDAR observations. No equations, fitted parameters, or self-citations are shown that reduce any central prediction or uniqueness claim back to the same data or prior author work by construction. The approach is evaluated on four independent external benchmarks (LiDARHuman26M, FreeMotion, NoiseMotion, SLOPER4D), with performance claims resting on those standard datasets rather than internal redefinitions or self-referential fits. This satisfies the criteria for a self-contained derivation without load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 3 invented entities

Paper introduces new modeling strategy and network modules whose internal parameters and design choices are not detailed in the abstract.

invented entities (3)

Bézier degradation modeling no independent evidence
purpose: Provide temporally compressible and coherent motion representation
Core modeling choice introduced to address jitter and occlusion
Time-scale Motion Transformer (TMT) no independent evidence
purpose: Predict motion curves at multiple temporal scales
New module for progressive reconstruction
Multi-level Motion Aggregator (MMA) no independent evidence
purpose: Adaptively fuse multi-scale curves to recover detailed poses
New module for bridging observation gaps

pith-pipeline@v0.9.0 · 5744 in / 1208 out tokens · 34294 ms · 2026-05-20T06:48:52.633884+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages

[1]

Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation.AAAI, 39(2): 1755–1763, 2025

Xiaoqi An, Lin Zhao, Chen Gong, Jun Li, and Jian Yang. Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation.AAAI, 39(2): 1755–1763, 2025. 1, 2, 5

work page 2025
[2]

Real-Time RGBD-Based Extended Body Pose Estimation

Renat Bashirov, Anastasia Ianina, Karim Iskakov, Yev- geniy Kononenko, Valeriya Strizhkova, Victor Lempitsky, and Alexander Vakhitov. Real-Time RGBD-Based Extended Body Pose Estimation. InWACV, pages 2807–2816, 2021. 2

work page 2021
[3]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. InECCV, pages 561–578, 2016. 1, 2

work page 2016
[4]

PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds, 2023

Zhongang Cai, Liang Pan, Chen Wei, Wanqi Yin, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, and Ziwei Liu. PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds, 2023. 2

work page 2023
[5]

A transformer-based adaptive prototype match- ing network for few-shot semantic segmentation

Sihan Chen, Yadang Chen, Yuhui Zheng, Zhi-Xin Yang, and Enhua Wu. A transformer-based adaptive prototype match- ing network for few-shot semantic segmentation. InIJCAI,

work page
[6]

Motion Capture from Inertial and Vision Sensors, 2024

Xiaodong Chen, Wu Liu, Qian Bao, Xinchen Liu, Quanwei Yang, Ruoli Dai, and Tao Mei. Motion Capture from Inertial and Vision Sensors, 2024. 2

work page 2024
[7]

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Us- ing Wearable IMUs and LiDAR, 2022

Yudi Dai, Yitai Lin, Chenglu Wen, Siqi Shen, Lan Xu, Jingyi Yu, Yuexin Ma, and Cheng Wang. HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Us- ing Wearable IMUs and LiDAR, 2022. 3

work page 2022
[8]

SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments

Yudi Dai, Yitai Lin, Xiping Lin, Chenglu Wen, Lan Xu, Hongwei Yi, Siqi Shen, Yuexin Ma, and Cheng Wang. SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments. InCVPR, pages 682–692, 2023. 5, 7

work page 2023
[9]

HiSC4D: Human-Centered Interaction and 4D Scene Capture in Large-Scale Space Using Wearable IMUs and LiDAR.IEEE TPAMI, pages 1–18, 2024

Yudi Dai, Zhiyong Wang, Xiping Lin, Chenglu Wen, Lan Xu, Siqi Shen, Yuexin Ma, and Cheng Wang. HiSC4D: Human-Centered Interaction and 4D Scene Capture in Large-Scale Space Using Wearable IMUs and LiDAR.IEEE TPAMI, pages 1–18, 2024. 3

work page 2024
[10]

Black, and Dimitrios Tzionas

Sai Kumar Dwivedi, Cordelia Schmid, Hongwei Yi, Michael J. Black, and Dimitrios Tzionas. POCO: 3D Pose and Shape Estimation with Confidence. In3DV, pages 85– 95, 2024. 3

work page 2024
[11]

Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Yao Feng, and Michael J. Black. TokenHMR: Advancing Human Mesh Re- covery with a Tokenized Pose Representation. InCVPR, pages 1323–1333, 2024. 2, 3

work page 2024
[12]

LiDAR-HMR: 3D human mesh recovery from LiDAR.IEEE TMM, 27:6962–6975, 2025

Bohao Fan, Wenzhao Zheng, Jianjiang Feng, and Jie Zhou. LiDAR-HMR: 3D human mesh recovery from LiDAR.IEEE TMM, 27:6962–6975, 2025. 1, 2, 6

work page 2025
[13]

VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space

Gu ´enol´e Fiche, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, and Francesc Moreno-Noguer. VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space. InECCV, pages 471–490, 2025. 3

work page 2025
[14]

MEGA: Masked Generative Autoencoder for Human Mesh Recovery

Gu ´enol´e Fiche, Simon Leglaive, Xavier Alameda-Pineda, and Francesc Moreno-Noguer. MEGA: Masked Generative Autoencoder for Human Mesh Recovery. InCVPR, pages 5366–5378, 2025

work page 2025
[15]

Human Pose As Compositional Tokens

Zigang Geng, Chunyu Wang, Yixuan Wei, Ze Liu, Houqiang Li, and Han Hu. Human Pose As Compositional Tokens. In CVPR, pages 660–671, 2023. 3

work page 2023
[16]

SGNetPose+: Stepwise Goal-Driven Networks with Pose Information for Trajectory Prediction in Autonomous Driving

Akshat Ghiya, Ali AlShami, and Jugal Kalita. SGNetPose+: Stepwise Goal-Driven Networks with Pose Information for Trajectory Prediction in Autonomous Driving. InWACV, pages 677–685, 2025. 1

work page 2025
[17]

Humans in 4D: Re- constructing and Tracking Humans with Transformers

Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, and Jitendra Malik. Humans in 4D: Re- constructing and Tracking Humans with Transformers. In ICCV, pages 14783–14794, 2023. 2, 3

work page 2023
[18]

CARP: Visuomotor Policy Learning via Coarse-to-Fine Au- toregressive Prediction, 2025

Zhefei Gong, Pengxiang Ding, Shangke Lyu, Siteng Huang, Mingyang Sun, Wei Zhao, Zhaoxin Fan, and Donglin Wang. CARP: Visuomotor Policy Learning via Coarse-to-Fine Au- toregressive Prediction, 2025. 3, 4, 5

work page 2025
[19]

HoloPose: Holistic 3D Human Reconstruction In-The-Wild

Riza Alp Guler and Iasonas Kokkinos. HoloPose: Holistic 3D Human Reconstruction In-The-Wild. InCVPR, pages 10884–10894, 2019. 2

work page 2019
[20]

DensePose: Dense Human Pose Estimation in the Wild

Rıza Alp G ¨uler, Natalia Neverova, and Iasonas Kokkinos. DensePose: Dense Human Pose Estimation in the Wild. In CVPR, pages 7297–7306, 2018. 2

work page 2018
[21]

TM2T: Stochastic and Tokenized Modeling for the Reciprocal Gen- eration of 3D Human Motions and Texts

Chuan Guo, Xinxin Zuo, Sen Wang, and Li Cheng. TM2T: Stochastic and Tokenized Modeling for the Reciprocal Gen- eration of 3D Human Motions and Texts. InECCV, pages 580–597, 2022. 3

work page 2022
[22]

MoMask: Generative Masked Mod- eling of 3D Human Motions

Chuan Guo, Yuxuan Mu, Muhammad Gohar Javed, Sen Wang, and Li Cheng. MoMask: Generative Masked Mod- eling of 3D Human Motions. InCVPR, pages 1900–1910,

work page 1900
[23]

STGCN: A Spatial-Temporal Aware Graph Learning Method for POI Recommendation

Haoyu Han, Mengdi Zhang, Min Hou, Fuzheng Zhang, Zhongyuan Wang, Enhong Chen, Hongwei Wang, Jianhui Ma, and Qi Liu. STGCN: A Spatial-Temporal Aware Graph Learning Method for POI Recommendation. InICDM, pages 1052–1057, 2020. 5

work page 2020
[24]

Black, Otmar Hilliges, and Gerard Pons-Moll

Yinghao Huang, Manuel Kaufmann, Emre Aksan, Michael J. Black, Otmar Hilliges, and Gerard Pons-Moll. Deep inertial poser: Learning to reconstruct human pose from sparse in- ertial measurements in real time.ACM TOG, 37(6):185:1– 185:15, 2018. 1, 2

work page 2018
[25]

MOVIN: Real-time Motion Capture using a Single LiDAR.Computer Graphics Forum, 42(7):e14961, 2023

Deok-Kyeong Jang, Dongseok Yang, Deok-Yun Jang, Bye- oli Choi, Taeil Jin, and Sung-Hee Lee. MOVIN: Real-time Motion Capture using a Single LiDAR.Computer Graphics Forum, 42(7):e14961, 2023. 3, 5, 6

work page 2023
[26]

Multi-agent Long-term 3D Human Pose Forecasting via Interaction- aware Trajectory Conditioning

Jaewoo Jeong, Daehee Park, and Kuk-Jin Yoon. Multi-agent Long-term 3D Human Pose Forecasting via Interaction- aware Trajectory Conditioning. InCVPR, pages 1617–1628,

work page
[27]

End-to-end recovery of human shape and pose

Angjoo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. End-to-end recovery of human shape and pose. InCVPR, pages 7122–7131, 2018. 2, 3

work page 2018
[28]

Zhang, Panna Felsen, and Jiten- dra Malik

Angjoo Kanazawa, Jason Y . Zhang, Panna Felsen, and Jiten- dra Malik. Learning 3d human dynamics from video. In CVPR, 2019. 2

work page 2019
[29]

Kanko, Elise K

Robert M. Kanko, Elise K. Laende, Elysia M. Davis, W. Scott Selbie, and Kevin J. Deluzio. Concurrent assess- ment of gait kinematics using marker-based and marker- less motion capture.Journal of Biomechanics, 127:110665,

work page
[30]

Sampling is mat- ter: Point-guided 3d human mesh reconstruction

Jeonghwan Kim, Mi-Gyeong Gwon, Hyunwoo Park, Hyuk- min Kwon, Gi-Mun Um, and Wonjun Kim. Sampling is mat- ter: Point-guided 3d human mesh reconstruction. InCVPR, pages 12880–12889, 2023. 2

work page 2023
[31]

Muhammed Kocabas, Nikos Athanasiou, and Michael J. Black. VIBE: Video Inference for Human Body Pose and Shape Estimation. InCVPR, pages 5252–5262, 2020. 3

work page 2020
[32]

Akhloufi

Franc ¸ois-Guillaume Landry and Moulay A. Akhloufi. Pre- dicting Pedestrian Crossing Intention in Autonomous Vehi- cles: A Review.Neurocomputing, 618:129105, 2025. 1

work page 2025
[33]

HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estima- tion

Jiefeng Li, Chao Xu, Zhicun Chen, Siyuan Bian, Lixin Yang, and Cewu Lu. HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estima- tion. InCVPR, pages 3383–3393, 2021. 2

work page 2021
[34]

Li- DARCap: Long-range Markerless 3D Human Motion Cap- ture with LiDAR Point Clouds

Jialian Li, Jingyi Zhang, Zhiyong Wang, Siqi Shen, Chenglu Wen, Yuexin Ma, Lan Xu, Jingyi Yu, and Cheng Wang. Li- DARCap: Long-range Markerless 3D Human Motion Cap- ture with LiDAR Point Clouds. InCVPR, pages 20470– 20480, 2022. 2, 5, 6, 7, 8

work page 2022
[35]

NIKI: Neural Inverse Kinematics with Invert- ible Neural Networks for 3D Human Pose and Shape Esti- mation

Jiefeng Li, Siyuan Bian, Qi Liu, Jiasheng Tang, Fan Wang, and Cewu Lu. NIKI: Neural Inverse Kinematics with Invert- ible Neural Networks for 3D Human Pose and Shape Esti- mation. InCVPR, pages 12933–12942, 2023. 2

work page 2023
[36]

HybrIK-X: Hybrid Analytical-Neural Inverse Kinematics for Whole-Body Mesh Recovery.IEEE TPAMI, 47(4):2754–2769, 2025

Jiefeng Li, Siyuan Bian, Chao Xu, Zhicun Chen, Lixin Yang, and Cewu Lu. HybrIK-X: Hybrid Analytical-Neural Inverse Kinematics for Whole-Body Mesh Recovery.IEEE TPAMI, 47(4):2754–2769, 2025. 1, 2

work page 2025
[37]

CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation

Zhihao Li, Jianzhuang Liu, Zhensong Zhang, Songcen Xu, and Youliang Yan. CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation. In ECCV, pages 590–606, 2022. 3

work page 2022
[38]

Cross-Modal Driven Object Restoration for 3D Point Cloud Backdoor Defense.IEEE Transactions on Information Forensics and Security, 20:11006–11018, 2025

Jiawei Lian, Xia Du, Jianghua Liu, Le Hui, and Jian Yang. Cross-Modal Driven Object Restoration for 3D Point Cloud Backdoor Defense.IEEE Transactions on Information Forensics and Security, 20:11006–11018, 2025. 1

work page 2025
[39]

End-to-End Hu- man Pose and Mesh Reconstruction with Transformers

Kevin Lin, Lijuan Wang, and Zicheng Liu. End-to-End Hu- man Pose and Mesh Reconstruction with Transformers. In CVPR, pages 1954–1963, 2021. 2

work page 1954
[40]

Progressive Pretext Task Learning for Human Tra- jectory Prediction

Xiaotong Lin, Tianming Liang, Jianhuang Lai, and Jian- Fang Hu. Progressive Pretext Task Learning for Human Tra- jectory Prediction. InECCV, pages 197–214, 2025. 3

work page 2025
[41]

V oteHMR: Occlusion- Aware V oting Network for Robust 3D Human Mesh Recov- ery from Partial Point Clouds

Guanze Liu, Yu Rong, and Lu Sheng. V oteHMR: Occlusion- Aware V oting Network for Robust 3D Human Mesh Recov- ery from Partial Point Clouds. InACM MM, pages 955–964,

work page
[42]

Markerless motion cap- ture of multiple characters using multiview image segmenta- tion.IEEE TPAMI, 35(11):2720–2735, 2013

Yebin Liu, Juergen Gall, Carsten Stoll, Qionghai Dai, Hans- Peter Seidel, and Christian Theobalt. Markerless motion cap- ture of multiple characters using multiview image segmenta- tion.IEEE TPAMI, 35(11):2720–2735, 2013. 2

work page 2013
[43]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. SMPL: A skinned multi- person linear model.ACM TOG, 34(6):248:1–248:16, 2015. 2

work page 2015
[44]

Decoupled Weight Decay Regularization, 2019

Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization, 2019. 5

work page 2019
[45]

Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Mo- tion Prediction

Tiezheng Ma, Yongwei Nie, Chengjiang Long, Qing Zhang, and Guiqing Li. Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Mo- tion Prediction. InCVPR, pages 6437–6446, 2022. 3

work page 2022
[46]

V2V-PoseNet: V oxel-to-V oxel Prediction Network for Ac- curate 3D Hand and Human Pose Estimation From a Single Depth Map

Gyeongsik Moon, Ju Yong Chang, and Kyoung Mu Lee. V2V-PoseNet: V oxel-to-V oxel Prediction Network for Ac- curate 3D Hand and Human Pose Estimation From a Single Depth Map. InCVPR, pages 5079–5088, 2018. 2

work page 2018
[47]

Waslander

Barza Nisar and Steven L. Waslander. PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds. InCVPR, pages 6670–6679, 2025. 2

work page 2025
[48]

Fusion of Multiple Lidars and Inertial Sen- sors for the Real-Time Pose Tracking of Human Motion

Ashok Kumar Patil, Adithya Balasubramanyam, Jae Yeong Ryu, Pavan Kumar B N, Bharatesh Chakravarthi, and Young Ho Chai. Fusion of Multiple Lidars and Inertial Sen- sors for the Real-Time Pose Tracking of Human Motion. Sensors, 20(18):5342, 2020. 2

work page 2020
[49]

ANIM: Accurate Neural Implicit Model for Human Reconstruction from a sin- gle RGB-D Image

Marco Pesavento, Yuanlu Xu, Nikolaos Sarafianos, Robert Maier, Ziyan Wang, Chun-Han Yao, Marco V olino, Edmond Boyer, Adrian Hilton, and Tony Tung. ANIM: Accurate Neural Implicit Model for Human Reconstruction from a sin- gle RGB-D Image. InCVPR, pages 5448–5458, 2024. 1, 2

work page 2024
[50]

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. InNeurIPS, 2017. 4, 5

work page 2017
[51]

A conditional denoising diffusion proba- bilistic model for point cloud upsampling

Wentao Qu, Yuantian Shao, Lingwu Meng, Xiaoshui Huang, and Liang Xiao. A conditional denoising diffusion proba- bilistic model for point cloud upsampling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20786–20795, 2024. 2

work page 2024
[52]

Robust single-stage fully sparse 3d object detection via detachable latent diffusion.arXiv preprint arXiv:2508.03252, 2025

Wentao Qu, Guofeng Mei, Jing Wang, Yujiao Wu, Xiaoshui Huang, and Liang Xiao. Robust single-stage fully sparse 3d object detection via detachable latent diffusion.arXiv preprint arXiv:2508.03252, 2025. 3

work page arXiv 2025
[53]

An end-to-end robust point cloud semantic segmentation network with single-step conditional diffusion models

Wentao Qu, Jing Wang, YongShun Gong, Xiaoshui Huang, and Liang Xiao. An end-to-end robust point cloud semantic segmentation network with single-step conditional diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 27325–27335, 2025. 3

work page 2025
[54]

LiDAR- aid Inertial Poser: Large-scale Human Motion Capture by Sparse Inertial and LiDAR Sensors.IEEE TVCG, 29(5): 2337–2347, 2023

Yiming Ren, Chengfeng Zhao, Yannan He, Peishan Cong, Han Liang, Jingyi Yu, Lan Xu, and Yuexin Ma. LiDAR- aid Inertial Poser: Large-scale Human Motion Capture by Sparse Inertial and LiDAR Sensors.IEEE TVCG, 29(5): 2337–2347, 2023. 1, 2, 5, 6

work page 2023
[55]

LiveHPS++: Robust and Coherent Mo- tion Capture in Dynamic Free Environment

Yiming Ren, Xiao Han, Yichen Yao, Xiaoxiao Long, Yujing Sun, and Yuexin Ma. LiveHPS++: Robust and Coherent Mo- tion Capture in Dynamic Free Environment. InECCV, pages 127–144, 2024. 2, 3, 5, 6, 7

work page 2024
[56]

LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free En- vironment

Yiming Ren, Xiao Han, Chengfeng Zhao, Jingya Wang, Lan Xu, Jingyi Yu, and Yuexin Ma. LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free En- vironment. InCVPR, pages 1281–1291, 2024. 2, 3, 5, 6, 7

work page 2024
[57]

EgoCap: Egocentric marker-less motion capture with two fisheye cameras.ACM TOG, 35 (6):162:1–162:11, 2016

Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafut- dinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, and Christian Theobalt. EgoCap: Egocentric marker-less motion capture with two fisheye cameras.ACM TOG, 35 (6):162:1–162:11, 2016. 2

work page 2016
[58]

Fast and robust hand tracking using detection-guided optimization

Srinath Sridhar, Franziska Mueller, Antti Oulasvirta, and Christian Theobalt. Fast and robust hand tracking using detection-guided optimization. InCVPR, pages 3213–3221,

work page
[59]

Sumner, Martin Guay, and Jakob Buhmann

Justin Studer, Dhruv Agrawal, Dominik Borer, Seyed- morteza Sadat, Robert W. Sumner, Martin Guay, and Jakob Buhmann. Factorized Motion Diffusion for Precise and Character-Agnostic Motion Inbetweening. InACM TOG, pages 1–10, 2024. 3

work page 2024
[60]

HUMOF: Human Motion Forecasting in Interactive Social Scenes, 2025

Caiyi Sun, Yujing Sun, Xiao Han, Zemin Yang, Jiawei Liu, Xinge Zhu, Siu Ming Yiu, and Yuexin Ma. HUMOF: Human Motion Forecasting in Interactive Social Scenes, 2025. 3

work page 2025
[61]

Black, and Tao Mei

Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, and Tao Mei. Monocular, One-Stage, Regression of Multiple 3D People. InICCV, pages 11179–11188, 2021. 3

work page 2021
[62]

A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals

Jiangnan Tang, Jingya Wang, Kaiyang Ji, Lan Xu, Jingyi Yu, and Ye Shi. A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals. InCVPR, pages 21251–21262, 2024. 1

work page 2024
[63]

Guy Tevet, Sigal Raab, Brian Gordon, Yonatan Shafir, Daniel Cohen-Or, and Amit H. Bermano. Human Motion Diffusion Model, 2022. 3

work page 2022
[64]

Visual autoregressive modeling: Scalable image gen- eration via next-scale prediction.Adv

Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, and Liwei Wang. Visual autoregressive modeling: Scalable image gen- eration via next-scale prediction.Adv. Neural Inf. Process. Syst., 37:84839–84865, 2024. 4, 5

work page 2024
[65]

Trust Your IMU: Consequences of Ignoring the IMU Drift

Marcus Valtonen ¨Ornhag, Patrik Persson, M ˚arten Wadenb¨ack, Kalle ˚Astr¨om, and Anders Heyden. Trust Your IMU: Consequences of Ignoring the IMU Drift. In CVPR, pages 4467–4476, 2022. 2

work page 2022
[66]

Neural Discrete Representation Learn- ing, 2018

Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural Discrete Representation Learn- ing, 2018. 3

work page 2018
[67]

BodyNet: V ol- umetric Inference of 3D Human Body Shapes

G ¨ul Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, and Cordelia Schmid. BodyNet: V ol- umetric Inference of 3D Human Body Shapes. InECCV, pages 20–38, 2018. 2

work page 2018
[68]

JRDB-Pose: A Large-Scale Dataset for Multi- Person Pose Estimation and Tracking

Edward Vendrow, Duy Tho Le, Jianfei Cai, and Hamid Rezatofighi. JRDB-Pose: A Large-Scale Dataset for Multi- Person Pose Estimation and Tracking. InCVPR, pages 4811–4820, 2023. 1

work page 2023
[69]

Vicon motion capture systems.https://www

Vicon. Vicon motion capture systems.https://www. vicon.com/, 2010. Accessed: 2025-09-08. 2

work page 2010
[70]

Practical motion capture in everyday surroundings

Daniel Vlasic, Rolf Adelsberger, Giovanni Vannucci, John Barnwell, Markus Gross, Wojciech Matusik, and Jovan Popovi´c. Practical motion capture in everyday surroundings. ACM TOG, 26(3):35–es, 2007. 2

work page 2007
[71]

EnvPoser: Environment-aware Realistic Hu- man Motion Estimation from Sparse Observations with Un- certainty Modeling

Songpengcheng Xia, Yu Zhang, Zhuo Su, Xiaozheng Zheng, Zheng Lv, Guidong Wang, Yongjie Zhang, Qi Wu, Lei Chu, and Ling Pei. EnvPoser: Environment-aware Realistic Hu- man Motion Estimation from Sparse Observations with Un- certainty Modeling. InCVPR, pages 1839–1849, 2025. 1

work page 2025
[72]

FlyCap: Markerless Motion Capture Using Multiple Autonomous Flying Cameras.IEEE TVCG, 24(8):2284–2297, 2018

Lan Xu, Yebin Liu, Wei Cheng, Kaiwen Guo, Guyue Zhou, Qionghai Dai, and Lu Fang. FlyCap: Markerless Motion Capture Using Multiple Autonomous Flying Cameras.IEEE TVCG, 24(8):2284–2297, 2018. 2

work page 2018
[73]

CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions

Ming Yan, Xin Wang, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, and Cheng Wang. CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions. InCVPR, pages 12977–12988, 2023. 3

work page 2023
[74]

TransPose: Real-time 3D human translation and pose estimation with six inertial sensors.ACM TOG, 40(4):86:1–86:13, 2021

Xinyu Yi, Yuxiao Zhou, and Feng Xu. TransPose: Real-time 3D human translation and pose estimation with six inertial sensors.ACM TOG, 40(4):86:1–86:13, 2021. 2

work page 2021
[75]

3d human mesh regression with dense corre- spondence

Wang Zeng, Wanli Ouyang, Ping Luo, Wentao Liu, and Xi- aogang Wang. 3d human mesh regression with dense corre- spondence. InCVPR, 2020. 2

work page 2020
[76]

PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop

Hongwen Zhang, Yating Tian, Xinchi Zhou, Wanli Ouyang, Yebin Liu, Limin Wang, and Zhenan Sun. PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop. InICCV, pages 11426–11436,

work page
[77]

Learning 3D Human Shape and Pose From Dense Body Parts.IEEE TPAMI, 44(5):2610–2627, 2022

Hongwen Zhang, Jie Cao, Guo Lu, Wanli Ouyang, and Zhenan Sun. Learning 3D Human Shape and Pose From Dense Body Parts.IEEE TPAMI, 44(5):2610–2627, 2022. 2

work page 2022
[78]

CoMPR: Efficient point cloud dataset condensa- tion via bidirectional matching and point recycling.Pattern Recognition, 172:112494, 2026

Hongliang Zhang, Xiaoqi An, Jiawei Lian, Lei Luo, and Jian Yang. CoMPR: Efficient point cloud dataset condensa- tion via bidirectional matching and point recycling.Pattern Recognition, 172:112494, 2026. 2

work page 2026
[79]

Neighborhood-Enhanced 3D Human Pose Es- timation with Monocular LiDAR in Long-Range Outdoor Scenes.AAAI, 38(7):7169–7177, 2024

Jingyi Zhang, Qihong Mao, Guosheng Hu, Siqi Shen, and Cheng Wang. Neighborhood-Enhanced 3D Human Pose Es- timation with Monocular LiDAR in Long-Range Outdoor Scenes.AAAI, 38(7):7169–7177, 2024. 2, 5, 6

work page 2024
[80]

LiDARCapV2: 3D human pose es- timation with human–object interaction from LiDAR point clouds.Pattern Recognition, 156:110848, 2024

Jingyi Zhang, Qihong Mao, Siqi Shen, Chenglu Wen, Lan Xu, and Cheng Wang. LiDARCapV2: 3D human pose es- timation with human–object interaction from LiDAR point clouds.Pattern Recognition, 156:110848, 2024. 2

work page 2024

Showing first 80 references.

[1] [1]

Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation.AAAI, 39(2): 1755–1763, 2025

Xiaoqi An, Lin Zhao, Chen Gong, Jun Li, and Jian Yang. Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation.AAAI, 39(2): 1755–1763, 2025. 1, 2, 5

work page 2025

[2] [2]

Real-Time RGBD-Based Extended Body Pose Estimation

Renat Bashirov, Anastasia Ianina, Karim Iskakov, Yev- geniy Kononenko, Valeriya Strizhkova, Victor Lempitsky, and Alexander Vakhitov. Real-Time RGBD-Based Extended Body Pose Estimation. InWACV, pages 2807–2816, 2021. 2

work page 2021

[3] [3]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. InECCV, pages 561–578, 2016. 1, 2

work page 2016

[4] [4]

PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds, 2023

Zhongang Cai, Liang Pan, Chen Wei, Wanqi Yin, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, and Ziwei Liu. PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds, 2023. 2

work page 2023

[5] [5]

A transformer-based adaptive prototype match- ing network for few-shot semantic segmentation

Sihan Chen, Yadang Chen, Yuhui Zheng, Zhi-Xin Yang, and Enhua Wu. A transformer-based adaptive prototype match- ing network for few-shot semantic segmentation. InIJCAI,

work page

[6] [6]

Motion Capture from Inertial and Vision Sensors, 2024

Xiaodong Chen, Wu Liu, Qian Bao, Xinchen Liu, Quanwei Yang, Ruoli Dai, and Tao Mei. Motion Capture from Inertial and Vision Sensors, 2024. 2

work page 2024

[7] [7]

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Us- ing Wearable IMUs and LiDAR, 2022

Yudi Dai, Yitai Lin, Chenglu Wen, Siqi Shen, Lan Xu, Jingyi Yu, Yuexin Ma, and Cheng Wang. HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Us- ing Wearable IMUs and LiDAR, 2022. 3

work page 2022

[8] [8]

SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments

Yudi Dai, Yitai Lin, Xiping Lin, Chenglu Wen, Lan Xu, Hongwei Yi, Siqi Shen, Yuexin Ma, and Cheng Wang. SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments. InCVPR, pages 682–692, 2023. 5, 7

work page 2023

[9] [9]

HiSC4D: Human-Centered Interaction and 4D Scene Capture in Large-Scale Space Using Wearable IMUs and LiDAR.IEEE TPAMI, pages 1–18, 2024

Yudi Dai, Zhiyong Wang, Xiping Lin, Chenglu Wen, Lan Xu, Siqi Shen, Yuexin Ma, and Cheng Wang. HiSC4D: Human-Centered Interaction and 4D Scene Capture in Large-Scale Space Using Wearable IMUs and LiDAR.IEEE TPAMI, pages 1–18, 2024. 3

work page 2024

[10] [10]

Black, and Dimitrios Tzionas

Sai Kumar Dwivedi, Cordelia Schmid, Hongwei Yi, Michael J. Black, and Dimitrios Tzionas. POCO: 3D Pose and Shape Estimation with Confidence. In3DV, pages 85– 95, 2024. 3

work page 2024

[11] [11]

Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Yao Feng, and Michael J. Black. TokenHMR: Advancing Human Mesh Re- covery with a Tokenized Pose Representation. InCVPR, pages 1323–1333, 2024. 2, 3

work page 2024

[12] [12]

LiDAR-HMR: 3D human mesh recovery from LiDAR.IEEE TMM, 27:6962–6975, 2025

Bohao Fan, Wenzhao Zheng, Jianjiang Feng, and Jie Zhou. LiDAR-HMR: 3D human mesh recovery from LiDAR.IEEE TMM, 27:6962–6975, 2025. 1, 2, 6

work page 2025

[13] [13]

VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space

Gu ´enol´e Fiche, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, and Francesc Moreno-Noguer. VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space. InECCV, pages 471–490, 2025. 3

work page 2025

[14] [14]

MEGA: Masked Generative Autoencoder for Human Mesh Recovery

Gu ´enol´e Fiche, Simon Leglaive, Xavier Alameda-Pineda, and Francesc Moreno-Noguer. MEGA: Masked Generative Autoencoder for Human Mesh Recovery. InCVPR, pages 5366–5378, 2025

work page 2025

[15] [15]

Human Pose As Compositional Tokens

Zigang Geng, Chunyu Wang, Yixuan Wei, Ze Liu, Houqiang Li, and Han Hu. Human Pose As Compositional Tokens. In CVPR, pages 660–671, 2023. 3

work page 2023

[16] [16]

SGNetPose+: Stepwise Goal-Driven Networks with Pose Information for Trajectory Prediction in Autonomous Driving

Akshat Ghiya, Ali AlShami, and Jugal Kalita. SGNetPose+: Stepwise Goal-Driven Networks with Pose Information for Trajectory Prediction in Autonomous Driving. InWACV, pages 677–685, 2025. 1

work page 2025

[17] [17]

Humans in 4D: Re- constructing and Tracking Humans with Transformers

Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, and Jitendra Malik. Humans in 4D: Re- constructing and Tracking Humans with Transformers. In ICCV, pages 14783–14794, 2023. 2, 3

work page 2023

[18] [18]

CARP: Visuomotor Policy Learning via Coarse-to-Fine Au- toregressive Prediction, 2025

Zhefei Gong, Pengxiang Ding, Shangke Lyu, Siteng Huang, Mingyang Sun, Wei Zhao, Zhaoxin Fan, and Donglin Wang. CARP: Visuomotor Policy Learning via Coarse-to-Fine Au- toregressive Prediction, 2025. 3, 4, 5

work page 2025

[19] [19]

HoloPose: Holistic 3D Human Reconstruction In-The-Wild

Riza Alp Guler and Iasonas Kokkinos. HoloPose: Holistic 3D Human Reconstruction In-The-Wild. InCVPR, pages 10884–10894, 2019. 2

work page 2019

[20] [20]

DensePose: Dense Human Pose Estimation in the Wild

Rıza Alp G ¨uler, Natalia Neverova, and Iasonas Kokkinos. DensePose: Dense Human Pose Estimation in the Wild. In CVPR, pages 7297–7306, 2018. 2

work page 2018

[21] [21]

TM2T: Stochastic and Tokenized Modeling for the Reciprocal Gen- eration of 3D Human Motions and Texts

Chuan Guo, Xinxin Zuo, Sen Wang, and Li Cheng. TM2T: Stochastic and Tokenized Modeling for the Reciprocal Gen- eration of 3D Human Motions and Texts. InECCV, pages 580–597, 2022. 3

work page 2022

[22] [22]

MoMask: Generative Masked Mod- eling of 3D Human Motions

Chuan Guo, Yuxuan Mu, Muhammad Gohar Javed, Sen Wang, and Li Cheng. MoMask: Generative Masked Mod- eling of 3D Human Motions. InCVPR, pages 1900–1910,

work page 1900

[23] [23]

STGCN: A Spatial-Temporal Aware Graph Learning Method for POI Recommendation

Haoyu Han, Mengdi Zhang, Min Hou, Fuzheng Zhang, Zhongyuan Wang, Enhong Chen, Hongwei Wang, Jianhui Ma, and Qi Liu. STGCN: A Spatial-Temporal Aware Graph Learning Method for POI Recommendation. InICDM, pages 1052–1057, 2020. 5

work page 2020

[24] [24]

Black, Otmar Hilliges, and Gerard Pons-Moll

Yinghao Huang, Manuel Kaufmann, Emre Aksan, Michael J. Black, Otmar Hilliges, and Gerard Pons-Moll. Deep inertial poser: Learning to reconstruct human pose from sparse in- ertial measurements in real time.ACM TOG, 37(6):185:1– 185:15, 2018. 1, 2

work page 2018

[25] [25]

MOVIN: Real-time Motion Capture using a Single LiDAR.Computer Graphics Forum, 42(7):e14961, 2023

Deok-Kyeong Jang, Dongseok Yang, Deok-Yun Jang, Bye- oli Choi, Taeil Jin, and Sung-Hee Lee. MOVIN: Real-time Motion Capture using a Single LiDAR.Computer Graphics Forum, 42(7):e14961, 2023. 3, 5, 6

work page 2023

[26] [26]

Multi-agent Long-term 3D Human Pose Forecasting via Interaction- aware Trajectory Conditioning

Jaewoo Jeong, Daehee Park, and Kuk-Jin Yoon. Multi-agent Long-term 3D Human Pose Forecasting via Interaction- aware Trajectory Conditioning. InCVPR, pages 1617–1628,

work page

[27] [27]

End-to-end recovery of human shape and pose

Angjoo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. End-to-end recovery of human shape and pose. InCVPR, pages 7122–7131, 2018. 2, 3

work page 2018

[28] [28]

Zhang, Panna Felsen, and Jiten- dra Malik

Angjoo Kanazawa, Jason Y . Zhang, Panna Felsen, and Jiten- dra Malik. Learning 3d human dynamics from video. In CVPR, 2019. 2

work page 2019

[29] [29]

Kanko, Elise K

Robert M. Kanko, Elise K. Laende, Elysia M. Davis, W. Scott Selbie, and Kevin J. Deluzio. Concurrent assess- ment of gait kinematics using marker-based and marker- less motion capture.Journal of Biomechanics, 127:110665,

work page

[30] [30]

Sampling is mat- ter: Point-guided 3d human mesh reconstruction

Jeonghwan Kim, Mi-Gyeong Gwon, Hyunwoo Park, Hyuk- min Kwon, Gi-Mun Um, and Wonjun Kim. Sampling is mat- ter: Point-guided 3d human mesh reconstruction. InCVPR, pages 12880–12889, 2023. 2

work page 2023

[31] [31]

Muhammed Kocabas, Nikos Athanasiou, and Michael J. Black. VIBE: Video Inference for Human Body Pose and Shape Estimation. InCVPR, pages 5252–5262, 2020. 3

work page 2020

[32] [32]

Akhloufi

Franc ¸ois-Guillaume Landry and Moulay A. Akhloufi. Pre- dicting Pedestrian Crossing Intention in Autonomous Vehi- cles: A Review.Neurocomputing, 618:129105, 2025. 1

work page 2025

[33] [33]

HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estima- tion

Jiefeng Li, Chao Xu, Zhicun Chen, Siyuan Bian, Lixin Yang, and Cewu Lu. HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estima- tion. InCVPR, pages 3383–3393, 2021. 2

work page 2021

[34] [34]

Li- DARCap: Long-range Markerless 3D Human Motion Cap- ture with LiDAR Point Clouds

Jialian Li, Jingyi Zhang, Zhiyong Wang, Siqi Shen, Chenglu Wen, Yuexin Ma, Lan Xu, Jingyi Yu, and Cheng Wang. Li- DARCap: Long-range Markerless 3D Human Motion Cap- ture with LiDAR Point Clouds. InCVPR, pages 20470– 20480, 2022. 2, 5, 6, 7, 8

work page 2022

[35] [35]

NIKI: Neural Inverse Kinematics with Invert- ible Neural Networks for 3D Human Pose and Shape Esti- mation

Jiefeng Li, Siyuan Bian, Qi Liu, Jiasheng Tang, Fan Wang, and Cewu Lu. NIKI: Neural Inverse Kinematics with Invert- ible Neural Networks for 3D Human Pose and Shape Esti- mation. InCVPR, pages 12933–12942, 2023. 2

work page 2023

[36] [36]

HybrIK-X: Hybrid Analytical-Neural Inverse Kinematics for Whole-Body Mesh Recovery.IEEE TPAMI, 47(4):2754–2769, 2025

Jiefeng Li, Siyuan Bian, Chao Xu, Zhicun Chen, Lixin Yang, and Cewu Lu. HybrIK-X: Hybrid Analytical-Neural Inverse Kinematics for Whole-Body Mesh Recovery.IEEE TPAMI, 47(4):2754–2769, 2025. 1, 2

work page 2025

[37] [37]

CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation

Zhihao Li, Jianzhuang Liu, Zhensong Zhang, Songcen Xu, and Youliang Yan. CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation. In ECCV, pages 590–606, 2022. 3

work page 2022

[38] [38]

Cross-Modal Driven Object Restoration for 3D Point Cloud Backdoor Defense.IEEE Transactions on Information Forensics and Security, 20:11006–11018, 2025

Jiawei Lian, Xia Du, Jianghua Liu, Le Hui, and Jian Yang. Cross-Modal Driven Object Restoration for 3D Point Cloud Backdoor Defense.IEEE Transactions on Information Forensics and Security, 20:11006–11018, 2025. 1

work page 2025

[39] [39]

End-to-End Hu- man Pose and Mesh Reconstruction with Transformers

Kevin Lin, Lijuan Wang, and Zicheng Liu. End-to-End Hu- man Pose and Mesh Reconstruction with Transformers. In CVPR, pages 1954–1963, 2021. 2

work page 1954

[40] [40]

Progressive Pretext Task Learning for Human Tra- jectory Prediction

Xiaotong Lin, Tianming Liang, Jianhuang Lai, and Jian- Fang Hu. Progressive Pretext Task Learning for Human Tra- jectory Prediction. InECCV, pages 197–214, 2025. 3

work page 2025

[41] [41]

V oteHMR: Occlusion- Aware V oting Network for Robust 3D Human Mesh Recov- ery from Partial Point Clouds

Guanze Liu, Yu Rong, and Lu Sheng. V oteHMR: Occlusion- Aware V oting Network for Robust 3D Human Mesh Recov- ery from Partial Point Clouds. InACM MM, pages 955–964,

work page

[42] [42]

Markerless motion cap- ture of multiple characters using multiview image segmenta- tion.IEEE TPAMI, 35(11):2720–2735, 2013

Yebin Liu, Juergen Gall, Carsten Stoll, Qionghai Dai, Hans- Peter Seidel, and Christian Theobalt. Markerless motion cap- ture of multiple characters using multiview image segmenta- tion.IEEE TPAMI, 35(11):2720–2735, 2013. 2

work page 2013

[43] [43]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. SMPL: A skinned multi- person linear model.ACM TOG, 34(6):248:1–248:16, 2015. 2

work page 2015

[44] [44]

Decoupled Weight Decay Regularization, 2019

Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization, 2019. 5

work page 2019

[45] [45]

Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Mo- tion Prediction

Tiezheng Ma, Yongwei Nie, Chengjiang Long, Qing Zhang, and Guiqing Li. Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Mo- tion Prediction. InCVPR, pages 6437–6446, 2022. 3

work page 2022

[46] [46]

V2V-PoseNet: V oxel-to-V oxel Prediction Network for Ac- curate 3D Hand and Human Pose Estimation From a Single Depth Map

Gyeongsik Moon, Ju Yong Chang, and Kyoung Mu Lee. V2V-PoseNet: V oxel-to-V oxel Prediction Network for Ac- curate 3D Hand and Human Pose Estimation From a Single Depth Map. InCVPR, pages 5079–5088, 2018. 2

work page 2018

[47] [47]

Waslander

Barza Nisar and Steven L. Waslander. PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds. InCVPR, pages 6670–6679, 2025. 2

work page 2025

[48] [48]

Fusion of Multiple Lidars and Inertial Sen- sors for the Real-Time Pose Tracking of Human Motion

Ashok Kumar Patil, Adithya Balasubramanyam, Jae Yeong Ryu, Pavan Kumar B N, Bharatesh Chakravarthi, and Young Ho Chai. Fusion of Multiple Lidars and Inertial Sen- sors for the Real-Time Pose Tracking of Human Motion. Sensors, 20(18):5342, 2020. 2

work page 2020

[49] [49]

ANIM: Accurate Neural Implicit Model for Human Reconstruction from a sin- gle RGB-D Image

Marco Pesavento, Yuanlu Xu, Nikolaos Sarafianos, Robert Maier, Ziyan Wang, Chun-Han Yao, Marco V olino, Edmond Boyer, Adrian Hilton, and Tony Tung. ANIM: Accurate Neural Implicit Model for Human Reconstruction from a sin- gle RGB-D Image. InCVPR, pages 5448–5458, 2024. 1, 2

work page 2024

[50] [50]

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. InNeurIPS, 2017. 4, 5

work page 2017

[51] [51]

A conditional denoising diffusion proba- bilistic model for point cloud upsampling

Wentao Qu, Yuantian Shao, Lingwu Meng, Xiaoshui Huang, and Liang Xiao. A conditional denoising diffusion proba- bilistic model for point cloud upsampling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20786–20795, 2024. 2

work page 2024

[52] [52]

Robust single-stage fully sparse 3d object detection via detachable latent diffusion.arXiv preprint arXiv:2508.03252, 2025

Wentao Qu, Guofeng Mei, Jing Wang, Yujiao Wu, Xiaoshui Huang, and Liang Xiao. Robust single-stage fully sparse 3d object detection via detachable latent diffusion.arXiv preprint arXiv:2508.03252, 2025. 3

work page arXiv 2025

[53] [53]

An end-to-end robust point cloud semantic segmentation network with single-step conditional diffusion models

Wentao Qu, Jing Wang, YongShun Gong, Xiaoshui Huang, and Liang Xiao. An end-to-end robust point cloud semantic segmentation network with single-step conditional diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 27325–27335, 2025. 3

work page 2025

[54] [54]

LiDAR- aid Inertial Poser: Large-scale Human Motion Capture by Sparse Inertial and LiDAR Sensors.IEEE TVCG, 29(5): 2337–2347, 2023

Yiming Ren, Chengfeng Zhao, Yannan He, Peishan Cong, Han Liang, Jingyi Yu, Lan Xu, and Yuexin Ma. LiDAR- aid Inertial Poser: Large-scale Human Motion Capture by Sparse Inertial and LiDAR Sensors.IEEE TVCG, 29(5): 2337–2347, 2023. 1, 2, 5, 6

work page 2023

[55] [55]

LiveHPS++: Robust and Coherent Mo- tion Capture in Dynamic Free Environment

Yiming Ren, Xiao Han, Yichen Yao, Xiaoxiao Long, Yujing Sun, and Yuexin Ma. LiveHPS++: Robust and Coherent Mo- tion Capture in Dynamic Free Environment. InECCV, pages 127–144, 2024. 2, 3, 5, 6, 7

work page 2024

[56] [56]

LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free En- vironment

Yiming Ren, Xiao Han, Chengfeng Zhao, Jingya Wang, Lan Xu, Jingyi Yu, and Yuexin Ma. LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free En- vironment. InCVPR, pages 1281–1291, 2024. 2, 3, 5, 6, 7

work page 2024

[57] [57]

EgoCap: Egocentric marker-less motion capture with two fisheye cameras.ACM TOG, 35 (6):162:1–162:11, 2016

Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafut- dinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, and Christian Theobalt. EgoCap: Egocentric marker-less motion capture with two fisheye cameras.ACM TOG, 35 (6):162:1–162:11, 2016. 2

work page 2016

[58] [58]

Fast and robust hand tracking using detection-guided optimization

Srinath Sridhar, Franziska Mueller, Antti Oulasvirta, and Christian Theobalt. Fast and robust hand tracking using detection-guided optimization. InCVPR, pages 3213–3221,

work page

[59] [59]

Sumner, Martin Guay, and Jakob Buhmann

Justin Studer, Dhruv Agrawal, Dominik Borer, Seyed- morteza Sadat, Robert W. Sumner, Martin Guay, and Jakob Buhmann. Factorized Motion Diffusion for Precise and Character-Agnostic Motion Inbetweening. InACM TOG, pages 1–10, 2024. 3

work page 2024

[60] [60]

HUMOF: Human Motion Forecasting in Interactive Social Scenes, 2025

Caiyi Sun, Yujing Sun, Xiao Han, Zemin Yang, Jiawei Liu, Xinge Zhu, Siu Ming Yiu, and Yuexin Ma. HUMOF: Human Motion Forecasting in Interactive Social Scenes, 2025. 3

work page 2025

[61] [61]

Black, and Tao Mei

Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, and Tao Mei. Monocular, One-Stage, Regression of Multiple 3D People. InICCV, pages 11179–11188, 2021. 3

work page 2021

[62] [62]

A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals

Jiangnan Tang, Jingya Wang, Kaiyang Ji, Lan Xu, Jingyi Yu, and Ye Shi. A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals. InCVPR, pages 21251–21262, 2024. 1

work page 2024

[63] [63]

Guy Tevet, Sigal Raab, Brian Gordon, Yonatan Shafir, Daniel Cohen-Or, and Amit H. Bermano. Human Motion Diffusion Model, 2022. 3

work page 2022

[64] [64]

Visual autoregressive modeling: Scalable image gen- eration via next-scale prediction.Adv

Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, and Liwei Wang. Visual autoregressive modeling: Scalable image gen- eration via next-scale prediction.Adv. Neural Inf. Process. Syst., 37:84839–84865, 2024. 4, 5

work page 2024

[65] [65]

Trust Your IMU: Consequences of Ignoring the IMU Drift

Marcus Valtonen ¨Ornhag, Patrik Persson, M ˚arten Wadenb¨ack, Kalle ˚Astr¨om, and Anders Heyden. Trust Your IMU: Consequences of Ignoring the IMU Drift. In CVPR, pages 4467–4476, 2022. 2

work page 2022

[66] [66]

Neural Discrete Representation Learn- ing, 2018

Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural Discrete Representation Learn- ing, 2018. 3

work page 2018

[67] [67]

BodyNet: V ol- umetric Inference of 3D Human Body Shapes

G ¨ul Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, and Cordelia Schmid. BodyNet: V ol- umetric Inference of 3D Human Body Shapes. InECCV, pages 20–38, 2018. 2

work page 2018

[68] [68]

JRDB-Pose: A Large-Scale Dataset for Multi- Person Pose Estimation and Tracking

Edward Vendrow, Duy Tho Le, Jianfei Cai, and Hamid Rezatofighi. JRDB-Pose: A Large-Scale Dataset for Multi- Person Pose Estimation and Tracking. InCVPR, pages 4811–4820, 2023. 1

work page 2023

[69] [69]

Vicon motion capture systems.https://www

Vicon. Vicon motion capture systems.https://www. vicon.com/, 2010. Accessed: 2025-09-08. 2

work page 2010

[70] [70]

Practical motion capture in everyday surroundings

Daniel Vlasic, Rolf Adelsberger, Giovanni Vannucci, John Barnwell, Markus Gross, Wojciech Matusik, and Jovan Popovi´c. Practical motion capture in everyday surroundings. ACM TOG, 26(3):35–es, 2007. 2

work page 2007

[71] [71]

EnvPoser: Environment-aware Realistic Hu- man Motion Estimation from Sparse Observations with Un- certainty Modeling

Songpengcheng Xia, Yu Zhang, Zhuo Su, Xiaozheng Zheng, Zheng Lv, Guidong Wang, Yongjie Zhang, Qi Wu, Lei Chu, and Ling Pei. EnvPoser: Environment-aware Realistic Hu- man Motion Estimation from Sparse Observations with Un- certainty Modeling. InCVPR, pages 1839–1849, 2025. 1

work page 2025

[72] [72]

FlyCap: Markerless Motion Capture Using Multiple Autonomous Flying Cameras.IEEE TVCG, 24(8):2284–2297, 2018

Lan Xu, Yebin Liu, Wei Cheng, Kaiwen Guo, Guyue Zhou, Qionghai Dai, and Lu Fang. FlyCap: Markerless Motion Capture Using Multiple Autonomous Flying Cameras.IEEE TVCG, 24(8):2284–2297, 2018. 2

work page 2018

[73] [73]

CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions

Ming Yan, Xin Wang, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, and Cheng Wang. CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions. InCVPR, pages 12977–12988, 2023. 3

work page 2023

[74] [74]

TransPose: Real-time 3D human translation and pose estimation with six inertial sensors.ACM TOG, 40(4):86:1–86:13, 2021

Xinyu Yi, Yuxiao Zhou, and Feng Xu. TransPose: Real-time 3D human translation and pose estimation with six inertial sensors.ACM TOG, 40(4):86:1–86:13, 2021. 2

work page 2021

[75] [75]

3d human mesh regression with dense corre- spondence

Wang Zeng, Wanli Ouyang, Ping Luo, Wentao Liu, and Xi- aogang Wang. 3d human mesh regression with dense corre- spondence. InCVPR, 2020. 2

work page 2020

[76] [76]

PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop

Hongwen Zhang, Yating Tian, Xinchi Zhou, Wanli Ouyang, Yebin Liu, Limin Wang, and Zhenan Sun. PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop. InICCV, pages 11426–11436,

work page

[77] [77]

Learning 3D Human Shape and Pose From Dense Body Parts.IEEE TPAMI, 44(5):2610–2627, 2022

Hongwen Zhang, Jie Cao, Guo Lu, Wanli Ouyang, and Zhenan Sun. Learning 3D Human Shape and Pose From Dense Body Parts.IEEE TPAMI, 44(5):2610–2627, 2022. 2

work page 2022

[78] [78]

CoMPR: Efficient point cloud dataset condensa- tion via bidirectional matching and point recycling.Pattern Recognition, 172:112494, 2026

Hongliang Zhang, Xiaoqi An, Jiawei Lian, Lei Luo, and Jian Yang. CoMPR: Efficient point cloud dataset condensa- tion via bidirectional matching and point recycling.Pattern Recognition, 172:112494, 2026. 2

work page 2026

[79] [79]

Neighborhood-Enhanced 3D Human Pose Es- timation with Monocular LiDAR in Long-Range Outdoor Scenes.AAAI, 38(7):7169–7177, 2024

Jingyi Zhang, Qihong Mao, Guosheng Hu, Siqi Shen, and Cheng Wang. Neighborhood-Enhanced 3D Human Pose Es- timation with Monocular LiDAR in Long-Range Outdoor Scenes.AAAI, 38(7):7169–7177, 2024. 2, 5, 6

work page 2024

[80] [80]

LiDARCapV2: 3D human pose es- timation with human–object interaction from LiDAR point clouds.Pattern Recognition, 156:110848, 2024

Jingyi Zhang, Qihong Mao, Siqi Shen, Chenglu Wen, Lan Xu, and Cheng Wang. LiDARCapV2: 3D human pose es- timation with human–object interaction from LiDAR point clouds.Pattern Recognition, 156:110848, 2024. 2

work page 2024