pith. sign in

arxiv: 2605.22631 · v1 · pith:K2OHEBOQnew · submitted 2026-05-21 · 💻 cs.CV

AtomicMotion: Learning Human Motion From Different Human Parts

Pith reviewed 2026-05-22 06:26 UTC · model grok-4.3

classification 💻 cs.CV
keywords human motion reconstructionsparse pose estimationbody partitioningkinematic attentionfull-body synthesisAR/VR telepresencebiomechanical constraintsAMASS dataset
0
0 comments X

The pith

AtomicMotion reconstructs full-body poses from sparse head and hand data by splitting the skeleton into five functional clusters and embedding kinematic structure into attention.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the problem of turning limited head and hand movement data into accurate full-body poses for AR/VR systems. It claims that treating the whole body as one unit causes accumulated errors and unnatural coordination because it misses local movement goals and body structure. To fix this, AtomicMotion splits the skeleton into five clusters chosen by how body parts are meant to move together, trains the model with masked full-body examples so it learns overall constraints, and replaces standard attention with a version that respects the fixed connections between joints. A reader would care if this makes virtual telepresence feel more natural without needing sensors on every joint.

Core claim

AtomicMotion decouples body dynamics by first decomposing the skeleton into five distinct clusters based on functional intent so each keeps its internal joint synergies while isolating local motion primitives, then applies masked full-body pre-conditioning during training to force internalization of global skeletal topology and latent kinematic constraints, and finally replaces vanilla spatial attention with Kinematic Attention that embeds the classical kinematic tree structure to guarantee biological plausibility, resulting in higher reconstruction fidelity and superior biomechanical realism on the AMASS dataset.

What carries the argument

AtomicMotion framework built around logical body partitioning into five functional clusters, masked full-body pre-conditioning, and Kinematic Attention that injects the fixed kinematic tree into the attention computation.

If this is right

  • Local motion primitives can be learned separately in each cluster without breaking overall coordination when the partitions respect functional intent.
  • Masked pre-conditioning teaches the network to fill in missing body parts from sparse signals by internalizing global constraints.
  • Kinematic Attention prevents generation of physiologically impossible joint configurations by enforcing the fixed tree structure.
  • The combination yields measurably higher fidelity and biomechanical realism than monolithic baselines on standard motion datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same cluster-based split could be tested on other sparse-input problems such as full-body prediction from only foot or torso data.
  • If the method scales, it could lower the sensor count needed for convincing VR telepresence sessions.
  • Dynamic re-clustering during inference might further improve results for activities that change which body parts move together.
  • Integration with physics simulators could add collision and balance checks that the current kinematic attention does not explicitly enforce.

Load-bearing premise

That decomposing the skeleton into five clusters based on functional intent preserves internal joint synergies while isolating local motion primitives.

What would settle it

If a controlled test on AMASS motions shows that AtomicMotion produces joint angles or coordination patterns that violate known physiological limits more often than a single-model baseline, or if removing the five-cluster split causes no drop in fidelity, the central claim would fail.

Figures

Figures reproduced from arXiv: 2605.22631 by Chuhua Xian, Fa-Ting Hong, Runzhen Liu.

Figure 1
Figure 1. Figure 1: We introduce the Structured Intent Atomization philosophy to conceptualize complex [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of our method framework. Masked Pose Modeling Guidance helps the model [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of Global Synchronized Modulation and Temporal-Kinematic Block. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of dual-branch Kinematic Attention. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visual comparisons on the AMASS-P1 in the standard mode. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visual comparisons on the AMASS-P1 in the hand tracking mode. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparison of ablation studies on the AMASS-P1. [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
read the original abstract

Accurately reconstructing full-body poses from sparse head and hand trajectories is a foundational challenge for immersive AR/VR telepresence. Current methods often struggle with error accumulation and unnatural joint coordination, primarily because they treat the human body as a monolithic entity, thereby failing to capture the fine-grained ``atomic intents'' embedded in subtle signal variations and overlooking the inherent structural topology. To bridge this gap, we present AtomicMotion, a framework designed to decouple and re-integrate body dynamics through three core innovations. First, we introduce a logical body partitioning scheme that decomposes the skeleton into five distinct clusters based on functional intent; this ensures that each partition preserves internal joint synergies while isolating local motion primitives. Second, to robustly map sparse inputs to high-dimensional poses, we employ a masked full-body pre-conditioning strategy during training, forcing the model to internalize global skeletal topology and latent kinematic constraints. Finally, addressing the limitations of vanilla spatial attention, which often ignores fixed physiological connectivity, we propose Kinematic Attention. By embedding the classical kinematic tree structure into the attention mechanism, we ensure biological plausibility in the synthesized motions. Extensive evaluations on the AMASS dataset demonstrate that AtomicMotion significantly outperforms existing baselines, yielding higher reconstruction fidelity and superior biomechanical realism.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents AtomicMotion, a framework for reconstructing full-body human poses from sparse head and hand trajectories. It introduces three innovations: a logical partitioning of the skeleton into five clusters based on functional intent to isolate local motion primitives while preserving internal synergies; a masked full-body pre-conditioning strategy during training to enforce global skeletal topology; and Kinematic Attention that embeds the classical kinematic tree into the attention mechanism for biological plausibility. The authors claim that these yield significantly higher reconstruction fidelity and biomechanical realism than existing baselines on the AMASS dataset.

Significance. If the central claims are substantiated with quantitative evidence, the work could advance AR/VR telepresence by addressing error accumulation and unnatural coordination through explicit incorporation of functional partitioning and kinematic constraints. The Kinematic Attention mechanism represents a concrete attempt to inject domain structure into attention, which is a potential strength. The manuscript does not report machine-checked proofs, reproducible code, or parameter-free derivations.

major comments (2)
  1. [Abstract] Abstract: The claim that AtomicMotion 'significantly outperforms existing baselines, yielding higher reconstruction fidelity and superior biomechanical realism' is stated without any quantitative metrics, ablation results, error bars, comparison tables, or figures. This absence is load-bearing because it prevents any evaluation of the central empirical claim.
  2. [Method overview / logical body partitioning] Logical body partitioning (described in the abstract and method overview): The five-cluster decomposition based on functional intent is introduced without derivation, optimality argument, or empirical test demonstrating that cross-cluster synergies (e.g., coordinated arm-torso rotation or hand-to-head reaching) are preserved. Because masked pre-conditioning and Kinematic Attention are applied only after this fixed partition, any loss of dependencies at this step directly undermines the biomechanical-realism claim.
minor comments (2)
  1. [Abstract] The term 'atomic intents' is used without a precise definition or link to the subsequent clustering.
  2. [Experiments] Ensure that all baseline methods referenced in the (unseen) experimental section are accompanied by explicit citations and implementation details for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment point by point below. Where the comments identify areas needing greater substantiation or clarity, we have revised the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that AtomicMotion 'significantly outperforms existing baselines, yielding higher reconstruction fidelity and superior biomechanical realism' is stated without any quantitative metrics, ablation results, error bars, comparison tables, or figures. This absence is load-bearing because it prevents any evaluation of the central empirical claim.

    Authors: We agree that the abstract would benefit from including concrete quantitative support for the performance claims. The main body of the manuscript already contains detailed comparisons, ablation studies, error bars, and tables on the AMASS dataset. In the revised version we have updated the abstract to report key metrics (e.g., MPJPE reductions relative to baselines) and to explicitly reference the supporting tables and figures. This change makes the central empirical claims directly evaluable from the abstract. revision: yes

  2. Referee: [Method overview / logical body partitioning] Logical body partitioning (described in the abstract and method overview): The five-cluster decomposition based on functional intent is introduced without derivation, optimality argument, or empirical test demonstrating that cross-cluster synergies (e.g., coordinated arm-torso rotation or hand-to-head reaching) are preserved. Because masked pre-conditioning and Kinematic Attention are applied only after this fixed partition, any loss of dependencies at this step directly undermines the biomechanical-realism claim.

    Authors: The partitioning into five clusters is motivated by standard functional and anatomical groupings used in biomechanics and animation literature, chosen to isolate local motion primitives while the subsequent global mechanisms (masked full-body pre-conditioning and Kinematic Attention operating over the full kinematic tree) are intended to restore cross-cluster coordination. We acknowledge that the original text provided limited explicit justification and validation for this choice. In the revision we have added a dedicated paragraph deriving the clusters from functional intent, citing relevant biomechanical references, and presenting an ablation study that compares our partitioning against random and alternative clusterings on metrics of coordinated motion (e.g., arm-torso and hand-head correlations). These additions demonstrate that the chosen partition, combined with the global components, preserves the cited synergies. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the claimed derivation chain.

full rationale

The paper introduces AtomicMotion via three explicit design choices—a logical partitioning of the skeleton into five functional clusters, masked full-body pre-conditioning, and Kinematic Attention—each presented as an innovation to address stated limitations of monolithic modeling. Performance claims rest on empirical results on the AMASS dataset rather than any derivation that reduces the reported gains in fidelity or biomechanical realism to a fitted parameter or self-referential definition. No equations appear that equate a prediction to its own input by construction, and no load-bearing premise is justified solely by a self-citation whose content is itself unverified. The central assertions therefore remain independent of the target outcomes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the domain assumption that functional clustering of joints is meaningful and on the introduction of a new attention variant whose benefits are asserted rather than derived from first principles.

axioms (1)
  • domain assumption The human skeleton can be decomposed into five distinct clusters based on functional intent that preserve internal joint synergies while isolating local motion primitives.
    Presented as the first core innovation in the abstract.
invented entities (1)
  • Kinematic Attention no independent evidence
    purpose: Embedding the classical kinematic tree structure into the attention mechanism to enforce biological plausibility.
    Introduced to address limitations of vanilla spatial attention.

pith-pipeline@v0.9.0 · 5749 in / 1173 out tokens · 50996 ms · 2026-05-22T06:26:49.518481+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages

  1. [1]

    Avatarposer: Articulated full-body pose tracking from sparse motion sensing

    Jiaxi Jiang, Paul Streli, Huajian Qiu, Andreas Fender, Larissa Laich, Patrick Snape, and Christian Holz. Avatarposer: Articulated full-body pose tracking from sparse motion sensing. InECCV, 2022

  2. [2]

    Egoposer: Robust real-time egocentric pose estimation from sparse and intermittent observations everywhere

    Jiaxi Jiang, Paul Streli, Manuel Meier, and Christian Holz. Egoposer: Robust real-time egocentric pose estimation from sparse and intermittent observations everywhere. InECCV, 2024

  3. [3]

    Realistic full-body tracking from sparse observations via joint-level modeling

    Xiaozheng Zheng, Zhuo Su, Chao Wen, Zhou Xue, and Xiaojie Jin. Realistic full-body tracking from sparse observations via joint-level modeling. InICCV, 2023. 9 GT Ours w/o MPMG w/o Intent Branch w/o Structual Branch Seq1Seq2Seq3Seq4 Figure 7: Visual comparison of ablation studies on the AMASS-P1

  4. [4]

    Stratified avatar generation from sparse observations

    Han Feng, Wenchao Ma, Quankai Gao, Xianwei Zheng, Nan Xue, and Huijuan Xu. Stratified avatar generation from sparse observations. InCVPR, 2024

  5. [5]

    John wiley & sons, 2009

    David A Winter.Biomechanics and motor control of human movement. John wiley & sons, 2009

  6. [6]

    Human kinetics, 2008

    Roger M Enoka.Neuromechanics of human movement. Human kinetics, 2008

  7. [7]

    Human kinetics, 2002

    Vladimir M Zatsiorsky.Kinetics of human motion. Human kinetics, 2002

  8. [8]

    Hierarchical recurrent neural network for skeleton based action recognition

    Yong Du, Wei Wang, and Liang Wang. Hierarchical recurrent neural network for skeleton based action recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1110–1118, 2015

  9. [9]

    Part-level graph convolutional network for skeleton-based action recognition

    Linjiang Huang, Yan Huang, Wanli Ouyang, and Liang Wang. Part-level graph convolutional network for skeleton-based action recognition. InAAAI, 2020

  10. [10]

    Skeleton-parted graph scattering networks for 3d human motion prediction

    Maosen Li, Siheng Chen, Zijing Zhang, Lingxi Xie, Qi Tian, and Ya Zhang. Skeleton-parted graph scattering networks for 3d human motion prediction. InECCV, 2022

  11. [11]

    Amass: Archive of motion capture as surface shapes

    Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Gerard Pons-Moll, and Michael J Black. Amass: Archive of motion capture as surface shapes. InICCV, 2019

  12. [12]

    A complete generalized solution to the inverse kinematics of robots.IEEE Journal on Robotics and Automation, 2003

    Andrew Goldenberg, Beno Benhabib, and Robert Fenton. A complete generalized solution to the inverse kinematics of robots.IEEE Journal on Robotics and Automation, 2003

  13. [13]

    URL https://assetstore.unity.com/packages/tools/animation/ final-ik-14290

    RootMotion Final IK., 2018. URL https://assetstore.unity.com/packages/tools/animation/ final-ik-14290

  14. [14]

    Human upper-body inverse kinematics for increased embodiment in consumer-grade virtual reality

    Mathias Parger, Joerg H Mueller, Dieter Schmalstieg, and Markus Steinberger. Human upper-body inverse kinematics for increased embodiment in consumer-grade virtual reality. InProceedings of the 24th ACM symposium on virtual reality software and technology, 2018

  15. [15]

    Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time

    Yinghao Huang, Manuel Kaufmann, Emre Aksan, Michael J Black, Otmar Hilliges, and Gerard Pons-Moll. Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. TOG, 2018

  16. [16]

    Transformer inertial poser: Real-time human motion reconstruction from sparse imus with simultaneous terrain generation

    Yifeng Jiang, Yuting Ye, Deepak Gopinath, Jungdam Won, Alexander W Winkler, and C Karen Liu. Transformer inertial poser: Real-time human motion reconstruction from sparse imus with simultaneous terrain generation. InSIGGRAPH Asia, 2022

  17. [17]

    Sparse inertial poser: Automatic 3d human pose estimation from sparse imus

    Timo V on Marcard, Bodo Rosenhahn, Michael J Black, and Gerard Pons-Moll. Sparse inertial poser: Automatic 3d human pose estimation from sparse imus. InComputer graphics forum. Wiley Online Library, 2017

  18. [18]

    Transpose: Real-time 3d human translation and pose estimation with six inertial sensors.TOG, 2021

    Xinyu Yi, Yuxiao Zhou, and Feng Xu. Transpose: Real-time 3d human translation and pose estimation with six inertial sensors.TOG, 2021. 10

  19. [19]

    Physical inertial poser (pip): Physics-aware real-time human motion tracking from sparse inertial sensors

    Xinyu Yi, Yuxiao Zhou, Marc Habermann, Soshi Shimada, Vladislav Golyanik, Christian Theobalt, and Feng Xu. Physical inertial poser (pip): Physics-aware real-time human motion tracking from sparse inertial sensors. InCVPR, 2022

  20. [20]

    Coolmoves: User motion accentuation in virtual reality.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2021

    Karan Ahuja, Eyal Ofek, Mar Gonzalez-Franco, Christian Holz, and Andrew D Wilson. Coolmoves: User motion accentuation in virtual reality.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2021

  21. [21]

    Lobstr: Real-time lower-body pose prediction from sparse upper-body tracking signals

    Dongseok Yang, Doyeon Kim, and Sung-Hee Lee. Lobstr: Real-time lower-body pose prediction from sparse upper-body tracking signals. InComputer Graphics F orum, 2021

  22. [22]

    Full-body motion from a single head-mounted device: Generating smpl poses from partial observations

    Andrea Dittadi, Sebastian Dziadzio, Darren Cosker, Ben Lundell, Thomas J Cashman, and Jamie Shotton. Full-body motion from a single head-mounted device: Generating smpl poses from partial observations. In ICCV, 2021

  23. [23]

    Flag: Flow-based 3d avatar generation from sparse observations

    Sadegh Aliakbarian, Pashmina Cameron, Federica Bogo, Andrew Fitzgibbon, and Thomas J Cashman. Flag: Flow-based 3d avatar generation from sparse observations. InCVPR, 2022

  24. [24]

    From sparse signal to smooth motion: Real-time motion generation with rolling prediction models

    German Barquero, Nadine Bertsch, Manojkumar Marramreddy, Carlos Chacón, Filippo Arcadu, Ferran Rigual, Nicky Sijia He, Cristina Palmero, Sergio Escalera, Yuting Ye, et al. From sparse signal to smooth motion: Real-time motion generation with rolling prediction models. InCVPR, 2025

  25. [25]

    Hmd-poser: On-device real-time human motion tracking from scalable sparse observations

    Peng Dai, Yang Zhang, Tao Liu, Zhen Fan, Tianyuan Du, Zhuo Su, Xiaozheng Zheng, and Zeming Li. Hmd-poser: On-device real-time human motion tracking from scalable sparse observations. InCVPR, 2024

  26. [26]

    Questsim: Human motion tracking from sparse sensors with simulated avatars

    Alexander Winkler, Jungdam Won, and Yuting Ye. Questsim: Human motion tracking from sparse sensors with simulated avatars. InSIGGRAPH Asia, 2022

  27. [27]

    Attention is all you need.NeurIPS, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.NeurIPS, 2017

  28. [28]

    Denoising diffusion probabilistic models.NeurIPS, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.NeurIPS, 2020

  29. [29]

    Smpl: A skinned multi-person linear model

    Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. Smpl: A skinned multi-person linear model. InSeminal Graphics Papers: Pushing the Boundaries, V olume 2, pages 851–866. 2023

  30. [30]

    On the continuity of rotation representa- tions in neural networks

    Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representa- tions in neural networks. InCVPR, 2019

  31. [31]

    Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training.NeurIPS, 2022

    Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training.NeurIPS, 2022

  32. [32]

    Motionbert: A unified perspective on learning human motion representations

    Wentao Zhu, Xiaoxuan Ma, Zhaoyang Liu, Libin Liu, Wayne Wu, and Yizhou Wang. Motionbert: A unified perspective on learning human motion representations. InICCV, 2023

  33. [33]

    Extracting and composing robust features with denoising autoencoders

    Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. InICML, 2008

  34. [34]

    Avatars grow legs: Generating smooth human motion from sparse tracking inputs with diffusion model

    Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, and Artsiom Sanakoyeu. Avatars grow legs: Generating smooth human motion from sparse tracking inputs with diffusion model. InCVPR, 2023

  35. [35]

    CMU MoCap Dataset

    Carnegie Mellon University. CMU MoCap Dataset. URLhttp://mocap.cs.cmu.edu

  36. [36]

    Decomposing biological motion: A framework for analysis and synthesis of human gait patterns.Journal of vision, 2(5):2–2, 2002

    Nikolaus F Troje. Decomposing biological motion: A framework for analysis and synthesis of human gait patterns.Journal of vision, 2(5):2–2, 2002

  37. [37]

    Documentation mocap database hdm05

    M Müller, T Röder, M Clausen, B Eberhardt, B Krüger, and A Weber. Documentation mocap database hdm05. universität bonn; bonn. Technical report, Germany: 2007. Technical Report CG-2007-2.[Google Scholar], 2007

  38. [38]

    ACCAD MoCap Dataset

    Advanced Computing Center for the Arts and Design. ACCAD MoCap Dataset. URL https://accad. osu.edu/research/motion-lab/mocap-system-and-data

  39. [39]

    Saeed Ghorbani, Kimia Mahdaviani, Anne Thaler, Konrad Kording, Douglas James Cook, Gunnar Blohm, and Nikolaus F. Troje. MoVi: A large multipurpose motion and video dataset.arXiv preprint arXiv: 2003.01888, 2020. 11

  40. [40]

    The KIT whole-body human motion database

    Christian Mandery, Ömer Terlemez, Martin Do, Nikolaus Vahrenkamp, and Tamim Asfour. The KIT whole-body human motion database. In(ICAR, 2015

  41. [41]

    Unifying rep- resentations and large-scale whole-body motion databases for studying human motion.IEEE Transactions on Robotics, 32(4):796–809, 2016

    Christian Mandery, Ömer Terlemez, Martin Do, Nikolaus Vahrenkamp, and Tamim Asfour. Unifying rep- resentations and large-scale whole-body motion databases for studying human motion.IEEE Transactions on Robotics, 32(4):796–809, 2016

  42. [42]

    The KIT bimanual manipulation dataset

    Franziska Krebs, Andre Meixner, Isabel Patzer, and Tamim Asfour. The KIT bimanual manipulation dataset. InIEEE/RAS International Conference on Humanoid Robots (Humanoids), 2021

  43. [43]

    Eyes JAPAN Co. Ltd. Eyes Japan MoCap Dataset. URLhttp://mocapdata.com

  44. [44]

    Loper, Naureen Mahmood, and Michael J

    Matthew M. Loper, Naureen Mahmood, and Michael J. Black. MoSh: Motion and shape capture from sparse markers.TOG, 2014

  45. [45]

    Ijaz Akhter and Michael J. Black. Pose-conditioned joint angle limits for 3D human pose reconstruction. InCVPR, 2015

  46. [46]

    SFU Motion Capture Database

    Simon Fraser University and National University of Singapore. SFU Motion Capture Database. URL http://mocap.cs.sfu.ca/

  47. [47]

    Total Capture: 3d human pose estimation fusing video and inertial sensors

    Matt Trumble, Andrew Gilbert, Charles Malleson, Adrian Hilton, and John Collomosse. Total Capture: 3d human pose estimation fusing video and inertial sensors. InBMVC, 2017

  48. [48]

    Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion.International journal of computer vision, 87(1):4–27, 2010

    Leonid Sigal, Alexandru O Balan, and Michael J Black. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion.International journal of computer vision, 87(1):4–27, 2010. 12 Appendices A Implementation Details The model stacks 6 TK-Blocks with an embedding dimension of 256 and 8 attention he...