How Noisy Poses Break Inverse Dynamics: Analysis and Mitigation for Video-Based Joint Torque Estimation

Chanyoung Kim; Donghyun Kim; Eunseo Jeong; Seong Jae Hwang; Youngjoong Kwon

arxiv: 2605.24776 · v1 · pith:KBXAY2ILnew · submitted 2026-05-23 · 💻 cs.CV

How Noisy Poses Break Inverse Dynamics: Analysis and Mitigation for Video-Based Joint Torque Estimation

Donghyun Kim , Chanyoung Kim , Eunseo Jeong , Youngjoong Kwon , Seong Jae Hwang This is my paper

Pith reviewed 2026-06-30 12:51 UTC · model grok-4.3

classification 💻 cs.CV

keywords inverse dynamicspose estimationjoint torquesSMPL modelnoise amplificationdifferentiable optimizationvideo-based estimation

0 comments

The pith

Pose estimation noise from monocular video is amplified roughly 1000 times when computing joint torques through numerical differentiation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that small errors in 3D body poses estimated from video lead to massive errors in calculated joint torques because differentiation magnifies high-frequency noise. A new differentiable module called SMPL-Dynamics allows the pose estimates to be refined directly to minimize torque error, cutting that error by 93 percent while barely changing the original pose. This matters because accurate torque estimates could support better motion analysis in sports, rehabilitation, and robotics without needing expensive motion capture setups. The work also finds that joints closer to the body center are far more sensitive to noise than those at the limbs, and that simple filtering before differentiation helps a lot.

Core claim

Pose noise is amplified by approximately 1,000x when computing joint torques via numerical differentiation. Proximal joints are up to 10x more sensitive than distal ones. Low-pass filtering before differentiation reduces amplification. SMPL-Dynamics enables end-to-end differentiable inverse dynamics for the SMPL model, and differentiable pose refinement using it reduces torque error by 93% with negligible pose change.

What carries the argument

SMPL-Dynamics, a fully differentiable inverse dynamics module for the SMPL body model that supports gradient computation without external simulators.

If this is right

Low-pass filtering before numerical differentiation substantially reduces torque error from pose noise.
Proximal joints like the spine and hips require more careful handling than distal joints like wrists.
Differentiable pose refinement can recover accurate torques from noisy video-based poses.
End-to-end optimization becomes possible for tasks involving both pose and dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-world applications like gait analysis or exoskeleton control could use this refinement to get reliable torque data from ordinary video.
Similar noise amplification issues likely appear in other inverse dynamics pipelines that rely on differentiation of kinematic estimates.
Testing the method on datasets with ground-truth force plate measurements would confirm whether the torque improvements hold in practice.

Load-bearing premise

The dominant source of error is additive noise on estimated joint positions and that the SMPL model with standard inverse dynamics equations matches real human motion without external forces or detailed muscle effects.

What would settle it

Compare torque estimates from noisy poses against direct measurements from force plates or torque sensors on the same motions; if the 1000x amplification and 93% reduction do not appear, the claims are falsified.

Figures

Figures reproduced from arXiv: 2605.24776 by Chanyoung Kim, Donghyun Kim, Eunseo Jeong, Seong Jae Hwang, Youngjoong Kwon.

**Figure 1.** Figure 1: Noise amplification in inverse dynamics. Joint torques visualized as color-coded spheres on the SMPL skeleton during walking (green = low, red = high torque magnitude). Top: Clean motion produces uniformly low, physically plausible torques. Bottom: Adding typical video-based pose noise (σ=0.05 rad) causes torques to explode at proximal joints (spine, hips), exceeding 150 Nm, while distal joints remain larg… view at source ↗

**Figure 2.** Figure 2: Noise amplification analysis. (a) Torque error grows approximately linearly with pose noise, with a ∼1,000× amplification factor. The blue band indicates the typical video-based estimation error range (σ ≈ 0.03–0.08 rad). (b) Per-joint sensitivity ranking: proximal joints (spine, hips) are most sensitive due to subtree mass accumulation. (c) Low-pass filter cutoff trade-off: lower cutoffs remove more noise… view at source ↗

**Figure 3.** Figure 3: Left hip torque before and after refinement. The optimized trajectory (blue) closely tracks the clean reference (green), while the noisy input (red) oscillates wildly due to noise amplification through inverse dynamics. • Pose regularizer: λr∥θˆ − θnoisy∥ keeps the refined pose close to the original estimate, preventing large kinematic deviations (λr=5). Using Adam optimization for 200 iterations, torque… view at source ↗

read the original abstract

Recent advances in monocular 3D human pose estimation enable accurate body tracking from video. However, translating these kinematic estimates into physical quantities, such as joint torques, remains challenging due to noise amplification through inverse dynamics. In this work, we provide a systematic analysis of how pose estimation noise propagates through the inverse dynamics pipeline. We present three key findings: (1) pose noise is amplified by approximately 1,000x when computing joint torques via numerical differentiation, (2) proximal joints (spine, hips) are up to 10x more sensitive to noise than distal joints (wrists, hands), and (3) low-pass filtering before differentiation substantially reduces this amplification. To enable this analysis, we develop SMPL-Dynamics, a fully differentiable inverse dynamics module for the SMPL body model that requires no external physics simulators. Our module supports end-to-end gradient computation, and we demonstrate this through differentiable pose refinement, which reduces torque error by 93% with negligible change in pose.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows pose noise amplifies ~1000x into torques via differentiation and supplies a differentiable SMPL inverse-dynamics module that cuts error 93% in controlled tests, but the numbers rest on Gaussian noise injection without contacts.

read the letter

The main thing to know is that this work measures how small errors in 3D joint positions explode when you compute torques through numerical differentiation, and it supplies a fully differentiable SMPL-based inverse-dynamics layer that lets you refine the input poses to recover most of the torque accuracy.

The SMPL-Dynamics module is the concrete new piece. It runs inverse dynamics inside the SMPL model without calling an external simulator, which makes end-to-end gradient flow straightforward. That is useful infrastructure for anyone who wants to optimize poses for dynamic consistency rather than just kinematic fit. The breakdown of proximal versus distal joint sensitivity and the benefit of low-pass filtering before differentiation are also cleanly quantified in their controlled setting.

The limitation is that every reported number comes from adding Gaussian-like noise directly to joint positions and running the dynamics with no ground reaction forces or external contacts. Real monocular pose estimators produce structured, temporally correlated errors, and actual human motion involves contacts, so the 1000x factor and the 93% recovery may not translate one-to-one outside the simulation. The paper is transparent about the setup, but the headline claims sit inside that controlled world.

This is worth sending to peer review. The module itself is reproducible and addresses a practical bottleneck for vision-to-dynamics pipelines in robotics and animation. Readers who already work with SMPL and need torque estimates will get immediate value from trying the layer, even if broader validation on real video data would be the natural next step.

Referee Report

2 major / 0 minor

Summary. The paper analyzes noise propagation from monocular 3D pose estimates through inverse dynamics to compute joint torques on the SMPL body model. It reports three findings: (1) pose noise amplifies by ~1000x via numerical differentiation, (2) proximal joints (spine, hips) are up to 10x more sensitive than distal ones, (3) low-pass filtering reduces amplification; it introduces the fully differentiable SMPL-Dynamics module (no external simulator) and shows differentiable pose refinement cuts torque error by 93% with negligible pose change.

Significance. If the reported amplification factors and mitigation hold under realistic conditions, the work provides a concrete, quantitative diagnosis of a key barrier in video-based dynamics estimation and a practical tool (differentiable inverse dynamics) for end-to-end refinement. The explicit credit for the parameter-free differentiable module and the empirical sensitivity analysis (proximal/distal, low-pass) strengthens the contribution for biomechanics and robotics applications.

major comments (2)

[Abstract (Findings 1-3)] Abstract (Findings 1–3): the 1000x amplification, proximal/distal sensitivity, and low-pass benefit are obtained by injecting additive Gaussian-like noise directly onto 3D joint positions inside the contact-free SMPL-Dynamics pipeline; because real monocular pose-estimator errors are typically structured, biased, and temporally correlated, and because the model omits ground-reaction and contact forces, the quantitative claims do not yet demonstrate applicability to actual video-based torque estimation.
[Abstract (SMPL-Dynamics description)] Abstract (SMPL-Dynamics description): the 93% torque-error reduction via differentiable refinement is measured inside the same contact-free, rigid-body inverse-dynamics formulation; without external validation against measured ground-reaction forces or muscle-actuated dynamics, the improvement remains internal to the simulation and does not yet support the headline claim for real human torque estimation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments regarding the scope and validation of our claims. We address each point by clarifying the controlled nature of the experiments while preserving the core analysis of noise propagation and the utility of the differentiable module.

read point-by-point responses

Referee: [Abstract (Findings 1-3)] Abstract (Findings 1–3): the 1000x amplification, proximal/distal sensitivity, and low-pass benefit are obtained by injecting additive Gaussian-like noise directly onto 3D joint positions inside the contact-free SMPL-Dynamics pipeline; because real monocular pose-estimator errors are typically structured, biased, and temporally correlated, and because the model omits ground-reaction and contact forces, the quantitative claims do not yet demonstrate applicability to actual video-based torque estimation.

Authors: We agree that the noise model is simplified (additive Gaussian) and the pipeline omits contacts and ground-reaction forces. The quantitative findings isolate the amplification arising from numerical differentiation and the SMPL dynamics equations themselves; the proximal/distal sensitivity difference and low-pass mitigation are direct consequences of the second-derivative nature of torque computation. These properties hold independently of the specific noise distribution and would affect any video-based pipeline. We will revise the abstract to explicitly state that the results are obtained under controlled synthetic perturbations in the contact-free model, and we will add a dedicated limitations paragraph discussing structured real-world errors and contact forces. This keeps the claims accurate while acknowledging the gap to full real-world applicability. revision: partial
Referee: [Abstract (SMPL-Dynamics description)] Abstract (SMPL-Dynamics description): the 93% torque-error reduction via differentiable refinement is measured inside the same contact-free, rigid-body inverse-dynamics formulation; without external validation against measured ground-reaction forces or muscle-actuated dynamics, the improvement remains internal to the simulation and does not yet support the headline claim for real human torque estimation.

Authors: The 93% reduction is indeed measured internally by comparing torque consistency before and after gradient-based pose refinement within SMPL-Dynamics. It serves as a proof-of-concept that end-to-end differentiability can be leveraged to mitigate the identified noise amplification. We acknowledge the absence of external validation against measured forces. We will revise the abstract to describe the result as an internal improvement demonstrating the module’s utility for mitigation, rather than a direct claim of real human torque estimation accuracy, and we will expand the discussion to note the need for future validation with ground-truth dynamics data. revision: yes

Circularity Check

0 steps flagged

No circularity: amplification factor and refinement gain are direct empirical measurements on injected noise through explicit differentiation and new differentiable module

full rationale

The paper's central results (1000x amplification, proximal sensitivity, low-pass benefit, 93% error reduction) are obtained by running numerical differentiation on SMPL joint positions with added Gaussian noise and by introducing a new SMPL-Dynamics module for end-to-end gradients. These steps are self-contained computations and experimental demonstrations; they do not reduce to any fitted parameter renamed as a prediction, any self-citation chain, or any definitional equivalence between input and output. The derivation chain therefore contains no load-bearing circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper relies on standard numerical differentiation and rigid-body inverse dynamics assumptions for the SMPL model; no new free parameters, ad-hoc axioms, or invented physical entities are introduced beyond the new software module itself.

axioms (2)

standard math Numerical differentiation of pose sequences produces velocities and accelerations that can be fed into standard rigid-body inverse dynamics equations.
Invoked when stating the 1000x amplification result.
domain assumption The SMPL body model plus its kinematic tree provides a sufficient kinematic and inertial description for torque computation without external contacts.
Required for the SMPL-Dynamics module to be used without additional physics engines.

invented entities (1)

SMPL-Dynamics module no independent evidence
purpose: Fully differentiable inverse-dynamics computation on SMPL parameters that supports gradient-based pose refinement.
New software component introduced to enable the 93% error reduction demonstration; no independent physical evidence supplied beyond the module's construction.

pith-pipeline@v0.9.1-grok · 5721 in / 1508 out tokens · 38662 ms · 2026-06-30T12:51:38.882457+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references

[1]

Towards human inverse dy- namics from real images: A dataset and benchmark for joint torque estimation.bioRxiv preprint, 2025

Chen Chen and Weifeng Su. Towards human inverse dy- namics from real images: A dataset and benchmark for joint torque estimation.bioRxiv preprint, 2025

2025
[2]

Adjustments to Zatsiorsky-Seluyanov’s seg- ment inertia parameters.Journal of Biomechanics, 29(9): 1223–1230, 1996

Paolo de Leva. Adjustments to Zatsiorsky-Seluyanov’s seg- ment inertia parameters.Journal of Biomechanics, 29(9): 1223–1230, 1996

1996
[3]

Delp, Frank C

Scott L. Delp, Frank C. Anderson, Allison S. Arnold, Pe- ter Loan, Ayman Habib, Chand T. John, Eran Guendel- man, and Darryl G. Thelen. OpenSim: Open-source soft- ware to create and analyze dynamic simulations of move- ment.IEEE Transactions on Biomedical Engineering, 54 (11):1940–1950, 2007

1940
[4]

Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Yao Feng, and Michael J. Black. TokenHMR: Advancing human mesh re- covery with a tokenized pose representation. InCVPR, 2024

2024
[5]

Springer, 2008

Roy Featherstone.Rigid Body Dynamics Algorithms. Springer, 2008

2008
[6]

Humans in 4D: Reconstructing and tracking humans with transformers

Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, and Jitendra Malik. Humans in 4D: Reconstructing and tracking humans with transformers. In ICLR, 2024

2024
[7]

Optimal-state dynamics estimation for physics-based human motion capture from videos

Cuong Le, Viktor Johansson, Manon Kok, and Bastian Wandt. Optimal-state dynamics estimation for physics-based human motion capture from videos. InNeurIPS, 2024

2024
[8]

ImDy: Human inverse dynamics from imitated observations

Xinpeng Liu, Junxuan Liang, Zili Lin, Haowen Hou, Yong- Lu Li, and Cewu Lu. ImDy: Human inverse dynamics from imitated observations. InICLR, 2025

2025
[9]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. SMPL: A skinned multi- person linear model. InACM TOG (SIGGRAPH Asia), 2015

2015
[10]

PhysCap: Physically plau- sible monocular 3D motion capture in real time

Soshi Shimada, Vladislav Golyanik, Weipeng Xu, Patrick P´erez, and Christian Theobalt. PhysCap: Physically plau- sible monocular 3D motion capture in real time. InACM TOG (SIGGRAPH Asia), 2020

2020
[11]

Soyong Shin, Juyong Kim, Eni Halilaj, and Michael J. Black. WHAM: Reconstructing world-grounded humans with accu- rate 3d motion. InCVPR, 2024

2024
[12]

MuJoCo: A physics engine for model-based control.IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), 2012

Emanuel Todorov, Tom Erez, and Yuval Tassa. MuJoCo: A physics engine for model-based control.IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), 2012

2012
[13]

Uchida and Ajay Seth

Thomas K. Uchida and Ajay Seth. Conclusion or illusion: Quantifying uncertainty in inverse analyses from marker- based motion capture due to errors in marker registration and model scaling.Frontiers in Bioengineering and Biotechnol- ogy, 10:874725, 2022

2022
[14]

Uhlrich, Antoine Falisse, Łukasz Kidzinski, Julie Muccini, Michael Ko, Akshay S

Scott D. Uhlrich, Antoine Falisse, Łukasz Kidzinski, Julie Muccini, Michael Ko, Akshay S. Chaudhari, Jennifer L. Hicks, and Scott L. Delp. OpenCap: Human movement dy- namics from smartphone videos.PLoS Computational Biol- ogy, 19(10), 2023

2023
[15]

Kephart, Zijun Cui, and Qiang Ji

Yufei Zhang, Jeffrey O. Kephart, Zijun Cui, and Qiang Ji. PhysPT: Physics-aware pretrained transformer for estimat- ing human dynamics from monocular videos. InCVPR, 2024. 5

2024

[1] [1]

Towards human inverse dy- namics from real images: A dataset and benchmark for joint torque estimation.bioRxiv preprint, 2025

Chen Chen and Weifeng Su. Towards human inverse dy- namics from real images: A dataset and benchmark for joint torque estimation.bioRxiv preprint, 2025

2025

[2] [2]

Adjustments to Zatsiorsky-Seluyanov’s seg- ment inertia parameters.Journal of Biomechanics, 29(9): 1223–1230, 1996

Paolo de Leva. Adjustments to Zatsiorsky-Seluyanov’s seg- ment inertia parameters.Journal of Biomechanics, 29(9): 1223–1230, 1996

1996

[3] [3]

Delp, Frank C

Scott L. Delp, Frank C. Anderson, Allison S. Arnold, Pe- ter Loan, Ayman Habib, Chand T. John, Eran Guendel- man, and Darryl G. Thelen. OpenSim: Open-source soft- ware to create and analyze dynamic simulations of move- ment.IEEE Transactions on Biomedical Engineering, 54 (11):1940–1950, 2007

1940

[4] [4]

Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Yao Feng, and Michael J. Black. TokenHMR: Advancing human mesh re- covery with a tokenized pose representation. InCVPR, 2024

2024

[5] [5]

Springer, 2008

Roy Featherstone.Rigid Body Dynamics Algorithms. Springer, 2008

2008

[6] [6]

Humans in 4D: Reconstructing and tracking humans with transformers

Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, and Jitendra Malik. Humans in 4D: Reconstructing and tracking humans with transformers. In ICLR, 2024

2024

[7] [7]

Optimal-state dynamics estimation for physics-based human motion capture from videos

Cuong Le, Viktor Johansson, Manon Kok, and Bastian Wandt. Optimal-state dynamics estimation for physics-based human motion capture from videos. InNeurIPS, 2024

2024

[8] [8]

ImDy: Human inverse dynamics from imitated observations

Xinpeng Liu, Junxuan Liang, Zili Lin, Haowen Hou, Yong- Lu Li, and Cewu Lu. ImDy: Human inverse dynamics from imitated observations. InICLR, 2025

2025

[9] [9]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. SMPL: A skinned multi- person linear model. InACM TOG (SIGGRAPH Asia), 2015

2015

[10] [10]

PhysCap: Physically plau- sible monocular 3D motion capture in real time

Soshi Shimada, Vladislav Golyanik, Weipeng Xu, Patrick P´erez, and Christian Theobalt. PhysCap: Physically plau- sible monocular 3D motion capture in real time. InACM TOG (SIGGRAPH Asia), 2020

2020

[11] [11]

Soyong Shin, Juyong Kim, Eni Halilaj, and Michael J. Black. WHAM: Reconstructing world-grounded humans with accu- rate 3d motion. InCVPR, 2024

2024

[12] [12]

MuJoCo: A physics engine for model-based control.IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), 2012

Emanuel Todorov, Tom Erez, and Yuval Tassa. MuJoCo: A physics engine for model-based control.IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), 2012

2012

[13] [13]

Uchida and Ajay Seth

Thomas K. Uchida and Ajay Seth. Conclusion or illusion: Quantifying uncertainty in inverse analyses from marker- based motion capture due to errors in marker registration and model scaling.Frontiers in Bioengineering and Biotechnol- ogy, 10:874725, 2022

2022

[14] [14]

Uhlrich, Antoine Falisse, Łukasz Kidzinski, Julie Muccini, Michael Ko, Akshay S

Scott D. Uhlrich, Antoine Falisse, Łukasz Kidzinski, Julie Muccini, Michael Ko, Akshay S. Chaudhari, Jennifer L. Hicks, and Scott L. Delp. OpenCap: Human movement dy- namics from smartphone videos.PLoS Computational Biol- ogy, 19(10), 2023

2023

[15] [15]

Kephart, Zijun Cui, and Qiang Ji

Yufei Zhang, Jeffrey O. Kephart, Zijun Cui, and Qiang Ji. PhysPT: Physics-aware pretrained transformer for estimat- ing human dynamics from monocular videos. InCVPR, 2024. 5

2024