arxiv: 2605.00018 · v1 · submitted 2026-04-19 · 💻 cs.LG · eess.SP

Recognition: unknown

What Physics do Data-Driven MoCap-to-Radar Models Learn?

Anish Arora, Kenneth W. Parker, Kevin Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:59 UTC · model grok-4.3

classification 💻 cs.LG eess.SP

keywords MoCap-to-radarmicro-Doppler spectrogramsphysics interpretabilityDoppler frequency alignmentvelocity interventiontemporal attentiondata-driven radar models

0 comments

The pith

Data-driven MoCap-to-radar models with low reconstruction error do not necessarily learn the underlying radar physics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether models that convert motion capture sequences into radar micro-Doppler spectrograms actually internalize the physical relationships of Doppler shifts rather than merely producing plausible images. It defines two metrics that operate without real radar recordings: one checks whether predicted frequencies align with those expected from the input motion via physics formulas, and the other checks whether the model maintains the correct frequency shift when the input velocity is deliberately altered. Experiments on multiple architectures show that low pixel-wise reconstruction error does not ensure good performance on either metric, and that temporal attention is required for transformer models to pass the physics checks.

Core claim

Low reconstruction error does not guarantee physical consistency in MoCap-to-radar models. Experiments across several model architectures reveal that some, but not all, models achieve low error yet perform poorly on the two physics-based metrics. Further analysis shows that temporal attention is critical for transformer-based models to learn the underlying physics.

What carries the argument

A physics-based interpretability framework with two metrics: alignment between model predictions and physics-derived Doppler frequency, plus preservation of the velocity-frequency relationship under explicit velocity intervention on the input.

If this is right

Reconstruction error alone is insufficient to certify that a model has captured radar physics.
Temporal attention enables transformer architectures to satisfy the velocity-frequency relationship that simpler models miss.
The metrics permit detection of physically inconsistent models without any measured radar ground truth.
Different architectures reach comparable reconstruction accuracy while differing sharply in physical fidelity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same metric pair could be applied to other sensor-to-sensor translation tasks to distinguish pattern matching from physical understanding.
Models that pass the velocity-intervention test are likely to produce more reliable outputs when input speeds fall outside the training distribution.
Adding the metrics as auxiliary training losses could steer future models toward physically consistent behavior without extra labeled radar data.

Load-bearing premise

The two proposed metrics, derived only from motion capture and model outputs, are sufficient and accurate proxies for whether the model has learned the true underlying radar physics.

What would settle it

Collect real radar measurements for the same motion-capture sequences and test whether models that score high on the two metrics produce spectrograms measurably closer to the recorded data than models that score low.

Figures

Figures reproduced from arXiv: 2605.00018 by Anish Arora, Kenneth W. Parker, Kevin Chen.

**Figure 2.** Figure 2: Overview of the MoCap2Radar backbone used in this [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Ground-truth spectrogram (30s) with the physics [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Data-driven MoCap-to-radar models generate plausible micro-Doppler spectrograms, but do they actually learn the underlying physics? We introduce a physics-based interpretability framework to answer this question via two proposed complementary metrics: one measures alignment between model predictions and the physics-derived Doppler frequency, while the other tests whether predictions preserve the velocity-frequency relationship under velocity intervention. Both metrics require only MoCap input and model predictions, without access to measured radar data. Experiments across several model architectures reveal that low reconstruction error does not guarantee physical consistency: some, but not all, models achieve low error yet perform poorly on the two physics-based metrics. Further analysis shows that temporal attention is critical for transformer-based models to learn the underlying physics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Low reconstruction error does not guarantee that MoCap-to-radar models have learned the underlying Doppler physics, and the two proposed metrics make that gap visible.

read the letter

The paper's main contribution is a pair of metrics that check whether generative models for radar spectrograms actually respect the velocity-to-frequency relationship, rather than just producing outputs that look close to the targets. One metric measures how well the model's predicted frequencies align with Doppler frequencies computed directly from the MoCap velocities. The other intervenes on the input velocities and checks whether the output frequencies shift accordingly. Both run without any real radar measurements, which keeps the evaluation simple and self-contained using only the model and the motion data.

Referee Report

2 major / 2 minor

Summary. The paper introduces a physics-based interpretability framework for evaluating data-driven MoCap-to-radar models via two metrics: alignment of model predictions with physics-derived Doppler frequencies from MoCap velocities, and preservation of the velocity-frequency relationship under velocity interventions on inputs. Experiments across architectures show that low reconstruction error does not guarantee physical consistency on these metrics, with temporal attention being critical for transformer-based models to learn the physics.

Significance. If the metrics prove reliable, the work offers a useful approach to assessing physical consistency in generative radar models beyond standard reconstruction losses, highlighting limitations of purely data-driven evaluation and the role of attention mechanisms. The intervention-based metric provides a falsifiable test of learned relationships using only MoCap data.

major comments (2)

Abstract: The central claim that low reconstruction error fails to guarantee physical consistency rests on the two metrics being faithful proxies, yet the abstract states they are computed without measured radar data and provides no quantitative validation (e.g., comparison of derived Doppler frequencies to real spectrograms) that the underlying Doppler derivation matches the actual radar formation process.
Abstract and experimental sections: No error bars, statistical tests, data exclusion criteria, or ablation details are mentioned for the reported findings on reconstruction error versus physics metrics or the importance of temporal attention, which is load-bearing for the claim that some models achieve low error yet perform poorly on the metrics.

minor comments (2)

The exact formulas and assumptions (radial velocity projection, scattering model, radar parameters) used to derive Doppler frequencies from MoCap should be stated explicitly in the main text or appendix for reproducibility.
Clarify the precise implementation of the velocity intervention in the second metric, including how interventions are applied and how preservation is quantified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the thorough review and valuable suggestions. Below we respond to each major comment and outline the revisions we will make to the manuscript.

read point-by-point responses

Referee: Abstract: The central claim that low reconstruction error fails to guarantee physical consistency rests on the two metrics being faithful proxies, yet the abstract states they are computed without measured radar data and provides no quantitative validation (e.g., comparison of derived Doppler frequencies to real spectrograms) that the underlying Doppler derivation matches the actual radar formation process.

Authors: The framework is explicitly designed to operate without measured radar data to enable evaluation of physical consistency in purely data-driven settings. The Doppler frequency is computed using the established formula relating radial velocity to frequency shift, which is the core of radar micro-Doppler formation. We will revise the abstract to better emphasize this and add supporting references in the methods section. A direct comparison to real spectrograms is outside the scope as it would require paired data not central to the contribution, but the velocity intervention metric offers a direct test of the learned physics relationship. revision: partial
Referee: Abstract and experimental sections: No error bars, statistical tests, data exclusion criteria, or ablation details are mentioned for the reported findings on reconstruction error versus physics metrics or the importance of temporal attention, which is load-bearing for the claim that some models achieve low error yet perform poorly on the metrics.

Authors: We agree that including these elements will improve the reliability of the reported results. In the revised manuscript, we will add error bars to all quantitative results, include statistical tests for differences between models, specify any data preprocessing or exclusion criteria, and provide detailed ablations on the temporal attention mechanism with supporting figures and tables. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines its two physics-based metrics directly from external Doppler frequency formulas and velocity intervention rules applied to MoCap inputs, then compares model outputs against these independent references and against reconstruction error. No step reduces a claimed prediction or result to a fitted parameter, self-defined quantity, or self-citation chain by construction; the metrics are not derived from the models' own outputs or training losses. The central claim (low reconstruction error does not guarantee physical consistency) therefore rests on an external benchmark rather than tautological re-expression of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that physics-derived Doppler frequencies constitute the correct reference for evaluating model outputs, with no free parameters, invented entities, or additional axioms introduced in the abstract.

axioms (1)

domain assumption Physics-derived Doppler frequency from velocity is the appropriate ground-truth reference for assessing model predictions.
Invoked to define the alignment metric and the intervention test.

pith-pipeline@v0.9.0 · 5418 in / 1311 out tokens · 52934 ms · 2026-05-10T05:59:51.759878+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 4 canonical work pages

[1]

Human activity recognition based on deep learning method,

X. Shi, Y . Li, F. Zhou, and L. Liu, “Human activity recognition based on deep learning method,” inProceedings of the 2018 International Conference on Radar (RADAR), 2018, pp. 1–5

2018
[2]

Human activity recognition based on deep learning and Micro-Doppler radar data,

T.-H. Tan, J.-H. Tian, A. K. Sharma, S.-H. Liu, and Y .-F. Hua, “Human activity recognition based on deep learning and Micro-Doppler radar data,”Sensors, vol. 24, no. 8, p. 2530, 2024

2024
[3]

Accurate heart rate and respiration rate detection based on a higher-order harmonics peak selection method using radar sensors,

H. Xu, M. Ebrahimet al., “Accurate heart rate and respiration rate detection based on a higher-order harmonics peak selection method using radar sensors,”Sensors, vol. 22, no. 1, p. 83, 2022

2022
[4]

One size does not fit all: Multi-scale, cascaded RNNs for radar classification,

D. Roy, S. Srivastava, A. Kusupati, P. Jain, M. Varma, and A. Arora, “One size does not fit all: Multi-scale, cascaded RNNs for radar classification,”ACM Transactions on Sensor Networks (TOSN), vol. 17, no. 2, pp. 1–27, 2021

2021
[5]

A kinect-based human micro-doppler simulator,

B. Erol and S. Z. Gurbuz, “A kinect-based human micro-doppler simulator,”IEEE Aerospace and Electronic Systems Magazine, vol. 30, no. 5, pp. 6–17, 2015

2015
[6]

Simulation of the radar cross-section of dynamic human motions using virtual reality data and ray tracing,

A. D. Singh, S. S. Ram, and S. Vishwakarma, “Simulation of the radar cross-section of dynamic human motions using virtual reality data and ray tracing,” in2018 IEEE Radar Conference (RadarConf18). IEEE, 2018, pp. 1555–1560

2018
[7]

Simhumalator: An open-source end-to- end radar simulator for human activity recognition,

S. Vishwakarma, W. Liet al., “Simhumalator: An open-source end-to- end radar simulator for human activity recognition,”IEEE Aerospace and Electronic Systems Magazine, vol. 37, no. 3, pp. 6–22, 2022

2022
[8]

Vid2Doppler: Synthesizing doppler radar data from videos for training privacy-preserving activity recognition,

K. Ahuja, Y . Jianget al., “Vid2Doppler: Synthesizing doppler radar data from videos for training privacy-preserving activity recognition,” inProc. of CHI Conf. on Human Factors in Computing Systems, 2021, pp. 292:1–292:10

2021
[9]

Text2Doppler: Gen- erating radar micro–doppler signatures for human activity recognition via textual descriptions,

Y . Zhou, M. L ´opez-Ben´ıtez, L. Yu, and Y . Yue, “Text2Doppler: Gen- erating radar micro–doppler signatures for human activity recognition via textual descriptions,”IEEE Sensors Letters, vol. 8, no. 10, pp. 1–4, 2024

2024
[10]

MoCap2Radar: A spatiotemporal transformer for synthesizing micro-doppler radar signatures from motion capture,

K. Chen, K. W. Parker, and A. Arora, “MoCap2Radar: A spatiotemporal transformer for synthesizing micro-doppler radar signatures from motion capture,” inProceedings of the 59th Hawaii International Conference on System Sciences (HICSS), January 2026, to appear

2026
[11]

The Mythos of Model Interpretability

Z. C. Lipton, “The mythos of model interpretability,” 2017. [Online]. Available: https://arxiv.org/abs/1606.03490

work page Pith review arXiv 2017
[12]

Interpretable machine learning–a brief history, state-of-the-art and challenges,

C. Molnar, G. Casalicchio, and B. Bischl, “Interpretable machine learning–a brief history, state-of-the-art and challenges,” inJoint Eu- ropean Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2020, pp. 417–431

2020
[13]

arXiv preprint arXiv:2404.14082 (2024)

L. Bereska and E. Gavves, “Mechanistic interpretability for ai safety – a review,” 2024. [Online]. Available: https://arxiv.org/abs/2404.14082

work page arXiv 2024
[14]

Discovering physical concepts with neural networks,

R. Iten, T. Metger, H. Wilming, L. Del Rio, and R. Renner, “Discovering physical concepts with neural networks,”Physical Review Letters, vol. 124, no. 1, p. 010508, 2020

2020
[15]

Comphy: Compositional physical reasoning of objects and events from videos.arXiv preprint arXiv:2205.01089, 2022

Z. Chen, K. Yi, Y . Li, M. Ding, A. Torralba, J. B. Tenenbaum, and C. Gan, “ComPhy: Compositional physical reasoning of objects and events from videos,”arXiv preprint arXiv:2205.01089, 2022

work page arXiv 2022
[16]

Filtered-CoPhy: Unsupervised learning of counterfactual physics in pixel space,

S. Janny, F. Baradel, N. Neverova, M. Nadri, G. Mori, and C. Wolf, “Filtered-CoPhy: Unsupervised learning of counterfactual physics in pixel space,”arXiv preprint arXiv:2202.00368, 2022

work page arXiv 2022
[17]

Counterfactual dynamics forecasting– a new setting of quantitative reasoning,

Y . Liu, Y . Sun, and J.-H. Lim, “Counterfactual dynamics forecasting– a new setting of quantitative reasoning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 1764– 1771

2023
[18]

The exposure treatment of burns,

A. B. Wallace, “The exposure treatment of burns,”The Lancet, vol. 1, no. 6653, pp. 501–504, 1951

1951
[19]

A gait analysis data collection and reduction technique,

R. B. Davis III, S. Ounpuu, D. Tyburski, and J. R. Gage, “A gait analysis data collection and reduction technique,”Human Movement Science, vol. 10, no. 5, pp. 575–587, 1991

1991