pith. machine review for the scientific record. sign in

arxiv: 2605.09604 · v1 · submitted 2026-05-10 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

DAP: Doppler-aware Point Network for Heterogeneous mmWave Action Recognition

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:29 UTC · model grok-4.3

classification 💻 cs.CV
keywords mmWave radarhuman action recognitionpoint cloudDoppler patternsheterogeneous sourcescross-source generalizationcross-modal alignment
0
0 comments X

The pith

DAP-Net aligns mmWave radar point clouds across devices by treating Doppler patterns as invariant anchors and adding text-based semantic guidance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that real-world mmWave radar action recognition fails when devices or frequency bands differ because each source produces its own distribution of point clouds. To fix this, the authors release UniMM-HAR, a dataset that records the same actions with three distinct radar configurations, and introduce DAP-Net. The network first uses a Dual-space Doppler Reparameterization module to densify sparse point clouds and recalibrate features around the velocity signatures that stay consistent for each action. It then applies a Text Alignment Module to pull representations toward a stable language space. If these steps succeed, the same model can recognize actions reliably even when the radar hardware changes without retraining from scratch.

Core claim

Leveraging action-consistent spatio-temporal Doppler patterns as anchors, the Dual-space Doppler Reparameterization (D2R) module performs sample-adaptive geometric densification and Doppler-guided feature recalibration, while the Text Alignment Module (TAM) provides stable semantic anchors via a pretrained textual space, enabling DAP-Net to learn source-invariant action semantics and achieve state-of-the-art accuracy with strong cross-source robustness on heterogeneous mmWave settings.

What carries the argument

Dual-space Doppler Reparameterization (D2R) module that uses spatio-temporal Doppler patterns for sample-adaptive geometric densification and feature recalibration, paired with the Text Alignment Module (TAM) that supplies semantic guidance from a pretrained textual space.

If this is right

  • Point-cloud sparsity in mmWave data can be mitigated by reparameterizing geometry around Doppler signatures rather than raw spatial coordinates alone.
  • Cross-modal alignment with pretrained text embeddings supplies semantic regularization that improves robustness when visual or radar features shift between sources.
  • A single trained model can generalize across radar sources without requiring source-specific fine-tuning or large new labeled sets for each device.
  • Standardized multi-source datasets like UniMM-HAR become necessary benchmarks for measuring real deployment robustness instead of single-device accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same Doppler-anchor idea could transfer to other velocity-sensitive sensors such as lidar or acoustic arrays where motion signatures are more stable than raw spatial patterns.
  • If text alignment proves reliable, the approach might support few-shot or zero-shot recognition of previously unseen action classes by relying on language descriptions rather than radar examples.
  • Future data collection efforts should prioritize recording the same actions across multiple hardware variants early, because single-source scaling appears less effective for generalization.

Load-bearing premise

Spatio-temporal Doppler patterns remain consistent enough for the same human action no matter which radar device or frequency band records the data.

What would settle it

Collect a fresh set of action recordings with a fourth mmWave radar configuration whose frequency band and hardware lie outside the three used in UniMM-HAR; if accuracy on this held-out source drops sharply below the reported cross-source numbers, the invariance claim does not hold.

Figures

Figures reproduced from arXiv: 2605.09604 by Can Wang, Jiaying Lin, Jinfu Liu, Mengyuan Liu, Shiman Wu.

Figure 1
Figure 1. Figure 1: UniMM-HAR is currently the largest mmWave point cloud human action recog￾nition dataset and the first unified benchmark for heterogeneous multi-source distribu￾tions, providing Cross-Subject and Cross-Set evaluation protocols and covering both daily and rehabilitation actions. The samples in UniMM-HAR are collected from radar devices with different models and operating frequencies. In recent years, mmWave-… view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of different action samples in UniMM-HAR. Actions (a)–(h) are: squat, left lunge, jump, bow, left front lunge, throw left, stretch, and kick right. In each subfigure, the 1st and 2nd columns show the RGB video frame and mmWave point cloud, respectively. The 3rd and 4th columns show the Doppler heatmap in BEV and front view. The 5th column shows the Doppler distribution for the action. largest… view at source ↗
Figure 3
Figure 3. Figure 3: Action distribution across sources and types. Representation Standardization. Temporal–point normalization converts each clip to a fixed shape [T, P, C] = [32, 64, 5], with channels x, y, z, Doppler, and intensity. When the original sequence length exceeds T, uniform temporal down￾sampling is applied. Otherwise, zero-padding is used. When the number of points per frame exceeds P, Farthest Point Sampling (F… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of DAP-Net: First, the Dual-space Doppler Reparameterization (D²R) converts mmWave point cloud sequences into Doppler-guided dense represen￾tations. Within D²R, Doppler-guided Geometry Reparameterization (DGR) performs geometric densification, while Motion-aware Feature Recalibration (MFR) enhances motion-sensitive feature modeling to produce point embeddings. These embeddings are first processed … view at source ↗
Figure 5
Figure 5. Figure 5: Accuracy–Cost trade-off across different backbones. Micro Acc Macro Acc *Centroid Dist Cross-Source Acc *CORAL DAP-Net (PointMLP + DAP) PointMLP PST-Transformer + DAP PST-Transformer Metrics marked with * are reported as reciprocals since smaller is better. Red metrics measure cross-source generalization [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Number of samples per action class, illustrated as a stacked bar chart to show the contributions of different data sources [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Samples visualization of different action categories in the UniMM-HAR dataset. Each sample includes 5 visualizations [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Distribution of frame-wise point counts across heterogeneous data sources. (a) Box plot of point count distributions. (b) Histogram of frame-wise point counts aggregated over all frames. To reduce the influence of extreme outliers, the axis range is restricted to the 5th–95th percentile of the distribution. Collectively, heterogeneity introduces non-additive, source-specific variations in point cloud densi… view at source ↗
Figure 10
Figure 10. Figure 10: Heatmap visualization of accumulated point cloud projections on the X–Y plane for different data sources. Due to the presence of noticeable outliers in Rad￾HAR, two visualizations are provided: (a) RadHAR – Full Range: includes all points, illustrating the presence of sparse outliers; (b) RadHAR – Main Range: highlights the primary spatial distribution after removing extreme outliers. Heterogeneity in Poi… view at source ↗
Figure 11
Figure 11. Figure 11: Feature distribution comparison across heterogeneous data sources (X, Y, Z, Doppler, and Intensity) [PITH_FULL_IMAGE:figures/full_fig_p029_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Visualization of Doppler characteristics for different action categories in UniMM-HAR. (a) Box plot showing the distribution of Doppler values. (b) Tempo￾ral evolution of Doppler over time. (c) Kernel density estimation (KDE) of Doppler magnitudes. Given this physical interpretation, the Doppler values provide meaningful signals for distinguishing different human actions, as variations in motion ampli￾tud… view at source ↗
Figure 13
Figure 13. Figure 13: Comparison of cross-source performance visualizations. Left: PointMLP base￾line improvement; Right: PST-Transformer baseline improvement. a motion prior, which, as shown in Section C, provides a reliable cue for both action discrimination and cross-source consistency. DAP-Net enhances model￾ing in both geometric and feature spaces, using the Doppler prior to guide the network to focus on motion patterns c… view at source ↗
Figure 14
Figure 14. Figure 14: Top 10 action categories in UniMM-HAR C-Sub where DAP-Net outperforms the baseline (PointMLP). nine action classes (A001–A003 and A009–A014) are selected, MM-Fi samples are used as the test set, and only RadHAR and mRI samples are used for training. This setup imposes a more demanding evaluation, as the model must generalize to a completely unseen cross-source distribution, thereby reflecting its robustne… view at source ↗
Figure 15
Figure 15. Figure 15: Visualization of point cloud attention. For each action, we show (from left to right): RGB image, input point cloud colored by Doppler velocity, attention weights from baseline model, and attention weights from DAP-Net. The black box indicates the ground-truth motion region; the red box highlights where DAP-Net produces higher activations than baseline, capturing more discriminative motion cues [PITH_FUL… view at source ↗
read the original abstract

Millimeter-wave (mmWave) radar provides privacy-preserving sensing and is valuable for human action recognition (HAR). Existing mmWave point cloud datasets are limited in scale and mostly collected under homogeneous single-source settings, preventing current methods from handling real-world distribution shifts caused by heterogeneous radar sources, such as different devices and frequency bands. To address this, we introduce UniMM-HAR, the largest and first mmWave point cloud HAR dataset for heterogeneous multi-source scenarios, standardizing three distinct radar configurations to realistically evaluate cross-source generalization. We further propose the Doppler-aware Point Cloud Network (DAP-Net) to tackle heterogeneity challenges. DAP-Net enhances intra-modal representations and performs cross-modal alignment to learn source-invariant action semantics. Leveraging action-consistent spatio-temporal Doppler patterns as anchors, the Dual-space Doppler Reparameterization (D2R) module performs sample-adaptive geometric densification and Doppler-guided feature recalibration, while the Text Alignment Module (TAM) provides stable semantic anchors via a pretrained textual space. Experiments show that DAP-Net significantly outperforms existing methods under heterogeneous radar settings, achieving state-of-the-art accuracy and strong cross-source robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces UniMM-HAR, the largest mmWave point cloud HAR dataset standardized across three heterogeneous radar configurations (different devices and frequency bands) to study cross-source distribution shifts, and proposes DAP-Net consisting of a Dual-space Doppler Reparameterization (D2R) module that uses action-consistent spatio-temporal Doppler patterns for sample-adaptive geometric densification and feature recalibration plus a Text Alignment Module (TAM) that leverages pretrained textual embeddings for semantic guidance, claiming that this yields state-of-the-art accuracy and strong cross-source robustness over existing methods.

Significance. If the empirical claims hold, the work would be significant for privacy-preserving mmWave sensing: the new multi-source dataset fills a clear gap left by prior homogeneous collections, and the architectural use of Doppler anchors plus textual semantics offers a concrete route to source-invariant representations. The dataset release itself would be a lasting contribution regardless of the precise performance numbers.

major comments (2)
  1. [Section 3.2 (D2R module description)] The central cross-source robustness claim rests on the D2R module's premise that spatio-temporal Doppler patterns remain sufficiently action-consistent across the three radar configurations despite differences in velocity resolution, aliasing, and noise. No quantitative verification of this invariance (e.g., inter-source Doppler similarity scores, ablation on pattern distortion, or failure-case analysis) appears in the method or experiments sections; without it the sample-adaptive reparameterization risks becoming source-dependent rather than invariant.
  2. [Experiments section, Table 2] Table 2 (or equivalent results table) reports overall accuracy gains but does not include per-source breakdowns, statistical significance tests, or error bars for the cross-source splits; this makes it difficult to assess whether the claimed robustness is uniform or driven by a subset of easier source pairs.
minor comments (2)
  1. [Abstract] The abstract asserts SOTA performance without any numerical values or baseline names; moving one or two headline numbers into the abstract would improve readability.
  2. [Section 3.2] Notation for the Doppler reparameterization (e.g., the exact form of the sample-adaptive scaling in Eq. (X)) should be cross-referenced explicitly when first introduced in the text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We sincerely thank the referee for the constructive and insightful comments, which help us strengthen the presentation of our cross-source robustness claims. We address each major comment point by point below and will incorporate the suggested additions in the revised manuscript.

read point-by-point responses
  1. Referee: [Section 3.2 (D2R module description)] The central cross-source robustness claim rests on the D2R module's premise that spatio-temporal Doppler patterns remain sufficiently action-consistent across the three radar configurations despite differences in velocity resolution, aliasing, and noise. No quantitative verification of this invariance (e.g., inter-source Doppler similarity scores, ablation on pattern distortion, or failure-case analysis) appears in the method or experiments sections; without it the sample-adaptive reparameterization risks becoming source-dependent rather than invariant.

    Authors: We appreciate the referee's emphasis on this foundational assumption. The D2R module design in Section 3.2 is grounded in the physical property that Doppler velocity signatures for a given action remain largely consistent in their spatio-temporal structure across radar configurations, even as absolute velocity resolution and aliasing vary; this is why we use them as anchors for sample-adaptive densification and recalibration. However, we agree that the current manuscript lacks explicit quantitative verification of this invariance. In the revised version, we will add: (i) inter-source cosine similarity scores on normalized Doppler features for identical action classes, (ii) an ablation quantifying performance sensitivity to controlled Doppler pattern distortion, and (iii) a brief failure-case analysis highlighting source pairs where invariance is weakest. These additions will empirically substantiate the premise without altering the method itself. revision: yes

  2. Referee: [Experiments section, Table 2] Table 2 (or equivalent results table) reports overall accuracy gains but does not include per-source breakdowns, statistical significance tests, or error bars for the cross-source splits; this makes it difficult to assess whether the claimed robustness is uniform or driven by a subset of easier source pairs.

    Authors: We concur that the current Table 2 aggregates results in a manner that obscures per-pair variability. To enable a more rigorous evaluation of uniformity, the revised manuscript will expand the experimental results with a detailed per-source-pair accuracy table for all cross-source protocols. We will also report standard deviations as error bars from multiple independent runs (different random seeds) and include paired statistical significance tests (e.g., t-tests with p-values) against the strongest baselines. This will demonstrate whether gains are consistent across all source combinations or concentrated in particular pairs. revision: yes

Circularity Check

0 steps flagged

No circularity in architectural proposal or empirical claims

full rationale

The paper introduces UniMM-HAR dataset and DAP-Net architecture (with D2R module using Doppler patterns as anchors and TAM for textual alignment) as an empirical solution for heterogeneous mmWave HAR. No equations, derivations, or parameter-fitting steps are described that reduce any claimed prediction or result to the inputs by construction. Performance claims rest on experimental comparisons rather than self-referential math or load-bearing self-citations. The assumption of action-consistent Doppler patterns is a modeling premise, not a circular reduction, making the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract introduces no new physical axioms, free parameters, or invented entities; the work rests on standard deep-learning assumptions and a pretrained text encoder.

pith-pipeline@v0.9.0 · 5504 in / 1033 out tokens · 29180 ms · 2026-05-12T02:29:04.342503+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 1 internal anchor

  1. [1]

    NeurIPS35, 27414–27426 (2022)

    An, S., Li, Y., Ogras, U.: mri: Multi-modal 3d human pose estimation dataset using mmwave, rgb-d, and inertial sensors. NeurIPS35, 27414–27426 (2022)

  2. [2]

    ACM Transactions on Embedded Computing Systems20(5s), 1–22 (2021)

    An, S., Ogras, U.Y.: Mars: mmwave-based assistive rehabilitation system for smart healthcare. ACM Transactions on Embedded Computing Systems20(5s), 1–22 (2021)

  3. [3]

    In: CVPR

    Ben-Shabat, Y., Shrout, O., Gould, S.: 3dinaction: Understanding human actions in 3d point clouds. In: CVPR. pp. 19978–19987 (2024)

  4. [4]

    Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

    Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradi- ents through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)

  5. [5]

    IEEE Transactions on Radar Systems2, 484–497 (2024)

    Biswas, S., Manavi Alam, A., Gurbuz, A.C.: Hrspecnet: A deep learning-based high-resolution radar micro-doppler signature reconstruction for improved har clas- sification. IEEE Transactions on Radar Systems2, 484–497 (2024)

  6. [6]

    IEEE TPAMI 45(3), 3522–3538 (2022)

    Bruce, X., Liu, Y., Zhang, X., Zhong, S.h., Chan, K.C.: Mmnet: A model-based multimodal network for human action recognition in rgb-d videos. IEEE TPAMI 45(3), 3522–3538 (2022)

  7. [7]

    In: ICCV

    Chae, Y., Park, H., Kim, H., Yoon, K.J.: Doppler-aware lidar-radar fusion for weather-robust 3d detection. In: ICCV. pp. 27197–27208 (2025)

  8. [8]

    In: ICCV

    Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., Hu, W.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: ICCV. pp. 13359–13368 (2021)

  9. [9]

    In: CVPR

    Choi, J., Hor, S., Yang, S., Arbabian, A.: Mvdoppler-pose: Multi-modal multi-view mmwave sensing for long-distance self-occluded human walking pose estimation. In: CVPR. pp. 27750–27759 (2025)

  10. [10]

    and Bengio, Y

    Courbariaux,M.,Hubara,I.,Soudry,D.,El-Yaniv,R.,Bengio,Y.:Binarizedneural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016)

  11. [11]

    NeurIPS36, 62713–62726 (2023)

    Cui, H., Zhong, S., Wu, J., Shen, Z., Dahnoun, N., Zhao, Y.: Milipoint: A point cloud dataset for mmwave radar. NeurIPS36, 62713–62726 (2023)

  12. [12]

    Applied Sciences14(16), 7253 (2024)

    Dang, X., Fan, K., Li, F., Tang, Y., Gao, Y., Wang, Y.: Multi-person action recog- nition based on millimeter-wave radar point cloud. Applied Sciences14(16), 7253 (2024)

  13. [13]

    In: ICRA

    Deng, Z., Li, X., Li, X., Tong, Y., Zhao, S., Liu, M.: Vg4d: Vision-language model goes 4d video recognition. In: ICRA. pp. 5014–5020. IEEE (2024)

  14. [14]

    In: ECCV

    Ding, F., Luo, Z., Zhao, P., Lu, C.X.: milliflow: Scene flow estimation on mmwave radar point cloud for human motion sensing. In: ECCV. pp. 202–221. Springer (2024)

  15. [15]

    In: CVPR

    Duan, H., Zhao, Y., Chen, K., Lin, D., Dai, B.: Revisiting skeleton-based action recognition. In: CVPR. pp. 2969–2978 (2022)

  16. [16]

    IEEE TPAMI45(2), 2181–2192 (2022)

    Fan, H., Yang, Y., Kankanhalli, M.: Point spatio-temporal transformer networks for point cloud video modeling. IEEE TPAMI45(2), 2181–2192 (2022)

  17. [17]

    M4human: A large-scale mul- timodal mmwave radar benchmark for human mesh reconstruction.arXiv preprint arXiv:2512.12378, 2025

    Fan, J., Zhou, Y., Yang, Y., Cui, X., Zhang, J., Xie, L., Yang, J., Lu, C.X., Ding, F.: M4human: A large-scale multimodal mmwave radar benchmark for human mesh reconstruction. arXiv preprint arXiv:2512.12378 (2025)

  18. [18]

    In: ICCV

    Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recogni- tion. In: ICCV. pp. 6202–6211 (2019)

  19. [19]

    IEEE Sensors Letters3(12), 1–4 (2019) 16 J

    Fhager, L.O., Heunisch, S., Dahlberg, H., Evertsson, A., Wernersson, L.E.: Pulsed millimeter wave radar for hand gesture sensing and classification. IEEE Sensors Letters3(12), 1–4 (2019) 16 J. Lin et al

  20. [20]

    In: Youth Academic Annual Conference of Chinese Association of Automation

    Gao, G., Liu, Q., Wang, W., Yu, Z., Liu, X.: Human activity recognition based on 4d millimeter-wave radar. In: Youth Academic Annual Conference of Chinese Association of Automation. pp. 1822–1828. IEEE (2025)

  21. [21]

    arXiv preprint arXiv:2405.01882 (2024)

    Gu, Z., He, X., Fang, G., Xu, C., Xia, F., Jia, W.: Millimeter wave radar- based human activity recognition for healthcare monitoring robot. arXiv preprint arXiv:2405.01882 (2024)

  22. [22]

    In: ICCV

    Haitman, Y., Bialer, O.: Doppdrive: Doppler-driven temporal aggregation for im- proved radar object detection. In: ICCV. pp. 26085–26094 (2025)

  23. [23]

    In: ECCV

    Hinojosa, C., Marquez, M., Arguello, H., Adeli, E., Fei-Fei, L., Niebles, J.C.: Privhar: Recognizing human actions from privacy-preserving lens. In: ECCV. pp. 314–332. Springer (2022)

  24. [24]

    NeurIPS36, 58064– 58074 (2023)

    Hor, S., Yang, S., Choi, J., Arbabian, A.: Mvdoppler: Unleashing the power of multi-view doppler for micromotion-based gait classification. NeurIPS36, 58064– 58074 (2023)

  25. [25]

    In: ICCV

    Huang, T., Dong, B., Yang, Y., Huang, X., Lau, R.W., Ouyang, W., Zuo, W.: Clip2point: Transfer clip to point cloud classification with image-depth pre- training. In: ICCV. pp. 22157–22167 (2023)

  26. [26]

    In: ICCV

    Kim, S., Lee, S., Hwang, D., Lee, J., Hwang, S.J., Kim, H.J.: Point cloud augmen- tation with weighted local transformations. In: ICCV. pp. 548–557 (2021)

  27. [27]

    In: WACV

    Lee, S.P., Kini, N.P., Peng, W.H., Ma, C.W., Hwang, J.N.: Hupr: A benchmark for human pose estimation using millimeter wave radar. In: WACV. pp. 5715–5724 (2023)

  28. [28]

    In: ICCV

    Li, P., Wang, Z., Yuan, Y., Liu, H., Meng, X., Yuan, J., Liu, M.: Ust-ssm: Unified spatio-temporal state space models for point cloud video modeling. In: ICCV. pp. 6738–6747 (2025)

  29. [29]

    In: CVPR

    Liu, H., Liu, Y., Ren, M., Wang, H., Wang, Y., Sun, Z.: Revealing key details to see differences: A novel prototypical perspective for skeleton-based action recognition. In: CVPR. pp. 29248–29257 (2025)

  30. [30]

    IEEE TMM26, 811–823 (2024)

    Liu, J., Wang, X., Wang, C., Gao, Y., Liu, M.: Temporal decoupling graph convo- lutional network for skeleton-based gesture recognition. IEEE TMM26, 811–823 (2024)

  31. [31]

    In: CVPR

    Liu, J., Han, J., Liu, L., Aviles-Rivero, A.I., Jiang, C., Liu, Z., Wang, H.: Mamba4d: Efficient 4d point cloud video understanding with disentangled spatial-temporal state space models. In: CVPR. pp. 17626–17636 (2025)

  32. [32]

    PR68, 346–362 (2017)

    Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. PR68, 346–362 (2017)

  33. [33]

    IEEE TPAMI48(3), 3726–3743 (2026)

    Liu, M., Liu, J., Jiang, Y., He, B.: Heatmap pooling network for action recognition from rgb videos. IEEE TPAMI48(3), 3726–3743 (2026)

  34. [34]

    In: CVPR

    Liu, Y., Zhou, S., Liu, X., Hao, C., Fan, B., Tian, J.: Unbiased faster r-cnn for single-source domain generalized object detection. In: CVPR. pp. 28838–28847 (2024)

  35. [35]

    NeurIPS36, 53964–53982 (2023)

    Liu, Y., Wang, F., Wang, N., Zhang, Z.X.: Echoes beyond points: Unleashing the power of raw radar data in multi-modality fusion. NeurIPS36, 53964–53982 (2023)

  36. [36]

    IEEE Transactions on Mobile Computing23(5), 5479–5493 (2023)

    Luo,F.,Khan,S.,Li,A.,Huang,Y.,Wu,K.:Edgeactnet:Edgeintelligence-enabled human activity recognition using radar point cloud. IEEE Transactions on Mobile Computing23(5), 5479–5493 (2023)

  37. [37]

    arXiv preprint arXiv:2202.07123 , year=

    Ma, X., Qin, C., You, H., Ran, H., Fu, Y.: Rethinking network design and lo- cal geometry in point cloud: A simple residual mlp framework. arXiv preprint arXiv:2202.07123 (2022)

  38. [38]

    In: ICCV

    Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: Amass: Archive of motion capture as surface shapes. In: ICCV. pp. 5442–5451 (2019) DAP 17

  39. [39]

    In: AAAI

    Meng, Z., Fu, S., Yan, J., Liang, H., Zhou, A., Zhu, S., Ma, H., Liu, J., Yang, N.: Gait recognition for co-existing multiple people using millimeter wave sensing. In: AAAI. vol. 34, pp. 849–856 (2020)

  40. [40]

    IMWUT5(1), 1–27 (2021)

    Palipana, S., Salami, D., Leiva, L.A., Sigg, S.: Pantomime: Mid-air gesture recog- nition with sparse millimeter-wave radar point clouds. IMWUT5(1), 1–27 (2021)

  41. [41]

    In: AAAI

    Perez, E., Strub, F., De Vries, H., Dumoulin, V., Courville, A.: Film: Visual rea- soning with a general conditioning layer. In: AAAI. vol. 32 (2018)

  42. [42]

    Neural Networks108, 533–543 (2018)

    Phan, A.V., Le Nguyen, M., Nguyen, Y.L.H., Bui, L.T.: Dgcnn: A convolutional neural network over large-scale labeled graphs. Neural Networks108, 533–543 (2018)

  43. [43]

    In: ICRA

    Prabhakara, A., Jin, T., Das, A., Bhatt, G., Kumari, L., Soltanaghai, E., Bilmes, J., Kumar, S., Rowe, A.: High resolution point clouds from mmwave radar. In: ICRA. pp. 4135–4142. IEEE (2023)

  44. [44]

    In: CVPR

    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: CVPR. pp. 652–660 (2017)

  45. [45]

    NeurIPS30(2017)

    Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learn- ing on point sets in a metric space. NeurIPS30(2017)

  46. [46]

    In: ICML

    Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: ICML. pp. 8748–8763. PmLR (2021)

  47. [47]

    In: EMNLP

    Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert- networks. In: EMNLP. pp. 3982–3992 (2019)

  48. [48]

    IEEE Transactions on Mobile Computing22(8), 4946–4960 (2022)

    Salami, D., Hasibi, R., Palipana, S., Popovski, P., Michoel, T., Sigg, S.: Tesla- rapture: A lightweight gesture recognition system from mmwave radar sparse point clouds. IEEE Transactions on Mobile Computing22(8), 4946–4960 (2022)

  49. [49]

    IEEE Transactions on Neu- ral Networks and Learning Systems34(11), 8418–8429 (2022)

    Sengupta, A., Cao, S.: mmpose-nlp: A natural language processing approach to precise skeletal pose estimation using mmwave radars. IEEE Transactions on Neu- ral Networks and Learning Systems34(11), 8418–8429 (2022)

  50. [50]

    IEEE Sensors Journal pp

    Seo, H.I., Bae, J.W., Seo, D.H.: Radar-based human activity recognition using adaptive range selection and deep neural network. IEEE Sensors Journal pp. 1–1 (2026)

  51. [52]

    In: CIKM

    Shao, T., Du, Z., Li, C., Wu, T., Wang, M.: Fast human action recognition via millimeter wave radar point cloud sequences learning. In: CIKM. pp. 2024–2033 (2024)

  52. [53]

    In: Pro- ceedings of the 3rd ACM Workshop on Millimeter-wave Networks and Sensing Systems

    Singh, A.D., Sandha, S.S., Garcia, L., Srivastava, M.: Radhar: Human activity recognition from point clouds generated through a millimeter-wave radar. In: Pro- ceedings of the 3rd ACM Workshop on Millimeter-wave Networks and Sensing Systems. pp. 51–56 (2019)

  53. [54]

    Sensors24(8) (2024)

    Tan, T.H., Tian, J.H., Sharma, A.K., Liu, S.H., Huang, Y.F.: Human activity recognition based on deep learning and micro-doppler radar data. Sensors24(8) (2024)

  54. [55]

    Texas Instruments: IWR1443BOOST single-chip 77- and 79-ghz mmwave sensor evaluation module.https://www.ti.com/tool/IWR1443BOOST(2014), accessed: 2020-09-29

  55. [56]

    Texas Instruments: IWR6843 single-chip 60-ghz mmwave radar sensor.https: //www.ti.com/product/IWR6843(2019), accessed: 2020-09-29

  56. [57]

    Lin et al

    Tian, J., Zou, Y., Lai, J.: From range-angle maps to poses: Human skeleton esti- mation from mmwave radar fmcw signal (2025) 18 J. Lin et al

  57. [58]

    In: CVPR

    Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR. pp. 1521–1528. IEEE (2011)

  58. [59]

    In: International Conference on AI in Healthcare

    Tunau, M., Zakka, V.G., Dai, Z.: Enhanced sparse point cloud data processing for privacy-aware human action recognition. In: International Conference on AI in Healthcare. pp. 142–155. Springer (2025)

  59. [60]

    In: EMBC

    Wan, Q., Li, Y., Li, C., Pal, R.: Gesture recognition for smart home applications using portable radar sensors. In: EMBC. pp. 6414–6417. IEEE (2014)

  60. [61]

    In: UIST

    Wang, S., Song, J., Lien, J., Poupyrev, I., Hilliges, O.: Interacting with soli: Ex- ploring fine-grained dynamic gesture recognition in the radio-frequency spectrum. In: UIST. pp. 851–860 (2016)

  61. [62]

    IMWUT7(1), 1–22 (2023)

    Wang, S., Cao, D., Liu, R., Jiang, W., Yao, T., Lu, C.X.: Human parsing with joint learning for dynamic mmwave radar point cloud. IMWUT7(1), 1–22 (2023)

  62. [63]

    In: ICASSP

    Wang, Y., Liu, H., Cui, K., Zhou, A., Li, W., Ma, H.: m-activity: Accurate and real-time human activity recognition via millimeter wave radar. In: ICASSP. pp. 8298–8302. IEEE (2021)

  63. [64]

    arXiv preprint arXiv:2503.02300 (2025)

    Wu, R., Li, Z., Wang, J., Xu, X., Zheng, Z., Huang, K., Lu, G.: Diffusion-based mmwave radar point cloud enhancement driven by range images. arXiv preprint arXiv:2503.02300 (2025)

  64. [65]

    IEEE Transactions on Radar Systems4, 261–272 (2025)

    Wu, Y., Fioranelli, F., Gao, C.: Radmamba: Efficient human activity recognition through a radar-based micro-doppler-oriented mamba state-space model. IEEE Transactions on Radar Systems4, 261–272 (2025)

  65. [66]

    IEEE Transactions on Mobile Computing 23(12), 10734–10751 (2024)

    Xia, S., Chu, L., Pei, L., Yang, J., Yu, W., Qiu, R.C.: Timestamp-supervised wearable-based activity segmentation and recognition with contrastive learning and order-preserving optimal transport. IEEE Transactions on Mobile Computing 23(12), 10734–10751 (2024)

  66. [67]

    In: AAAI

    Xu, K.: Ai-driven personalized fall prevention for older adults. In: AAAI. vol. 39, pp. 29610–29612 (2025)

  67. [68]

    Xu, R., Wang, X., Wang, T., Chen, Y., Pang, J., Lin, D.: Pointllm: Empowering largelanguagemodelstounderstandpointclouds.In:ECCV.pp.131–147.Springer (2024)

  68. [69]

    In: MobiSys

    Xue, H., Ju, Y., Miao, C., Wang, Y., Wang, S., Zhang, A., Su, L.: mmmesh: Towards 3d real-time dynamic human mesh construction using millimeter-wave. In: MobiSys. pp. 269–282 (2021)

  69. [70]

    arXiv preprint arXiv:2511.08910 (2025)

    Yan, J., Xu, C., Liu, D.: Og-pcl: Efficient sparse point cloud processing for human activity recognition. arXiv preprint arXiv:2511.08910 (2025)

  70. [71]

    In: AAAI

    Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI. vol. 32 (2018)

  71. [72]

    NeurIPS36, 18756–18768 (2023)

    Yang, J., Huang, H., Zhou, Y., Chen, X., Xu, Y., Yuan, S., Zou, H., Lu, C.X., Xie, L.: Mm-fi: Multi-modal non-intrusive 4d human dataset for versatile wireless sensing. NeurIPS36, 18756–18768 (2023)

  72. [73]

    IEEE Systems Journal16(2), 3036–3047 (2022)

    Yu, C., Xu, Z., Yan, K., Chien, Y.R., Fang, S.H., Wu, H.C.: Noninvasive human activity recognition using millimeter-wave radar. IEEE Systems Journal16(2), 3036–3047 (2022)

  73. [74]

    In: 2020 IEEE 91st Vehicular Technology Conference

    Yu, J.T., Yen, L., Tseng, P.H.: mmwave radar-based hand gesture recognition using range-angle image. In: 2020 IEEE 91st Vehicular Technology Conference. pp. 1–5 (2020)

  74. [75]

    In: International Conference on Computer and Communications

    Zeng, X., Shi, Y., Zhou, A.: Multi-har: Human activity recognition in multi-person scenes based on mmwave sensing. In: International Conference on Computer and Communications. pp. 1789–1793. IEEE (2022)

  75. [76]

    In: CVPR

    Zhang, R., Guo, Z., Zhang, W., Li, K., Miao, X., Cui, B., Qiao, Y., Gao, P., Li, H.: Pointclip: Point cloud understanding by clip. In: CVPR. pp. 8552–8562 (2022) DAP 19

  76. [77]

    IEEE Sensors Letters3(2), 1–4 (2018)

    Zhang, R., Cao, S.: Real-time human motion behavior detection via cnn using mmwave radar. IEEE Sensors Letters3(2), 1–4 (2018)

  77. [78]

    Zhao, M., Tian, Y., Zhao, H., Alsheikh, M.A., Li, T., Hristov, R., Kabelac, Z., Katabi,D.,Torralba,A.:Rf-based3dskeletons.In:SIGCOMM.pp.267–281(2018)

  78. [79]

    IEEE Internet of Things Journal10(12), 10236–10249 (2023)

    Zhao, P., Lu, C.X., Wang, B., Trigoni, N., Markham, A.: Cubelearn: End-to-end learning for human motion recognition from raw mmwave radar signals. IEEE Internet of Things Journal10(12), 10236–10249 (2023)

  79. [80]

    IEEE TPAMI45(4), 4396–4415 (2022)

    Zhou, K., Liu, Z., Qiao, Y., Xiang, T., Loy, C.C.: Domain generalization: A survey. IEEE TPAMI45(4), 4396–4415 (2022)

  80. [81]

    IMWUT9(2), 1–25 (2025) 20 J

    Zhou, R., Li, S., Zhang, H., Liu, C., Sun, J.: mmmulti: Multi-person action recog- nition based on multi-task learning using millimeter waves. IMWUT9(2), 1–25 (2025) 20 J. Lin et al. Supplementary Material In this supplementary material, we provide a comprehensive overview of the UniMM-HAR dataset, an analysis of its heterogeneous multi-source characteris...

Showing first 80 references.