pith. sign in

arxiv: 2606.23186 · v1 · pith:MHQNJCBOnew · submitted 2026-06-22 · 💻 cs.CV

StreamPPG: Low-Latency rPPG Estimation via Consistent Privileged Learning

Pith reviewed 2026-06-26 09:12 UTC · model grok-4.3

classification 💻 cs.CV
keywords remote photoplethysmographyrPPGprivileged learninglow-latency estimationframe-wise processingblood volume pulseedge computingreal-time monitoring
0
0 comments X

The pith

StreamPPG enables frame-by-frame rPPG estimation from video with accuracy matching clip-wise methods by using ground-truth signals only during training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the latency-accuracy trade-off in remote photoplethysmography, where clip-wise methods require over one hundred frames and introduce multi-second delays while frame-wise methods lose periodic signal features. StreamPPG introduces a unified architecture trained via consistent privileged learning that incorporates ground-truth rPPG signals exclusively at training time. This produces representations capable of accurate single-frame inference at test time. The result is state-of-the-art accuracy on standard datasets together with real-time throughput on edge hardware, supporting contact-free vital-sign monitoring without waiting for long video buffers.

Core claim

StreamPPG is a unified architecture that enables low-latency frame-wise physiological signal estimation while achieving competitive accuracy compared with clip-wise approaches. It is trained under a consistent privileged learning strategy that leverages ground-truth rPPG signals as privileged information to enhance the model's representation capability.

What carries the argument

Consistent privileged learning (CPL) strategy that supplies ground-truth rPPG signals exclusively during training to strengthen single-frame inference representations.

If this is right

  • StreamPPG achieves state-of-the-art accuracy across multiple datasets.
  • It maintains real-time throughput on edge devices.
  • It removes the multi-second delay inherent in clip-wise rPPG while avoiding the accuracy drop of conventional frame-wise methods.
  • The same training approach supports continuous contact-free health monitoring on resource-constrained hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The CPL pattern could be tested on other periodic video signals such as respiration rate estimation.
  • Mobile health applications could incorporate the model to deliver continuous vital-sign feedback without specialized sensors.
  • Further experiments on skin-tone diversity would clarify whether the learned representations remain stable across demographic groups.
  • The approach might reduce buffer requirements in other real-time video analysis pipelines that currently rely on batch processing.

Load-bearing premise

Ground-truth rPPG signals used only at training time will produce features that generalize to new videos without those signals and without overfitting to the training distribution.

What would settle it

A cross-dataset evaluation in which StreamPPG error exceeds that of a standard clip-wise baseline under changed lighting, camera, or subject demographics.

Figures

Figures reproduced from arXiv: 2606.23186 by Hui-Liang Shen, Si-Yuan Cao, Xiaohan Zhang, Xiaokai Bai, Yihan Yang, Yiming Li, Yuanhui Hu, Yuguang Chu, Zhe Wu.

Figure 1
Figure 1. Figure 1: Comparison between clip-wise rPPG inference and the proposed [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the StreamPPG inference pipeline. The model receives the [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the StreamPPG training with consistent privileged learning (CPL) strategy. During training, privileged information is introduced through [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of facial attention maps. (a) The model trained with the [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Bland-Altman plots and scatter plots. (a) Training with PURE and [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Representative rPPG signal visualizations from intra-dataset evaluation on the PURE, UBFC, COHFACE and MMPD datasets. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: HR estimation errors on the UBFC dataset under different video [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Effect of λ on the MMPD dataset. Performance peaks at λ = 0.8, which provides the best balance between privileged supervision and consistency regularization. TABLE VI ABLATION STUDY ON THE ENCODER OF STREAMPPG ON THE MMPD DATASET. “CA” DENOTES CHANNEL ATTENTION. Encoder MAE RMSE MAPE ρ w/o CA 3.77 9.35 3.92 0.74 w/ CA 3.39 8.35 3.50 0.78 and poor generalization in signal-free inference, confirming the exis… view at source ↗
read the original abstract

Remote photoplethysmography (rPPG) estimates the blood volume pulse (BVP) signal from facial videos, enabling contact-free health monitoring. Conventional clip-wise approaches, which use video clips as input, require capturing over one hundred frames before inference, thus introducing several seconds of delay and hindering real-time use. Meanwhile, frame-wise approaches struggle to capture long-range temporal and periodic features of physiological rhythms, and therefore lead to reduced estimation accuracy. To overcome these issues, we propose StreamPPG, a unified architecture that enables low-latency frame-wise physiological signal estimation while achieving competitive accuracy compared with clip-wise approaches. StreamPPG is trained under a consistent privileged learning (CPL) strategy, which leverages ground-truth rPPG signals as privileged information to enhance the model's representation capability. Extensive experiments demonstrate that StreamPPG achieves state-of-the-art accuracy across multiple datasets while maintaining real-time throughput on edge devices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes StreamPPG, a unified frame-wise architecture for remote photoplethysmography (rPPG) that uses consistent privileged learning (CPL) to incorporate ground-truth BVP signals only at training time. This enables low-latency inference from individual frames while claiming to match or exceed the accuracy of conventional clip-wise methods. The abstract states that extensive experiments show state-of-the-art accuracy across multiple datasets together with real-time throughput on edge devices.

Significance. If the central claim holds, StreamPPG would address a practical bottleneck in contact-free physiological monitoring by delivering both low latency and competitive accuracy, potentially enabling real-time applications on resource-constrained devices. The CPL strategy is a standard privileged-information technique; its value here would lie in empirical demonstration that video-only features learned under this regime generalize without overfitting to training-domain video-BVP correlations.

major comments (2)
  1. [Experiments] Experiments section (and any associated tables/figures): the manuscript claims SOTA accuracy and real-time performance but the provided abstract supplies no quantitative numbers, baseline comparisons, error bars, dataset details, or ablation studies. Without these, the central claim that CPL produces video-only features whose accuracy matches clip-wise baselines cannot be evaluated.
  2. [Method] § on method / CPL strategy: the assumption that ground-truth rPPG signals used only at training produce representations that generalize to unseen test videos is load-bearing, yet no cross-dataset transfer results or ablation removing the privileged branch are described. If dataset-specific correlations between video and BVP are learned, the reported performance would not transfer.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, clarifying the content of the Experiments and Method sections while outlining targeted revisions for improved clarity.

read point-by-point responses
  1. Referee: [Experiments] Experiments section (and any associated tables/figures): the manuscript claims SOTA accuracy and real-time performance but the provided abstract supplies no quantitative numbers, baseline comparisons, error bars, dataset details, or ablation studies. Without these, the central claim that CPL produces video-only features whose accuracy matches clip-wise baselines cannot be evaluated.

    Authors: We agree that the abstract, as a concise summary, does not include specific quantitative numbers. However, the Experiments section of the manuscript contains detailed tables and figures reporting SOTA comparisons against multiple baselines, error bars across runs, full dataset specifications, and ablation studies on the CPL components. These results directly support that the video-only inference matches clip-wise accuracy. In the revision we will add a brief summary of key metrics (e.g., MAE and throughput) to the abstract for immediate accessibility. revision: partial

  2. Referee: [Method] § on method / CPL strategy: the assumption that ground-truth rPPG signals used only at training produce representations that generalize to unseen test videos is load-bearing, yet no cross-dataset transfer results or ablation removing the privileged branch are described. If dataset-specific correlations between video and BVP are learned, the reported performance would not transfer.

    Authors: The manuscript already reports cross-dataset transfer experiments (training on one dataset and testing on others) as well as ablations that disable the privileged BVP branch at training time. These appear in the Experiments section and confirm that the learned video representations generalize without overfitting to training-domain video-BVP correlations. We will add explicit forward references from the Method section to these results to make the generalization evidence more prominent. revision: no

Circularity Check

0 steps flagged

No circularity; derivation is self-contained empirical method

full rationale

The paper describes a standard neural architecture for rPPG estimation trained with a privileged branch that receives ground-truth BVP signals exclusively at training time. The central claim is an empirical performance result (SOTA accuracy + real-time inference) obtained via supervised optimization on labeled datasets; no equation or derivation is shown to reduce by construction to its own fitted parameters or to a self-citation chain. The inference procedure is explicitly defined to operate without the privileged signal, making the generalization claim falsifiable by cross-dataset or ablation experiments rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract alone; the method description does not introduce new physical quantities or unstated mathematical assumptions beyond standard supervised learning.

pith-pipeline@v0.9.1-grok · 5718 in / 1065 out tokens · 18983 ms · 2026-06-26T09:12:47.970372+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 1 linked inside Pith

  1. [1]

    Rademacher and gaussian complexities: Risk bounds and structural results.Journal of Machine Learning Research, 3(Nov):463–482, 2002

    Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results.Journal of Machine Learning Research, 3(Nov):463–482, 2002

  2. [2]

    Unsupervised skin tissue segmentation for remote photoplethysmography.Pattern Recognition Letters, 124:82–90, 2019

    Serge Bobbia, Richard Macwan, Yannick Benezeth, Alamin Mansouri, and Julien Dubois. Unsupervised skin tissue segmentation for remote photoplethysmography.Pattern Recognition Letters, 124:82–90, 2019

  3. [3]

    Bj¨orn Braun, Daniel McDuff, and Christian Holz. How suboptimal is training rPPG models with videos and targets from different body sites? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 410–418, 2024

  4. [4]

    Face2PPG: An unsupervised pipeline for blood volume pulse extraction from faces

    Constantino Alvarez Casado and Miguel Bordallo L ´opez. Face2PPG: An unsupervised pipeline for blood volume pulse extraction from faces. IEEE Journal of Biomedical and Health Informatics, 27(11):5530–5541, 2023

  5. [5]

    DeepPhys: Video-based physiological measurement using convolutional attention networks

    Weixuan Chen and Daniel McDuff. DeepPhys: Video-based physiological measurement using convolutional attention networks. InProceedings of the European Conference on Computer Vision, pages 349–365, 2018

  6. [6]

    Juan Cheng, Ping Wang, Rencheng Song, Yu Liu, Chang Li, Yong Liu, and Xun Chen. Remote heart rate measurement from near-infrared videos based on joint blind source separation with delay-coordinate transformation.IEEE Transactions on Instrumentation and Measurement, 70:1–13, 2020

  7. [7]

    Transformers are SSMs: generalized models and efficient algorithms through structured state space duality.Proceedings of Machine Learning Research, 235:10041–10071, 2024

    Tri Dao and Albert Gu. Transformers are SSMs: generalized models and efficient algorithms through structured state space duality.Proceedings of Machine Learning Research, 235:10041–10071, 2024

  8. [8]

    Robust pulse rate from chrominance- based rPPG.IEEE Transactions on Biomedical Engineering, 60(10):2878– 2886, 2013

    Gerard De Haan and Vincent Jeanne. Robust pulse rate from chrominance- based rPPG.IEEE Transactions on Biomedical Engineering, 60(10):2878– 2886, 2013

  9. [9]

    Improved motion robustness of remote-PPG by using the blood volume pulse signature.Physiological Measurement, 35(9):1913, 2014

    Gerard De Haan and Arno Van Leest. Improved motion robustness of remote-PPG by using the blood volume pulse signature.Physiological Measurement, 35(9):1913, 2014

  10. [10]

    A repro- ducible study on remote heart rate measurement.arXiv preprint arXiv:1709.00962, 2017

    Guillaume Heusch, Andr ´e Anjos, and S ´ebastien Marcel. A repro- ducible study on remote heart rate measurement.arXiv preprint arXiv:1709.00962, 2017

  11. [11]

    ETA- rPPGNet: Effective time-domain attention network for remote heart rate measurement.IEEE Transactions on Instrumentation and Measurement, 70:1–12, 2021

    Min Hu, Fei Qian, Dong Guo, Xiaohua Wang, Lei He, and Fuji Ren. ETA- rPPGNet: Effective time-domain attention network for remote heart rate measurement.IEEE Transactions on Instrumentation and Measurement, 70:1–12, 2021

  12. [12]

    Titong Jiang, Yuan Ma, Jiaqi Li, Qing Dong, Xuewu Ji, and Yahui Liu. LSTS: Periodicity learning via long short-term temporal shift for remote physiological measurement.IEEE Transactions on Circuits and Systems for Video Technology, 35(7):6452–6465, 2025

  13. [13]

    Factorizephys: Matrix factorization for multidimensional attention in remote physiological sensing.arXiv preprint arXiv:2411.01542, 2024

    Jitesh Joshi, Sos S Agaian, and Youngjun Cho. Factorizephys: Matrix factorization for multidimensional attention in remote physiological sensing.arXiv preprint arXiv:2411.01542, 2024

  14. [14]

    Learning motion-robust remote photoplethysmography through arbitrary resolution videos

    Jianwei Li, Zitong Yu, and Jingang Shi. Learning motion-robust remote photoplethysmography through arbitrary resolution videos. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 1334–1342, 2023

  15. [15]

    Contactless pulse estimation leveraging pseudo labels and self-supervision

    Zhihua Li and Lijun Yin. Contactless pulse estimation leveraging pseudo labels and self-supervision. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20588–20597, 2023

  16. [16]

    Spiking- physformer: Camera-based remote photoplethysmography with parallel spike-driven transformer.Neural Networks, 185:107128, 2025

    Mingxuan Liu, Jiankai Tang, Yongli Chen, Haoxiang Li, Jiahao Qi, Siwei Li, Kegang Wang, Jie Gan, Yuntao Wang, and Hong Chen. Spiking- physformer: Camera-based remote photoplethysmography with parallel spike-driven transformer.Neural Networks, 185:107128, 2025

  17. [17]

    A general remote photoplethysmography estimator with spatiotemporal convolutional network

    Si-Qi Liu and Pong C Yuen. A general remote photoplethysmography estimator with spatiotemporal convolutional network. InProceedings of the International Conference on Automatic Face and Gesture Recognition, pages 481–488. IEEE, 2020

  18. [18]

    Multi- task temporal shift attention networks for on-device contactless vitals measurement.Advances in Neural Information Processing Systems, 33:19400–19411, 2020

    Xin Liu, Josh Fromm, Shwetak Patel, and Daniel McDuff. Multi- task temporal shift attention networks for on-device contactless vitals measurement.Advances in Neural Information Processing Systems, 33:19400–19411, 2020

  19. [19]

    Efficientphys: Enabling simple, fast and accurate camera-based cardiac measurement

    Xin Liu, Brian Hill, Ziheng Jiang, Shwetak Patel, and Daniel McDuff. Efficientphys: Enabling simple, fast and accurate camera-based cardiac measurement. InProceedings of the Winter Conference on Applications of Computer Vision, pages 5008–5017, 2023

  20. [20]

    rppg-toolbox: Deep remote ppg toolbox.Advances in Neural Information Processing Systems, 36:68485–68510, 2023

    Xin Liu, Girish Narayanswamy, Akshay Paruchuri, Xiaoyu Zhang, Jiankai Tang, Yuzhe Zhang, Roni Sengupta, Shwetak Patel, Yuntao Wang, and Daniel McDuff. rppg-toolbox: Deep remote ppg toolbox.Advances in Neural Information Processing Systems, 36:68485–68510, 2023

  21. [21]

    Dual-gan: Joint BVP and noise modeling for remote physiological measurement

    Hao Lu, Hu Han, and S Kevin Zhou. Dual-gan: Joint BVP and noise modeling for remote physiological measurement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12404–12413, 2021

  22. [22]

    A vector-contraction inequality for rademacher complexities

    Andreas Maurer. A vector-contraction inequality for rademacher complexities. InProceedings of the International Conference on Algorithmic Learning Theory, pages 3–17. Springer, 2016

  23. [23]

    Video-based remote physiological measurement via cross-verified feature disentangling

    Xuesong Niu, Zitong Yu, Hu Han, Xiaobai Li, Shiguang Shan, and Guoying Zhao. Video-based remote physiological measurement via cross-verified feature disentangling. InProceedings of the European Conference on Computer Vision, pages 295–310. Springer, 2020

  24. [24]

    On the theory of learning with privileged information.Advances in Neural Information Processing Systems, 23, 2010

    Dmitry Pechyony and Vladimir Vapnik. On the theory of learning with privileged information.Advances in Neural Information Processing Systems, 23, 2010

  25. [25]

    Local group invariance for heart rate estimation from face videos in the wild

    Christian S Pilz, Sebastian Zaunseder, Jarek Krajewski, and Vladimir Blazek. Local group invariance for heart rate estimation from face videos in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1254–1262, 2018

  26. [26]

    Non-contact, automated cardiac pulse measurements using video imaging and blind source separation.Optics Express, 18(10):10762–10774, 2010

    Ming-Zher Poh, Daniel J McDuff, and Rosalind W Picard. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation.Optics Express, 18(10):10762–10774, 2010

  27. [27]

    Rs+rppg: Robust strongly self- supervised learning for rppg.IEEE Transactions on Circuits and Systems for Video Technology, 2025

    Marko Savic and Guoying Zhao. Rs+rppg: Robust strongly self- supervised learning for rppg.IEEE Transactions on Circuits and Systems for Video Technology, 2025

  28. [28]

    Tranphys: Spatiotemporal masked transformer steered remote photoplethysmography estimation.IEEE Transactions on Circuits and Systems for Video Technology, 34(4):3030–3042, 2023

    Hang Shao, Lei Luo, Jianjun Qian, Shuo Chen, Chuanfei Hu, and Jian Yang. Tranphys: Spatiotemporal masked transformer steered remote photoplethysmography estimation.IEEE Transactions on Circuits and Systems for Video Technology, 34(4):3030–3042, 2023

  29. [29]

    PulseGAN: Learning to generate realistic pulse waveforms in remote photoplethysmography.IEEE Journal of Biomedical and Health Informatics, 25(5):1373–1384, 2021

    Rencheng Song, Huan Chen, Juan Cheng, Chang Li, Yu Liu, and Xun Chen. PulseGAN: Learning to generate realistic pulse waveforms in remote photoplethysmography.IEEE Journal of Biomedical and Health Informatics, 25(5):1373–1384, 2021

  30. [30]

    Non- contrastive unsupervised learning of physiological signals from video

    Jeremy Speth, Nathan Vance, Patrick Flynn, and Adam Czajka. Non- contrastive unsupervised learning of physiological signals from video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14464–14474, 2023

  31. [31]

    Visual heart rate estimation with convolutional neural network

    Radim ˇSpetl´ık, V ojtech Franc, and Jir´ı Matas. Visual heart rate estimation with convolutional neural network. InProceedings of the British Machine Vision Conference, pages 3–6, 2018

  32. [32]

    Non-contact video-based pulse rate measurement on a mobile service robot

    Ronny Stricker, Steffen M ¨uller, and Horst-Michael Gross. Non-contact video-based pulse rate measurement on a mobile service robot. In The IEEE International Symposium on Robot and Human Interactive Communication, pages 1056–1062. IEEE, 2014

  33. [33]

    Photoplethysmography revisited: from contact to noncontact, from point to imaging.IEEE Transactions on Biomedical Engineering, 63(3):463–477, 2015

    Yu Sun and Nitish Thakor. Photoplethysmography revisited: from contact to noncontact, from point to imaging.IEEE Transactions on Biomedical Engineering, 63(3):463–477, 2015

  34. [34]

    Sun and X

    Z. Sun and X. Li. Contrast-Phys+: Unsupervised and weakly-supervised video-based remote physiological measurement via spatiotemporal con- trast.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–18, 2024

  35. [35]

    MMPD: Multi-domain mobile video physiology dataset

    Jiankai Tang, Kequan Chen, Yuntao Wang, Yuanchun Shi, Shwetak Patel, Daniel McDuff, and Xin Liu. MMPD: Multi-domain mobile video physiology dataset. InProceedings of the International Conference of the IEEE Engineering in Medicine & Biology Society, pages 1–5, 2023

  36. [36]

    Learning using privileged information: Similarity control and knowledge transfer.Journal of Machine Learning Research, 16(1):2023–2049, 2015

    Vladimir Vapnik, Rauf Izmailov, et al. Learning using privileged information: Similarity control and knowledge transfer.Journal of Machine Learning Research, 16(1):2023–2049, 2015. ARXIV 11

  37. [37]

    A new learning paradigm: Learning using privileged information.Neural networks, 22(5-6):544–557, 2009

    Vladimir Vapnik and Akshay Vashist. A new learning paradigm: Learning using privileged information.Neural networks, 22(5-6):544–557, 2009

  38. [38]

    Remote plethys- mographic imaging using ambient light.Optics Express, 16(26):21434– 21445, 2008

    Wim Verkruysse, Lars O Svaasand, and J Stuart Nelson. Remote plethys- mographic imaging using ambient light.Optics Express, 16(26):21434– 21445, 2008

  39. [39]

    Algorithmic principles of remote PPG.IEEE Transactions on Biomedical Engineering, 64(7):1479–1491, 2016

    Wenjin Wang, Albertus C Den Brinker, Sander Stuijk, and Gerard De Haan. Algorithmic principles of remote PPG.IEEE Transactions on Biomedical Engineering, 64(7):1479–1491, 2016

  40. [40]

    Contact-free screening of atrial fibrillation by a smartphone using facial pulsatile photoplethysmographic signals

    Bryan P Yan, William HS Lai, Christy KY Chan, Stephen Chun-Hin Chan, Lok-Hei Chan, Ka-Ming Lam, Ho-Wang Lau, Chak-Ming Ng, Lok- Yin Tai, Kin-Wai Yip, et al. Contact-free screening of atrial fibrillation by a smartphone using facial pulsatile photoplethysmographic signals. Journal of the American Heart Association, 7(8):e008585, 2018

  41. [41]

    Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks

    Zitong Yu, Xiaobai Li, and Guoying Zhao. Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks. InProceedings of the British Machine Vision Conference, 2019

  42. [42]

    Facial-video-based physio- logical signal measurement: Recent advances and affective applications

    Zitong Yu, Xiaobai Li, and Guoying Zhao. Facial-video-based physio- logical signal measurement: Recent advances and affective applications. IEEE Signal Processing Magazine, 38(6):50–58, 2021

  43. [43]

    Remote heart rate measurement from highly compressed facial videos: an end-to-end deep learning solution with video enhancement

    Zitong Yu, Wei Peng, Xiaobai Li, Xiaopeng Hong, and Guoying Zhao. Remote heart rate measurement from highly compressed facial videos: an end-to-end deep learning solution with video enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 151–160, 2019

  44. [44]

    Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer.International Journal of Computer Vision, 131(6):1307– 1330, 2023

    Zitong Yu, Yuming Shen, Jingang Shi, Hengshuang Zhao, Yawen Cui, Jiehua Zhang, Philip Torr, and Guoying Zhao. Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer.International Journal of Computer Vision, 131(6):1307– 1330, 2023

  45. [45]

    Physformer: Facial video-based physiological measurement with temporal difference transformer

    Zitong Yu, Yuming Shen, Jingang Shi, Hengshuang Zhao, Philip HS Torr, and Guoying Zhao. Physformer: Facial video-based physiological measurement with temporal difference transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4186–4196, 2022

  46. [46]

    Dezhao Zhai, Wei Chen, Yinghao Ding, Ming Yu, Qinwei Li, and Hang Wu. Research on robust measurement method of heart rate using remote photoplethysmography based on adversarial learning network with high and low frequency features.IEEE Transactions on Circuits and Systems for Video Technology, 35(6):5208–5222, 2025

  47. [47]

    Yizhu Zhang, Jingang Shi, Jiayin Wang, Yuan Zong, Wenming Zheng, and Guoying Zhao. MaskFusionNet: A dual-stream fusion model with masked pre-training mechanism for rPPG measurement.IEEE Transactions on Circuits and Systems for Video Technology, 34(11):11521–11534, 2024

  48. [48]

    JAMSNet: A remote pulse extraction network based on joint attention and multi-scale fusion.IEEE Transactions on Circuits and Systems for Video Technology, 33(6):2783–2797, 2022

    Changchen Zhao, Hongsheng Wang, Huiling Chen, Weiwei Shi, and Yuanjing Feng. JAMSNet: A remote pulse extraction network based on joint attention and multi-scale fusion.IEEE Transactions on Circuits and Systems for Video Technology, 33(6):2783–2797, 2022

  49. [49]

    RhythmFormer: Extracting patterned rPPG signals based on periodic sparse attention.Pattern Recognition, 164:111511, 2025

    Bochao Zou, Zizheng Guo, Jiansheng Chen, Junbao Zhuo, Weiran Huang, and Huimin Ma. RhythmFormer: Extracting patterned rPPG signals based on periodic sparse attention.Pattern Recognition, 164:111511, 2025

  50. [50]

    Rhythm- Mamba: Fast, Lightweight, and Accurate Remote Physiological Measure- ment

    Bochao Zou, Zizheng Guo, Xiaocheng Hu, and Huimin Ma. Rhythm- Mamba: Fast, Lightweight, and Accurate Remote Physiological Measure- ment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 11077–11085, 2025. APPENDIXA CONSISTENTPRIVILEGEDLEARNINGSTRATEGY A. Notation and Definitions V, z and sgt denote the input video, privile...