pith. sign in

arxiv: 2606.20715 · v1 · pith:MNLLENXGnew · submitted 2026-06-16 · 💻 cs.CV

CDER-SME: A Cross-Device Event-RGB Micro-Expression Dataset under Multi-Level Stress Induction

Pith reviewed 2026-06-27 01:23 UTC · model grok-4.3

classification 💻 cs.CV
keywords micro-expression recognitionevent cameraRGB datasetcross-device alignmentstress inductionmultimodal fusionspontaneous expressionsaction units
0
0 comments X

The pith

CDER-SME supplies a cross-device Event-RGB dataset of 1963 micro-expression samples gathered under cognitive and social stress, along with an alignment method that works with independent sensors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CDER-SME to address the gap between controlled lab benchmarks and realistic micro-expression recognition that needs fine time resolution and natural triggers. It collects data from 92 subjects using separate event cameras and RGB cameras while applying multi-level stress tasks to draw out spontaneous expressions, then supplies a pipeline that aligns the streams in time and space without matched hardware. A baseline experiment demonstrates that combining the two modalities yields higher recognition rates than either source alone. This matters for moving micro-expression work toward practical settings where sensors cannot share a common optical path. The work centers on reproducible acquisition and annotation of 790 Event-RGB pairs plus 210 high-fidelity aligned pairs.

Core claim

We introduce CDER-SME, a cross-device Event-RGB dataset collected under a multi-level stress induction framework (cognitive and social) to elicit spontaneous emotional leakage. To enable reproducible acquisition with independent, decoupled sensors, we provide a hardware-agnostic alignment pipeline for temporal synchronization and landmark-guided spatial registration. CDER-SME adopts a three-tier structure with 92 subjects and 1,963 expert-annotated samples (Action Units and emotions), including 790 Event-RGB pairs and 210 high-fidelity aligned pairs. We further report a reproducible multimodal baseline, where cross-modal fusion improves performance over single-modality counterparts, supporti

What carries the argument

The hardware-agnostic alignment pipeline that performs temporal synchronization and landmark-guided spatial registration between independent event and RGB sensors.

If this is right

  • Cross-modal fusion of event dynamics and RGB cues raises micro-expression recognition performance above either modality used alone.
  • The dataset removes the requirement for coaxial calibration, allowing practical cross-device Event-RGB setups.
  • The 790 Event-RGB pairs and 210 high-fidelity aligned pairs supply test material for alignment and fusion methods.
  • Expert annotations of action units and emotions across 1,963 samples support reproducible multimodal experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The alignment pipeline could be tested on other decoupled sensor pairs such as event plus depth cameras for different recognition tasks.
  • The cognitive and social stress protocol might be reused or modified to study other low-intensity behaviors in controlled psychology settings.
  • Real-world deployment of the dataset approach could support applications like driver monitoring or interview analysis where natural micro-expressions appear.

Load-bearing premise

The multi-level stress induction framework reliably produces spontaneous micro-expressions that experts can consistently identify and annotate across the collected samples.

What would settle it

A controlled test in which models using fused event-RGB input show no accuracy gain over single-modality baselines on held-out samples, or in which the alignment pipeline produces visibly mismatched event-frame pairs on new independent hardware.

Figures

Figures reproduced from arXiv: 2606.20715 by Hui Sha, Jingting Li, Su-Jing Wang.

Figure 1
Figure 1. Figure 1: Workflow for cognitive stress Module recorded, removing time-constrained pressure. (3) No Stress: Only two congruent colors (red/green) were used as a baseline. Rationale for MEs. This paradigm systematically induces cognitive over￾load, leading to micro-fluctuations in the periocular and brow regions, and re￾quiring high-temporal event sensing to capture dynamics which might be blurred in 30fps sequences.… view at source ↗
Figure 2
Figure 2. Figure 2: Significant AU occurrences under two multi-level stress induction paradigm Temporal Synchronization via Neuromorphic Ventral Cross-Modal Spatial Registration & Normalization Stream First digit state transition 𝑇𝑟𝑔𝑏 ORB localization + OCR on chronometer Chronometer ROI 𝑇𝑒𝑣 ∆𝑇 = 𝑇𝑒𝑣 − 𝑇𝑟𝑔𝑏 Microsecond-level synchronization Aligned RGB- Event pairs (210 samples) RGB cropped face sequence Event tensor RGB Onse… view at source ↗
Figure 3
Figure 3. Figure 3: The proposed cross-device alignment pipeline 4 Cross-Device Spatial and Temporal Alignment While our multi-pressure paradigm yielded a total of 1,963 spontaneous expres￾sion samples in the RGB modality, hardware constraints limited the final dataset to 790 samples with concurrent RGB-Event recordings. Meantime, high-fidelity cross-modal ME analysis requires reliable temporal synchronization and spatial cor… view at source ↗
Figure 4
Figure 4. Figure 4: Spatial quality comparison. Top and bottom rows show RGB images and event￾based frame, respectively. marker occlusion/offset (n=121), and acquisition failures (n=189)(See Supp. for a detailed list). The remaining 210 pairs form a high-fidelity “gold standard”, while the full 790-pair set is released as a challenging testbed for future robust alignment research. Representativeness Analysis. While the alignm… view at source ↗
Figure 5
Figure 5. Figure 5: Overall architecture of the baseline method for multi-modal MER task confirms that the most intense motion moments are well-synchronized. (Detailed per-sample metrics are provided in Supp.) 2. Spatial Alignment Validity. Due to the fundamental differences in imaging mechanisms between RGB intensity and asynchronous event triggering, stan￾dard pixel-level metrics (e.g., SSIM, PSNR) are inherently low and un… view at source ↗
Figure 6
Figure 6. Figure 6: Data collection setup with reduced illumination variation, suppressed event noise, and reproducible recording conditions across subjects and sessions. Importantly, consistent with the main paper, the RGB and event sensors were deployed as independent, de￾coupled devices rather than as a rigidly coaxial or pre-calibrated hardware pair, so that the resulting dataset better reflects practical cross-device Eve… view at source ↗
Figure 7
Figure 7. Figure 7: Emotion class distributions of the 790 raw Event–RGB pairs and the 210 aligned Event–RGB pairs. Emotion labels are derived from AU-rule-based objective inference. pair pool that still reflects realistic cross-device acquisition noise. The resulting sample accounting is summarized in Tab. 8. Among the 790 raw pairs, 210 samples were retained as valid aligned pairs, while the remaining 580 samples were exclu… view at source ↗
Figure 8
Figure 8. Figure 8: Confusion matrices of the released multimodal reference baseline on the overall set, MaE subset, and ME subset. A similar trend is observed on the event branch. Our method achieves the strongest event-only performance, with an ACC of 0.5619, a UF1 of 0.3180, and a UAR of 0.3167, outperforming both Voxel + CNN (0.3667/0.3232/0.3093) and Event Frame + CNN (0.3524/0.2486/0.2301). Notably, while Voxel + CNN at… view at source ↗
read the original abstract

Micro-expression recognition (MER) in realistic scenarios demands high temporal sensitivity and ecological validity, yet existing benchmarks are largely constrained to laboratory-controlled settings and rigid hardware-coupled sensing. We introduce CDER-SME, a cross-device Event-RGB dataset collected under a multi-level stress induction framework (cognitive and social) to elicit spontaneous emotional leakage. To enable reproducible acquisition with independent, decoupled sensors, we provide a hardware-agnostic alignment pipeline for temporal synchronization and landmark-guided spatial registration. CDER-SME adopts a three-tier structure with 92 subjects and 1,963 expert-annotated samples (Action Units and emotions), including 790 Event-RGB pairs and 210 high-fidelity aligned pairs. We further report a reproducible multimodal baseline, where cross-modal fusion improves performance over single-modality counterparts, supporting the complementarity of event dynamics and RGB cues. By removing the need for coaxial calibration, CDER-SME offers a practical benchmark for cross-device alignment and deployable Event-RGB MER in real-world affective intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces CDER-SME, a cross-device Event-RGB micro-expression dataset collected from 92 subjects under a multi-level stress induction framework (cognitive and social) to elicit spontaneous emotional leakage. It comprises 1,963 expert-annotated samples (Action Units and emotions), including 790 Event-RGB pairs and 210 high-fidelity aligned pairs. The authors provide a hardware-agnostic alignment pipeline for temporal synchronization and landmark-guided spatial registration, and report a reproducible multimodal baseline in which cross-modal fusion improves performance over single-modality counterparts.

Significance. If the stress induction, synchronization, and annotation procedures are shown to be reliable, the dataset would offer a valuable benchmark for Event-RGB MER that removes the requirement for coaxial hardware calibration. This could support more ecologically valid and deployable affective computing systems. The alignment pipeline and baseline results, if quantitatively detailed, would strengthen reproducibility claims in the field.

major comments (3)
  1. [Dataset Collection and Stress Induction] The central claim that the multi-level stress induction reliably elicits spontaneous micro-expressions across the 1,963 samples is load-bearing for ecological validity, yet the manuscript provides no details on the specific task protocols, induction durations, or any corroborating measures (e.g., physiological signals or self-report validation) to support this.
  2. [Baseline Evaluation] The assertion that cross-modal fusion improves performance is central to the complementarity argument, but the provided text contains no quantitative baseline metrics (accuracy, F1, or statistical comparisons), tables, or error analysis, preventing evaluation of the reported gains.
  3. [Annotation Process] Expert annotation of Action Units and emotions for 1,963 samples requires reporting of inter-annotator agreement (e.g., Cohen's kappa or percentage agreement) to establish reliability; this is not addressed.
minor comments (2)
  1. The abstract would be strengthened by inclusion of at least one key quantitative result from the baseline (e.g., fusion accuracy delta).
  2. Clarify the exact number of subjects per stress level and any exclusion criteria applied to the 92 participants.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of clarity, reproducibility, and validation that we will address in the revision. Below we respond point-by-point to the major comments.

read point-by-point responses
  1. Referee: [Dataset Collection and Stress Induction] The central claim that the multi-level stress induction reliably elicits spontaneous micro-expressions across the 1,963 samples is load-bearing for ecological validity, yet the manuscript provides no details on the specific task protocols, induction durations, or any corroborating measures (e.g., physiological signals or self-report validation) to support this.

    Authors: We agree that additional protocol details are necessary to substantiate the ecological validity of the stress induction. The current manuscript provides a high-level description of the cognitive and social stress framework but lacks granular information. In the revised version, we will expand the Dataset Collection section with specific task protocols, induction durations, and self-report validation measures where available. Physiological signals were not collected, as the experimental design prioritized decoupled cross-device Event-RGB capture without additional sensors; we will explicitly state this design choice and its implications as a limitation. revision: yes

  2. Referee: [Baseline Evaluation] The assertion that cross-modal fusion improves performance is central to the complementarity argument, but the provided text contains no quantitative baseline metrics (accuracy, F1, or statistical comparisons), tables, or error analysis, preventing evaluation of the reported gains.

    Authors: We acknowledge the referee's point that quantitative results are essential for evaluating the fusion gains. Although the abstract references the multimodal baseline, the main text does not present the metrics in sufficient detail. We will revise the Experiments section to include a dedicated table reporting accuracy, F1 scores, statistical comparisons (e.g., significance tests), and error analysis comparing single-modality and cross-modal fusion approaches. revision: yes

  3. Referee: [Annotation Process] Expert annotation of Action Units and emotions for 1,963 samples requires reporting of inter-annotator agreement (e.g., Cohen's kappa or percentage agreement) to establish reliability; this is not addressed.

    Authors: We agree that reporting inter-annotator agreement is required to demonstrate annotation reliability. The 1,963 samples were annotated by multiple experts following a standardized protocol. In the revision, we will add a subsection under Annotation Process that reports the agreement metrics, including Cohen's kappa for both Action Units and emotion labels. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a dataset introduction and baseline report with no mathematical derivations, fitted parameters, predictions, or self-citation chains. The central claims concern data collection under stress induction, a hardware-agnostic alignment pipeline, and empirical multimodal fusion results. These are presented as empirical contributions without any reduction to self-defined inputs or prior author work invoked as uniqueness theorems. The work is self-contained against external benchmarks as a new resource release.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The contribution rests on standard practices for data collection, expert annotation, and sensor alignment rather than new mathematical axioms, free parameters, or invented entities.

pith-pipeline@v0.9.1-grok · 5713 in / 1148 out tokens · 49357 ms · 2026-06-27T01:23:04.187965+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 9 canonical work pages

  1. [1]

    In: International Confer- ence on Pattern Recognition

    Adra, M., Mirabet-Herranz, N., Dugelay, J.L.: Beyond RGB: Tri-modal microex- pression recognition with rgb, thermal, and event data. In: International Confer- ence on Pattern Recognition. pp. 311–324. Springer (2024)

  2. [2]

    In: European Con- ference on Computer Vision

    Becattini, F., Cultrera, L., Berlincioni, L., Ferrari, C., Leonardo, A., Del Bimbo, A.: Neuromorphic facial analysis with cross-modal supervision. In: European Con- ference on Computer Vision. pp. 205–223. Springer (2024)

  3. [3]

    IEEE Transactions on Pattern Analysis and Machine Intelligence44(9), 5826–5846 (2022).https://doi.org/10.1109/TPAMI.2021.3067464

    Ben, X., Ren, Y., Zhang, J., Wang, S.J., Kpalma, K., Meng, W., Liu, Y.J.: Video- Based Facial Micro-Expression Analysis: A Survey of Datasets, Features and Al- gorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence44(9), 5826–5846 (2022).https://doi.org/10.1109/TPAMI.2021.3067464

  4. [4]

    In: ProceedingsoftheIEEE/CVFConferenceonComputerVisionandPatternRecog- nition (CVPR) Workshops

    Berlincioni, L., et al.: Neuromorphic event-based facial expression recognition. In: ProceedingsoftheIEEE/CVFConferenceonComputerVisionandPatternRecog- nition (CVPR) Workshops. pp. 4109–4119 (2023)

  5. [5]

    IEEE Transactions on Multimedia25, 1345–1358 (2022)

    Chen, B., Liu, K.H., Xu, Y., Wu, Q.Q., Yao, J.F.: Block division convolutional network with implicit deep features augmentation for micro-expression recognition. IEEE Transactions on Multimedia25, 1345–1358 (2022)

  6. [6]

    IEEE Transactions on Affective Com- puting9(1), 116–129 (2018).https://doi.org/10.1109/TAFFC.2016.2573832, https://ieeexplore.ieee.org/document/7492264/?arnumber=7492264

    Davison, A.K., Lansley, C., Costen, N., Tan, K., Yap, M.H.: SAMM: A Spon- taneous Micro-Facial Movement Dataset. IEEE Transactions on Affective Com- puting9(1), 116–129 (2018).https://doi.org/10.1109/TAFFC.2016.2573832, https://ieeexplore.ieee.org/document/7492264/?arnumber=7492264

  7. [7]

    Journal of imaging4(10), 119 (2018)

    Davison, A.K., Merghani, W., Yap, M.H.: Objective classes for micro-facial expres- sion recognition. Journal of imaging4(10), 119 (2018)

  8. [8]

    Dickerson, S.S., Kemeny, M.E.: Acute stressors and cortisol responses: a theoretical integrationandsynthesisoflaboratoryresearch.Psychologicalbulletin130(3), 355 (2004)

  9. [9]

    Frontiers in Psychology12, 784834 (2022)

    Dong, Z., Wang, G., Lu, S., Li, J., Yan, W., Wang, S.J.: Spontaneous facial expres- sions and micro-expressions coding: from brain to face. Frontiers in Psychology12, 784834 (2022)

  10. [10]

    The philosophy of deception1(2), 5 (2009)

    Ekman, P.: Lie catching and microexpressions. The philosophy of deception1(2), 5 (2009)

  11. [11]

    WW Norton & Company (2009) 16 J

    Ekman, P.: Telling lies: Clues to deceit in the marketplace politics and marriage. WW Norton & Company (2009) 16 J. Li et al

  12. [12]

    Environmental Psychology & Nonverbal Behavior (1978)

    Ekman, P., Friesen, W.V.: Facial action coding system. Environmental Psychology & Nonverbal Behavior (1978)

  13. [13]

    IEEE transactions on pattern analysis and machine intelligence 44(1), 154–180 (2020)

    Gallego, G., Delbrück, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger, S., Davison, A.J., Conradt, J., Daniilidis, K., et al.: Event-based vision: A survey. IEEE transactions on pattern analysis and machine intelligence 44(1), 154–180 (2020)

  14. [14]

    In: Proceedings of the 3rd Workshop on Facial Micro- Expression (FME ’23)

    Guo, C., Huang, H.: GLEFFN: A global-local event feature fusion network for micro-expression recognition. In: Proceedings of the 3rd Workshop on Facial Micro- Expression (FME ’23). p. 8. ACM (2023)

  15. [15]

    Pattern Recognition Letters163, 57–64 (2022)

    He, Y., Xu, Z., Ma, L., Li, H.: Micro-expression spotting based on optical flow features. Pattern Recognition Letters163, 57–64 (2022)

  16. [16]

    in the wild

    Husák, P., Cech, J., Matas, J.: Spotting facial micro-expressions “in the wild”. In: 22nd Computer Vision Winter Workshop (Retz). pp. 1–9 (2017)

  17. [17]

    Neuropsychobiology28(1-2), 76–81 (1993)

    Kirschbaum, C., Pirke, K.M., Hellhammer, D.H.: The ‘trier social stress test’– a tool for investigating psychobiological stress responses in a laboratory setting. Neuropsychobiology28(1-2), 76–81 (1993)

  18. [18]

    arXiv preprint arXiv:2201.05297 (2022)

    Li, H., Sui, M., Zhu, Z., Zhao, F.: Mmnet: Muscle motion-guided network for micro-expression recognition. arXiv preprint arXiv:2201.05297 (2022)

  19. [19]

    IEEE Transactions on Pat- tern Analysis and Machine Intelligence45(3), 2782–2800 (2022)

    Li, J., Dong, Z., Lu, S., Wang, S.J., Yan, W.J., Ma, Y., Liu, Y., Huang, C., Fu, X.: CAS(ME)3: A Third Generation Facial Spontaneous Micro-Expression Database With Depth Information and High Ecological Validity. IEEE Transactions on Pat- tern Analysis and Machine Intelligence45(3), 2782–2800 (2022)

  20. [20]

    IEEE Transactions on Affective Computing16(4), 2959–2974 (2025).https://doi.org/10.1109/TAFFC

    Li, J., Lu, S., Wang, Y., Dong, Z., Wang, S.J., Fu, X.: Could micro-expressions be quantified? electromyography gives affirmative evidence. IEEE Transactions on Affective Computing16(4), 2959–2974 (2025).https://doi.org/10.1109/TAFFC. 2025.3575127

  21. [21]

    IEEE Transactions on Affective Computing14(4), 3031–3047 (2023).https://doi

    Li, X., Cheng, S., Li, Y., Behzad, M., Shen, J., Zafeiriou, S., Pantic, M., Zhao, G.: 4DME: A Spontaneous 4D Micro-Expression Dataset With Multimodalities. IEEE Transactions on Affective Computing14(4), 3031–3047 (2023).https://doi. org/10.1109/TAFFC.2022.3182342,https://ieeexplore.ieee.org/document/ 9796028/?arnumber=9796028

  22. [22]

    In: 2013 10th IEEE In- ternational Conference and Workshops on Automatic Face and Gesture Recogni- tion (FG)

    Li, X., Pfister, T., Huang, X., Zhao, G., Pietikäinen, M.: A Spontaneous Micro- expression Database: Inducement, collection and baseline. In: 2013 10th IEEE In- ternational Conference and Workshops on Automatic Face and Gesture Recogni- tion (FG). pp. 1–6 (2013).https://doi.org/10.1109/FG.2013.6553717

  23. [23]

    IEEE Transactions on Affective Computing13(4), 2028– 2046 (2022)

    Li, Y., Wei, J., Liu, Y., Kauttonen, J., Zhao, G.: Deep learning for micro-expression recognition: A survey. IEEE Transactions on Affective Computing13(4), 2028– 2046 (2022)

  24. [24]

    In: European Conference on Computer Vi- sion

    Lin, S., Ma, Y., Guo, Z., Wen, B.: Dvs-voltmeter: Stochastic process-based event simulator for dynamic vision sensors. In: European Conference on Computer Vi- sion. pp. 578–593. Springer (2022)

  25. [25]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Mastropasqua, N., Bugueno-Cordova, I., Verschae, R., Acevedo, D., Negri, P., Buemi, M.E.: Exploring spatial-temporal dynamics in event-based facial micro- expression analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4723–4732 (2025)

  26. [26]

    In: Proceedings of the 13th annual ACM international conference on Multimedia

    Pantic,M.,Sebe,N.,Cohn,J.F.,Huang,T.:Affectivemultimodalhuman-computer interaction. In: Proceedings of the 13th annual ACM international conference on Multimedia. pp. 669–676 (2005)

  27. [27]

    MIT press (1997) CDER-SME 17

    Picard, R.W.: Affective computing. MIT press (1997) CDER-SME 17

  28. [28]

    In: 3rd international conference on imaging for crime detection and prevention (ICDP 2009)

    Polikovsky, S., Kameda, Y., Ohta, Y.: Facial micro-expressions recognition using high speed camera and 3d-gradient descriptor. In: 3rd international conference on imaging for crime detection and prevention (ICDP 2009). pp. 1–6. IET (2009)

  29. [29]

    Psychological science19(5), 508– 514 (2008)

    Porter, S., Ten Brinke, L.: Reading between the lies: Identifying concealed and falsified emotions in universal facial expressions. Psychological science19(5), 508– 514 (2008)

  30. [30]

    IEEE Transactions on Affective Computing9(4), 424–436 (2018).https://doi

    Qu, F., Wang, S.J., Yan, W.J., Li, H., Wu, S., Fu, X.: CAS(ME)2: A Database for Spontaneous Macro-Expression and Micro-Expression Spotting and Recognition. IEEE Transactions on Affective Computing9(4), 424–436 (2018).https://doi. org/10.1109/TAFFC.2017.2654440,https://ieeexplore.ieee.org/document/ 7820164/

  31. [31]

    IEEE transactions on pattern analysis and machine intelligence43(6), 1964–1980 (2019)

    Rebecq, H., Ranftl, R., Koltun, V., Scaramuzza, D.: High speed and high dynamic range video with an event camera. IEEE transactions on pattern analysis and machine intelligence43(6), 1964–1980 (2019)

  32. [32]

    In: 2011 IEEE international conference on automatic face & gesture recognition (FG)

    Shreve, M., Godavarthy, S., Goldgof, D., Sarkar, S.: Macro-and micro-expression spotting in long videos using spatio-temporal strain. In: 2011 IEEE international conference on automatic face & gesture recognition (FG). pp. 51–56. IEEE (2011)

  33. [33]

    Journal of experi- mental psychology18(6), 643 (1935)

    Stroop, J.R.: Studies of interference in serial verbal reactions. Journal of experi- mental psychology18(6), 643 (1935)

  34. [34]

    Neurocomputing602, 128196 (2024)

    Wang, Z., Zhang, K., Luo, W., Sankaranarayana, R.: Htnet for micro-expression recognition. Neurocomputing602, 128196 (2024)

  35. [35]

    In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG)

    Wen-Jing Yan, Wu, Q., Yong-Jin Liu, Su-Jing Wang, Fu, X.: CASME database: A dataset of spontaneous micro-expressions collected from neutralized faces. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). pp. 1–7. IEEE (2013).https://doi.org/10.1109/FG. 2013.6553799,http://ieeexplore.ieee.org/document/6553799/

  36. [36]

    In: IEEE International Conference on Multimedia and Expo (ICME)

    Xiao, P., et al.: ESTME: Event-driven spatio-temporal motion enhancement for micro-expression recognition. In: IEEE International Conference on Multimedia and Expo (ICME). pp. 1–6 (2024)

  37. [37]

    PLoS ONE9(1), e86041 (2014-01-27).https://doi.org/10.1371/journal.pone

    Yan, W.J., Li, X., Wang, S.J., Zhao, G., Liu, Y.J., Chen, Y.H., Fu, X.: CASME II: An improved spontaneous micro-expression database and the baseline evaluation. PLoS ONE9(1), e86041 (2014-01-27).https://doi.org/10.1371/journal.pone. 0086041,https://dx.plos.org/10.1371/journal.pone.0086041

  38. [38]

    IEEE Transactions on Affective Computing15(3), 1371–1386 (2024).https: //doi.org/10.1109/TAFFC.2023.3341918 18 J

    Zhao, S., Tang, H., Mao, X., Liu, S., Zhang, Y., Wang, H., Xu, T., Chen, E.: DFME: A New Benchmark for Dynamic Facial Micro-Expression Recognition. IEEE Transactions on Affective Computing15(3), 1371–1386 (2024).https: //doi.org/10.1109/TAFFC.2023.3341918 18 J. Li et al. Supplementary Materials A Dataset Release Protocol and Data Card To facilitate commun...

  39. [39]

    raw RGB videos and raw event streams in their original formats

  40. [40]

    temporal annotations including onset, apex, and offset

  41. [41]

    AU annotations and AU laterality labels

  42. [42]

    derived objective emotion labels based on AU rules

  43. [43]

    stress-condition labels indicating the corresponding induction setting

  44. [44]

    alignment metadata including estimated cross-device temporal offset, face- crop information

  45. [45]

    whitening

    official split files, evaluation scripts, and reference-baseline code. This release design follows the benchmark plan stated in the main paper and aims to support transparent and reproducible downstream comparisons. In practice, we recommend Tier-1 for large-scale RGB-only learning, Tier-2 for robust cross-device or weakly synchronized multimodal research...