AffectAI-Capture: A Reproducible Multimodal Protocol for Small-Group Meeting Research
Pith reviewed 2026-05-20 04:13 UTC · model grok-4.3
The pith
A reproducible protocol collects synchronized eye-tracking, physiology, audio, and video data from small-group meetings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is a protocol architecture that links task design with instrumentation, timing information, and data packaging to support reproducible collection of affective, behavioral, and meeting data in controlled small-group settings.
What carries the argument
The single authoritative event timeline that coordinates data acquisition, post-processing, and ensures synchronization across all modalities.
If this is right
- Standardized outputs allow different research groups to produce comparable datasets.
- Fixed task blocks based on group interaction paradigms provide consistent contexts for data collection.
- Timing provenance from the central timeline supports accurate analysis of events and responses.
- Practical trade-offs in instrumentation help balance data quality with feasibility in lab settings.
Where Pith is reading between the lines
- Adapting this protocol for virtual meetings could extend its use beyond in-person setups.
- High-quality synchronized data might enable more accurate machine learning models for detecting group affect.
- Combining this with existing meeting analytics tools could reveal new patterns in collaboration.
Load-bearing premise
That using fixed task blocks and a single event timeline will yield synchronized, high-quality data ready for affective and behavioral analysis.
What would settle it
Running the full protocol with participants and finding that the collected data streams do not align temporally or suffer from quality issues that prevent reliable analysis.
Figures
read the original abstract
We present AffectAI-Capture, a protocol for collecting synchronized multimodal data in four-person meeting-like interactions, combining eye tracking, wearable physiology, close-talk and room audio, multi-view video, event logging, and structured self-report. Sessions use fixed task blocks grounded in established group-interaction paradigms, while acquisition and post-processing are organized around a single authoritative event timeline and standardized outputs. We describe the experimental rationale, synchronization philosophy, data organization, and practical trade-offs. Pilot-level validation of audio quality and video synchronization has been conducted using controlled bench tests; full protocol sessions with participants remain ongoing work. The contribution is a reproducible protocol architecture linking task design, instrumentation, timing provenance, and data packaging for affective, behavioral, and meeting-analytics research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents AffectAI-Capture, a protocol for collecting synchronized multimodal data during four-person meeting-like interactions. It combines eye tracking, wearable physiology sensors, close-talk and room audio, multi-view video, event logging, and structured self-reports. Sessions rely on fixed task blocks drawn from established group-interaction paradigms, with acquisition and post-processing organized around a single authoritative event timeline and standardized output formats. Pilot bench tests for audio quality and video synchronization are reported; full participant sessions are described as ongoing work. The stated contribution is the reproducible protocol architecture linking task design, instrumentation, timing provenance, and data packaging.
Significance. If the protocol is adopted and extended, it could help standardize multimodal data collection for affective computing, behavioral analysis, and meeting research, addressing common issues of synchronization and comparability across studies. The explicit focus on a single authoritative timeline and standardized outputs is a constructive step toward enabling reproducible downstream analyses, and the detailed description of practical trade-offs adds practical value for researchers planning similar setups.
minor comments (3)
- [Pilot Validation] The pilot bench-test description would benefit from explicit quantitative metrics (e.g., measured audio SNR values or video frame-offset statistics) even if only summary statistics are provided; this would make the supporting evidence for synchronization claims more concrete without altering the protocol focus.
- [Data Organization and Packaging] Clarify how the standardized output formats map to common analysis libraries or file standards (e.g., specific CSV schemas, video container choices, or metadata conventions) to strengthen the reproducibility claim.
- [Experimental Rationale and Task Blocks] The rationale for selecting the particular established group-interaction paradigms could be expanded with one or two concrete references or brief task descriptions to help readers assess generalizability.
Simulated Author's Rebuttal
We thank the referee for the positive and constructive summary of our manuscript on the AffectAI-Capture protocol. The recommendation for minor revision is noted, and we value the recognition of the protocol's focus on synchronization, reproducibility, and standardized outputs for multimodal meeting research. No specific major comments were listed under the MAJOR COMMENTS section of the report. We will therefore incorporate minor clarifications in the revised version, particularly around the status of ongoing participant sessions and practical implementation details.
Circularity Check
No significant circularity detected in protocol description
full rationale
The manuscript is a descriptive protocol paper whose central claim is the specification of a reproducible architecture linking task design, instrumentation, timing provenance, and data packaging. No mathematical derivations, predictions, fitted parameters, or equations appear in the provided text or abstract. The contribution rests on the protocol specification itself rather than any reduction to self-referential definitions, self-citations, or renamed empirical patterns. Pilot bench tests for audio and video are presented as preliminary validation steps, not as inputs that force downstream claims. This is a self-contained methodological contribution with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Synchronized multimodal data from eye tracking, physiology, audio, and video improves research on affective and behavioral processes in group interactions.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Sessions use fixed task blocks grounded in established group-interaction paradigms, while acquisition and post-processing are organized around a single authoritative event timeline and standardized outputs.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat induction and embed_strictMono unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Synchronization is treated as an explicit protocol concern... Authoritative event spine... Redundant timing anchors.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
IEEE Transactions on Audio, Speech, and Language Processing20(2), 356–370 (2012)
Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O.: Speaker diarization: A review of recent research. IEEE Transactions on Audio, Speech, and Language Processing20(2), 356–370 (2012). https://doi.org/10.1109/ TASL.2011.2125954
-
[2]
Baker, C.: Regulators and turn-taking in American Sign Language discourse. In: Friedman, L.A. (ed.) On the Other Hand: New Perspectives on American Sign Language, pp. 215–236. Academic Press, New York (1977)
work page 1977
-
[3]
Journal of Behavior Therapy and Experimental Psychi- atry25(1), 49–59 (1994)
Bradley, M.M., Lang, P.J.: Measuring emotion: The self-assessment manikin and the semantic differential. Journal of Behavior Therapy and Experimental Psychi- atry25(1), 49–59 (1994). https://doi.org/10.1016/0005-7916(94)90063-9
-
[4]
In: Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR)
Camgöz, N.C., Hadfield, S., Koller, O., Ney, H., Bowden, R.: Neural sign language translation. In: Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR). pp. 7784–7793 (2018)
work page 2018
-
[5]
Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., Wellner, P.: The AMI meeting corpus: A pre- announcement. In: Renals, S., Bengio, S. (eds.) Machine Learning for Multimodal Interaction. Lecture Notes i...
-
[6]
Experimental Economics14(1), 47–83 (2011)
Chaudhuri, A.: Sustaining cooperation in laboratory public goods experiments: A selective survey of the literature. Experimental Economics14(1), 47–83 (2011). https://doi.org/10.1007/s10683-010-9257-1
-
[7]
Optimal sticky prices under rational inattention.American Economic Review, 99(3):769–803, 2009
Fehr, E., Gächter, S.: Cooperation and punishment in public goods experiments. American Economic Review90(4), 980–994 (2000). https://doi.org/10.1257/aer. 90.4.980
work page doi:10.1257/aer 2000
-
[8]
Gorgolewski, K.J., Auer, T., Calhoun, V.D., Craddock, R.C., Das, S., Duff, E.P., Flandin, G., Ghosh, S., Glatard, T., Halchenko, Y.O., Handwerker, D.A., Hanke, M., Keator, D., Li, X., Michael, Z., Maumet, C., Nichols, B.N., Nichols, T.E., Pellman, J., Poline, J.B., Rokem, A., Schaefer, G., Sochat, V., Triplett, W., Turner, J.A., Varoquaux, G., Poldrack, R...
-
[9]
In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Pro- cessing (ICASSP ’03)
Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI meeting corpus. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Pro- cessing (ICASSP ’03). vol. 1, pp. I–364–I–367. IEEE (2003). https://doi.org/ 10.1109/ICASSP.2003.1198793,availableathttps:...
-
[10]
Imaging Neuroscience3, IMAG.a.136 (2025)
Kothe, C., Shirazi, S.Y., Stenner, T., Medine, D., Boulay, C., Grivich, M.I., Artoni, F., Mullen, T., Delorme, A., Makeig, S.: The lab streaming layer for synchronized multimodal recording. Imaging Neuroscience3, IMAG.a.136 (2025). https://doi. org/10.1162/IMAG.a.136
-
[11]
https://doi.org/10.1177/1088868311417243
Lu, L., Yuan, Y.C., McLeod, P.L.: Twenty-five years of hidden profiles in group decisionmaking:Ameta-analysis.PersonalityandSocialPsychologyReview16(1), 54–75 (2012). https://doi.org/10.1177/1088868311417243
-
[12]
Journal of Cogni- tion1(1), 16 (2018)
Mathôt, S.: Pupillometry: Psychology, physiology, and function. Journal of Cogni- tion1(1), 16 (2018). https://doi.org/10.5334/joc.18
-
[13]
IEEE Trans- actions on Affective Computing12(2), 479–493 (2021)
Miranda-Correa, J.A., Abadi, M.K., Sebe, N., Patras, I.: AMIGOS: A dataset for affect, personality and mood research on individuals and groups. IEEE Trans- actions on Affective Computing12(2), 479–493 (2021). https://doi.org/10.1109/ TAFFC.2018.2884461
-
[14]
Scientific Data7(1), 293 (2020)
Park, C.Y., Cha, N., Kang, S., Kim, A., Khandoker, A.H., Hadjileontiadis, L., Oh, A., Jeong, Y., Lee, U.: K-emocon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations. Scientific Data7(1), 293 (2020)
work page 2020
-
[15]
Pernet, C.R., Appelhoff, S., Gorgolewski, K.J., Flandin, G., Phillips, C., Delorme, A., Oostenveld, R.: EEG-BIDS, an extension to the brain imaging data structure forelectroencephalography.ScientificData6, 103(2019).https://doi.org/10.1038/ s41597-019-0104-8
work page 2019
-
[16]
Posada-Quintero, H.F., Chon, K.H.: Innovations in electrodermal activity data collection and signal processing: A systematic review. Sensors20(2), 479 (2020). https://doi.org/10.3390/s20020479
-
[17]
Rietzschel, E.F., Nijstad, B.A., Stroebe, W.: The selection of creative ideas after individualideageneration:Choosingbetweencreativityandimpact.Britishjournal of psychology101(1), 47–68 (2010)
work page 2010
-
[18]
Cambridge University Press (2006)
Sandler, W., Lillo-Martin, D.: Sign Language and Linguistic Universals. Cambridge University Press (2006)
work page 2006
-
[19]
In: Recent Advances in Deep Learning Applications: New Techniques and Practical Examples
Seikavandi, M.J., Barrett, M., Burelli, P.: Modeling face emotion perception from naturalistic face viewing: Insights from fixational events and gaze strategies. In: Recent Advances in Deep Learning Applications: New Techniques and Practical Examples. Taylor & Francis (2024)
work page 2024
-
[20]
In: 2023 International Conference on Machine Learning and Applications (ICMLA)
Seikavandi, M.J., Barrett, M.J.: Gaze reveals emotion perception: Insights from modelling naturalistic face viewing. In: 2023 International Conference on Machine Learning and Applications (ICMLA). pp. 2022–2025. IEEE (2023)
work page 2023
-
[21]
arXiv preprint arXiv:2503.16532 (2025)
Seikavandi, M.J., Fimland, J., Barrett, M., Burelli, P.: Modelling emotions in face- to-face setting: The interplay of eye-tracking, personality, and temporal dynamics. arXiv preprint arXiv:2503.16532 (2025)
-
[22]
Seikavandi, M.J., Narcizo, F.B., Vucurevich, T., Dittberner, A.B., Burelli, P.: MuMTAffect: A multimodal multitask affective framework for personality and emotion recognition from physiological signals. In: Proceedings of the 3rd Interna- tionalWorkshoponMultimodalandResponsibleAffectiveComputing.pp.100–108 (2025) AffectAI-Capture for Small-Group Meetings 11
work page 2025
-
[23]
Sekiavandi, M.J., Dixen, L., Fimland, J., Desu, S.K., Zserai, A.B., Lee, Y.S., Bar- rett, M., Burelli, P.: Advancing face-to-face emotion communication: A multimodal dataset (affec). arXiv preprint arXiv:2504.18969 (2025)
-
[24]
Journal of Personality and Social Psychology48(6), 1467–1478 (1985)
Stasser, G., Titus, W.: Pooling of unshared information in group decision making: Biased information sampling during discussion. Journal of Personality and Social Psychology48(6), 1467–1478 (1985). https://doi.org/10.1037/0022-3514.48.6.1467
-
[25]
Journal of Personality and Social Psychology87(4), 510–528 (2004)
Van Kleef, G.A., De Dreu, C.K.W., Manstead, A.S.R.: The interpersonal effects of emotions in negotiations: A motivated information processing approach. Journal of Personality and Social Psychology87(4), 510–528 (2004). https://doi.org/10. 1037/0022-3514.87.4.510
work page 2004
-
[26]
Wilkinson,M.D.,Dumontier,M.,Aalbersberg,I.J.J.,etal.:TheFAIRguidingprin- ciples for scientific data management and stewardship. Scientific Data3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.