pith. sign in

arxiv: 2605.19794 · v1 · pith:OOXU6NCJnew · submitted 2026-05-19 · 💻 cs.HC · cs.AI· cs.DB

AffectAI-Capture: A Reproducible Multimodal Protocol for Small-Group Meeting Research

Pith reviewed 2026-05-20 04:13 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.DB
keywords multimodal datasmall-group meetingsaffective computingreproducible protocoleye trackingwearable sensorsvideo synchronizationmeeting research
0
0 comments X

The pith

A reproducible protocol collects synchronized eye-tracking, physiology, audio, and video data from small-group meetings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AffectAI-Capture as a method for capturing multimodal data during four-person interactions that resemble meetings. It integrates eye tracking, wearable sensors for body signals, close and room audio, multiple video angles, event logs, and participant reports. All of this is structured around fixed tasks and one main timeline to ensure everything stays in sync. The goal is to create data that researchers can use reliably for studying emotions, behaviors, and group dynamics in meetings. If the protocol works as designed, it would make such studies easier to repeat and compare across different teams.

Core claim

The central discovery is a protocol architecture that links task design with instrumentation, timing information, and data packaging to support reproducible collection of affective, behavioral, and meeting data in controlled small-group settings.

What carries the argument

The single authoritative event timeline that coordinates data acquisition, post-processing, and ensures synchronization across all modalities.

If this is right

  • Standardized outputs allow different research groups to produce comparable datasets.
  • Fixed task blocks based on group interaction paradigms provide consistent contexts for data collection.
  • Timing provenance from the central timeline supports accurate analysis of events and responses.
  • Practical trade-offs in instrumentation help balance data quality with feasibility in lab settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adapting this protocol for virtual meetings could extend its use beyond in-person setups.
  • High-quality synchronized data might enable more accurate machine learning models for detecting group affect.
  • Combining this with existing meeting analytics tools could reveal new patterns in collaboration.

Load-bearing premise

That using fixed task blocks and a single event timeline will yield synchronized, high-quality data ready for affective and behavioral analysis.

What would settle it

Running the full protocol with participants and finding that the collected data streams do not align temporally or suffer from quality issues that prevent reliable analysis.

Figures

Figures reproduced from arXiv: 2605.19794 by Alice Modica, Andrew Burke Dittberner, Anna Obara, Fabricio Batista Narcizo, Jesper B\"unsow Boldt, Meisam Jamshidi Seikavandi, Paolo Burelli, Tanya Ignatenko, Ted Vucurevich.

Figure 1
Figure 1. Figure 1: Audio validation using headset microphones on Head and Torso Simu￾lators (HATS) [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
read the original abstract

We present AffectAI-Capture, a protocol for collecting synchronized multimodal data in four-person meeting-like interactions, combining eye tracking, wearable physiology, close-talk and room audio, multi-view video, event logging, and structured self-report. Sessions use fixed task blocks grounded in established group-interaction paradigms, while acquisition and post-processing are organized around a single authoritative event timeline and standardized outputs. We describe the experimental rationale, synchronization philosophy, data organization, and practical trade-offs. Pilot-level validation of audio quality and video synchronization has been conducted using controlled bench tests; full protocol sessions with participants remain ongoing work. The contribution is a reproducible protocol architecture linking task design, instrumentation, timing provenance, and data packaging for affective, behavioral, and meeting-analytics research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript presents AffectAI-Capture, a protocol for collecting synchronized multimodal data during four-person meeting-like interactions. It combines eye tracking, wearable physiology sensors, close-talk and room audio, multi-view video, event logging, and structured self-reports. Sessions rely on fixed task blocks drawn from established group-interaction paradigms, with acquisition and post-processing organized around a single authoritative event timeline and standardized output formats. Pilot bench tests for audio quality and video synchronization are reported; full participant sessions are described as ongoing work. The stated contribution is the reproducible protocol architecture linking task design, instrumentation, timing provenance, and data packaging.

Significance. If the protocol is adopted and extended, it could help standardize multimodal data collection for affective computing, behavioral analysis, and meeting research, addressing common issues of synchronization and comparability across studies. The explicit focus on a single authoritative timeline and standardized outputs is a constructive step toward enabling reproducible downstream analyses, and the detailed description of practical trade-offs adds practical value for researchers planning similar setups.

minor comments (3)
  1. [Pilot Validation] The pilot bench-test description would benefit from explicit quantitative metrics (e.g., measured audio SNR values or video frame-offset statistics) even if only summary statistics are provided; this would make the supporting evidence for synchronization claims more concrete without altering the protocol focus.
  2. [Data Organization and Packaging] Clarify how the standardized output formats map to common analysis libraries or file standards (e.g., specific CSV schemas, video container choices, or metadata conventions) to strengthen the reproducibility claim.
  3. [Experimental Rationale and Task Blocks] The rationale for selecting the particular established group-interaction paradigms could be expanded with one or two concrete references or brief task descriptions to help readers assess generalizability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and constructive summary of our manuscript on the AffectAI-Capture protocol. The recommendation for minor revision is noted, and we value the recognition of the protocol's focus on synchronization, reproducibility, and standardized outputs for multimodal meeting research. No specific major comments were listed under the MAJOR COMMENTS section of the report. We will therefore incorporate minor clarifications in the revised version, particularly around the status of ongoing participant sessions and practical implementation details.

Circularity Check

0 steps flagged

No significant circularity detected in protocol description

full rationale

The manuscript is a descriptive protocol paper whose central claim is the specification of a reproducible architecture linking task design, instrumentation, timing provenance, and data packaging. No mathematical derivations, predictions, fitted parameters, or equations appear in the provided text or abstract. The contribution rests on the protocol specification itself rather than any reduction to self-referential definitions, self-citations, or renamed empirical patterns. Pilot bench tests for audio and video are presented as preliminary validation steps, not as inputs that force downstream claims. This is a self-contained methodological contribution with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The protocol rests on the domain assumption that synchronized multimodal streams add value for affective and behavioral analysis in meetings; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Synchronized multimodal data from eye tracking, physiology, audio, and video improves research on affective and behavioral processes in group interactions.
    Implicit foundation for the protocol design and data packaging choices.

pith-pipeline@v0.9.0 · 5704 in / 1284 out tokens · 42947 ms · 2026-05-20T04:13:42.405775+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

  1. [1]

    IEEE Transactions on Audio, Speech, and Language Processing20(2), 356–370 (2012)

    Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O.: Speaker diarization: A review of recent research. IEEE Transactions on Audio, Speech, and Language Processing20(2), 356–370 (2012). https://doi.org/10.1109/ TASL.2011.2125954

  2. [2]

    In: Friedman, L.A

    Baker, C.: Regulators and turn-taking in American Sign Language discourse. In: Friedman, L.A. (ed.) On the Other Hand: New Perspectives on American Sign Language, pp. 215–236. Academic Press, New York (1977)

  3. [3]

    Journal of Behavior Therapy and Experimental Psychi- atry25(1), 49–59 (1994)

    Bradley, M.M., Lang, P.J.: Measuring emotion: The self-assessment manikin and the semantic differential. Journal of Behavior Therapy and Experimental Psychi- atry25(1), 49–59 (1994). https://doi.org/10.1016/0005-7916(94)90063-9

  4. [4]

    In: Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR)

    Camgöz, N.C., Hadfield, S., Koller, O., Ney, H., Bowden, R.: Neural sign language translation. In: Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR). pp. 7784–7793 (2018)

  5. [5]

    In: Renals, S., Bengio, S

    Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., Wellner, P.: The AMI meeting corpus: A pre- announcement. In: Renals, S., Bengio, S. (eds.) Machine Learning for Multimodal Interaction. Lecture Notes i...

  6. [6]

    Experimental Economics14(1), 47–83 (2011)

    Chaudhuri, A.: Sustaining cooperation in laboratory public goods experiments: A selective survey of the literature. Experimental Economics14(1), 47–83 (2011). https://doi.org/10.1007/s10683-010-9257-1

  7. [7]

    Optimal sticky prices under rational inattention.American Economic Review, 99(3):769–803, 2009

    Fehr, E., Gächter, S.: Cooperation and punishment in public goods experiments. American Economic Review90(4), 980–994 (2000). https://doi.org/10.1257/aer. 90.4.980

  8. [8]

    Gorgolewski, K.J., Auer, T., Calhoun, V.D., Craddock, R.C., Das, S., Duff, E.P., Flandin, G., Ghosh, S., Glatard, T., Halchenko, Y.O., Handwerker, D.A., Hanke, M., Keator, D., Li, X., Michael, Z., Maumet, C., Nichols, B.N., Nichols, T.E., Pellman, J., Poline, J.B., Rokem, A., Schaefer, G., Sochat, V., Triplett, W., Turner, J.A., Varoquaux, G., Poldrack, R...

  9. [9]

    In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Pro- cessing (ICASSP ’03)

    Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI meeting corpus. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Pro- cessing (ICASSP ’03). vol. 1, pp. I–364–I–367. IEEE (2003). https://doi.org/ 10.1109/ICASSP.2003.1198793,availableathttps:...

  10. [10]

    Imaging Neuroscience3, IMAG.a.136 (2025)

    Kothe, C., Shirazi, S.Y., Stenner, T., Medine, D., Boulay, C., Grivich, M.I., Artoni, F., Mullen, T., Delorme, A., Makeig, S.: The lab streaming layer for synchronized multimodal recording. Imaging Neuroscience3, IMAG.a.136 (2025). https://doi. org/10.1162/IMAG.a.136

  11. [11]

    https://doi.org/10.1177/1088868311417243

    Lu, L., Yuan, Y.C., McLeod, P.L.: Twenty-five years of hidden profiles in group decisionmaking:Ameta-analysis.PersonalityandSocialPsychologyReview16(1), 54–75 (2012). https://doi.org/10.1177/1088868311417243

  12. [12]

    Journal of Cogni- tion1(1), 16 (2018)

    Mathôt, S.: Pupillometry: Psychology, physiology, and function. Journal of Cogni- tion1(1), 16 (2018). https://doi.org/10.5334/joc.18

  13. [13]

    IEEE Trans- actions on Affective Computing12(2), 479–493 (2021)

    Miranda-Correa, J.A., Abadi, M.K., Sebe, N., Patras, I.: AMIGOS: A dataset for affect, personality and mood research on individuals and groups. IEEE Trans- actions on Affective Computing12(2), 479–493 (2021). https://doi.org/10.1109/ TAFFC.2018.2884461

  14. [14]

    Scientific Data7(1), 293 (2020)

    Park, C.Y., Cha, N., Kang, S., Kim, A., Khandoker, A.H., Hadjileontiadis, L., Oh, A., Jeong, Y., Lee, U.: K-emocon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations. Scientific Data7(1), 293 (2020)

  15. [15]

    Pernet, C.R., Appelhoff, S., Gorgolewski, K.J., Flandin, G., Phillips, C., Delorme, A., Oostenveld, R.: EEG-BIDS, an extension to the brain imaging data structure forelectroencephalography.ScientificData6, 103(2019).https://doi.org/10.1038/ s41597-019-0104-8

  16. [16]

    Sensors20(2), 479 (2020)

    Posada-Quintero, H.F., Chon, K.H.: Innovations in electrodermal activity data collection and signal processing: A systematic review. Sensors20(2), 479 (2020). https://doi.org/10.3390/s20020479

  17. [17]

    Rietzschel, E.F., Nijstad, B.A., Stroebe, W.: The selection of creative ideas after individualideageneration:Choosingbetweencreativityandimpact.Britishjournal of psychology101(1), 47–68 (2010)

  18. [18]

    Cambridge University Press (2006)

    Sandler, W., Lillo-Martin, D.: Sign Language and Linguistic Universals. Cambridge University Press (2006)

  19. [19]

    In: Recent Advances in Deep Learning Applications: New Techniques and Practical Examples

    Seikavandi, M.J., Barrett, M., Burelli, P.: Modeling face emotion perception from naturalistic face viewing: Insights from fixational events and gaze strategies. In: Recent Advances in Deep Learning Applications: New Techniques and Practical Examples. Taylor & Francis (2024)

  20. [20]

    In: 2023 International Conference on Machine Learning and Applications (ICMLA)

    Seikavandi, M.J., Barrett, M.J.: Gaze reveals emotion perception: Insights from modelling naturalistic face viewing. In: 2023 International Conference on Machine Learning and Applications (ICMLA). pp. 2022–2025. IEEE (2023)

  21. [21]

    arXiv preprint arXiv:2503.16532 (2025)

    Seikavandi, M.J., Fimland, J., Barrett, M., Burelli, P.: Modelling emotions in face- to-face setting: The interplay of eye-tracking, personality, and temporal dynamics. arXiv preprint arXiv:2503.16532 (2025)

  22. [22]

    In: Proceedings of the 3rd Interna- tionalWorkshoponMultimodalandResponsibleAffectiveComputing.pp.100–108 (2025) AffectAI-Capture for Small-Group Meetings 11

    Seikavandi, M.J., Narcizo, F.B., Vucurevich, T., Dittberner, A.B., Burelli, P.: MuMTAffect: A multimodal multitask affective framework for personality and emotion recognition from physiological signals. In: Proceedings of the 3rd Interna- tionalWorkshoponMultimodalandResponsibleAffectiveComputing.pp.100–108 (2025) AffectAI-Capture for Small-Group Meetings 11

  23. [23]

    Seikavandi, Laurits Dixen, Jostein Fimland, Sree Keerthi Desu, Antonia-Bianca Zserai, Ye Sul Lee, Maria Barrett, and Paolo Burelli

    Sekiavandi, M.J., Dixen, L., Fimland, J., Desu, S.K., Zserai, A.B., Lee, Y.S., Bar- rett, M., Burelli, P.: Advancing face-to-face emotion communication: A multimodal dataset (affec). arXiv preprint arXiv:2504.18969 (2025)

  24. [24]

    Journal of Personality and Social Psychology48(6), 1467–1478 (1985)

    Stasser, G., Titus, W.: Pooling of unshared information in group decision making: Biased information sampling during discussion. Journal of Personality and Social Psychology48(6), 1467–1478 (1985). https://doi.org/10.1037/0022-3514.48.6.1467

  25. [25]

    Journal of Personality and Social Psychology87(4), 510–528 (2004)

    Van Kleef, G.A., De Dreu, C.K.W., Manstead, A.S.R.: The interpersonal effects of emotions in negotiations: A motivated information processing approach. Journal of Personality and Social Psychology87(4), 510–528 (2004). https://doi.org/10. 1037/0022-3514.87.4.510

  26. [26]

    Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E

    Wilkinson,M.D.,Dumontier,M.,Aalbersberg,I.J.J.,etal.:TheFAIRguidingprin- ciples for scientific data management and stewardship. Scientific Data3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18