Recognition: no theorem link
BEACON: A Multimodal Dataset for Learning Behavioral Fingerprints from Gameplay Data
Pith reviewed 2026-05-12 03:37 UTC · model grok-4.3
The pith
The BEACON dataset supplies over 100 hours of synchronized multimodal data from Valorant gameplay to support research on behavioral biometrics for continuous authentication.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BEACON is a multimodal dataset comprising approximately 430 GB of synchronized data from 79 sessions across 28 distinct players, totaling an estimated 102.51 hours of active Valorant gameplay. It includes high-frequency mouse dynamics, keystroke events, network packet captures, screen recordings, hardware metadata, and in-game configuration context. The dataset is designed to serve as a rigorous stress test for behavioral biometrics due to the high precision motor skills and cognitive load inherent to tactical shooters, enabling studies in continuous authentication, behavioral profiling, user drift, and multimodal representation learning.
What carries the argument
The synchronized collection of high-frequency behavioral signals from competitive esports gameplay, which provides the multimodal context for learning user-specific fingerprints.
If this is right
- The dataset enables the study of continuous authentication using gameplay behavior under realistic conditions.
- It facilitates research on how behavioral patterns may drift over time in high-stakes settings.
- Researchers can explore multimodal fusion techniques for more robust user identification.
- It creates a benchmark for evaluating security models in an esports environment with diverse player skill levels.
Where Pith is reading between the lines
- This approach might extend to other domains requiring fine motor control and decision-making, such as professional software development or vehicle operation.
- If models from this data generalize well, they could be adapted for real-time monitoring in online gaming platforms to detect account sharing or unauthorized access.
- The emphasis on synchronized modalities could encourage similar collection efforts in non-gaming high-cognitive-load tasks to test biometric robustness.
Load-bearing premise
That the specific demands of Valorant gameplay provide a sufficiently demanding and representative test for the effectiveness of behavioral biometrics derived from these modalities.
What would settle it
If authentication models built using BEACON data perform no better than those from smaller or unimodal datasets when tested on new sessions or different players, the value of this large-scale multimodal collection would be called into question.
Figures
read the original abstract
Continuous authentication in high-stakes digital environments requires datasets with fine-grained behavioral signals under realistic cognitive and motor demands. But current benchmarks are often limited by small scale, unimodal sensing or lack of synchronised environmental context. To address this gap, this paper introduces BEACON ( Behavioral Engine for Authentication \& Continuous Monitoring), a large-scale multimodal dataset that captures diverse skill tiers in competitive \textit{Valorant} gameplay. BEACON contains approximately 430 GB of synchronised modality data (461 GB total on-disk including auxiliary \textit{Valorant} configuration captures) from 79 sessions across 28 distinct players, estimated at 102.51 hours of active gameplay, including high-frequency mouse dynamics, keystroke events, network packet captures, screen recordings, hardware metadata, and in-game configuration context. BEACON leverages the high precision motor skills and high cognitive load that are inherent to tactical shooters, making it a rigorous stress test for the robustness of behavioral biometrics. The dataset allows for the study of continuous authentication, behavioral profiling, user drift and multimodal representation learning in a high-fidelity esports setting. The authors release the dataset and code on Hugging Face and GitHub to create a reproducible benchmark for evaluating next-generation behavioral fingerprinting and security models
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces BEACON, a multimodal dataset collected from 79 sessions of competitive Valorant gameplay involving 28 distinct players. It reports approximately 430 GB of synchronized data (461 GB on-disk) spanning 102.51 hours, encompassing high-frequency mouse dynamics, keystroke events, network packet captures, screen recordings, hardware metadata, and in-game configuration context. The dataset is positioned as a rigorous stress test for behavioral biometrics due to the high motor precision and cognitive load of tactical shooters, enabling research on continuous authentication, behavioral profiling, user drift, and multimodal representation learning. The authors release the data and associated code on Hugging Face and GitHub.
Significance. If the synchronization accuracy, data quality, and collection protocols are rigorously validated and documented, BEACON could serve as a valuable large-scale benchmark for behavioral fingerprinting and security research. It addresses limitations in existing datasets by providing synchronized multimodal streams under realistic high-stakes conditions, potentially advancing continuous authentication systems and multimodal learning models. The public release supports reproducibility and community benchmarking.
major comments (1)
- Abstract and Data Collection section: The central claim that BEACON supplies 'synchronised modality data' suitable for multimodal representation learning and continuous authentication is load-bearing, yet the manuscript provides no quantitative validation. There are no reported details on the synchronization protocol, measured inter-modality latency or jitter, maximum temporal offsets (e.g., between mouse events and video frames), or validation metrics such as cross-modal alignment error bounds. Without these, the assumption that the streams are sufficiently aligned for downstream analyses cannot be assessed.
minor comments (2)
- Abstract: The active gameplay duration is stated as 'estimated at 102.51 hours'; provide the exact method of estimation and any exclusion criteria for inactive periods in the methods or dataset description section.
- Abstract: Clarify whether the 28 players are distinct across all 79 sessions or if some players contributed multiple sessions, and report any per-player session distribution to support claims of diversity.
Simulated Author's Rebuttal
We thank the referee for their constructive review and for identifying the need for greater transparency on synchronization. We address the major comment below and will revise the manuscript to incorporate additional details.
read point-by-point responses
-
Referee: [—] Abstract and Data Collection section: The central claim that BEACON supplies 'synchronised modality data' suitable for multimodal representation learning and continuous authentication is load-bearing, yet the manuscript provides no quantitative validation. There are no reported details on the synchronization protocol, measured inter-modality latency or jitter, maximum temporal offsets (e.g., between mouse events and video frames), or validation metrics such as cross-modal alignment error bounds. Without these, the assumption that the streams are sufficiently aligned for downstream analyses cannot be assessed.
Authors: We acknowledge that the current manuscript does not include quantitative validation of inter-modality alignment or a detailed description of the synchronization protocol. Data streams were captured with native high-resolution timestamps (mouse/keyboard events via low-level input hooks, network packets via libpcap, video frames via recording metadata) and aligned post hoc to a shared system clock started at the beginning of each session. However, explicit measurements of latency, jitter, or alignment error bounds are absent. In the revised manuscript we will add a dedicated subsection to the Data Collection section that (i) fully specifies the synchronization protocol and hardware/software configuration, (ii) reports any empirical offset measurements obtainable from the collection logs, and (iii) provides conservative error bounds derived from device sampling rates and known capture latencies. This addition will allow readers to evaluate the dataset's suitability for multimodal and continuous-authentication tasks. revision: yes
Circularity Check
Dataset release with no derivations, predictions or fitted parameters
full rationale
The manuscript is a dataset paper whose sole contribution is the description and release of BEACON (approximately 430 GB of synchronized multimodal Valorant gameplay recordings). It contains no equations, no claimed predictions, no parameter fitting, and no derivation chain. The abstract and full text limit themselves to data-collection details, modality lists, and release statements; none of these steps reduce to self-definition, self-citation load-bearing, or renaming of prior results. External verifiability is provided by the public Hugging Face/GitHub release, satisfying the criterion for a self-contained, non-circular contribution.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Valorant gameplay involves high precision motor skills and high cognitive load that make it a rigorous stress test for behavioral biometrics
Reference graph
Works this paper leans on
-
[1]
Ahmed Abbas, Tareq Abed Mohammed, Zena Ez Dallalbash, and Adil Khalil. Integrating big data analytics and behavioral biometrics for advanced fraud detection.Sakarya University Journal of Computer and Information Sciences, 9(1):8–20, 2026. doi: 10.35377/saucis... 1729803. URLhttps://izlik.org/JA55RG93DW
-
[2]
Sapimouse: Mouse dynamics-based user authentication using deep feature learning
Margit Antal, Norbert Fejér, and Krisztian Buza. Sapimouse: Mouse dynamics-based user authentication using deep feature learning. In2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI), pages 61–66, 2021. doi: 10.1109/SACI51354.2021.9465583
-
[3]
Realistic website fingerprinting by augmenting network traces
Alireza Bahramali, Ardavan Bozorgi, and Amir Houmansadr. Realistic website fingerprinting by augmenting network traces. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS ’23, page 1035–1049, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9798400700507. doi: 10.1145/3576915.3616639. URLhttps:/...
-
[4]
Sanjit Bhat, David Lu, Albert Kwon, and Srinivas Devadas. Var-cnn and dynaflow: Improved attacks and defenses for website fingerprinting.CoRR, abs/1802.10215, 2018. URL http: //arxiv.org/abs/1802.10215
-
[5]
Sc2egset: Starcraft ii esport replay and game-state dataset.Scientific Data, 10(1):600, Sep 2023
Andrzej Białecki, Natalia Jakubowska, Paweł Dobrowolski, Piotr Białecki, Leszek Krupi´nski, Andrzej Szczap, Robert Białecki, and Jan Gajewski. Sc2egset: Starcraft ii esport replay and game-state dataset.Scientific Data, 10(1):600, Sep 2023. ISSN 2052-4463. doi: 10.1038/ s41597-023-02510-7. URLhttps://doi.org/10.1038/s41597-023-02510-7
-
[6]
Rapid skill capture in a first-person shooter
David Buckley, Ke Chen, and Joshua Knowles. Rapid skill capture in a first-person shooter. IEEE Transactions on Computational Intelligence and AI in Games, 9(1):63–75, 2017. doi: 10.1109/TCIAIG.2015.2494849
-
[7]
User identification based on game-play activity patterns
Kuan-Ta Chen and Li-Wen Hong. User identification based on game-play activity patterns. In Proceedings of the 6th ACM SIGCOMM Workshop on Network and System Support for Games, NetGames ’07, page 7–12, New York, NY , USA, 2007. Association for Computing Machinery. ISBN 9780980446005. doi: 10.1145/1326257.1326259. URL https://doi.org/10.1145/ 1326257.1326259
-
[8]
From clicks to security: Investigating continuous authentication via mouse dynamics, 2024
Rushit Dave, Marcho Handoko, Ali Rashid, and Cole Schoenbauer. From clicks to security: Investigating continuous authentication via mouse dynamics, 2024. URL https://arxiv. org/abs/2403.03828
-
[9]
Toward robust multi-tab website fingerprinting.IEEE Transactions on Networking, 34: 3656–3671, 2026
Xinhao Deng, Xiyuan Zhao, Qilei Yin, Zhuotao Liu, Qi Li, Mingwei Xu, Ke Xu, and Jianping Wu. Toward robust multi-tab website fingerprinting.IEEE Transactions on Networking, 34: 3656–3671, 2026. doi: 10.1109/TON.2026.3666721
-
[10]
Keyrecs: A keystroke dynamics and typing pattern recognition dataset.Data in Brief, 50:109509, 2023
Tiago Dias, João Vitorino, Eva Maia, Orlando Sousa, and Isabel Praça. Keyrecs: A keystroke dynamics and typing pattern recognition dataset.Data in Brief, 50:109509, 2023. ISSN 2352-
work page 2023
-
[11]
doi: https://doi.org/10.1016/j.dib.2023.109509. URL https://www.sciencedirect. com/science/article/pii/S2352340923006091
-
[12]
Pedro Gomes do Nascimento, Pidge Witiak, Tucker MacCallum, Zachary Winterfeldt, and Rushit Dave. Your device may know you better than you know yourself – continuous authenti- cation on novel dataset using machine learning, 2024. URL https://arxiv.org/abs/2403. 03832
work page 2024
-
[13]
Amucs: Affective multimodal counter-strike video game dataset.Scientific Data, 12(1):1325, Jul 2025
Marios Fanourakis and Guillaume Chanel. Amucs: Affective multimodal counter-strike video game dataset.Scientific Data, 12(1):1325, Jul 2025. ISSN 2052-4463. doi: 10.1038/s41597-025-05596-3. URLhttps://doi.org/10.1038/s41597-025-05596-3
- [14]
-
[15]
Martin, Marta Beltrán, Alberto Fernández-Isabel, and Isaac Martín de Diego
Alejandro G. Martin, Marta Beltrán, Alberto Fernández-Isabel, and Isaac Martín de Diego. Keystroke and mouse dynamics for ueba dataset, 2020. URL https://doi.org/10.17632/ f78jsh6zp9.2
work page 2020
-
[16]
Nahuel González, Enrique P. Calot, Jorge S. Ierache, and Waldo Hasperué. Towards liveness detection in keystroke dynamics: Revealing synthetic forgeries.Systems and Soft Computing, 4:200037, 2022. ISSN 2772-9419. doi: https://doi.org/10.1016/j.sasc.2022.200037. URL https://www.sciencedirect.com/science/article/pii/S2772941922000047
-
[17]
Bapm: Block attention profiling model for multi-tab website fingerprinting attacks on tor
Zhong Guan, Gang Xiong, Gaopeng Gou, Zhen Li, Mingxin Cui, and Chang Liu. Bapm: Block attention profiling model for multi-tab website fingerprinting attacks on tor. InProceedings of the 37th Annual Computer Security Applications Conference, ACSAC ’21, page 248–259, New York, NY , USA, 2021. Association for Computing Machinery. ISBN 9781450385794. doi: 10....
-
[18]
Transformer-based model for multi- tab website fingerprinting attack
Zhaoxin Jin, Tianbo Lu, Shuang Luo, and Jiaze Shang. Transformer-based model for multi- tab website fingerprinting attack. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS ’23, page 1050–1064, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9798400700507. doi: 10.1145/3576915.3623107. URLhttp...
-
[19]
Kevin S. Killourhy and Roy A. Maxion. Comparing anomaly-detection algorithms for keystroke dynamics. In2009 IEEE/IFIP International Conference on Dependable Systems & Networks, pages 125–134, 2009. doi: 10.1109/DSN.2009.5270346
-
[20]
Learning models of individual behavior in chess
Reid McIlroy-Young, Russell Wang, Siddhartha Sen, Jon Kleinberg, and Ashton Anderson. Learning models of individual behavior in chess. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 1253–1263. ACM, Au- gust 2022. doi: 10.1145/3534678.3539367. URL http://dx.doi.org/10.1145/3534678. 3539367
-
[21]
Charles Murphy and Charles C. Tappert. Clarkson university keystroke dataset II. Clarkson University CITeR, 2017. URL https://citer.clarkson.edu/ clarkson-university-keystroke-dataset-ii/
work page 2017
-
[22]
O’Brien, Louis Rosenberg, and Dawn Song
Vivek Nair, Wenbo Guo, Justus Mattern, Rui Wang, James F. O’Brien, Louis Rosenberg, and Dawn Song. Unique identification of 50,000+ virtual reality users from head & hand motion data, 2023. URLhttps://arxiv.org/abs/2302.08927
-
[23]
Nonso Nnamoko, Joe Barrowclough, Mark Liptrott, and Ioannis Korkontzelos. A behaviour biometrics dataset for user identification and authentication.Data in Brief, 45:108728, 11 2022. doi: 10.1016/j.dib.2022.108728
-
[24]
Michail D. Papamichail, Kyriakos C. Chatzidimitriou, Thomas Karanikiotis, Napoleon- Christos I. Oikonomou, Andreas L. Symeonidis, and Sashi K. Saripalle. Brainrun: A be- havioral biometrics dataset towards continuous implicit authentication.Data, 4(2), 2019. ISSN 2306-5729. doi: 10.3390/data4020060. URL https://www.mdpi.com/2306-5729/4/2/60
-
[25]
PureSkill.gg competitive CS:GO gameplay dataset
PureSkill.gg. PureSkill.gg competitive CS:GO gameplay dataset. AWS Data Exchange, 2021. URLhttps://docs.pureskill.gg/datascience/
work page 2021
-
[26]
Christian Rack, Tamara Fernando, Murat Yalcin, Andreas Hotho, and Marc Erich Latoschik. Who is alyx? a new behavioral biometric dataset for user identification in xr.Frontiers in Virtual Reality, 4, November 2023. ISSN 2673-4192. doi: 10.3389/frvir.2023.1272234. URL http://dx.doi.org/10.3389/frvir.2023.1272234
- [27]
-
[28]
Continuous user authentication using mouse dynamics, machine learning, and minecraft
Nyle Siddiqui, Rushit Dave, and Naeem Seliya. Continuous user authentication using mouse dynamics, machine learning, and minecraft. In2021 International Conference on Electrical, Computer and Energy Technologies (ICECET), pages 1–6, 2021. doi: 10.1109/ICECET52533. 2021.9698532. 12
-
[29]
Profiling in games: Understanding behavior from telemetry
Rafet Sifa, Anders Drachen, and Christian Bauckhage. Profiling in games: Understanding behavior from telemetry. 2018. URL https://api.semanticscholar.org/CorpusID: 58984151
work page 2018
-
[30]
Beacon: A multimodal dataset for learning behavioral fingerprints from gameplay data, 2026
Ishpuneet Singh, Gursmeep Kaur, Uday Pratap Singh Atwal, Guramrit Singh, Gurjot Singh, and Maninder Singh. Beacon: A multimodal dataset for learning behavioral fingerprints from gameplay data, 2026. URL https://huggingface.co/datasets/beacon-gui/ BEACON-Dataset
work page 2026
-
[31]
Beacon-logger: A behavioral authentication and network traffic logger for game environments, 2026
Ishpuneet Singh, Guramrit Singh, Gursmeep Kaur, Uday Pratap Singh Atwal, Gurjot Singh, and Maninder Singh. Beacon-logger: A behavioral authentication and network traffic logger for game environments, 2026. URLhttps://zenodo.org/records/20062628
-
[32]
Zdeˇnka Sitová, Jaroslav Šedˇenka, Qing Yang, Ge Peng, Gang Zhou, Paolo Gasti, and Kiran S. Balagani. Hmog: New behavioral biometric features for continuous authentication of smart- phone users.IEEE Transactions on Information Forensics and Security, 11(5):877–892, 2016. doi: 10.1109/TIFS.2015.2506542
-
[33]
Pin Shen Teh, Ning Zhang, Andrew Beng Jin Teoh, and Ke Chen. Tdas: A touch dynamics based multi-factor authentication solution for mobile devices.International Journal of Pervasive Computing and Communications, 12(1):127–153, 2016. doi: 10.1108/IJPCC-01-2016-0005
-
[34]
Strengthen user authentication on mobile devices by using user’s touch dynamics pattern
Pin Shen Teh, Ning Zhang, Syh-Yuan Tan, Qi Shi, Wee How Khoh, and Raheel Nawaz. Strengthen user authentication on mobile devices by using user’s touch dynamics pattern. Journal of Ambient Intelligence and Humanized Computing, 11(10):4019–4039, 2020. doi: 10.1007/s12652-019-01654-y
-
[35]
Obaidat, Youssef Nakkabi, and Iris Lai
Issa Traore, Isaac Woungang, Mohammad S. Obaidat, Youssef Nakkabi, and Iris Lai. Combining mouse and keystroke dynamics biometrics for risk-based authentication in web environments. In2012 Fourth International Conference on Digital Home, pages 138–145, 2012. doi: 10.1109/ ICDH.2012.59
work page 2012
-
[36]
Using biometric data to measure and predict emotional engagement of video games
Janette Vazquez, Samir Abdelrahman, Chris Wasden, Stuart Jardine, Colby Judd, Mathew Davis, and Julio Facelli. Using biometric data to measure and predict emotional engagement of video games. 03 2022. doi: 10.1101/2022.02.28.482337
-
[37]
In: 2020 IEEE Symposium on Security and Privacy (SP)
Meiqi Wang, Yanzeng Li, Xuebin Wang, Tingwen Liu, Jinqiao Shi, and Muqian Chen. 2ch-tcn: A website fingerprinting attack over tor using 2-channel temporal convolutional networks. In 2020 IEEE Symposium on Computers and Communications (ISCC), pages 1–7, 2020. doi: 10.1109/ISCC50000.2020.9219717
-
[38]
Esta: An esports trajectory and action dataset, 2022
Peter Xenopoulos and Claudio Silva. Esta: An esports trajectory and action dataset, 2022. URL https://arxiv.org/abs/2209.09861
-
[39]
signal" is significantly stronger than the background
Franziska Zimmer, Mhd Irvan, Maharage Nisansala Sevwandi Perera, Ryosuke Kobayashi, and Rie Shigetomi Yamaguchi. Fair play and identity: In-game behavioral biometrics for player identification in competitive online games. In2025 IEEE Conference on Games (CoG), pages 1–8, 2025. doi: 10.1109/CoG64752.2025.11114281. 13 A Ethics Statement Consent, oversight, ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.