arxiv: 2605.02596 · v1 · submitted 2026-05-04 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

HARMES: A Multi-Modal Dataset for Wearable Human Activity Recognition with Motion, Environmental Sensing and Sound

Robin Burchard , Pascal-Andr\'e Br\"uckner , Marius Bock , Juergen Gall , Kristof Van Laerhoven

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:29 UTC · model grok-4.3

classification 💻 cs.LG

keywords human activity recognitionwearable sensorsmulti-modal datasetIMUaudio sensingenvironmental sensorsactivities of daily livingwrist-worn devices

0 comments

The pith

HARMES supplies the first large wrist dataset that records motion, air conditions, and sound together during real household activities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HARMES as a multi-modal wearable dataset collected from the wrist with three sensor types: an inertial measurement unit for motion, atmospheric sensors for humidity, temperature and pressure, and a microphone for audio. Twenty participants performed fifteen activities of daily living in their own homes, producing over eighty hours of recordings and roughly three hours of labeled data per person. The work evaluates how these modalities interact through cross-subject tests and an ablation study that isolates each sensor's contribution. Results indicate that environmental and audio signals add value mainly for activities whose motion patterns overlap. The dataset is released publicly with loading and training code to let others build on the combined signals.

Core claim

HARMES is the first dataset to combine wrist IMU, atmospheric environmental sensors, and audio for human activity recognition. It contains over eighty hours of data from twenty participants across fifteen ADL classes in real home settings and is nearly six times larger than the prior largest wrist-inertial-acoustic collection. Benchmarks demonstrate that modality contributions vary by activity and that adding environmental and sound data improves recognition when motion alone is ambiguous.

What carries the argument

The HARMES dataset, which synchronously records wrist inertial motion, atmospheric conditions, and audio during natural household tasks to support multi-modal activity recognition.

If this is right

Environmental and audio modalities supply complementary value for activities that produce similar motion signals.
Cross-subject generalization can be measured on a scale six times larger than earlier wrist-inertial-acoustic datasets.
Ablation results indicate that fusion strategies should be activity-dependent rather than uniform across all classes.
The real-home collection captures natural variability in execution and sensor orientation that lab protocols often miss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The three-modality combination may allow detection of context shifts, such as the same arm motion occurring while standing versus seated.
The dataset size supports experiments that test whether environmental pressure changes can flag vertical movements like stair climbing without additional sensors.
Public release with example code lowers the barrier for groups to train and compare fusion models on the same wrist data.

Load-bearing premise

The activity labels provided by twenty participants performing tasks in their own homes are accurate and the recorded signals are representative without major placement or labeling errors.

What would settle it

An independent review that compares the provided labels against synchronized video or observer notes on a held-out subset and finds substantial mismatches would show the labels cannot be trusted for training or evaluation.

Figures

Figures reproduced from arXiv: 2605.02596 by Juergen Gall, Kristof Van Laerhoven, Marius Bock, Pascal-Andr\'e Br\"uckner, Robin Burchard.

**Figure 1.** Figure 1: Timeline over all labeled activities, performed by all participants. The three recordings per participant are horizontally view at source ↗

**Figure 2.** Figure 2: Overview over activity durations in our dataset. Left: Boxplots displaying the distribution of the duration of individual view at source ↗

**Figure 3.** Figure 3: Exemplary multi-modal data plot, showcasing two instances of view at source ↗

**Figure 4.** Figure 4: Lineplots per activity class, showing the mean response (solid line) and view at source ↗

**Figure 5.** Figure 5: Our multi-modal HAR model. It consists of four branches: The first branch is the audio branch, taking as input view at source ↗

**Figure 6.** Figure 6: Confusion matrices for the activity classification tasks (5 s windows), for the combination of IMU sensors and audio view at source ↗

**Figure 7.** Figure 7: Barplot showing per-participant macro F1-score results of the LOPO cross-validation (5 s windows), for IMU sensor view at source ↗

**Figure 8.** Figure 8: Recording duration per participant, split into the combined duration of the three supervised sessions (labeled) and view at source ↗

**Figure 9.** Figure 9: Confusion matrices for models trained with humidity as the only modality. view at source ↗

read the original abstract

With each sensing modality exhibiting inherent strengths and limitations, multi-modal approaches for wearable Human Activity Recognition (HAR) are becoming increasingly relevant -- particularly for recognizing Activities of Daily Living (ADLs), where individual modalities often produce ambiguous signals for similar or complex activities. This work introduces HARMES, a multi-modal wearable dataset combining three wrist-recorded modalities: motion sensing via an Inertial Measurement Unit (IMU), atmospheric environmental sensors (humidity, temperature, and pressure), and audio. Collected from 20 participants performing household activities in their own homes, HARMES totals over 80 hours of recorded data, with approximately three hours of labeled activity data per participant across 15 ADL classes. To the best of our knowledge, HARMES is the first dataset to combine this particular sensor trio, and it is nearly six times larger than the previously largest wrist-inertial-acoustic HAR dataset. In an extensive benchmark, we evaluate cross-subject generalization and conduct an ablation study revealing that modality contributions are activity-dependent and can provide complementary value, particularly for activities that are ambiguous from motion data alone. HARMES is freely available at Zenodo, alongside example code for loading the dataset and training models on GitHub.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HARMES releases a larger wrist dataset with IMU plus environmental and audio sensors, but its benchmarks rest on unshown labeling and placement details.

read the letter

HARMES brings a new wrist dataset that pairs motion sensing with environmental readings and sound, collected from twenty people in their own homes over fifteen daily activities. At over eighty hours total and three hours labeled per person, it's substantially larger than earlier wrist inertial-acoustic sets, and the authors say it's the first with this exact three-modality mix. The paper does a few things right. It makes the data public on Zenodo with example code on GitHub, which lowers the barrier for others to use it. The benchmarks test cross-subject generalization and run an ablation that points to activity-dependent value from the extra modalities, especially where motion alone is unclear. That matches the motivation for multi-modal HAR in real settings. The soft spots are around data quality and collection details. The abstract and available text give no clear account of the labeling process—whether participants self-reported, if video was used for verification, or how ambiguous activities were handled. Home environments add variability in wrist placement, orientation, and background conditions that could affect both IMU and atmospheric signals, yet no metrics on placement consistency or inter-subject variance appear in the reported results. The ablation study treats the signals as reliable without those checks, so the complementarity claims rest on unverified assumptions. This paper is mainly for researchers building or benchmarking wearable systems for activities of daily living, particularly those interested in combining inertial, environmental, and audio data. It supplies a concrete resource that could support work in health monitoring or smart environments. I would send it to peer review. The dataset release is the core value, and a referee can push for the missing protocol information and any additional validation steps. If those hold up, it becomes a solid addition to the public HAR resources.

Referee Report

3 major / 1 minor

Summary. The paper introduces HARMES, a multi-modal wearable dataset for Human Activity Recognition combining wrist IMU motion sensing, atmospheric environmental sensors (humidity, temperature, pressure), and audio. Collected from 20 participants performing 15 ADLs in their own homes, it totals over 80 hours with ~3 hours labeled per participant. The authors claim it is the first dataset with this sensor trio and nearly six times larger than the prior largest wrist-inertial-acoustic HAR dataset; they report cross-subject generalization benchmarks and a modality ablation study indicating activity-dependent complementarity, particularly for motion-ambiguous activities. The dataset is released on Zenodo with loading and training code on GitHub.

Significance. If the ground-truth labels prove accurate and home recordings representative, HARMES would provide a valuable, large-scale resource for multi-modal HAR research on real-world ADLs, where single modalities are often insufficient. The open availability, size, and empirical benchmarks on modality fusion are strengths that could support follow-on work in sensor complementarity.

major comments (3)

[Data Collection] Data Collection section: the labeling protocol for the 15 ADL classes (self-report, video synchronization, post-hoc review, or otherwise) is not described, which is load-bearing for assessing label noise in motion-ambiguous activities and thus for the validity of the cross-subject benchmarks and ablation results.
[Methods] Methods/Experimental Setup: details on synchronization across IMU, environmental, and audio streams are absent, as are any metrics quantifying inter-participant variability in wrist sensor placement, orientation, or home ambient conditions; without these the ablation study's assumption of representative signals cannot be evaluated.
[Results] Results/Evaluation: the reported cross-subject and ablation benchmarks lack statistical significance tests, confidence intervals, or per-activity variance measures, weakening the claim that modalities provide complementary value.

minor comments (1)

[Abstract] Abstract: the 'nearly six times larger' claim would benefit from an explicit citation to the prior dataset size for immediate verifiability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We have carefully considered each major comment and provide point-by-point responses below. Where the original manuscript was incomplete, we will revise the text to incorporate the requested details and analyses.

read point-by-point responses

Referee: [Data Collection] Data Collection section: the labeling protocol for the 15 ADL classes (self-report, video synchronization, post-hoc review, or otherwise) is not described, which is load-bearing for assessing label noise in motion-ambiguous activities and thus for the validity of the cross-subject benchmarks and ablation results.

Authors: We agree that a clear description of the labeling protocol is essential for evaluating label quality and the reliability of the reported benchmarks. The original manuscript omitted these procedural details due to length constraints. In the revised version we will add a dedicated subsection under Data Collection that specifies the full protocol: participants used a custom smartphone app to self-report activity start and end times in real time; these timestamps were synchronized to the sensor streams via a shared clock; continuous video from a head-mounted camera provided independent verification; and two independent annotators performed post-hoc review to resolve any discrepancies or ambiguous segments. This addition will allow readers to assess potential label noise, particularly for motion-ambiguous classes. revision: yes
Referee: [Methods] Methods/Experimental Setup: details on synchronization across IMU, environmental, and audio streams are absent, as are any metrics quantifying inter-participant variability in wrist sensor placement, orientation, or home ambient conditions; without these the ablation study's assumption of representative signals cannot be evaluated.

Authors: We acknowledge that synchronization and variability information were not reported. In the revised Methods section we will describe the synchronization approach in detail: all streams were aligned using a combination of hardware-triggered timestamps at recording start, NTP-based clock synchronization across devices, and post-hoc cross-correlation refinement on the audio and IMU channels to correct for any residual drift. For inter-participant variability we will add quantitative metrics derived from the collected data, including average and range of wrist placement offsets (estimated from participant photographs and self-reported band tightness), orientation deviations (computed from gravity vector statistics during quiet periods), and summary statistics on home ambient conditions (temperature, humidity, and pressure ranges across households). These additions will support evaluation of the ablation study's assumptions. revision: yes
Referee: [Results] Results/Evaluation: the reported cross-subject and ablation benchmarks lack statistical significance tests, confidence intervals, or per-activity variance measures, weakening the claim that modalities provide complementary value.

Authors: We agree that the absence of statistical tests and uncertainty measures limits the strength of the complementarity claims. In the revised Results section we will augment all cross-subject and ablation tables/figures with: (i) paired statistical significance tests (McNemar's test for accuracy differences between modality combinations and Wilcoxon signed-rank tests for F1-score comparisons), (ii) 95% confidence intervals obtained via participant-level bootstrapping, and (iii) per-activity standard deviations and variance across the 20 subjects. These additions will provide a more rigorous basis for asserting activity-dependent modality complementarity. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset collection with direct benchmarks

full rationale

The paper is a dataset release describing collection of 80+ hours of wrist IMU + environmental + audio recordings from 20 participants performing 15 ADLs in home settings, followed by standard cross-subject generalization and modality-ablation benchmarks. No equations, fitted parameters, predictions derived from inputs, or load-bearing self-citations appear in the abstract or described content. Novelty and size claims are external comparisons to prior datasets; benchmarks are direct evaluations on the collected data. The work is self-contained with no derivation chain that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests primarily on the existence and properties of the collected dataset rather than new mathematical derivations; standard domain assumptions about sensor data validity and labeling accuracy are invoked without independent verification in the abstract.

axioms (1)

domain assumption Activity labels provided by participants or annotators accurately reflect the performed household tasks.
The 15 ADL classes and per-participant labeled hours depend on this assumption for all downstream benchmarks.

pith-pipeline@v0.9.0 · 5535 in / 1160 out tokens · 48507 ms · 2026-05-08T18:29:52.789903+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation (J = ½(x+x⁻¹)−1) washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We train the models both on 5 s and 10 s long, non-overlapping majority-vote-labeled windows, using the ADAM optimizer and weighted cross-entropy as the loss function

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 55 canonical work pages

[1]

DeMeo, Haarika A

Riku Arakawa, Hiromu Yakura, Vimal Mollyn, Suzanne Nie, Emma Russell, Dustin P. DeMeo, Haarika A. Reddy, Alexander K. Maytin, Bryan T. Carroll, Jill Fain Lehman, and Mayank Goel. 2022. PrISM-Tracker: A Framework for Multimodal Procedure Tracking Using Wearable Sensors and State Transition Information with User-Driven Handling of Errors and Uncertainty.Pro...

work page doi:10.1145/3569504 2022
[2]

Arnab Barna, Abdul Kadar Muhammad Masum, Mohammad Emdad Hossain, Erfanul Hoque Bahadur, and Mohammed Shamsul Alam
[3]

In2019 International Conference on Electrical, Computer and Communication Engineering (ECCE)

A Study on Human Activity Recognition Using Gyroscope, Accelerometer, Temperature and Humidity Data. In2019 International Conference on Electrical, Computer and Communication Engineering (ECCE). IEEE, Cox’sBazar, Bangladesh, 1–6. doi:10.1109/ECACE. 2019.8679226

work page doi:10.1109/ecace 2019
[4]

Vincent Becker, Linus Fessler, and Gábor Sörös. 2019. GestEar: Combining Audio and Motion Sensing for Gesture Recognition on Smartwatches. InProceedings of the 23rd International Symposium on Wearable Computers. ACM, London United Kingdom, 10–19. doi:10.1145/3341163.3347735

work page doi:10.1145/3341163.3347735 2019
[5]

Sarnab Bhattacharya, Rebecca Adaimi, and Edison Thomaz. 2022. Leveraging Sound and Wrist Motion to Detect Activities of Daily Living with Commodity Smartwatches.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.6, 2 (July 2022), 42:1–42:28. doi:10.1145/3534582

work page doi:10.1145/3534582 2022
[6]

Marius Bock, Alexander Hölzemann, Michael Moeller, and Kristof Van Laerhoven. 2021. Improving Deep Learning for HAR with Shallow LSTMs. InProceedings of the 2021 ACM International Symposium on Wearable Computers (ISWC ’21). Association for Computing Machinery, New York, NY, USA, 7–12. doi:10.1145/3460421.3480419

work page doi:10.1145/3460421.3480419 2021
[7]

Marius Bock, Hilde Kuehne, Kristof Van Laerhoven, and Michael Moeller. 2024. WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity Recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies8, 4 (Nov. 2024), 1–21. doi:10.1145/3699776

work page doi:10.1145/3699776 2024
[8]

Andreas Bulling, Ulf Blanke, and Bernt Schiele. 2014. A Tutorial on Human Activity Recognition Using Body-Worn Inertial Sensors. ACM Comput. Surv.46, 3 (Jan. 2014), 33:1–33:33. doi:10.1145/2499621

work page doi:10.1145/2499621 2014
[9]

Robin Burchard, Hurriat Ali, and Kristof Van Laerhoven. 2026. Improved Strategies for Multi-modal Atmospheric Sensing to Augment Wearable IMU-Based Hand Washing Detection. InSensor-Based Activity Recognition and Artificial Intelligence, Özlem Durmaz Incel, Jingwen Qin, Gerald Bieber, and Arjan Kuijper (Eds.). Vol. 16292. Springer Nature Switzerland, Cham,...

work page doi:10.1007/978-3-032- 2026
[10]

Robin Burchard, Pascal-André Brückner, Marius Bock, and Kristof Van Laerhoven. 2026. HARMES: A Multi-Modal Dataset for Wearable Human Activity Recognition with Motion, Environmental Sensing and Sound. doi:10.5281/zenodo.19425719 20•Burchard et al

work page doi:10.5281/zenodo.19425719 2026
[11]

Robin Burchard and Kristof Van Laerhoven. 2025. Multi-Modal Atmospheric Sensing to Augment Wearable IMU-Based Hand Washing Detection. InSensor-Based Activity Recognition and Artificial Intelligence, Orhan Konak, Bert Arnrich, Gerald Bieber, Arjan Kuijper, and Sebastian Fudickar (Eds.). Springer Nature Switzerland, Cham, 55–68. doi:10.1007/978-3-031-80856-2_4

work page doi:10.1007/978-3-031-80856-2_4 2025
[12]

Yetong Cao, Fan Li, Huijie Chen, Xiaochen Liu, Song Yang, and Yu Wang. 2023. Leveraging Wearables for Assisting the Elderly With Dementia in Handwashing.IEEE Transactions on Mobile Computing22, 11 (Nov. 2023), 6554–6570. doi:10.1109/TMC.2022.3193615

work page doi:10.1109/tmc.2022.3193615 2023
[13]

Stefania Cristina, Vladimir Despotovic, Rodrigo Pérez-Rodríguez, and Slavisa Aleksic. 2024. Audio- and Video-Based Human Activity Recognition Systems in Healthcare.IEEE Access12 (2024), 8230–8245. doi:10.1109/ACCESS.2024.3353138

work page doi:10.1109/access.2024.3353138 2024
[14]

Federico Cruciani, Anastasios Vafeiadis, Chris Nugent, Ian Cleland, Paul McCullagh, Konstantinos Votis, Dimitrios Giakoumis, Dimitrios Tzovaras, Liming Chen, and Raouf Hamzaoui. 2020. Feature Learning for Human Activity Recognition Using Convolutional Neural Networks: A Case Study for Inertial Measurement Unit and Audio Data.CCF Transactions on Pervasive ...

work page doi:10.1007/s42486-020-00026-2 2020
[15]

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. 2022. Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100.International Journal of Computer Vision130, 1 (Jan. 2022), 33–55. doi:10.1007/s11263-...

work page doi:10.1007/s11263-021-01531-2 2022
[16]

Das, and Sriram Chellappan

Debraj De, Pratool Bharti, Sajal K. Das, and Sriram Chellappan. 2015. Multimodal Wearable Sensing for Fine-Grained Activity Recognition in Healthcare.IEEE Internet Computing19, 5 (Sept. 2015), 26–35. doi:10.1109/MIC.2015.72

work page doi:10.1109/mic.2015.72 2015
[17]

De la Torre, Fernando, Jessica Hodgins, Adam Bargteil, Xavier Martin, Macey, Justin, Alex Collado, and Pep Beltran. 2008. Guide to the Carnegie Mellon University Multimodal Activity (CMU-MMAC) Database. (2008)

2008
[18]

Joseph DelPreto, Chao Liu, Yiyue Luo, Michael Foshey, Yunzhu Li, Antonio Torralba, Wojciech Matusik, and Daniela Rus. 2022. ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment

2022
[19]

Alexander Diete, Timo Sztyler, and Heiner Stuckenschmidt. 2019. Vision and Acceleration Modalities: Partners for Recognizing Complex Activities. In2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). 101–106. doi:10.1109/PERCOMW.2019.8730690

work page doi:10.1109/percomw.2019.8730690 2019
[20]

Galván-Tejada, and Ramon Brena

Enrique Garcia-Ceja, Carlos E. Galván-Tejada, and Ramon Brena. 2018. Multi-View Stacking for Activity Recognition with Sound and Accelerometer Data.Information Fusion40 (March 2018), 45–56. doi:10.1016/j.inffus.2017.06.004

work page doi:10.1016/j.inffus.2017.06.004 2018
[21]

Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Carti...

work page doi:10.1109/cvpr52688.2022.01842 2022
[22]

Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Moha...
[23]

Jayasumana, S

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 19383–19400. doi:10.1109/CVPR52733.2024.01834

work page doi:10.1109/cvpr52733.2024.01834 2024
[24]

Haodong Guo, Ling Chen, Liangying Peng, and Gencai Chen. 2016. Wearable Sensor Based Multimodal Human Activity Recognition Exploiting the Diversity of Classifier Ensemble. InProceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’16). Association for Computing Machinery, New York, NY, USA, 1112–1123. doi:...

work page doi:10.1145/2971648.2971708 2016
[25]

Tomoki Hayashi, Masafumi Nishida, Norihide Kitaoka, and Kazuya Takeda. 2015. Daily Activity Recognition Based on DNN Using Environmental Sound and Acceleration Signals. In2015 23rd European Signal Processing Conference (EUSIPCO). 2306–2310. doi:10.1109/ EUSIPCO.2015.7362796

work page arXiv 2015
[26]

Alexander Hoelzemann, Julia Lee Romero, Marius Bock, Kristof Van Laerhoven, and Qin Lv. 2023. Hang-Time HAR: A Benchmark Dataset for Basketball Activity Recognition Using Wrist-Worn Inertial Sensors.Sensors23, 13 (June 2023), 5879. doi:10.3390/s23135879

work page doi:10.3390/s23135879 2023
[27]

Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, and Andrew Zisserman. 2025. Epic-Sounds: A Large-scale Dataset of Actions That Sound. arXiv:2302.00646 [cs] doi:10.48550/arXiv.2302.00646

work page doi:10.48550/arxiv.2302.00646 2025
[28]

Yasha Iravantchi, Karan Ahuja, Mayank Goel, Chris Harrison, and Alanson Sample. 2021. PrivacyMic: Utilizing Inaudible Frequencies for Privacy Preserving Daily Activity Recognition. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3411764.3445169

work page doi:10.1145/3411764.3445169 2021
[29]

Minhyuk Jung and Seokho Chi. 2020. Human Activity Classification Based on Sound Recognition and Residual Convolutional Neural Network.Automation in Construction114 (June 2020), 103177. doi:10.1016/j.autcon.2020.103177

work page doi:10.1016/j.autcon.2020.103177 2020
[30]

Hrishikesh Balkrishna Karande, Ravikiran Arasur Thippeswamy Shivalingappa, Abdelhafid Nassim Yaici, Iman Haghbin, Niravkumar Bavadiya, Robin Burchard, and Kristof Van Laerhoven. 2025. Raising the Bar(Ometer): Identifying a User’s Stair and Lift Usage Through Wearable Sensor Data Analysis. InSensor-Based Activity Recognition and Artificial Intelligence, Or...

work page doi:10.1007/978-3-031-80856-2_14 2025
[31]

Kristina Kirsten, Robin Burchard, Olesya Bauer, Marcel Miché, Philipp Scholl, Karina Wahl, Roselind Lieb, Kristof Van Laerhoven, and Bert Arnrich. 2025. The Supervised Learning Dilemma: Lessons Learned from a Study in-the-Wild. InSensor-Based Activity Recognition and Artificial Intelligence, Orhan Konak, Bert Arnrich, Gerald Bieber, Arjan Kuijper, and Seb...

work page doi:10.1007/978-3-031-80856-2_12 2025
[32]

Kristina Kirsten, Robin Burchard, Amatya Mackintosh, Marcel Miché, Dorothée Bentz, Klaus Bader, Jannis Behr, Roselind Lieb, Kristof Van Laerhoven, Bert Arnrich, and Karina Wahl. 2025. Exploring Wearable-Based Detection of Compulsive Handwashing in a Non- Controlled Setting: A Case Study. InProceedings of the 2025 ACM International Symposium on Wearable Co...

work page doi:10.1145/3715071.3750434 2025
[33]

Dawei Liang, Guihong Li, Rebecca Adaimi, Radu Marculescu, and Edison Thomaz. 2022. AudioIMU: Enhancing Inertial Sensing-Based Activity Recognition with Acoustic Models. InProceedings of the 2022 ACM International Symposium on Wearable Computers. ACM, Cambridge United Kingdom, 44–48. doi:10.1145/3544794.3558471

work page doi:10.1145/3544794.3558471 2022
[34]

Dawei Liang, Wenting Song, and Edison Thomaz. 2020. Characterizing the Effect of Audio Degradation on Privacy Perception And Inference Performance in Audio-Based Human Activity Recognition. In22nd International Conference on Human-Computer Interaction with Mobile Devices and Services. ACM, Oldenburg Germany, 1–10. doi:10.1145/3379503.3403551

work page doi:10.1145/3379503.3403551 2020
[35]

Hui Liu, Yale Hartmann, and Tanja Schultz. 2021. CSL-SHARE: A Multimodal Wearable Sensor-Based Human Activity Dataset.Frontiers in Computer Science3, 2624-9898 (2021). doi:10.3389/fcomp.2021.759136

work page doi:10.3389/fcomp.2021.759136 2021
[36]

Roya Lotfi, George Tzanetakis, Rasit Eskicioglu, and Pourang Irani. 2020. A Comparison between Audio and IMU Data to Detect Chewing Events Based on an Earable Device. InProceedings of the 11th Augmented Human International Conference (AH ’20). Association for Computing Machinery, New York, NY, USA, 1–8. doi:10.1145/3396339.3396362

work page doi:10.1145/3396339.3396362 2020
[37]

Adria Mallol-Ragolta, Anastasia Semertzidou, Maria Pateraki, and Björn Schuller. 2021. harAGE: A Novel Multimodal Smartwatch-based Dataset for Human Activity Recognition. In2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021). 01–07. doi:10.1109/FG52635.2021.9666947

work page doi:10.1109/fg52635.2021.9666947 2021
[38]

Lourdes Martínez-Villaseñor, Hiram Ponce, Jorge Brieva, Ernesto Moya-Albor, José Núñez-Martínez, and Carlos Peñafort-Asturiano
[39]

Mart´ ınez-Villase˜ nor, H

UP-Fall Detection Dataset: A Multimodal Approach.Sensors19, 9 (April 2019), 1988. doi:10.3390/s19091988

work page doi:10.3390/s19091988 2019
[40]

Brian McFee, Matt McVicar, Daniel Faronbi, Iran Roman, Matan Gover, Stefan Balke, Scott Seyfarth, Ayoub Malek, Colin Raffel, Vincent Lostanlen, Benjamin van Niekirk, Dana Lee, Frank Cwitkowitz, Frank Zalkow, Oriol Nieto, Dan Ellis, Jack Mason, Kyungyun Lee, Bea Steers, Emily Halvachs, Carl Thomé, Fabian Robert-Stöter, Rachel Bittner, Ziyao Wei, Adam Weiss...

work page doi:10.5281/zenodo.15006942 2025
[41]

Minnen, T

D. Minnen, T. Starner, J.A. Ward, P. Lukowicz, and G. Troster. 2005. Recognizing and Discovering Human Actions from On-Body Sensor Data. In2005 IEEE International Conference on Multimedia and Expo. 1545–1548. doi:10.1109/ICME.2005.1521728 22•Burchard et al

work page doi:10.1109/icme.2005.1521728 2005
[42]

Vimal Mollyn, Karan Ahuja, Dhruv Verma, Chris Harrison, and Mayank Goel. 2022. SAMoSA: Sensing Activities with Motion and Subsampled Audio.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies6, 3 (Sept. 2022), 1–19. doi:10.1145/3550284

work page doi:10.1145/3550284 2022
[43]

Stankovic

Md Abu Sayeed Mondol and John A. Stankovic. 2015. Harmony: A Hand Wash Monitoring and Reminder System Using Smart Watches. InProceedings of the 12th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services. ACM, Coimbra, Portugal. doi:10.4108/eai.22-7-2015.2260042

work page doi:10.4108/eai.22-7-2015.2260042 2015
[44]

Sebastian Münzner, Philip Schmidt, Attila Reiss, Michael Hanselmann, Rainer Stiefelhagen, and Robert Dürichen. 2017. CNN-based Sensor Fusion Techniques for Multimodal Human Activity Recognition. InProceedings of the 2017 ACM International Symposium on Wearable Computers (ISWC ’17). Association for Computing Machinery, New York, NY, USA, 158–165. doi:10.11...

work page doi:10.1145/3123021.3123046 2017
[45]

Natalia Neverova, Christian Wolf, Griffin Lacey, Lex Fridman, Deepak Chandra, Brandon Barbello, and Graham Taylor. 2016. Learning Human Identity from Motion Patterns. arXiv:1511.03908 [cs] doi:10.48550/arXiv.1511.03908

work page doi:10.48550/arxiv.1511.03908 2016
[46]

Jianyuan Ni, Hao Tang, Syed Tousiful Haque, Yan Yan, and Anne H. H. Ngu. 2024. A Survey on Multimodal Wearable Sensor-based Human Action Recognition. arXiv:2404.15349 [eess] doi:10.48550/arXiv.2404.15349

work page doi:10.48550/arxiv.2404.15349 2024
[47]

Wolf, Lennart Lehmann, Katja Langer, Christoph Lippert, and Hanna Drimalla

Matthias Norden, Oliver T. Wolf, Lennart Lehmann, Katja Langer, Christoph Lippert, and Hanna Drimalla. 2022. Automatic Detection of Subjective, Annotated and Physiological Stress Responses from Video Data. In2022 10th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, Nara, Japan, 1–8. doi:10.1109/ACII55700.2022.9953894

work page doi:10.1109/acii55700.2022.9953894 2022
[48]

Ferda Ofli, Rizwan Chaudhry, Gregorij Kurillo, René Vidal, and Ruzena Bajcsy. 2013. Berkeley MHAD: A Comprehensive Multimodal Human Action Database. In2013 IEEE Workshop on Applications of Computer Vision (W ACV). 53–60. doi:10.1109/WACV.2013.6474999

work page doi:10.1109/wacv.2013.6474999 2013
[49]

Francisco Ordóñez and Daniel Roggen. 2016. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition.Sensors16, 1 (Jan. 2016), 115. doi:10.3390/s16010115

work page doi:10.3390/s16010115 2016
[50]

Munafò, Sebastian Ocklenburg, and Silvia Paracchini

Marietta Papadatou-Pastou, Eleni Ntolka, Judith Schmitz, Maryanne Martin, Marcus R. Munafò, Sebastian Ocklenburg, and Silvia Paracchini. 2020. Human Handedness: A Meta-Analysis.Psychological Bulletin146, 6 (2020), 481–524. doi:10.1037/bul0000229

work page doi:10.1037/bul0000229 2020
[51]

Stankovic

Sirat Samyoun, Sudipta Saha Shubha, Md Abu Sayeed Mondol, and John A. Stankovic. 2021. iWash: A Smartwatch Handwashing Quality Assessment and Reminder System with Real-Time Feedback in the Context of Infectious Disease.Smart Health (Amsterdam, Netherlands)19 (March 2021), 100171. doi:10.1016/j.smhl.2020.100171

work page doi:10.1016/j.smhl.2020.100171 2021
[52]

Nabeel Siddiqui and Rosa H M Chan. [n. d.]. Multimodal Hand Gesture Recognition Using Single IMU and Acoustic Measurements at Wrist. ([n. d.])
[53]

Scholl, Marcel Miché, Silvan Wirth, Robin Burchard, and Roselind Lieb

Karina Wahl, Philipp M. Scholl, Marcel Miché, Silvan Wirth, Robin Burchard, and Roselind Lieb. 2023. Real-Time Detection of Obsessive-Compulsive Hand Washing with Wearables: Research Procedure, Usefulness and Discriminative Performance.Journal of Obsessive-Compulsive and Related Disorders39 (Oct. 2023), 100845. doi:10.1016/j.jocrd.2023.100845

work page doi:10.1016/j.jocrd.2023.100845 2023
[54]

Yan Wang, Shuang Cang, and Hongnian Yu. 2019. A Survey on Wearable Sensor Modality Centred Human Activity Recognition in Health Care.Expert Systems with Applications137 (Dec. 2019), 167–190. doi:10.1016/j.eswa.2019.04.057

work page doi:10.1016/j.eswa.2019.04.057 2019
[55]

J.A. Ward, P. Lukowicz, G. Troster, and T.E. Starner. 2006. Activity Recognition of Assembly Tasks Using Body-Worn Microphones and Accelerometers.IEEE Transactions on Pattern Analysis and Machine Intelligence28, 10 (Oct. 2006), 1553–1567. doi:10.1109/TPAMI.2006.197

work page doi:10.1109/tpami.2006.197 2006
[56]

Anjana Wijekoon, Nirmalie Wiratunga, and Kay Cooper. 2019. MEx: Multi-modal Exercises Dataset for Human Activity Recognition. arXiv:1908.08992 [cs] doi:10.48550/arXiv.1908.08992

work page doi:10.48550/arxiv.1908.08992 2019
[57]

Zhe Yang, Ying Zhang, Yanjun Li, Linchong Huang, Ping Hu, and Yuexiang Lin. 2025. Fusion of Inertial and High-Resolution Acoustic Data for Privacy-Preserving Human Activity Recognition.IEEE Transactions on Instrumentation and Measurement74 (2025), 1–20. doi:10.1109/TIM.2025.3565250

work page doi:10.1109/tim.2025.3565250 2025
[58]

Taeyoung Yeon, Vasco Xu, Henry Hoffmann, and Karan Ahuja. 2025. WatchHAR: Real-time On-device Human Activity Recognition System for Smartwatches. InProceedings of the 27th International Conference on Multimodal Interaction. 387–394. arXiv:2509.04736 [cs] doi:10.1145/3716553.3750775

work page doi:10.1145/3716553.3750775 2025
[59]

Mauk, and Changchun Liu

Xin Zhang, Karteek Kadimisetty, Kun Yin, Carlos Ruiz, Michael G. Mauk, and Changchun Liu. 2019. Smart Ring: A Wearable Device for Hand Hygiene Compliance Monitoring at the Point-of-Need.Microsystem Technologies25, 8 (Aug. 2019), 3105–3110. doi:10.1007/s00542-018-4268-5

work page doi:10.1007/s00542-018-4268-5 2019
[60]

Haoyu Zhuang, Liqiang Xu, Yuuki Nishiyama, and Kaoru Sezaki. 2023. Detecting Hand Hygienic Behaviors In-the-Wild Using a Microphone and Motion Sensor on a Smartwatch. InDistributed, Ambient and Pervasive Interactions (Lecture Notes in Computer Science), Norbert A. Streitz and Shin’ichi Konomi (Eds.). Springer Nature Switzerland, Cham, 470–483. doi:10.1007...

work page doi:10.1007/978-3-031-34609-5_34 2023
[61]

Zhendong Zhuang and Yang Xue. 2019. Sport-Related Human Activity Detection and Recognition Using a Smartwatch.Sensors19, 22 (Jan. 2019), 5001. doi:10.3390/s19225001 HARMES: A Multi-Modal Dataset for Wearable Human Activity Recognition with Motion, Environmental Sensing and Sound•23 A META - Methodological Transparency A.1 Dataset and Code Availability We ...

work page doi:10.3390/s19225001 2019