arxiv: 2604.09451 · v1 · submitted 2026-04-10 · 🧬 q-bio.QM · cs.LG

Recognition: unknown

An Open-Source, Open Data Approach to Activity Classification from Triaxial Accelerometry in an Ambulatory Setting

Alex Hall, David W. Wright, Edward Tian, Gari D. Clifford, Harrison Hoffman, J. Lucas McKay, Matthew Parks, Sepideh Nikookar, Tommy T. Thomas, Yashar Kiarashi

Pith reviewed 2026-05-10 16:22 UTC · model grok-4.3

classification 🧬 q-bio.QM cs.LG

keywords activity classificationtriaxial accelerometryconvolutional neural networkopen dataopen sourceambulatory monitoringhuman activity recognitionwearable sensors

0 comments

The pith

A convolutional neural network classifies five activities from ambulatory triaxial accelerometry data with an F1 score of 0.83.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops an open dataset and accompanying open-source code for processing 50 Hz tri-axial accelerometer signals collected in an ambulatory setting to classify five activities. Data came from 23 healthy subjects who followed a standardized routine of lying, sitting, standing, walking, and jogging while wearing the device for an average of 26 minutes each. A binary high-versus-low activity classifier reached an F1 score of 0.79, while the multi-class CNN reached 0.83. The work releases both the raw data and the analysis code to allow others to add behavioral context to standard health metrics from wearables.

Core claim

The authors collected synchronized triaxial accelerometer and ECG data from 23 healthy adults during a standardized activity protocol and showed that a convolutional neural network can classify the five activities (lying, sitting, standing, walking, jogging) with an F1 score of 0.83, while releasing the full dataset and processing code under an open-source license to support further development of context-aware monitoring tools.

What carries the argument

The multi-class convolutional neural network trained on raw 50 Hz tri-axial acceleration signals to output class probabilities for lying, sitting, standing, walking, and jogging.

If this is right

Activity labels supply context for interpreting traditional health metrics such as energy expenditure estimates.
The classification supports development of clinical decision-making tools for patient monitoring.
Contextual activity information enables predictive analytics and personalized health interventions.
Public release of the dataset and code allows other researchers to validate or extend the models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on patient populations with irregular movement patterns to check whether controlled-routine training generalizes.
Pairing the activity outputs with the paper's synchronous ECG channel might reveal activity-specific cardiac signatures not visible in averaged metrics.
Widespread adoption of the open data could speed creation of real-time wearable apps that guide activity levels during rehabilitation.

Load-bearing premise

The standardized sequence of five activities performed by healthy subjects in a controlled ambulatory routine adequately represents the variety and transitions of natural free-living movement patterns.

What would settle it

Testing the trained CNN on continuous, unscripted triaxial accelerometer recordings from participants moving freely in their daily environments without following any fixed activity sequence.

Figures

Figures reproduced from arXiv: 2604.09451 by Alex Hall, David W. Wright, Edward Tian, Gari D. Clifford, Harrison Hoffman, J. Lucas McKay, Matthew Parks, Sepideh Nikookar, Tommy T. Thomas, Yashar Kiarashi.

**Figure 2.** Figure 2: Architecture of the deep learning model for the activity classification. Model performance was evaluated using Leave-One-Out Cross-Validation (LOOCV). In this setup, data from one participant was held out for testing while the [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Histogram of Activity Count Values Separated by Activity. 3.1.2. Confusion Matrix. The confusion matrix, shown in [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Confusion Matrix for Active vs. Inactive Labels. 3.1.3. Comparison with MAD-Based Cut-Point. To assess how existing methods generalize to the current dataset, the Mean Amplitude Deviation (MAD)-based cutpoint proposed by Luckhurst et al. (2024) was applied. Their study identified an optimal MAD threshold of 47.73 mG (0.04773 g) for classifying sedentary and ambulatory activity using the same VivaLink ECG … view at source ↗

**Figure 5.** Figure 5: Box Plots of Different Metrics Across Different Folds for 5s Window Size [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 7.** Figure 7: Classification performance across different high-frequency cutoff values in the Butterworth filter. To visualize these impacts, heatmaps were generated for each metric ( [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Heatmap of Different Metric vs. Window Size and Sampling Rate [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: illustrates this relationship, where each point represents a 5-second window during an active period [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

read the original abstract

The accelerometer has become an almost ubiquitous device, providing enormous opportunities in healthcare monitoring beyond step counting or other average energy estimates in 15-60 second epochs. Objective: To develop an open data set with associated open-source code for processing 50 Hz tri-axial accelerometry-based to classify patient activity levels and natural types of movement. Approach: Data were collected from 23 healthy subjects (16 males and seven females) aged between 23 and 62 years using an ambulatory device, which included a triaxial accelerometer and synchronous lead II equivalent ECG for an average of 26 minutes each. Participants followed a standardized activity routine involving five distinct activities: lying, sitting, standing, walking, and jogging. Two classifiers were constructed: a signal processing technique to distinguish between high and low activity levels and a convolutional neural network (CNN)-based approach to classify each of the five activities. Main results: The binary (high/low) activity classifier exhibited an F1 score of 0.79. The multi-class CNN-based classifier provided an F1 score of 0.83. The code for this analysis has been made available under an open-source license together with the data on which the classifiers were trained and tested. Significance: The classification of behavioral activity, as demonstrated in this study, offers valuable context for interpreting traditional health metrics and may provide contextual information to support the future development of clinical decision-making tools for patient monitoring, predictive analytics, and personalized health interventions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main value is releasing an open dataset of 50 Hz triaxial accel from 23 healthy subjects in a scripted five-activity routine plus the code for a CNN that reaches 0.83 F1.

read the letter

The main thing here is a new open dataset of 50 Hz triaxial accelerometer recordings from 23 healthy adults (ages 23-62) collected during a standardized ambulatory routine of lying, sitting, standing, walking, and jogging, each for roughly 26 minutes on average, along with open code that trains a CNN to 0.83 F1 on the five-class task and a simpler signal-processing method at 0.79 F1 for high versus low activity. They also recorded synchronous ECG, which is a practical extra for anyone wanting multimodal context later. Both the raw data and the processing scripts are released under open licenses, so the numbers are directly checkable and reusable. That open release is the part that actually moves the needle for the field. It gives people working on wearable health monitoring a concrete starting benchmark without having to collect their own labeled ambulatory data from scratch. The inclusion of a fixed protocol and reported F1 scores makes it straightforward to compare new methods against this baseline. The soft spots are mostly about scope rather than execution. The subjects are all healthy and the activities follow a controlled sequence, so the data do not capture the variability, transitions, or irregular patterns of true free-living behavior or clinical populations. With only 23 participants the scale remains modest, and the abstract leaves out explicit details on train-test splits or cross-validation, though the released code lets anyone verify the pipeline. No load-bearing circularity or invented claims appear in the reported results. This is the sort of paper that matters for researchers building activity-aware tools in digital health or ambulatory monitoring who need public data to test ideas. It will not rewrite the literature on its own, but the open resources make it worth citing and extending. I would send it to peer review so the methods get proper scrutiny and the dataset gets properly indexed.

Referee Report

2 major / 3 minor

Summary. The paper collects 50 Hz triaxial accelerometer (plus ECG) data from 23 healthy subjects (ages 23-62) performing a fixed 26-minute standardized sequence of five activities (lying, sitting, standing, walking, jogging) in an ambulatory setting. It releases the full dataset and processing code under open licenses, implements a signal-processing binary high/low activity classifier (F1 = 0.79) and a CNN multi-class classifier (F1 = 0.83), and positions the resource as context for health monitoring metrics.

Significance. If the numerical results hold under proper validation, the primary value lies in the openly released dataset and code, which directly supports reproducibility and extension by the community. This addresses a genuine gap in accessible ambulatory accelerometry benchmarks and provides a concrete baseline for activity classification that can be used to contextualize other physiological signals.

major comments (2)

[Methods] Methods section (classifier training and evaluation): the manuscript reports concrete F1 scores but does not describe the data partitioning strategy (subject-wise vs. pooled), cross-validation procedure, hyperparameter selection method, or any statistical testing/confidence intervals around the F1 values of 0.79 and 0.83. These omissions are load-bearing for assessing whether the reported performance supports the central empirical claim.
[Results] Results and Discussion: the evaluation is performed exclusively on a controlled, scripted sequence performed by healthy volunteers; the paper does not quantify how well the five-class model handles natural transitions, variable durations, or inter-subject differences that would be expected in free-living ambulatory data, limiting the strength of the claim that the approach is ready for patient-monitoring applications.

minor comments (3)

[Abstract] Abstract: the phrase 'an average of 26 minutes each' should be accompanied by the range or standard deviation of recording durations across the 23 subjects.
[Methods] Figure captions and Methods: the CNN architecture diagram and layer specifications are referenced but the exact input window length, overlap, and normalization steps are not stated explicitly in the text, forcing readers to consult the released code.
[Introduction] References: several standard papers on accelerometry-based activity recognition (e.g., on subject-independent validation) are not cited, which would help situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and recommendation for minor revision. We address the major comments point by point below, with revisions planned to improve clarity and acknowledge limitations.

read point-by-point responses

Referee: [Methods] Methods section (classifier training and evaluation): the manuscript reports concrete F1 scores but does not describe the data partitioning strategy (subject-wise vs. pooled), cross-validation procedure, hyperparameter selection method, or any statistical testing/confidence intervals around the F1 values of 0.79 and 0.83. These omissions are load-bearing for assessing whether the reported performance supports the central empirical claim.

Authors: We agree that these methodological details are essential for evaluating the reported F1 scores. The revised manuscript will expand the Methods section to fully describe the data partitioning strategy used, the cross-validation procedure, the approach to hyperparameter selection, and the statistical testing performed, including confidence intervals around the F1 values of 0.79 and 0.83. revision: yes
Referee: [Results] Results and Discussion: the evaluation is performed exclusively on a controlled, scripted sequence performed by healthy volunteers; the paper does not quantify how well the five-class model handles natural transitions, variable durations, or inter-subject differences that would be expected in free-living ambulatory data, limiting the strength of the claim that the approach is ready for patient-monitoring applications.

Authors: We acknowledge the limitation of evaluating on a controlled, scripted protocol in healthy subjects, which does not capture free-living variability such as natural transitions or inter-subject differences in patient populations. In the revised Discussion, we will add explicit text noting this scope and clarifying that the work establishes baseline performance and an open resource for controlled ambulatory settings, while tempering claims about immediate readiness for patient-monitoring applications and outlining the need for future free-living validation. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical machine-learning study: 23 subjects performed a fixed sequence of five labeled activities while wearing a triaxial accelerometer; a binary signal-processing classifier and a CNN multi-class classifier were trained on the resulting data and evaluated via standard F1 metrics. No derivation chain exists that reduces any claimed result to its own inputs by construction, no parameters are fitted and then re-labeled as independent predictions, and no self-citations supply load-bearing uniqueness theorems or ansatzes. The reported F1 scores are direct performance measures on the authors' collected and released dataset, making the central claims externally verifiable rather than self-referential.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that the five scripted activities are representative of real-world movement and that standard CNN training procedures generalize to new subjects; no new physical entities are introduced.

free parameters (1)

CNN architecture and training hyperparameters
The network depth, filter sizes, learning rate, and regularization choices are selected or tuned to achieve the reported F1 score.

axioms (1)

domain assumption The five scripted activities (lying, sitting, standing, walking, jogging) are distinct and representative of ambulatory behavior
Invoked by the data-collection protocol and the choice of classification targets.

pith-pipeline@v0.9.0 · 5611 in / 1290 out tokens · 38851 ms · 2026-05-10T16:22:33.094231+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 2 canonical work pages

[1]

S., Albahri, O

Albahri, A. S., Albahri, O. S., Zaidan, A., Zaidan, B., Hashim, M., Alsalem, M., Mohsin, A. H., Mohammed, K., Alamoodi, A. H., Enaizan, O. et al. (2019). Based multiple heterogeneous wearable sensors: A smart real-time health monitoring structured for hospitals distributor,IEEE Access7: 37269–37323. Antonsson, E. K. and Mann, R. W. (1985). The frequency c...

2019
[2]

T., Vigotsky, A

Balachandran, A. T., Vigotsky, A. D., Quiles, N., Mokkink, L. B., Belio, M. A. and Glenn, J. M. (2021). Validity, reliability, and measurement error of a sit-to- stand power test in older adults: A pre-registered study,Experimental Gerontology 145: 111202. URL:https://www.sciencedirect.com/science/article/pii/S0531556520305507 Bao, L. and Intille, S. S. (...

2021
[3]

and Tran, V.-T

Gra˜ na Possamai, C., Ravaud, P., Ghosn, L. and Tran, V.-T. (2020). Use of wearable biometric monitoring devices to measure outcomes in randomized clinical trials: a methodological systematic review,BMC Medicine18: 1–11. Greiwe, J. and Nyenhuis, S. (2020). Wearable technology and how this can be implemented into clinical practice,Current Allergy and Asthm...

work page arXiv 2020
[4]

and Zilic, Z

Janidarmian, M., Roshan Fekr, A., Radecka, K. and Zilic, Z. (2017). A comprehensive analysis on wearable acceleration sensors in human activity recognition,Sensors 17(3):

2017
[5]

Jat, A. S. and Grønli, T.-M. (2022). Smart watch for smart health monitoring: a literature review,International Work-Conference on Bioinformatics and Biomedical Engineering, Springer, pp. 256–268. Joyner, M., Hsu, S.-H., Martin, S., Dwyer, J., Chen, D. F., Sameni, R., Waters, S. H., Borodin, K., Clifford, G. D., Levey, A. I. et al. (2024). Using a standal...

2022
[6]

P., Grooten, W

Kuster, R. P., Grooten, W. J., Blom, V., Baumgartner, D., Hagstr¨ omer, M. and Ekblom, ¨O. (2020). Is sitting always inactive and standing always active? a simultaneous free- living activpal and actigraph analysis,International journal of environmental research and public health17(23):

2020
[7]

M., Boric-Lubecke, O

Li, C., Lubecke, V. M., Boric-Lubecke, O. and Lin, J. (2013). A review on recent advances in doppler radar sensors for noncontact healthcare monitoring,IEEE Transactions on Microwave Theory and Techniques61(5): 2046–2060. Luckhurst, J., Hughes, C. and Shelley, B. (2024). Classifying physical activity levels using mean amplitude deviation in adults using a...

2013
[8]

Manullang, M. C. T., Lin, Y.-H., Lai, S.-J. and Chou, N.-K. (2021). Implementation of thermal camera for non-contact physiological measurement: A systematic review, Sensors21(23):

2021
[9]

and Jitpattanakul, A

Mekruksavanich, S. and Jitpattanakul, A. (2021). LSTM networks using smartphone data for sensor-based human activity recognition in smart homes,Sensors21(5):

2021
[10]

and Clifford, G

Nikookar, S., Tian, E., Hoffman, H. and Clifford, G. D. (2025). Code and data for activity classification from triaxial accelerometry. Available online:https: //github.com/cliffordlab/Accel_Activity_DB(Accessed on 11 February 2025). Nino, V., Claudio, D., Schiel, C. and Bellows, B. (2020). Coupling wearable devices and decision theory in the united states...

2025
[11]

and Duttagupta, S

Pawar, T., Anantakrishnan, N., Chaudhuri, S. and Duttagupta, S. P. (2007). Impact analysis of body movement in ambulatory ECG,2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, pp. 5453–

2007
[12]

Ren, Y. (2023). Human activity recognition acc electrocardiogram, figshare. Dataset. URL:https://doi.org/10.6084/m9.figshare.24132510.v3 Ren, Y., Liu, M., Yang, Y., Mao, L. and Chen, K. (2024). Clinical human activity recognition based on a wearable patch of combined tri-axial acc and ecg sensors, Digital Health10: 20552076231223804. Reyes-Ortiz, J., Angu...

work page doi:10.6084/m9.figshare.24132510.v3 2023
[13]

V., Vanoverschelde, H., Heerman, J

Stevens, G., Larmuseau, M., Damme, A. V., Vanoverschelde, H., Heerman, J. and Verdonck, P. (2024). Feasibility study of the use of a wearable vital sign patch in an intensive care unit setting,Journal of Clinical Monitoring and Computingpp. 1–12. Sweeney, K., Leamy, D., Ward, T. and McLoone, S. (2010). Intelligent artifact classification for ambulatory ph...

2024
[14]

and Craddock, I

Twomey, N., Diethe, T., Fafoutis, X., Elsts, A., McConville, R., Flach, P. and Craddock, I. (2018). A comprehensive study of activity recognition using accelerometers, Informatics, Vol. 5, MDPI, p

2018
[15]

and Dong, T

Wang, Z., Yang, Z. and Dong, T. (2017). A review of wearable technologies for elderly care that can accurately track indoor position, recognize physical activities and monitor vital signs in real time,Sensors17(2):

2017
[16]

and Clifford, G

Waters, S., Berent, J., Saini, P. and Clifford, G. (2024). Domain adaptation using large scale databases for sleep staging from a novel in-ear sensor. Yuan, L., Andrews, J., Mu, H., Vakil, A., Ewing, R., Blasch, E. and Li, J. (2022). Interpretable passive multi-modal sensor fusion for human identification and activity recognition,Sensors22(15):

2024
[17]

and Soh, Y

Zhu, Q., Chen, Z. and Soh, Y. C. (2018). A novel semisupervised deep learning method for human activity recognition,IEEE Transactions on Industrial Informatics 15(7): 3821–3830

2018