arxiv: 2605.03462 · v1 · submitted 2026-05-05 · 💻 cs.LG · cs.AI

Recognition: unknown

Learning Generalizable Action Representations via Pre-training AEMG

Zhenghao Huang , Huilin Yao , Kaikai Wang , Lin Shu

Authors on Pith no claims yet

Pith reviewed 2026-05-07 17:08 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords electromyographyself-supervised learningaction representationgeneralizationphysiological languagepre-trainingmotor intent

0 comments

The pith

AEMG pre-trains EMG signals as a cross-device physiological language using a contraction tokenizer to improve generalization in motor intent decoding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AEMG, the first large-scale self-supervised framework for EMG signals that treats neuromuscular dynamics as linguistic structures. It deploys the Neuromuscular Contraction Tokenizer to convert discrete muscle contractions into words and temporal patterns into sentences while compiling the largest cross-device EMG vocabulary to date. This pre-training enables representations that transfer across subjects, devices, and tasks. Experiments show consistent gains in zero-shot and few-shot settings for decoding human motor intent.

Core claim

By treating EMG signals linguistically and pre-training on a massive cross-device dataset, AEMG learns representations that generalize across subjects, devices, and tasks, achieving 5.79-9.25% higher zero-shot leave-one-subject-out accuracy than existing methods and over 90% performance in few-shot settings using only 5% of target data.

What carries the argument

The Neuromuscular Contraction Tokenizer (NCT), which converts discrete muscle contractions into structural words and temporal activation patterns into coherent sentences to support linguistic-style pre-training on EMG data.

If this is right

Zero-shot leave-one-subject-out accuracy improves by 5.79-9.25% over six state-of-the-art baselines.
Few-shot adaptation reaches more than 90% accuracy using only 5% of target user data.
Seamless transfer occurs across arbitrary channel topologies and sampling rates.
A single pre-trained model can serve as a foundation for multiple EMG applications without repeated per-user training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The linguistic treatment of EMG may extend to other time-series biosignals to create unified foundation models.
Prosthetic and human-computer interface systems could reduce per-user calibration time substantially.
Scaling the cross-device vocabulary further might yield additional gains in rare or complex action classes.

Load-bearing premise

That EMG signals contain consistent linguistic structures across subjects and devices that can be tokenized without losing information needed to distinguish different actions.

What would settle it

If a pre-trained AEMG model shows no accuracy gain or a loss relative to non-pretrained baselines when tested on a completely new device or subject cohort, the claimed generalization benefit would not hold.

Figures

Figures reproduced from arXiv: 2605.03462 by Huilin Yao, Kaikai Wang, Lin Shu, Zhenghao Huang.

**Figure 1.** Figure 1: Corresponding electromyography, gesture, and anatom view at source ↗

**Figure 2.** Figure 2: Framework of AEMG The Exploration Stage of General-Purpose Model Paradigms Driven by Ultra-Large-Scale Data. It is pioneered by leading teams such as Meta’s CTRL-labs. The core idea borrows from the concept of foundation models in the field of Natural Language Processing: a model’s powerful generalization capability ultimately stems from pretraining on massive and diverse data [20]. The field of EMG dec… view at source ↗

**Figure 3.** Figure 3: Illustration of the Neural EMG Vocabulary. Due to space constraints, only a representative subset is displayed here. The comprehensive gesture modes and corresponding illustrations are detailed in the Appendix. Left: Morphological distinctiveness of the EMG vocabulary. Distinct muscle contraction morphologies reliably convey different semantic expressions. Right: Context-dependent semantic polysemy in EMG … view at source ↗

read the original abstract

A fundamental role in decoding human motor intent and enabling intuitive human-computer interaction is played by electromyography (EMG). However, its generalization capability across subjects, devices, and tasks remains substantially limited by data heterogeneity, label scarcity, and the lack of a unified representational framework. To bridge this gap, we propose Any Electromyography (AEMG), the first large-scale, self-supervised representation learning framework for EMG. AEMG reconceptualizes neuromuscular dynamics linguistically, utilizing a novel Neuromuscular Contraction Tokenizer (NCT) to translate discrete muscle contractions into structural words and temporal activation patterns into coherent sentences. Furthermore, we compile the largest cross-device EMG signal vocabulary to date, enabling seamless transfer across arbitrary channel topologies and sampling rates. Experiments demonstrate that AEMG improves the zero-shot leave-one-subject-out (LOSO) accuracy by 5.79-9.25% compared to six state-of-the-art baselines, and achieves more than 90% few-shot adaptation performance with only 5% of target user data. Our work has proposed the concept of EMG signals as a cross-device physiological language, learned their grammar from massive amounts of data, and laid the groundwork for a single-training, universally applicable EMG foundation model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes AEMG, the first large-scale self-supervised representation learning framework for EMG signals. It reconceptualizes neuromuscular dynamics linguistically via a novel Neuromuscular Contraction Tokenizer (NCT) that discretizes muscle contractions into structural words and temporal activation patterns into sentences. A large cross-device EMG signal vocabulary is compiled to support transfer across arbitrary channel topologies and sampling rates. Experiments are reported to show 5.79-9.25% gains in zero-shot leave-one-subject-out (LOSO) accuracy over six baselines and >90% few-shot adaptation performance using only 5% of target user data.

Significance. If the reported gains hold and are attributable to the linguistic modeling rather than dataset scale alone, the work has high significance for EMG-based motor intent decoding and human-computer interaction. The compilation of the largest cross-device EMG vocabulary to date and the self-supervised pre-training approach directly address label scarcity and heterogeneity; these are concrete strengths that could support future foundation models. The linguistic analogy provides a fresh conceptual lens even if the empirical validation requires strengthening.

major comments (2)

[Abstract] Abstract: The headline claims of 5.79-9.25% zero-shot LOSO accuracy improvement and >90% few-shot performance with 5% data are stated without any reference to experimental protocol, dataset details (subjects, devices, tasks), statistical tests, or ablation results. This absence is load-bearing for the central generalization claim.
[NCT description] Section describing the Neuromuscular Contraction Tokenizer (NCT): The premise that NCT produces a lossless, subject- and device-invariant linguistic representation (words from contractions, sentences from patterns) is central to attributing gains to the proposed grammar rather than other pre-training choices, yet no analysis of information loss from discretization, fixed thresholds, or quantization, nor ablations against non-linguistic baselines, is supplied.

minor comments (1)

[Abstract] The abstract uses 'AEMG' both for the framework and implicitly for the signals; a brief clarification of acronym scope would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us improve the clarity and rigor of our manuscript. We address each major comment in detail below, providing clarifications and indicating the revisions made to the paper.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claims of 5.79-9.25% zero-shot LOSO accuracy improvement and >90% few-shot performance with 5% data are stated without any reference to experimental protocol, dataset details (subjects, devices, tasks), statistical tests, or ablation results. This absence is load-bearing for the central generalization claim.

Authors: We acknowledge the referee's point that the abstract lacks specific references to the experimental details. The manuscript body provides comprehensive descriptions of the datasets (including subject numbers, device types, and task specifications), the leave-one-subject-out protocol, and comparisons with baselines. Statistical tests (paired t-tests) were used to validate the improvements. To address this, we have revised the abstract to briefly mention the key aspects of the evaluation protocol and datasets, ensuring the claims are better contextualized without exceeding length limits. revision: yes
Referee: [NCT description] Section describing the Neuromuscular Contraction Tokenizer (NCT): The premise that NCT produces a lossless, subject- and device-invariant linguistic representation (words from contractions, sentences from patterns) is central to attributing gains to the proposed grammar rather than other pre-training choices, yet no analysis of information loss from discretization, fixed thresholds, or quantization, nor ablations against non-linguistic baselines, is supplied.

Authors: We agree that additional analysis would strengthen the attribution of gains to the linguistic modeling. The NCT uses fixed thresholds derived from neuromuscular physiology to ensure invariance, and the cross-device vocabulary addresses heterogeneity in channel topologies and sampling rates. However, explicit quantification of information loss due to discretization and ablations against non-linguistic baselines were not included. We will incorporate a new analysis section quantifying reconstruction error from the tokenizer and an ablation comparing NCT to a non-linguistic baseline (e.g., direct feature extraction without tokenization) in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical gains rest on external baselines, not self-referential definitions or fitted inputs.

full rationale

The paper presents AEMG as a self-supervised framework that tokenizes EMG via NCT into words/sentences and pretrains on a compiled cross-device vocabulary. Its strongest claims are zero-shot LOSO accuracy improvements (5.79-9.25%) and few-shot results (>90% with 5% data) measured against six independent state-of-the-art baselines. No equations, parameter-fitting steps, or self-citations are shown that reduce any reported prediction or generalization result to a quantity defined in terms of itself. The NCT discretization and vocabulary construction are introduced as novel design choices whose validity is tested by downstream performance rather than assumed by construction. This is the common honest case of a self-contained empirical paper whose central results do not collapse to tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Based solely on abstract; the central approach rests on the linguistic reconceptualization of EMG and the feasibility of a universal cross-device vocabulary. No specific numerical free parameters are stated.

axioms (1)

domain assumption EMG neuromuscular dynamics can be tokenized into structural words (discrete contractions) and coherent sentences (temporal activation patterns) without critical information loss.
Invoked as the foundation for the NCT and self-supervised pre-training described in the abstract.

invented entities (2)

Neuromuscular Contraction Tokenizer (NCT) no independent evidence
purpose: Translate discrete muscle contractions into structural words and temporal activation patterns into sentences for representation learning.
Novel component introduced to enable the linguistic framing of EMG signals.
Cross-device EMG signal vocabulary no independent evidence
purpose: Enable seamless transfer across arbitrary channel topologies and sampling rates.
Compiled as the largest such collection to support the unified framework.

pith-pipeline@v0.9.0 · 5516 in / 1445 out tokens · 55606 ms · 2026-05-07T17:08:50.895608+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 3 canonical work pages

[1]

Electromyography data for non-invasive naturally-controlled robotic hand prostheses.Scientific Data, 1(1):1–13, 2014

Manfredo Atzori, Arjan Gijsberts, Claudio Castellini, Bar- bara Caputo, Anne-Gabrielle Mittaz Hager, Simone Elsig, Giorgio Giatsidis, Franco Bassetto, and Henning M ¨uller. Electromyography data for non-invasive naturally-controlled robotic hand prostheses.Scientific Data, 1(1):1–13, 2014. 1, 2

2014
[2]

L. I. Barona L ´opez, F. M. Ferri, J. Zea, ´A. L. Val- divieso Caraguay, and M. E. Benalc´azar. Cnn-lstm and post- processing for emg-based hand gesture recognition.Intelli- gent Systems with Applications, 22:200352, 2024. 1, 2

2024
[3]

Benalcazar, L

M.E. Benalcazar, L. Barona, L. Valdivieso, X. Aguas, and J. Zea. Emg-epn-612 dataset, 2020. Zenodo. 6, 8

2020
[4]

Campanini, C

I. Campanini, C. Disselhorst-Klug, W. Z. Rymer, and R. Merletti. Surface emg in clinical assessment and neuroreha- bilitation: barriers limiting its use.Frontiers in Neurology, 11:934, 2020. 1

2020
[5]

Chapelle and A

O. Chapelle and A. Zien. Semi-supervised classification by low density separation. InInternational Workshop on Arti- ficial Intelligence and Statistics, pages 57–64. PMLR, 2005. 2

2005
[6]

Ulysse C ˆot´e-Allard, Cheikh Latyr Fall, Alexandre Drouin, Alexandre Campeau-Lecours, Cl ´ement Gosselin, Kyrre Glette, Franc ¸ois Laviolette, and Benoit Gosselin. Deep learn- ing for electromyographic hand gesture signal classification using transfer learning.IEEE Transactions on Neural Sys- tems and Rehabilitation Engineering, 27(4):760–771, 2019. 2, 3

2019
[7]

Cote-Allard, G

U. Cote-Allard, G. Gagnon-Turcotte, A. Phinyomark, K. Glette, E. J. Scheme, F. Laviolette, and B. Gos- selin. Unsupervised domain adversarial self-calibration for electromyography-based gesture recognition.IEEE Access, 8:177941–177955, 2020. 2

2020
[8]

A. D. Degenhart, W. E. Bishop, E. R. Oby, E. C. Tyler- Kabara, S. M. Chase, A. P. Batista, and B. M. Yu. Stabi- lization of a brain–computer interface via the alignment of low-dimensional spaces of neural activity.Nature Biomedi- cal Engineering, 4(7):672–685, 2020. 4

2020
[9]

Z. Deng, Y . Luo, and J. Zhu. Cluster alignment with a teacher for unsupervised domain adaptation. InICCV, pages 9944– 9953, 2019. 2

2019
[10]

M. D. Dere and B. Lee. A novel approach to surface emg- based gesture classification using a vision transformer inte- grated with convolutive blind source separation.IEEE Jour- nal of Biomedical and Health Informatics, 2023. 2

2023
[11]

N. A. Dimitrova and G. V . Dimitrov. Interpretation of emg changes with fatigue: facts, pitfalls, and fallacies.Journal of Electromyography and Kinesiology, 13(1):13–36, 2003. 2

2003
[12]

Surface emg-based intersession gesture recognition enhanced by deep domain adaptation.Sensors, 17(3):458,

Yu Du, Wenguang Jin, Wentao Wei, Yu Hu, and Weidong Geng. Surface emg-based intersession gesture recognition enhanced by deep domain adaptation.Sensors, 17(3):458,
[13]

Y . Du, Y . Chen, F. Cui, X. Zhang, and C. Wang. Cross- domain error minimization for unsupervised domain adapta- tion. InDatabase Systems for Advanced Applications: 26th International Conference, DASFAA 2021, Proceedings, Part II, pages 429–448. Springer, 2021. 2, 6

2021
[14]

Gesture recognition by instantaneous surface emg images.Scientific Reports, 6(1):36571, 2016

Weidong Geng, Yu Du, Wenguang Jin, Wentao Wei, Yu Hu, and Jiajun Li. Gesture recognition by instantaneous surface emg images.Scientific Reports, 6(1):36571, 2016. 6

2016
[15]

Hou, Y .-H

C.-A. Hou, Y .-H. H. Tsai, Y .-R. Yeh, and Y .-C. F. Wang. Un- supervised domain adaptation with label and structural con- sistency.IEEE TIP, 25(12):5552–5562, 2016. 2, 8

2016
[16]

N. M. Hye, U. Hany, S. Chakravarty, L. Akter, and I. Ahmed. Artificial intelligence for semg-based muscular movement recognition for hand prosthesis.IEEE Access, 2023. 2

2023
[17]

Large brain model for learning generic representations with tremendous EEG data in BCI

Wei-Bang Jiang, Li-Ming Zhao, and Bao-Liang Lu. Large brain model for learning generic representations with tremendous EEG data in BCI. InThe Twelfth International Conference on Learning Representations, 2024. 4, 5

2024
[18]

Xinyu Jiang, Xiangyu Liu, Jiahao Fan, Xinming Ye, Chenyun Dai, Edward A Clancy, Metin Akay, and Wei Chen. Open access dataset, toolbox and benchmark process- ing results of high-density surface electromyogram record- ings.IEEE Transactions on Neural Systems and Rehabilita- tion Engineering, 29:1035–1046, 2021. 3

2021
[19]

J. Jin, H. Wang, H. Li, J. Li, J. Pan, and S. Hong. Reading your heart: Learning ecg words and sentences via pre-training ecg language model.arXiv preprint arXiv:2502.10707, 2025. 1, 4, 5

work page arXiv 2025
[20]

Kaifosh, T

P. Kaifosh, T. R. Reardon, and CTRL-labs at Reality Labs. A generic non-invasive neuromotor interface for human- computer interaction.Nature, pages 1–10, 2025. 1, 3, 5

2025
[21]

E. R. Kandel, J. H. Schwartz, T. M. Jessell, S. A. Siegel- baum, and A. J. Hudspeth.Principles of Neural Science, Fifth Edition. McGraw-Hill Medical, 2000. 1

2000
[22]

Krilova, I

N. Krilova, I. Kastalskiy, V . Kazantsev, V . A. Makarov, and S. Lobov. Emg data for gestures. UCI Machine Learn- ing Repository, 2019. DOI:https://doi.org/10. 24432/C5ZP5C. 3

2019
[23]

Y . Liu, X. Peng, Y . Tan, T. T. Oyemakinde, M. Wang, G. Li, and X. Li. A novel unsupervised dynamic feature domain adaptation strategy for cross-individual myoelectric gesture recognition.Journal of Neural Engineering, 20(6):066044,
[24]

Merletti and D

R. Merletti and D. Farina.Surface Electromyography: Phys- iology, Engineering, and Applications. John Wiley & Sons,
[25]

Training lan- guage models to follow instructions with human feedback

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Car- roll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training lan- guage models to follow instructions with human feedback. NeurIPS, 35:27730–27744, 2022. 1

2022
[26]

Dataset for multichannel surface electromyo- graphy (semg) signals of hand gestures.Data in Brief, 41: 107921, 2022

Mehmet Akif Ozdemir, Deniz Hande Kisa, Onan Guren, and Aydin Akan. Dataset for multichannel surface electromyo- graphy (semg) signals of hand gestures.Data in Brief, 41: 107921, 2022. 3

2022
[27]

M. A. Ozdemir, D. H. Kisa, O. Guren, and A. Akan. Hand gesture classification using time-frequency images and trans- fer learning based on cnn.Biomedical Signal Processing and Control, 77:103787, 2022. 1

2022
[28]

Trans- former convolutional neural networks for automated artifact detection in scalp eeg

Wei Yan Peh, Yuanyuan Yao, and Justin Dauwels. Trans- former convolutional neural networks for automated artifact detection in scalp eeg. In2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 3599–3602. IEEE, 2022. 1

2022
[29]

Phinyomark, P

A. Phinyomark, P. Phukpattaranont, and C. Limsakul. Fea- ture reduction and selection for emg signal classification.Ex- pert Systems with Applications, 39(8):7420–7431, 2012. 2

2012
[30]

Comparison of six electromyography acquisition setups on hand movement classification tasks.PLoS ONE, 12(10): e0186132, 2017

Stefano Pizzolato, Luca Tagliapietra, Matteo Cognolato, Monica Reggiani, Henning M ¨uller, and Manfredo Atzori. Comparison of six electromyography acquisition setups on hand movement classification tasks.PLoS ONE, 12(10): e0186132, 2017. 6, 8

2017
[31]

R. Shu, H. H. Bui, H. Narui, and S. Ermon. A dirt-t approach to unsupervised domain adaptation.arXiv preprint, 2018. arXiv:1802.08735. 2, 6

work page arXiv 2018
[32]

Simar, M

C. Simar, M. Colot, A.-M. Cebolla, M. Petieau, G. Cheron, and G. Bontempi. Machine learning for hand pose classifica- tion from phasic and tonic emg signals during bimanual ac- tivities in virtual reality.Front. Neurosci., 18:1329411, 2024. 6, 8

2024
[33]

A systematic review on surface electromyography-based clas- sification system for identifying hand and finger movements

Afroza Sultana, Farruk Ahmed, and Md Shafiul Alam. A systematic review on surface electromyography-based clas- sification system for identifying hand and finger movements. Healthcare Analytics, 3:100126, 2023. 6

2023
[34]

H. Tang, K. Chen, and K. Jia. Unsupervised domain adap- tation via structurally regularized deep clustering. InCVPR, pages 8725–8735, 2020. 2, 8

2020
[35]

Toro-Ossaba, J

A. Toro-Ossaba, J. Jaramillo-Tigreros, J.C. Tejada, A. Pe ˜na, A. L´opez-Gonz´alez, and R.A. Castanho. Lstm recurrent neu- ral network for hand gesture recognition using emg signals. Appl. Sci., 12(19):9700, 2022. 6, 8

2022
[36]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, et al. Attention is all you need.NeurIPS, 30, 2017. 4

2017
[37]

L. Wang, X. Li, Z. Chen, Z. Sun, J. Xue, W. Sun, and S. Zhang. A novel hybrid unsupervised domain adaptation method for cross-subject joint angle estimation from surface electromyography.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29:1451–1461, 2021. 2

2021
[38]

Z. Wang, H. Wan, L. Meng, Z. Zeng, M. Akay, C. Chen, and W. Chen. Optimization of inter-subject semg-based hand gesture recognition tasks using unsupervised domain adapta- tion techniques.Biomedical Signal Processing and Control, 92:106086, 2024. 6

2024
[39]

M. Xu, X. Chen, Y . Ruan, and X. Zhang. Cross-user elec- tromyography pattern recognition based on a novel spatial- temporal graph convolutional network.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2023. 2

2023
[40]

J. Yang, M. Soh, D. J. Weber, V . Lieu, and Z. Erickson. Emgbench: Benchmarking out-of-distribution generaliza- tion and adaptation for electromyography.arXiv preprint,
[41]

arXiv:2410.23625. 2, 5

work page arXiv
[42]

Zhang, T

Y . Zhang, T. Liu, M. Long, and M. Jordan. Bridging the- ory and algorithm for domain adaptation. InInternational Conference on Machine Learning, pages 7404–7413. PMLR,