pith. sign in

arxiv: 2507.04465 · v4 · submitted 2025-07-06 · 💻 cs.CV

Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions

Pith reviewed 2026-05-19 05:53 UTC · model grok-4.3

classification 💻 cs.CV
keywords visual hand gesture recognitiondeep learningsurveystatic gesture recognitiondynamic gesture recognitioncontinuous gesture recognitiondatasetschallenges
0
0 comments X

The pith

This review organizes deep learning methods for visual hand gesture recognition into static, isolated dynamic, and continuous tasks as a guideline for researchers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a comprehensive overview of visual hand gesture recognition using deep learning to fill the gap left by the absence of a structured survey amid hundreds of papers. It uses a systematic research methodology to locate literature and presents methods, datasets, and metrics in a taxonomy-based format. The review focuses on four questions covering main aspects, current state-of-the-art methods grouped by task, comparative insights, and challenges that shape future work. A sympathetic reader would care because the overview supports proposing improvements in applications such as sign language understanding and human-computer interaction.

Core claim

Starting with the methodology used to locate the related literature, the survey identifies and organizes the key VHGR approaches in a taxonomy-based format. The SOTA methods are grouped across three primary VHGR tasks: static, isolated dynamic and continuous gesture recognition. For each task, the architectural trends and learning strategies are listed. To support the experimental evaluation of future methods in the field, the study reviews commonly used datasets and presents the standard performance metrics. The survey concludes by identifying the major challenges in VHGR, including both general computer vision issues and domain-specific obstacles, and outlines promising directions for未来研究.

What carries the argument

Taxonomy-based format that organizes approaches and groups SOTA methods across the three primary tasks of static, isolated dynamic, and continuous gesture recognition.

If this is right

  • Researchers can identify current state-of-the-art methods more quickly for each of the three gesture recognition tasks.
  • Reviewed datasets and standard metrics enable consistent evaluation when testing new methods.
  • Listed challenges point to specific obstacles that new work should target to advance the field.
  • Outlined future directions suggest concrete paths for improving architectures and strategies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The task-based grouping could support cross-task transfer of techniques between static and dynamic recognition settings.
  • The survey structure might extend to related vision problems such as full-body action recognition.
  • Future updates could add benchmarks for real-time deployment on edge devices.

Load-bearing premise

The systematic research methodology used to locate and organize the related literature has captured the key VHGR approaches, SOTA methods, datasets, and challenges without significant omissions or selection bias.

What would settle it

A search that uncovers a major deep learning paper on visual hand gesture recognition from the covered period that falls outside the survey's taxonomy, task groupings, or listed datasets.

read the original abstract

The rapid evolution of deep learning (DL) models and the ever-increasing size of available datasets have raised the interest of the research community in the always-important field of visual hand gesture recognition (VHGR), and delivered a wide range of applications, such as sign language understanding and human-computer interaction. Despite the large volume of research works in the field, a structured and complete survey on VHGR is still missing, leaving researchers to navigate through hundreds of papers in order to find the current state-of-the-art (SOTA). The current survey aims to fill this gap by presenting a comprehensive overview of this computer vision field. With a systematic research methodology and a structured presentation of the various methods, datasets, and evaluation metrics, this review aims to constitute a useful guideline for researchers, helping them to propose improvements. Specifically, this survey focuses on four fundamental questions: what are the main VHGR aspects, what are the current SOTA methods, what comparative insights can be drawn across methods and tasks, and which challenges shape future research. Starting with the methodology used to locate the related literature, the survey identifies and organizes the key VHGR approaches in a taxonomy-based format. The SOTA methods are grouped across three primary VHGR tasks: static, isolated dynamic and continuous gesture recognition. For each task, the architectural trends and learning strategies are listed. To support the experimental evaluation of future methods in the field, the study reviews commonly used datasets and presents the standard performance metrics. Our survey concludes by identifying the major challenges in VHGR, including both general computer vision issues and domain-specific obstacles, and outlines promising directions for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper presents a comprehensive survey on visual hand gesture recognition (VHGR) with deep learning. It begins with a systematic research methodology to locate and organize related literature into a taxonomy-based format, groups current SOTA methods across static, isolated dynamic, and continuous gesture recognition tasks with architectural trends and learning strategies, reviews commonly used datasets and standard performance metrics to support future evaluations, and concludes by identifying major challenges (general CV and domain-specific) along with promising future research directions.

Significance. If the coverage proves complete and free of significant selection bias, the survey would provide a structured guideline for VHGR researchers by consolidating SOTA comparisons across tasks, standard datasets/metrics, and challenge identification, thereby helping to direct improvements in this active computer vision subfield.

major comments (1)
  1. [§2] §2 (Methodology): The systematic research methodology is described at a high level without providing concrete, reproducible elements such as exact search strings/Boolean queries, specific databases queried (e.g., Google Scholar, IEEE Xplore, Scopus), explicit publication date range, inclusion/exclusion criteria, or a quantitative flow diagram/summary of papers screened versus included. This directly affects the central claim that the taxonomy, SOTA groupings, and challenge identification constitute a comprehensive, unbiased overview suitable as a researcher guideline.
minor comments (2)
  1. [Abstract] The abstract states the survey focuses on 'four fundamental questions' but enumerates them narratively rather than as a numbered list, which could improve immediate clarity for readers.
  2. [Throughout] Ensure consistent first-use definitions for acronyms (VHGR, SOTA, DL) and that all cited works in the taxonomy and SOTA sections receive full bibliographic entries.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comment below and will revise the manuscript accordingly to improve reproducibility while preserving the integrity of our survey.

read point-by-point responses
  1. Referee: [§2] §2 (Methodology): The systematic research methodology is described at a high level without providing concrete, reproducible elements such as exact search strings/Boolean queries, specific databases queried (e.g., Google Scholar, IEEE Xplore, Scopus), explicit publication date range, inclusion/exclusion criteria, or a quantitative flow diagram/summary of papers screened versus included. This directly affects the central claim that the taxonomy, SOTA groupings, and challenge identification constitute a comprehensive, unbiased overview suitable as a researcher guideline.

    Authors: We acknowledge that Section 2 currently presents the methodology at a high level. We agree this limits full reproducibility and will revise the section in the next version to include the specific Boolean search strings used, the databases queried (Google Scholar, IEEE Xplore, Scopus, and arXiv), the explicit publication date range (2014–2024), detailed inclusion/exclusion criteria, and a PRISMA-style flow diagram with quantitative screening statistics. These additions are based on the original search process we conducted and will directly support the claim of a comprehensive, unbiased overview suitable as a researcher guideline. revision: yes

Circularity Check

0 steps flagged

No circularity: survey organizes existing literature without self-referential derivations or load-bearing self-citations

full rationale

This is a review paper whose central contribution is a taxonomy and overview of prior VHGR work rather than any new derivation, fitted parameter, or prediction. The abstract and methodology description frame the effort as a systematic literature search followed by structured grouping of methods, datasets, and challenges; no equations, ansatzes, or uniqueness theorems are introduced that could reduce to the paper's own inputs. Self-citations, if present, serve only as normal bibliographic references and do not carry the load of the completeness claim. The survey is therefore self-contained against external benchmarks (the cited primary literature) and receives the default low score for honest non-findings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As a survey the paper does not introduce new free parameters, invented entities, or ad-hoc axioms beyond the standard assumption that its literature search is systematic and representative.

axioms (1)
  • domain assumption A systematic literature search can identify and organize the key aspects, SOTA methods, datasets, and challenges in VHGR without major omissions.
    Invoked in the abstract as the basis for the survey's methodology and claim to fill the gap.

pith-pipeline@v0.9.0 · 5876 in / 1152 out tokens · 28864 ms · 2026-05-19T05:53:01.947947+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Prompt-to-Gesture: Measuring the Capabilities of Image-to-Video Deictic Gesture Generation

    cs.CV 2026-04 unverdicted novelty 5.0

    Prompt-driven image-to-video generation produces deictic gestures that match real data visually, add useful variety, and improve downstream recognition models when mixed with human recordings.

  2. Interactive Augmented Reality-enabled Outdoor Scene Visualization For Enhanced Real-time Disaster Response

    cs.HC 2026-02 unverdicted novelty 5.0

    An AR system using 3D Gaussian Splatting, WIM navigation, and semantic POIs enables real-time disaster scene visualization with high usability and acceptance in preliminary user tests.

Reference graph

Works this paper leans on

217 extracted references · 217 canonical work pages · cited by 2 Pith papers

  1. [1]

    Expert Systems with Applications 248:123258

    Abdullahi SB, Chamnongthai K, Bolon-Canedo V, et al (2024) Spatial–temporal feature-based end-to-end fourier network for 3d sign language recognition. Expert Systems with Applications 248:123258. doi:https://doi.org/10.1016/j.eswa.2024.123258, ://www.sciencedirect.com/science/article/pii/S0957417424001234

  2. [2]

    IEEE Transactions on Multimedia 24:1750--1762

    Adaloglou N, Chatzis T, Papastratis I, et al (2022) A comprehensive study on deep learning-based methods for sign language recognition. IEEE Transactions on Multimedia 24:1750--1762. doi:10.1109/TMM.2021.3070438

  3. [3]

    Intelligent Systems with Applications 12:200056

    Adeyanju IA, Bello OO, Adegboye MA (2021) Machine learning methods for sign language recognition: A critical review and analysis. Intelligent Systems with Applications 12:200056

  4. [4]

    URL http://dx.doi.org/10.1109/ICASSP48485.2024.10447579

    Ahn J, Jang Y, Chung JS (2024) Slowfast network for continuous sign language recognition. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 3920--3924, doi:10.1109/ICASSP48485.2024.10445841

  5. [5]

    In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

    Aich S, Ruiz-Santaquiteria J, Lu Z, et al (2023) Data-free class-incremental hand gesture recognition. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp 20901--20910, doi:10.1109/ICCV51070.2023.01916

  6. [6]

    IEEE Access

    Al Abdullah B, Amoudi G, Alghamdi H (2024) Advancements in sign language recognition: A comprehensive review and future prospects. IEEE Access

  7. [7]

    IEEE Access 9:126917--126951

    Al-Qurishi M, Khalid T, Souissi R (2021) Deep learning for sign language recognition: Current techniques, benchmarks, and open issues. IEEE Access 9:126917--126951

  8. [8]

    Neural Computing and Applications 36(14):7609--7624

    Alaftekin M, Pacal I, Cicek K (2024) Real-time sign language recognition based on yolo algorithm. Neural Computing and Applications 36(14):7609--7624

  9. [9]

    Viana, Maurício Santos, Dulce Freire, Patrícia Abrantes, and Jorge Rocha

    Alam MM, Islam MT, Rahman SM (2022) Unified learning approach for egocentric hand gesture recognition and fingertip detection. Pattern Recognition 121:108200. doi:https://doi.org/10.1016/j.patcog.2021.108200, ://www.sciencedirect.com/science/article/pii/S0031320321003824

  10. [10]

    IEEE Access 12:77019--77040

    Alamri FS, Bala Abdullahi S, Khan AR, et al (2024) Enhanced weak spatial modeling through cnn-based deep sign language skeletal feature transformation. IEEE Access 12:77019--77040. doi:10.1109/ACCESS.2024.3405341

  11. [11]

    In: Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XI 16, Springer, pp 35--53

    Albanie S, Varol G, Momeni L, et al (2020) Bsl-1k: Scaling up co-articulated sign language recognition using mouthing cues. In: Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XI 16, Springer, pp 35--53

  12. [12]

    ://arxiv.org/abs/2111.03635, https://arxiv.org/abs/2111.03635 arXiv:2111.03635

    Albanie S, Varol G, Momeni L, et al (2021) Bbc-oxford british sign language dataset. ://arxiv.org/abs/2111.03635, https://arxiv.org/abs/2111.03635 arXiv:2111.03635

  13. [13]

    Artificial Intelligence Review 58(4):1--55

    Alimisis P, Mademlis I, Radoglou-Grammatikis P, et al (2025) Advances in diffusion models for image data augmentation: A review of methods, models, evaluation metrics and future research directions. Artificial Intelligence Review 58(4):1--55

  14. [14]

    arXiv preprint arXiv:250401666

    Alyami S, Luqman H (2025 a ) Clip-sla: Parameter-efficient clip adaptation for continuous sign language recognition. arXiv preprint arXiv:250401666

  15. [15]

    Neurocomputing 617:129015

    Alyami S, Luqman H (2025 b ) Swin-mstp: Swin transformer with multi-scale temporal perception for continuous sign language recognition. Neurocomputing 617:129015

  16. [16]

    IEEE Transactions on Multimedia 21(1):234--245

    Avola D, Bernardi M, Cinque L, et al (2019) Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures. IEEE Transactions on Multimedia 21(1):234--245. doi:10.1109/TMM.2018.2856094

  17. [17]

    Engineering Applications of Artificial Intelligence 132:108443

    Bamani E, Nissinman E, Meir I, et al (2024) Ultra-range gesture recognition using a web-camera in human–robot interaction. Engineering Applications of Artificial Intelligence 132:108443. doi:https://doi.org/10.1016/j.engappai.2024.108443, ://www.sciencedirect.com/science/article/pii/S0952197624006018

  18. [18]

    arXiv preprint arXiv:241118413

    Beeri EB, Nissinman E, Sintov A (2024) Robust dynamic gesture recognition at ultra-long distances. arXiv preprint arXiv:241118413

  19. [19]

    , author Weiss, Y

    Benitez-Garcia G, Olivares-Mercado J, Sanchez-Perez G, et al (2021) Ipn hand: A video dataset and benchmark for real-time continuous hand gesture recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp 4340--4347, doi:10.1109/ICPR48806.2021.9412317

  20. [20]

    Neural Computing and Applications 35(28):21165--21180

    Bharti S, Balmik A, Nandy A (2023) Novel error correction-based key frame extraction technique for dynamic hand gesture recognition. Neural Computing and Applications 35(28):21165--21180

  21. [21]

    Multimedia Tools and Applications 83(14):41805--41822

    Bhaumik G, Govil MC (2024) Spatnet: A spatial feature attention network for hand gesture recognition. Multimedia Tools and Applications 83(14):41805--41822

  22. [22]

    IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1):1217--1232

    Bilge YC, Cinbis RG, Ikizler-Cinbis N (2023) Towards zero-shot sign language recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1):1217--1232. doi:10.1109/TPAMI.2022.3143074

  23. [24]

    Pattern Recognition 151:110374

    Bilge YC, Ikizler-Cinbis N, Cinbis RG (2024 b ) Cross-lingual few-shot sign language recognition. Pattern Recognition 151:110374. doi:https://doi.org/10.1016/j.patcog.2024.110374, ://www.sciencedirect.com/science/article/pii/S0031320324001250

  24. [25]

    In: 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), pp 1--6, doi:10.1109/FG57933.2023.10042643

    Bohacek M, Hrúz M (2023) Learning from what is already out there: Few-shot sign language recognition with online dictionaries. In: 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), pp 1--6, doi:10.1109/FG57933.2023.10042544

  25. [26]

    In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7784--7793, doi:10.1109/CVPR.2018.00812

    Camgoz NC, Hadfield S, Koller O, et al (2018) Neural sign language translation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7784--7793, doi:10.1109/CVPR.2018.00812

  26. [27]

    In: Koyejo S, Mohamed S, Agarwal A, et al (eds) Advances in Neural Information Processing Systems, vol 35

    Chen Y, Zuo R, Wei F, et al (2022) Two-stream network for sign language recognition and translation. In: Koyejo S, Mohamed S, Agarwal A, et al (eds) Advances in Neural Information Processing Systems, vol 35. Curran Associates, Inc., pp 17043--17056, ://proceedings.neurips.cc/paper_files/paper/2022/file/6cd3ac24cdb789beeaa9f7145670fcae-Paper-Conference.pdf

  27. [28]

    IEEE Transactions on Multimedia 26:3951--3961

    Cheng J, Shi D, Li C, et al (2024) Skeleton-based gesture recognition with learnable paths and signature features. IEEE Transactions on Multimedia 26:3951--3961. doi:10.1109/TMM.2023.3318242

  28. [29]

    In: Karlinsky L, Michaeli T, Nishino K (eds) Computer Vision -- ECCV 2022 Workshops

    Constantin S, Eyiokur FI, Yaman D, et al (2023) Interactive multimodal robot dialog using pointing gesture recognition. In: Karlinsky L, Michaeli T, Nishino K (eds) Computer Vision -- ECCV 2022 Workshops. Springer Nature Switzerland, Cham, pp 640--657

  29. [30]

    arXiv preprint arXiv:250112086

    Cui H, Huang R, Zhang R, et al (2025) Dstsa-gcn: Advancing skeleton-based gesture recognition with semantic-aware spatio-temporal topology modeling. arXiv preprint arXiv:250112086

  30. [31]

    IEEE Transactions on Multimedia 21(7):1880--1891

    Cui R, Liu H, Zhang C (2019) A deep neural framework for continuous sign language recognition by iterative training. IEEE Transactions on Multimedia 21(7):1880--1891. doi:10.1109/TMM.2018.2889563

  31. [32]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

    Cunico F, Girella F, Avogaro A, et al (2023) Oo-dmvmt: A deep multi-view multi-task classification framework for real-time 3d hand gesture classification and segmentation. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 2745--2754, doi:10.1109/CVPRW59228.2023.00275

  32. [33]

    Neural Computing and Applications 34(24):21849--21862

    Dang TL, Nguyen HT, Dao DM, et al (2022) Shape: a dataset for hand gesture recognition. Neural Computing and Applications 34(24):21849--21862

  33. [34]

    Multimedia Tools and Applications 82(18):28569--28587

    Dang TL, Pham TH, Dang QM, et al (2023) A lightweight architecture for hand gesture recognition. Multimedia Tools and Applications 82(18):28569--28587

  34. [35]

    Neural Computing and Applications 36(28):17311--17325

    Dang TL, Pham TH, Dao DM, et al (2024) Date: a video dataset and benchmark for dynamic hand gesture recognition. Neural Computing and Applications 36(28):17311--17325

  35. [36]

    In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

    De Coster M, Van Herreweghe M, Dambre J (2021) Isolated sign recognition from rgb video using pose flow and self-attention. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 3436--3445, doi:10.1109/CVPRW53098.2021.00383

  36. [37]

    In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 1206--1214, doi:10.1109/CVPRW.2016.153

    De Smedt Q, Wannous H, Vandeborre JP (2016) Skeleton-based dynamic hand gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 1206--1214, doi:10.1109/CVPRW.2016.153

  37. [38]

    Neurocomputing 572:127194

    Deng Z, Leng Y, Chen J, et al (2024) Tms-net: A multi-feature multi-stream multi-level information sharing network for skeleton-based sign language recognition. Neurocomputing 572:127194. doi:https://doi.org/10.1016/j.neucom.2023.127194, ://www.sciencedirect.com/science/article/pii/S0925231223013176

  38. [39]

    IEEE Access 11:140382--140394

    Dozdor Z, Kalafatic Z, Ban Z, et al (2023) Ty-net: Transforming yolo for hand gesture recognition. IEEE Access 11:140382--140394. doi:10.1109/ACCESS.2023.3341702

  39. [40]

    Neurocomputing 500:115--123

    Du Y, Xie P, Wang M, et al (2022) Full transformer network with masking future for word-level sign language recognition. Neurocomputing 500:115--123. doi:https://doi.org/10.1016/j.neucom.2022.05.051, ://www.sciencedirect.com/science/article/pii/S0925231222006178

  40. [41]

    Derf: Decomposed radiance fields,

    Duarte A, Palaskar S, Ventura L, et al (2021) How2sign: A large-scale multimodal dataset for continuous american sign language. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2734--2743, doi:10.1109/CVPR46437.2021.00276

  41. [42]

    In: Proceedings of the 11th international conference on language resources and evaluation (LREC) 2018, The European Language Resources Association (ELRA)

    Ebling S, Camg \"o z NC, Braem PB, et al (2018) Smile swiss german sign language dataset. In: Proceedings of the 11th international conference on language resources and evaluation (LREC) 2018, The European Language Resources Association (ELRA)

  42. [43]

    Engineering Applications of Artificial Intelligence 114:105198

    El-Alfy ESM, Luqman H (2022) A comprehensive survey and taxonomy of sign language research. Engineering Applications of Artificial Intelligence 114:105198

  43. [44]

    p 1911 – 1916, ://www.scopus.com/inward/record.uri?eid=2-s2.0-85020237430&partnerID=40&md5=e17a51f131e02531fa45248c5bd212bc, cited by: 130

    Forster J, Schmidt C, Koller O, et al (2014) Extensions of the sign language recognition and translation corpus rwth-phoenix-weather. p 1911 – 1916, ://www.scopus.com/inward/record.uri?eid=2-s2.0-85020237430&partnerID=40&md5=e17a51f131e02531fa45248c5bd212bc, cited by: 130

  44. [45]

    IEEE Transactions on Intelligent Transportation Systems 24(9):9518--9531

    Fu Z, Chen J, Jiang K, et al (2023) Traffic police 3d gesture recognition based on spatial–temporal fully adaptive graph convolutional network. IEEE Transactions on Intelligent Transportation Systems 24(9):9518--9531. doi:10.1109/TITS.2023.3276345

  45. [46]

    IEEE Robotics and Automation Letters 8(11):7807--7814

    Gan M, Liu J, He Y, et al (2023) Keyframe selection via deep reinforcement learning for skeleton-based gesture recognition. IEEE Robotics and Automation Letters 8(11):7807--7814. doi:10.1109/LRA.2023.3322645

  46. [47]

    Emogen: Emotional image content generation with text-to-image diffusion models,

    Gan S, Yin Y, Jiang Z, et al (2024) Signgraph: A sign sequence is worth graphs of nodes. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13470--13479, doi:10.1109/CVPR52733.2024.01279

  47. [48]

    IEEE Transactions on Circuits and Systems for Video Technology 34(11):11724--11738

    Gao L, Lyu F, Shi P, et al (2024) Overcoming modality bias in question-driven sign language video translation. IEEE Transactions on Circuits and Systems for Video Technology 34(11):11724--11738. doi:10.1109/TCSVT.2024.3419089

  48. [49]

    IEEE Sensors Journal 22(18):17421--17430

    Gao Q, Chen Y, Ju Z, et al (2022) Dynamic hand gesture recognition based on 3d hand pose estimation for human–robot interaction. IEEE Sensors Journal 22(18):17421--17430. doi:10.1109/JSEN.2021.3059685

  49. [50]

    In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 409--419, doi:10.1109/CVPR.2018.00050

    Garcia-Hernando G, Yuan S, Baek S, et al (2018) First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 409--419, doi:10.1109/CVPR.2018.00050

  50. [51]

    In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Work- shops (CVPR W), pp

    Garg M, Ghosh D, Pradhan PM (2024 a ) Gestformer: Multiscale wavelet pooling transformer network for dynamic hand gesture recognition. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 2473--2483, doi:10.1109/CVPRW63382.2024.00254

  51. [52]

    ://arxiv.org/abs/2409.03890, https://arxiv.org/abs/2409.03890 arXiv:2409.03890

    Garg M, Ghosh D, Pradhan PM (2024 b ) Mvtn: A multiscale video transformer network for hand gesture recognition. ://arxiv.org/abs/2409.03890, https://arxiv.org/abs/2409.03890 arXiv:2409.03890

  52. [53]

    In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV)

    Garg M, Ghosh D, Pradhan PM (2025) Convmixformer- a resource-efficient convolution mixer for transformer-based dynamic hand gesture recognition. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 6156--6166, doi:10.1109/WACV61041.2025.00600

  53. [54]

    Neural Computing and Applications 35(28):20889--20907

    Ghorai A, Nandi U, Changdar C, et al (2023) Indian sign language recognition system using network deconvolution and spatial transformer network. Neural Computing and Applications 35(28):20889--20907

  54. [55]

    In: Proceedings of the 23rd international conference on Machine learning, pp 369--376

    Graves A, Fern \'a ndez S, Gomez F, et al (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning, pp 369--376

  55. [56]

    doi:https://doi.org/10.1016/j.patcog.2025.111602, ://www.sciencedirect.com/science/article/pii/S0031320325002626

    Guan M, Wang Y, Ma G, et al (2025) Mska: Multi-stream keypoint attention network for sign language recognition and translation. doi:https://doi.org/10.1016/j.patcog.2025.111602, ://www.sciencedirect.com/science/article/pii/S0031320325002626

  56. [57]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Guo L, Xue W, Guo Q, et al (2023) Distilling cross-temporal contexts for continuous sign language recognition. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10771--10780, doi:10.1109/CVPR52729.2023.01037

  57. [58]

    IEEE Transactions on Image Processing 33:3486--3495

    Guo L, Xue W, Liu B, et al (2024 a ) Gloss prior guided visual feature learning for continuous sign language recognition. IEEE Transactions on Image Processing 33:3486--3495. doi:10.1109/TIP.2024.3404869

  58. [59]

    IEEE Transactions on Intelligent Transportation Systems 25(10):14031--14039

    Guo X, Zhu Q, Wang Y, et al (2024 b ) Mg-gct: A motion-guided graph convolutional transformer for traffic gesture recognition. IEEE Transactions on Intelligent Transportation Systems 25(10):14031--14039. doi:10.1109/TITS.2024.3394911

  59. [60]

    IEEE Access 11:34094--34103

    Hampiholi B, Jarvers C, Mader W, et al (2023) Convolutional transformer fusion blocks for multi-modal gesture recognition. IEEE Access 11:34094--34103. doi:10.1109/ACCESS.2023.3263812

  60. [61]

    IEEE Access 12:67005--67012

    Han W, Hao M, Yuan Y, et al (2024) Fusion enhancement of yolov5 and copula bayesian classifier for hand gesture recognition in smart sports venues. IEEE Access 12:67005--67012. doi:10.1109/ACCESS.2024.3398142

  61. [62]

    Walk in the cloud: Learning curves for point clouds shape analysis, pp

    Hao A, Min Y, Chen X (2021) Self-mutual distillation learning for continuous sign language recognition. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp 11283--11292, doi:10.1109/ICCV48922.2021.01111

  62. [63]

    IEEE Sensors Journal 23(23):29227--29234

    Hao S, Fu M, Liu X, et al (2023) Dynamic gesture recognition based on two-scale 3-d-convnext. IEEE Sensors Journal 23(23):29227--29234. doi:10.1109/JSEN.2023.3324479

  63. [64]

    IEEE Access

    Hashi AO, Hashim SZM, Asamah AB (2024) A systematic review of hand gesture recognition: An update from 2018 to 2024. IEEE Access

  64. [65]

    IEEE Access 12:28761--28774

    Hax DRT, Penava P, Krodel S, et al (2024) A novel hybrid deep learning architecture for dynamic hand gesture recognition. IEEE Access 12:28761--28774. doi:10.1109/ACCESS.2024.3365274

  65. [66]

    Neurocomputing 390:248--259

    He J, Zhang C, He X, et al (2020) Visual recognition of traffic police gestures with convolutional pose machine and handcrafted features. Neurocomputing 390:248--259. doi:https://doi.org/10.1016/j.neucom.2019.07.103, ://www.sciencedirect.com/science/article/pii/S0925231219314420

  66. [67]

    In: Proceedings of the 5th ACM Multimedia Systems Conference

    Hsiao YS, Sanchez-Riera J, Lim T, et al (2014) Lared: a large rgb-d extensible hand gesture dataset. In: Proceedings of the 5th ACM Multimedia Systems Conference. Association for Computing Machinery, New York, NY, USA, MMSys '14, p 53–58, doi:10.1145/2557642.2563669, ://doi.org/10.1145/2557642.2563669

  67. [68]

    Walk in the cloud: Learning curves for point clouds shape analysis, pp

    Hu H, Zhao W, Zhou W, et al (2021 a ) Signbert: Pre-training of hand-model-aware representation for sign language recognition. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp 11067--11076, doi:10.1109/ICCV48922.2021.01090

  68. [69]

    ACM Trans Multimedia Comput Commun Appl 17(3)

    Hu H, Zhou W, Pu J, et al (2021 b ) Global-local enhancement network for nmf-aware sign language recognition. ACM Trans Multimedia Comput Commun Appl 17(3). doi:10.1145/3436754, ://doi.org/10.1145/3436754

  69. [70]

    IEEE Transactions on Multimedia 25:7559--7570

    Hu H, Pu J, Zhou W, et al (2023 a ) Collaborative multilingual continuous sign language recognition: A unified framework. IEEE Transactions on Multimedia 25:7559--7570. doi:10.1109/TMM.2022.3223260

  70. [71]

    IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9):11221--11239

    Hu H, Zhao W, Zhou W, et al (2023 b ) Signbert+: Hand-model-aware self-supervised pre-training for sign language understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9):11221--11239. doi:10.1109/TPAMI.2023.3269220

  71. [72]

    IEEE Transactions on Multimedia 26:593--606

    Hu H, Pu J, Zhou W, et al (2024 a ) Prior-aware cross modality augmentation learning for continuous sign language recognition. IEEE Transactions on Multimedia 26:593--606. doi:10.1109/TMM.2023.3268368

  72. [73]

    In: European conference on computer vision, Springer, pp 511--527

    Hu L, Gao L, Liu Z, et al (2022) Temporal lift pooling for continuous sign language recognition. In: European conference on computer vision, Springer, pp 511--527

  73. [74]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Hu L, Gao L, Liu Z, et al (2023 c ) Continuous sign language recognition with correlation network. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2529--2539, doi:10.1109/CVPR52729.2023.00249

  74. [75]

    Proceedings of the AAAI Conference on Artificial Intelligence 37(1):854--862

    Hu L, Gao L, Liu Z, et al (2023 d ) Self-emphasizing network for continuous sign language recognition. Proceedings of the AAAI Conference on Artificial Intelligence 37(1):854--862. doi:10.1609/aaai.v37i1.25164, ://ojs.aaai.org/index.php/AAAI/article/view/25164

  75. [76]

    ://arxiv.org/abs/2404.11111, https://arxiv.org/abs/2404.11111 arXiv:2404.11111

    Hu L, Feng W, Gao L, et al (2024 b ) Corrnet+: Sign language recognition and translation via spatial-temporal correlation. ://arxiv.org/abs/2404.11111, https://arxiv.org/abs/2404.11111 arXiv:2404.11111

  76. [77]

    arXiv preprint arXiv:240312519

    Hu L, Gao L, Liu Z, et al (2024 c ) Dynamic spatial-temporal aggregation for skeleton-aware sign language recognition. arXiv preprint arXiv:240312519

  77. [78]

    IEEE Transactions on Circuits and Systems for Video Technology 29(9):2822--2832

    Huang J, Zhou W, Li H, et al (2019) Attention-based 3d-cnns for large-vocabulary sign language recognition. IEEE Transactions on Circuits and Systems for Video Technology 29(9):2822--2832. doi:10.1109/TCSVT.2018.2870740

  78. [79]

    In: 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pp 1--10, doi:10.1109/FG59268.2024.10581975

    Ikne O, Allaert B, Wannous H (2024) Skeleton-based self-supervised feature extraction for improved dynamic hand gesture recognition. In: 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pp 1--10, doi:10.1109/FG59268.2024.10581975

  79. [80]

    IEEE Access 11:133755--133766

    Jafari F, Basu A (2023) Two-dimensional parallel spatio-temporal pyramid pooling for hand gesture recognition. IEEE Access 11:133755--133766. doi:10.1109/ACCESS.2023.3336591

  80. [81]

    arXiv preprint arXiv:211006161

    Jiang S, Sun B, Wang L, et al (2021 a ) Sign language recognition via skeleton-aware multi-model ensemble. arXiv preprint arXiv:211006161

Showing first 80 references.