Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions

Georgios Th. Papadopoulos; Iraklis Varlamis; Konstantinos Foteinos; Manousos Linardakis; Panagiotis Radoglou-Grammatikis; Panagiotis Sarigiannidis; Vasileios Argyriou

arxiv: 2507.04465 · v4 · submitted 2025-07-06 · 💻 cs.CV

Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions

Konstantinos Foteinos , Manousos Linardakis , Panagiotis Radoglou-Grammatikis , Vasileios Argyriou , Panagiotis Sarigiannidis , Iraklis Varlamis , Georgios Th. Papadopoulos This is my paper

Pith reviewed 2026-05-19 05:53 UTC · model grok-4.3

classification 💻 cs.CV

keywords visual hand gesture recognitiondeep learningsurveystatic gesture recognitiondynamic gesture recognitioncontinuous gesture recognitiondatasetschallenges

0 comments

The pith

This review organizes deep learning methods for visual hand gesture recognition into static, isolated dynamic, and continuous tasks as a guideline for researchers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a comprehensive overview of visual hand gesture recognition using deep learning to fill the gap left by the absence of a structured survey amid hundreds of papers. It uses a systematic research methodology to locate literature and presents methods, datasets, and metrics in a taxonomy-based format. The review focuses on four questions covering main aspects, current state-of-the-art methods grouped by task, comparative insights, and challenges that shape future work. A sympathetic reader would care because the overview supports proposing improvements in applications such as sign language understanding and human-computer interaction.

Core claim

Starting with the methodology used to locate the related literature, the survey identifies and organizes the key VHGR approaches in a taxonomy-based format. The SOTA methods are grouped across three primary VHGR tasks: static, isolated dynamic and continuous gesture recognition. For each task, the architectural trends and learning strategies are listed. To support the experimental evaluation of future methods in the field, the study reviews commonly used datasets and presents the standard performance metrics. The survey concludes by identifying the major challenges in VHGR, including both general computer vision issues and domain-specific obstacles, and outlines promising directions for未来研究.

What carries the argument

Taxonomy-based format that organizes approaches and groups SOTA methods across the three primary tasks of static, isolated dynamic, and continuous gesture recognition.

If this is right

Researchers can identify current state-of-the-art methods more quickly for each of the three gesture recognition tasks.
Reviewed datasets and standard metrics enable consistent evaluation when testing new methods.
Listed challenges point to specific obstacles that new work should target to advance the field.
Outlined future directions suggest concrete paths for improving architectures and strategies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The task-based grouping could support cross-task transfer of techniques between static and dynamic recognition settings.
The survey structure might extend to related vision problems such as full-body action recognition.
Future updates could add benchmarks for real-time deployment on edge devices.

Load-bearing premise

The systematic research methodology used to locate and organize the related literature has captured the key VHGR approaches, SOTA methods, datasets, and challenges without significant omissions or selection bias.

What would settle it

A search that uncovers a major deep learning paper on visual hand gesture recognition from the covered period that falls outside the survey's taxonomy, task groupings, or listed datasets.

read the original abstract

The rapid evolution of deep learning (DL) models and the ever-increasing size of available datasets have raised the interest of the research community in the always-important field of visual hand gesture recognition (VHGR), and delivered a wide range of applications, such as sign language understanding and human-computer interaction. Despite the large volume of research works in the field, a structured and complete survey on VHGR is still missing, leaving researchers to navigate through hundreds of papers in order to find the current state-of-the-art (SOTA). The current survey aims to fill this gap by presenting a comprehensive overview of this computer vision field. With a systematic research methodology and a structured presentation of the various methods, datasets, and evaluation metrics, this review aims to constitute a useful guideline for researchers, helping them to propose improvements. Specifically, this survey focuses on four fundamental questions: what are the main VHGR aspects, what are the current SOTA methods, what comparative insights can be drawn across methods and tasks, and which challenges shape future research. Starting with the methodology used to locate the related literature, the survey identifies and organizes the key VHGR approaches in a taxonomy-based format. The SOTA methods are grouped across three primary VHGR tasks: static, isolated dynamic and continuous gesture recognition. For each task, the architectural trends and learning strategies are listed. To support the experimental evaluation of future methods in the field, the study reviews commonly used datasets and presents the standard performance metrics. Our survey concludes by identifying the major challenges in VHGR, including both general computer vision issues and domain-specific obstacles, and outlines promising directions for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard organizing survey on visual hand gesture recognition whose value depends on whether the literature search was detailed enough to avoid gaps.

read the letter

Hi, the main thing here is a review that groups deep learning work on visual hand gesture recognition into static, isolated dynamic, and continuous tasks, while also covering datasets, metrics, and open challenges. It aims to answer four practical questions about aspects, current methods, comparative insights, and future directions, which gives the structure some clarity for people who need a reference map in this area. That organization is the useful part, especially for applications like sign language or HCI where researchers might want a quick way to locate SOTA approaches and common evaluation setups. The taxonomy and trend summaries across learning strategies seem like a reasonable way to present the material without introducing new experiments. The soft spot is the methodology. The abstract claims a systematic approach, but if the methods section lacks concrete details such as exact search strings, specific databases, date ranges, inclusion criteria, or a flow of screened versus included papers, then it is harder to confirm the coverage is complete and unbiased. That would make the guideline claim rest on trust rather than verifiable steps. No obvious internal contradictions show up in the abstract, and the paper does not appear to overclaim new results. This is for computer vision researchers who are either new to gesture recognition or looking for a consolidated starting point on datasets and challenges. It is not the kind of work that changes core methods, but it could help someone avoid reinventing basic literature searches. I would send it to peer review so a referee can verify the search process and suggest any missing recent papers if needed.

Referee Report

1 major / 2 minor

Summary. The paper presents a comprehensive survey on visual hand gesture recognition (VHGR) with deep learning. It begins with a systematic research methodology to locate and organize related literature into a taxonomy-based format, groups current SOTA methods across static, isolated dynamic, and continuous gesture recognition tasks with architectural trends and learning strategies, reviews commonly used datasets and standard performance metrics to support future evaluations, and concludes by identifying major challenges (general CV and domain-specific) along with promising future research directions.

Significance. If the coverage proves complete and free of significant selection bias, the survey would provide a structured guideline for VHGR researchers by consolidating SOTA comparisons across tasks, standard datasets/metrics, and challenge identification, thereby helping to direct improvements in this active computer vision subfield.

major comments (1)

[§2] §2 (Methodology): The systematic research methodology is described at a high level without providing concrete, reproducible elements such as exact search strings/Boolean queries, specific databases queried (e.g., Google Scholar, IEEE Xplore, Scopus), explicit publication date range, inclusion/exclusion criteria, or a quantitative flow diagram/summary of papers screened versus included. This directly affects the central claim that the taxonomy, SOTA groupings, and challenge identification constitute a comprehensive, unbiased overview suitable as a researcher guideline.

minor comments (2)

[Abstract] The abstract states the survey focuses on 'four fundamental questions' but enumerates them narratively rather than as a numbered list, which could improve immediate clarity for readers.
[Throughout] Ensure consistent first-use definitions for acronyms (VHGR, SOTA, DL) and that all cited works in the taxonomy and SOTA sections receive full bibliographic entries.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comment below and will revise the manuscript accordingly to improve reproducibility while preserving the integrity of our survey.

read point-by-point responses

Referee: [§2] §2 (Methodology): The systematic research methodology is described at a high level without providing concrete, reproducible elements such as exact search strings/Boolean queries, specific databases queried (e.g., Google Scholar, IEEE Xplore, Scopus), explicit publication date range, inclusion/exclusion criteria, or a quantitative flow diagram/summary of papers screened versus included. This directly affects the central claim that the taxonomy, SOTA groupings, and challenge identification constitute a comprehensive, unbiased overview suitable as a researcher guideline.

Authors: We acknowledge that Section 2 currently presents the methodology at a high level. We agree this limits full reproducibility and will revise the section in the next version to include the specific Boolean search strings used, the databases queried (Google Scholar, IEEE Xplore, Scopus, and arXiv), the explicit publication date range (2014–2024), detailed inclusion/exclusion criteria, and a PRISMA-style flow diagram with quantitative screening statistics. These additions are based on the original search process we conducted and will directly support the claim of a comprehensive, unbiased overview suitable as a researcher guideline. revision: yes

Circularity Check

0 steps flagged

No circularity: survey organizes existing literature without self-referential derivations or load-bearing self-citations

full rationale

This is a review paper whose central contribution is a taxonomy and overview of prior VHGR work rather than any new derivation, fitted parameter, or prediction. The abstract and methodology description frame the effort as a systematic literature search followed by structured grouping of methods, datasets, and challenges; no equations, ansatzes, or uniqueness theorems are introduced that could reduce to the paper's own inputs. Self-citations, if present, serve only as normal bibliographic references and do not carry the load of the completeness claim. The survey is therefore self-contained against external benchmarks (the cited primary literature) and receives the default low score for honest non-findings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As a survey the paper does not introduce new free parameters, invented entities, or ad-hoc axioms beyond the standard assumption that its literature search is systematic and representative.

axioms (1)

domain assumption A systematic literature search can identify and organize the key aspects, SOTA methods, datasets, and challenges in VHGR without major omissions.
Invoked in the abstract as the basis for the survey's methodology and claim to fill the gap.

pith-pipeline@v0.9.0 · 5876 in / 1152 out tokens · 28864 ms · 2026-05-19T05:53:01.947947+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Sections 4–6 group SOTA methods by architecture (spatial, spatiotemporal, GCN+Transformer, multi-stream fusion) and report accuracies on IsoGD, WLASL, etc.; no cost or ratio-symmetric constructions appear.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Taxonomy in §3 and methodology in §2 contain no 8-tick clocks, φ-ladder spacings, or distinction-to-spacetime forcing.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Prompt-to-Gesture: Measuring the Capabilities of Image-to-Video Deictic Gesture Generation
cs.CV 2026-04 unverdicted novelty 5.0

Prompt-driven image-to-video generation produces deictic gestures that match real data visually, add useful variety, and improve downstream recognition models when mixed with human recordings.
Interactive Augmented Reality-enabled Outdoor Scene Visualization For Enhanced Real-time Disaster Response
cs.HC 2026-02 unverdicted novelty 5.0

An AR system using 3D Gaussian Splatting, WIM navigation, and semantic POIs enables real-time disaster scene visualization with high usability and acceptance in preliminary user tests.

Reference graph

Works this paper leans on

217 extracted references · 217 canonical work pages · cited by 2 Pith papers

[1]

Expert Systems with Applications 248:123258

Abdullahi SB, Chamnongthai K, Bolon-Canedo V, et al (2024) Spatial–temporal feature-based end-to-end fourier network for 3d sign language recognition. Expert Systems with Applications 248:123258. doi:https://doi.org/10.1016/j.eswa.2024.123258, ://www.sciencedirect.com/science/article/pii/S0957417424001234

work page doi:10.1016/j.eswa.2024.123258 2024
[2]

IEEE Transactions on Multimedia 24:1750--1762

Adaloglou N, Chatzis T, Papastratis I, et al (2022) A comprehensive study on deep learning-based methods for sign language recognition. IEEE Transactions on Multimedia 24:1750--1762. doi:10.1109/TMM.2021.3070438

work page doi:10.1109/tmm.2021.3070438 2022
[3]

Intelligent Systems with Applications 12:200056

Adeyanju IA, Bello OO, Adegboye MA (2021) Machine learning methods for sign language recognition: A critical review and analysis. Intelligent Systems with Applications 12:200056

work page 2021
[4]

URL http://dx.doi.org/10.1109/ICASSP48485.2024.10447579

Ahn J, Jang Y, Chung JS (2024) Slowfast network for continuous sign language recognition. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 3920--3924, doi:10.1109/ICASSP48485.2024.10445841

work page doi:10.1109/icassp48485.2024.10445841 2024
[5]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Aich S, Ruiz-Santaquiteria J, Lu Z, et al (2023) Data-free class-incremental hand gesture recognition. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp 20901--20910, doi:10.1109/ICCV51070.2023.01916

work page doi:10.1109/iccv51070.2023.01916 2023
[6]

IEEE Access

Al Abdullah B, Amoudi G, Alghamdi H (2024) Advancements in sign language recognition: A comprehensive review and future prospects. IEEE Access

work page 2024
[7]

IEEE Access 9:126917--126951

Al-Qurishi M, Khalid T, Souissi R (2021) Deep learning for sign language recognition: Current techniques, benchmarks, and open issues. IEEE Access 9:126917--126951

work page 2021
[8]

Neural Computing and Applications 36(14):7609--7624

Alaftekin M, Pacal I, Cicek K (2024) Real-time sign language recognition based on yolo algorithm. Neural Computing and Applications 36(14):7609--7624

work page 2024
[9]

Viana, Maurício Santos, Dulce Freire, Patrícia Abrantes, and Jorge Rocha

Alam MM, Islam MT, Rahman SM (2022) Unified learning approach for egocentric hand gesture recognition and fingertip detection. Pattern Recognition 121:108200. doi:https://doi.org/10.1016/j.patcog.2021.108200, ://www.sciencedirect.com/science/article/pii/S0031320321003824

work page doi:10.1016/j.patcog.2021.108200 2022
[10]

IEEE Access 12:77019--77040

Alamri FS, Bala Abdullahi S, Khan AR, et al (2024) Enhanced weak spatial modeling through cnn-based deep sign language skeletal feature transformation. IEEE Access 12:77019--77040. doi:10.1109/ACCESS.2024.3405341

work page doi:10.1109/access.2024.3405341 2024
[11]

In: Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XI 16, Springer, pp 35--53

Albanie S, Varol G, Momeni L, et al (2020) Bsl-1k: Scaling up co-articulated sign language recognition using mouthing cues. In: Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XI 16, Springer, pp 35--53

work page 2020
[12]

://arxiv.org/abs/2111.03635, https://arxiv.org/abs/2111.03635 arXiv:2111.03635

Albanie S, Varol G, Momeni L, et al (2021) Bbc-oxford british sign language dataset. ://arxiv.org/abs/2111.03635, https://arxiv.org/abs/2111.03635 arXiv:2111.03635

work page arXiv 2021
[13]

Artificial Intelligence Review 58(4):1--55

Alimisis P, Mademlis I, Radoglou-Grammatikis P, et al (2025) Advances in diffusion models for image data augmentation: A review of methods, models, evaluation metrics and future research directions. Artificial Intelligence Review 58(4):1--55

work page 2025
[14]

arXiv preprint arXiv:250401666

Alyami S, Luqman H (2025 a ) Clip-sla: Parameter-efficient clip adaptation for continuous sign language recognition. arXiv preprint arXiv:250401666

work page 2025
[15]

Neurocomputing 617:129015

Alyami S, Luqman H (2025 b ) Swin-mstp: Swin transformer with multi-scale temporal perception for continuous sign language recognition. Neurocomputing 617:129015

work page 2025
[16]

IEEE Transactions on Multimedia 21(1):234--245

Avola D, Bernardi M, Cinque L, et al (2019) Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures. IEEE Transactions on Multimedia 21(1):234--245. doi:10.1109/TMM.2018.2856094

work page doi:10.1109/tmm.2018.2856094 2019
[17]

Engineering Applications of Artificial Intelligence 132:108443

Bamani E, Nissinman E, Meir I, et al (2024) Ultra-range gesture recognition using a web-camera in human–robot interaction. Engineering Applications of Artificial Intelligence 132:108443. doi:https://doi.org/10.1016/j.engappai.2024.108443, ://www.sciencedirect.com/science/article/pii/S0952197624006018

work page doi:10.1016/j.engappai.2024.108443 2024
[18]

arXiv preprint arXiv:241118413

Beeri EB, Nissinman E, Sintov A (2024) Robust dynamic gesture recognition at ultra-long distances. arXiv preprint arXiv:241118413

work page 2024
[19]

, author Weiss, Y

Benitez-Garcia G, Olivares-Mercado J, Sanchez-Perez G, et al (2021) Ipn hand: A video dataset and benchmark for real-time continuous hand gesture recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp 4340--4347, doi:10.1109/ICPR48806.2021.9412317

work page doi:10.1109/icpr48806.2021.9412317 2021
[20]

Neural Computing and Applications 35(28):21165--21180

Bharti S, Balmik A, Nandy A (2023) Novel error correction-based key frame extraction technique for dynamic hand gesture recognition. Neural Computing and Applications 35(28):21165--21180

work page 2023
[21]

Multimedia Tools and Applications 83(14):41805--41822

Bhaumik G, Govil MC (2024) Spatnet: A spatial feature attention network for hand gesture recognition. Multimedia Tools and Applications 83(14):41805--41822

work page 2024
[22]

IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1):1217--1232

Bilge YC, Cinbis RG, Ikizler-Cinbis N (2023) Towards zero-shot sign language recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1):1217--1232. doi:10.1109/TPAMI.2022.3143074

work page doi:10.1109/tpami.2022.3143074 2023
[24]

Pattern Recognition 151:110374

Bilge YC, Ikizler-Cinbis N, Cinbis RG (2024 b ) Cross-lingual few-shot sign language recognition. Pattern Recognition 151:110374. doi:https://doi.org/10.1016/j.patcog.2024.110374, ://www.sciencedirect.com/science/article/pii/S0031320324001250

work page doi:10.1016/j.patcog.2024.110374 2024
[25]

In: 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), pp 1--6, doi:10.1109/FG57933.2023.10042643

Bohacek M, Hrúz M (2023) Learning from what is already out there: Few-shot sign language recognition with online dictionaries. In: 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), pp 1--6, doi:10.1109/FG57933.2023.10042544

work page doi:10.1109/fg57933.2023.10042544 2023
[26]

In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7784--7793, doi:10.1109/CVPR.2018.00812

Camgoz NC, Hadfield S, Koller O, et al (2018) Neural sign language translation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7784--7793, doi:10.1109/CVPR.2018.00812

work page doi:10.1109/cvpr.2018.00812 2018
[27]

In: Koyejo S, Mohamed S, Agarwal A, et al (eds) Advances in Neural Information Processing Systems, vol 35

Chen Y, Zuo R, Wei F, et al (2022) Two-stream network for sign language recognition and translation. In: Koyejo S, Mohamed S, Agarwal A, et al (eds) Advances in Neural Information Processing Systems, vol 35. Curran Associates, Inc., pp 17043--17056, ://proceedings.neurips.cc/paper_files/paper/2022/file/6cd3ac24cdb789beeaa9f7145670fcae-Paper-Conference.pdf

work page 2022
[28]

IEEE Transactions on Multimedia 26:3951--3961

Cheng J, Shi D, Li C, et al (2024) Skeleton-based gesture recognition with learnable paths and signature features. IEEE Transactions on Multimedia 26:3951--3961. doi:10.1109/TMM.2023.3318242

work page doi:10.1109/tmm.2023.3318242 2024
[29]

In: Karlinsky L, Michaeli T, Nishino K (eds) Computer Vision -- ECCV 2022 Workshops

Constantin S, Eyiokur FI, Yaman D, et al (2023) Interactive multimodal robot dialog using pointing gesture recognition. In: Karlinsky L, Michaeli T, Nishino K (eds) Computer Vision -- ECCV 2022 Workshops. Springer Nature Switzerland, Cham, pp 640--657

work page 2023
[30]

arXiv preprint arXiv:250112086

Cui H, Huang R, Zhang R, et al (2025) Dstsa-gcn: Advancing skeleton-based gesture recognition with semantic-aware spatio-temporal topology modeling. arXiv preprint arXiv:250112086

work page 2025
[31]

IEEE Transactions on Multimedia 21(7):1880--1891

Cui R, Liu H, Zhang C (2019) A deep neural framework for continuous sign language recognition by iterative training. IEEE Transactions on Multimedia 21(7):1880--1891. doi:10.1109/TMM.2018.2889563

work page doi:10.1109/tmm.2018.2889563 2019
[32]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

Cunico F, Girella F, Avogaro A, et al (2023) Oo-dmvmt: A deep multi-view multi-task classification framework for real-time 3d hand gesture classification and segmentation. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 2745--2754, doi:10.1109/CVPRW59228.2023.00275

work page doi:10.1109/cvprw59228.2023.00275 2023
[33]

Neural Computing and Applications 34(24):21849--21862

Dang TL, Nguyen HT, Dao DM, et al (2022) Shape: a dataset for hand gesture recognition. Neural Computing and Applications 34(24):21849--21862

work page 2022
[34]

Multimedia Tools and Applications 82(18):28569--28587

Dang TL, Pham TH, Dang QM, et al (2023) A lightweight architecture for hand gesture recognition. Multimedia Tools and Applications 82(18):28569--28587

work page 2023
[35]

Neural Computing and Applications 36(28):17311--17325

Dang TL, Pham TH, Dao DM, et al (2024) Date: a video dataset and benchmark for dynamic hand gesture recognition. Neural Computing and Applications 36(28):17311--17325

work page 2024
[36]

In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

De Coster M, Van Herreweghe M, Dambre J (2021) Isolated sign recognition from rgb video using pose flow and self-attention. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 3436--3445, doi:10.1109/CVPRW53098.2021.00383

work page doi:10.1109/cvprw53098.2021.00383 2021
[37]

In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 1206--1214, doi:10.1109/CVPRW.2016.153

De Smedt Q, Wannous H, Vandeborre JP (2016) Skeleton-based dynamic hand gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 1206--1214, doi:10.1109/CVPRW.2016.153

work page doi:10.1109/cvprw.2016.153 2016
[38]

Neurocomputing 572:127194

Deng Z, Leng Y, Chen J, et al (2024) Tms-net: A multi-feature multi-stream multi-level information sharing network for skeleton-based sign language recognition. Neurocomputing 572:127194. doi:https://doi.org/10.1016/j.neucom.2023.127194, ://www.sciencedirect.com/science/article/pii/S0925231223013176

work page doi:10.1016/j.neucom.2023.127194 2024
[39]

IEEE Access 11:140382--140394

Dozdor Z, Kalafatic Z, Ban Z, et al (2023) Ty-net: Transforming yolo for hand gesture recognition. IEEE Access 11:140382--140394. doi:10.1109/ACCESS.2023.3341702

work page doi:10.1109/access.2023.3341702 2023
[40]

Neurocomputing 500:115--123

Du Y, Xie P, Wang M, et al (2022) Full transformer network with masking future for word-level sign language recognition. Neurocomputing 500:115--123. doi:https://doi.org/10.1016/j.neucom.2022.05.051, ://www.sciencedirect.com/science/article/pii/S0925231222006178

work page doi:10.1016/j.neucom.2022.05.051 2022
[41]

Derf: Decomposed radiance fields,

Duarte A, Palaskar S, Ventura L, et al (2021) How2sign: A large-scale multimodal dataset for continuous american sign language. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2734--2743, doi:10.1109/CVPR46437.2021.00276

work page doi:10.1109/cvpr46437.2021.00276 2021
[42]

In: Proceedings of the 11th international conference on language resources and evaluation (LREC) 2018, The European Language Resources Association (ELRA)

Ebling S, Camg \"o z NC, Braem PB, et al (2018) Smile swiss german sign language dataset. In: Proceedings of the 11th international conference on language resources and evaluation (LREC) 2018, The European Language Resources Association (ELRA)

work page 2018
[43]

Engineering Applications of Artificial Intelligence 114:105198

El-Alfy ESM, Luqman H (2022) A comprehensive survey and taxonomy of sign language research. Engineering Applications of Artificial Intelligence 114:105198

work page 2022
[44]

p 1911 – 1916, ://www.scopus.com/inward/record.uri?eid=2-s2.0-85020237430&partnerID=40&md5=e17a51f131e02531fa45248c5bd212bc, cited by: 130

Forster J, Schmidt C, Koller O, et al (2014) Extensions of the sign language recognition and translation corpus rwth-phoenix-weather. p 1911 – 1916, ://www.scopus.com/inward/record.uri?eid=2-s2.0-85020237430&partnerID=40&md5=e17a51f131e02531fa45248c5bd212bc, cited by: 130

work page 2014
[45]

IEEE Transactions on Intelligent Transportation Systems 24(9):9518--9531

Fu Z, Chen J, Jiang K, et al (2023) Traffic police 3d gesture recognition based on spatial–temporal fully adaptive graph convolutional network. IEEE Transactions on Intelligent Transportation Systems 24(9):9518--9531. doi:10.1109/TITS.2023.3276345

work page doi:10.1109/tits.2023.3276345 2023
[46]

IEEE Robotics and Automation Letters 8(11):7807--7814

Gan M, Liu J, He Y, et al (2023) Keyframe selection via deep reinforcement learning for skeleton-based gesture recognition. IEEE Robotics and Automation Letters 8(11):7807--7814. doi:10.1109/LRA.2023.3322645

work page doi:10.1109/lra.2023.3322645 2023
[47]

Emogen: Emotional image content generation with text-to-image diffusion models,

Gan S, Yin Y, Jiang Z, et al (2024) Signgraph: A sign sequence is worth graphs of nodes. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13470--13479, doi:10.1109/CVPR52733.2024.01279

work page doi:10.1109/cvpr52733.2024.01279 2024
[48]

IEEE Transactions on Circuits and Systems for Video Technology 34(11):11724--11738

Gao L, Lyu F, Shi P, et al (2024) Overcoming modality bias in question-driven sign language video translation. IEEE Transactions on Circuits and Systems for Video Technology 34(11):11724--11738. doi:10.1109/TCSVT.2024.3419089

work page doi:10.1109/tcsvt.2024.3419089 2024
[49]

IEEE Sensors Journal 22(18):17421--17430

Gao Q, Chen Y, Ju Z, et al (2022) Dynamic hand gesture recognition based on 3d hand pose estimation for human–robot interaction. IEEE Sensors Journal 22(18):17421--17430. doi:10.1109/JSEN.2021.3059685

work page doi:10.1109/jsen.2021.3059685 2022
[50]

In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 409--419, doi:10.1109/CVPR.2018.00050

Garcia-Hernando G, Yuan S, Baek S, et al (2018) First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 409--419, doi:10.1109/CVPR.2018.00050

work page doi:10.1109/cvpr.2018.00050 2018
[51]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Work- shops (CVPR W), pp

Garg M, Ghosh D, Pradhan PM (2024 a ) Gestformer: Multiscale wavelet pooling transformer network for dynamic hand gesture recognition. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 2473--2483, doi:10.1109/CVPRW63382.2024.00254

work page doi:10.1109/cvprw63382.2024.00254 2024
[52]

://arxiv.org/abs/2409.03890, https://arxiv.org/abs/2409.03890 arXiv:2409.03890

Garg M, Ghosh D, Pradhan PM (2024 b ) Mvtn: A multiscale video transformer network for hand gesture recognition. ://arxiv.org/abs/2409.03890, https://arxiv.org/abs/2409.03890 arXiv:2409.03890

work page arXiv 2024
[53]

In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV)

Garg M, Ghosh D, Pradhan PM (2025) Convmixformer- a resource-efficient convolution mixer for transformer-based dynamic hand gesture recognition. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 6156--6166, doi:10.1109/WACV61041.2025.00600

work page doi:10.1109/wacv61041.2025.00600 2025
[54]

Neural Computing and Applications 35(28):20889--20907

Ghorai A, Nandi U, Changdar C, et al (2023) Indian sign language recognition system using network deconvolution and spatial transformer network. Neural Computing and Applications 35(28):20889--20907

work page 2023
[55]

In: Proceedings of the 23rd international conference on Machine learning, pp 369--376

Graves A, Fern \'a ndez S, Gomez F, et al (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning, pp 369--376

work page 2006
[56]

doi:https://doi.org/10.1016/j.patcog.2025.111602, ://www.sciencedirect.com/science/article/pii/S0031320325002626

Guan M, Wang Y, Ma G, et al (2025) Mska: Multi-stream keypoint attention network for sign language recognition and translation. doi:https://doi.org/10.1016/j.patcog.2025.111602, ://www.sciencedirect.com/science/article/pii/S0031320325002626

work page doi:10.1016/j.patcog.2025.111602 2025
[57]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Guo L, Xue W, Guo Q, et al (2023) Distilling cross-temporal contexts for continuous sign language recognition. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10771--10780, doi:10.1109/CVPR52729.2023.01037

work page doi:10.1109/cvpr52729.2023.01037 2023
[58]

IEEE Transactions on Image Processing 33:3486--3495

Guo L, Xue W, Liu B, et al (2024 a ) Gloss prior guided visual feature learning for continuous sign language recognition. IEEE Transactions on Image Processing 33:3486--3495. doi:10.1109/TIP.2024.3404869

work page doi:10.1109/tip.2024.3404869 2024
[59]

IEEE Transactions on Intelligent Transportation Systems 25(10):14031--14039

Guo X, Zhu Q, Wang Y, et al (2024 b ) Mg-gct: A motion-guided graph convolutional transformer for traffic gesture recognition. IEEE Transactions on Intelligent Transportation Systems 25(10):14031--14039. doi:10.1109/TITS.2024.3394911

work page doi:10.1109/tits.2024.3394911 2024
[60]

IEEE Access 11:34094--34103

Hampiholi B, Jarvers C, Mader W, et al (2023) Convolutional transformer fusion blocks for multi-modal gesture recognition. IEEE Access 11:34094--34103. doi:10.1109/ACCESS.2023.3263812

work page doi:10.1109/access.2023.3263812 2023
[61]

IEEE Access 12:67005--67012

Han W, Hao M, Yuan Y, et al (2024) Fusion enhancement of yolov5 and copula bayesian classifier for hand gesture recognition in smart sports venues. IEEE Access 12:67005--67012. doi:10.1109/ACCESS.2024.3398142

work page doi:10.1109/access.2024.3398142 2024
[62]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

Hao A, Min Y, Chen X (2021) Self-mutual distillation learning for continuous sign language recognition. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp 11283--11292, doi:10.1109/ICCV48922.2021.01111

work page doi:10.1109/iccv48922.2021.01111 2021
[63]

IEEE Sensors Journal 23(23):29227--29234

Hao S, Fu M, Liu X, et al (2023) Dynamic gesture recognition based on two-scale 3-d-convnext. IEEE Sensors Journal 23(23):29227--29234. doi:10.1109/JSEN.2023.3324479

work page doi:10.1109/jsen.2023.3324479 2023
[64]

IEEE Access

Hashi AO, Hashim SZM, Asamah AB (2024) A systematic review of hand gesture recognition: An update from 2018 to 2024. IEEE Access

work page 2024
[65]

IEEE Access 12:28761--28774

Hax DRT, Penava P, Krodel S, et al (2024) A novel hybrid deep learning architecture for dynamic hand gesture recognition. IEEE Access 12:28761--28774. doi:10.1109/ACCESS.2024.3365274

work page doi:10.1109/access.2024.3365274 2024
[66]

Neurocomputing 390:248--259

He J, Zhang C, He X, et al (2020) Visual recognition of traffic police gestures with convolutional pose machine and handcrafted features. Neurocomputing 390:248--259. doi:https://doi.org/10.1016/j.neucom.2019.07.103, ://www.sciencedirect.com/science/article/pii/S0925231219314420

work page doi:10.1016/j.neucom.2019.07.103 2020
[67]

In: Proceedings of the 5th ACM Multimedia Systems Conference

Hsiao YS, Sanchez-Riera J, Lim T, et al (2014) Lared: a large rgb-d extensible hand gesture dataset. In: Proceedings of the 5th ACM Multimedia Systems Conference. Association for Computing Machinery, New York, NY, USA, MMSys '14, p 53–58, doi:10.1145/2557642.2563669, ://doi.org/10.1145/2557642.2563669

work page doi:10.1145/2557642.2563669 2014
[68]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

Hu H, Zhao W, Zhou W, et al (2021 a ) Signbert: Pre-training of hand-model-aware representation for sign language recognition. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp 11067--11076, doi:10.1109/ICCV48922.2021.01090

work page doi:10.1109/iccv48922.2021.01090 2021
[69]

ACM Trans Multimedia Comput Commun Appl 17(3)

Hu H, Zhou W, Pu J, et al (2021 b ) Global-local enhancement network for nmf-aware sign language recognition. ACM Trans Multimedia Comput Commun Appl 17(3). doi:10.1145/3436754, ://doi.org/10.1145/3436754

work page doi:10.1145/3436754 2021
[70]

IEEE Transactions on Multimedia 25:7559--7570

Hu H, Pu J, Zhou W, et al (2023 a ) Collaborative multilingual continuous sign language recognition: A unified framework. IEEE Transactions on Multimedia 25:7559--7570. doi:10.1109/TMM.2022.3223260

work page doi:10.1109/tmm.2022.3223260 2023
[71]

IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9):11221--11239

Hu H, Zhao W, Zhou W, et al (2023 b ) Signbert+: Hand-model-aware self-supervised pre-training for sign language understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9):11221--11239. doi:10.1109/TPAMI.2023.3269220

work page doi:10.1109/tpami.2023.3269220 2023
[72]

IEEE Transactions on Multimedia 26:593--606

Hu H, Pu J, Zhou W, et al (2024 a ) Prior-aware cross modality augmentation learning for continuous sign language recognition. IEEE Transactions on Multimedia 26:593--606. doi:10.1109/TMM.2023.3268368

work page doi:10.1109/tmm.2023.3268368 2024
[73]

In: European conference on computer vision, Springer, pp 511--527

Hu L, Gao L, Liu Z, et al (2022) Temporal lift pooling for continuous sign language recognition. In: European conference on computer vision, Springer, pp 511--527

work page 2022
[74]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Hu L, Gao L, Liu Z, et al (2023 c ) Continuous sign language recognition with correlation network. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2529--2539, doi:10.1109/CVPR52729.2023.00249

work page doi:10.1109/cvpr52729.2023.00249 2023
[75]

Proceedings of the AAAI Conference on Artificial Intelligence 37(1):854--862

Hu L, Gao L, Liu Z, et al (2023 d ) Self-emphasizing network for continuous sign language recognition. Proceedings of the AAAI Conference on Artificial Intelligence 37(1):854--862. doi:10.1609/aaai.v37i1.25164, ://ojs.aaai.org/index.php/AAAI/article/view/25164

work page doi:10.1609/aaai.v37i1.25164 2023
[76]

://arxiv.org/abs/2404.11111, https://arxiv.org/abs/2404.11111 arXiv:2404.11111

Hu L, Feng W, Gao L, et al (2024 b ) Corrnet+: Sign language recognition and translation via spatial-temporal correlation. ://arxiv.org/abs/2404.11111, https://arxiv.org/abs/2404.11111 arXiv:2404.11111

work page arXiv 2024
[77]

arXiv preprint arXiv:240312519

Hu L, Gao L, Liu Z, et al (2024 c ) Dynamic spatial-temporal aggregation for skeleton-aware sign language recognition. arXiv preprint arXiv:240312519

work page 2024
[78]

IEEE Transactions on Circuits and Systems for Video Technology 29(9):2822--2832

Huang J, Zhou W, Li H, et al (2019) Attention-based 3d-cnns for large-vocabulary sign language recognition. IEEE Transactions on Circuits and Systems for Video Technology 29(9):2822--2832. doi:10.1109/TCSVT.2018.2870740

work page doi:10.1109/tcsvt.2018.2870740 2019
[79]

In: 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pp 1--10, doi:10.1109/FG59268.2024.10581975

Ikne O, Allaert B, Wannous H (2024) Skeleton-based self-supervised feature extraction for improved dynamic hand gesture recognition. In: 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pp 1--10, doi:10.1109/FG59268.2024.10581975

work page doi:10.1109/fg59268.2024.10581975 2024
[80]

IEEE Access 11:133755--133766

Jafari F, Basu A (2023) Two-dimensional parallel spatio-temporal pyramid pooling for hand gesture recognition. IEEE Access 11:133755--133766. doi:10.1109/ACCESS.2023.3336591

work page doi:10.1109/access.2023.3336591 2023
[81]

arXiv preprint arXiv:211006161

Jiang S, Sun B, Wang L, et al (2021 a ) Sign language recognition via skeleton-aware multi-model ensemble. arXiv preprint arXiv:211006161

work page 2021

Showing first 80 references.

[1] [1]

Expert Systems with Applications 248:123258

Abdullahi SB, Chamnongthai K, Bolon-Canedo V, et al (2024) Spatial–temporal feature-based end-to-end fourier network for 3d sign language recognition. Expert Systems with Applications 248:123258. doi:https://doi.org/10.1016/j.eswa.2024.123258, ://www.sciencedirect.com/science/article/pii/S0957417424001234

work page doi:10.1016/j.eswa.2024.123258 2024

[2] [2]

IEEE Transactions on Multimedia 24:1750--1762

Adaloglou N, Chatzis T, Papastratis I, et al (2022) A comprehensive study on deep learning-based methods for sign language recognition. IEEE Transactions on Multimedia 24:1750--1762. doi:10.1109/TMM.2021.3070438

work page doi:10.1109/tmm.2021.3070438 2022

[3] [3]

Intelligent Systems with Applications 12:200056

Adeyanju IA, Bello OO, Adegboye MA (2021) Machine learning methods for sign language recognition: A critical review and analysis. Intelligent Systems with Applications 12:200056

work page 2021

[4] [4]

URL http://dx.doi.org/10.1109/ICASSP48485.2024.10447579

Ahn J, Jang Y, Chung JS (2024) Slowfast network for continuous sign language recognition. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 3920--3924, doi:10.1109/ICASSP48485.2024.10445841

work page doi:10.1109/icassp48485.2024.10445841 2024

[5] [5]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Aich S, Ruiz-Santaquiteria J, Lu Z, et al (2023) Data-free class-incremental hand gesture recognition. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp 20901--20910, doi:10.1109/ICCV51070.2023.01916

work page doi:10.1109/iccv51070.2023.01916 2023

[6] [6]

IEEE Access

Al Abdullah B, Amoudi G, Alghamdi H (2024) Advancements in sign language recognition: A comprehensive review and future prospects. IEEE Access

work page 2024

[7] [7]

IEEE Access 9:126917--126951

Al-Qurishi M, Khalid T, Souissi R (2021) Deep learning for sign language recognition: Current techniques, benchmarks, and open issues. IEEE Access 9:126917--126951

work page 2021

[8] [8]

Neural Computing and Applications 36(14):7609--7624

Alaftekin M, Pacal I, Cicek K (2024) Real-time sign language recognition based on yolo algorithm. Neural Computing and Applications 36(14):7609--7624

work page 2024

[9] [9]

Viana, Maurício Santos, Dulce Freire, Patrícia Abrantes, and Jorge Rocha

Alam MM, Islam MT, Rahman SM (2022) Unified learning approach for egocentric hand gesture recognition and fingertip detection. Pattern Recognition 121:108200. doi:https://doi.org/10.1016/j.patcog.2021.108200, ://www.sciencedirect.com/science/article/pii/S0031320321003824

work page doi:10.1016/j.patcog.2021.108200 2022

[10] [10]

IEEE Access 12:77019--77040

Alamri FS, Bala Abdullahi S, Khan AR, et al (2024) Enhanced weak spatial modeling through cnn-based deep sign language skeletal feature transformation. IEEE Access 12:77019--77040. doi:10.1109/ACCESS.2024.3405341

work page doi:10.1109/access.2024.3405341 2024

[11] [11]

In: Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XI 16, Springer, pp 35--53

Albanie S, Varol G, Momeni L, et al (2020) Bsl-1k: Scaling up co-articulated sign language recognition using mouthing cues. In: Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XI 16, Springer, pp 35--53

work page 2020

[12] [12]

://arxiv.org/abs/2111.03635, https://arxiv.org/abs/2111.03635 arXiv:2111.03635

Albanie S, Varol G, Momeni L, et al (2021) Bbc-oxford british sign language dataset. ://arxiv.org/abs/2111.03635, https://arxiv.org/abs/2111.03635 arXiv:2111.03635

work page arXiv 2021

[13] [13]

Artificial Intelligence Review 58(4):1--55

Alimisis P, Mademlis I, Radoglou-Grammatikis P, et al (2025) Advances in diffusion models for image data augmentation: A review of methods, models, evaluation metrics and future research directions. Artificial Intelligence Review 58(4):1--55

work page 2025

[14] [14]

arXiv preprint arXiv:250401666

Alyami S, Luqman H (2025 a ) Clip-sla: Parameter-efficient clip adaptation for continuous sign language recognition. arXiv preprint arXiv:250401666

work page 2025

[15] [15]

Neurocomputing 617:129015

Alyami S, Luqman H (2025 b ) Swin-mstp: Swin transformer with multi-scale temporal perception for continuous sign language recognition. Neurocomputing 617:129015

work page 2025

[16] [16]

IEEE Transactions on Multimedia 21(1):234--245

Avola D, Bernardi M, Cinque L, et al (2019) Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures. IEEE Transactions on Multimedia 21(1):234--245. doi:10.1109/TMM.2018.2856094

work page doi:10.1109/tmm.2018.2856094 2019

[17] [17]

Engineering Applications of Artificial Intelligence 132:108443

Bamani E, Nissinman E, Meir I, et al (2024) Ultra-range gesture recognition using a web-camera in human–robot interaction. Engineering Applications of Artificial Intelligence 132:108443. doi:https://doi.org/10.1016/j.engappai.2024.108443, ://www.sciencedirect.com/science/article/pii/S0952197624006018

work page doi:10.1016/j.engappai.2024.108443 2024

[18] [18]

arXiv preprint arXiv:241118413

Beeri EB, Nissinman E, Sintov A (2024) Robust dynamic gesture recognition at ultra-long distances. arXiv preprint arXiv:241118413

work page 2024

[19] [19]

, author Weiss, Y

Benitez-Garcia G, Olivares-Mercado J, Sanchez-Perez G, et al (2021) Ipn hand: A video dataset and benchmark for real-time continuous hand gesture recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp 4340--4347, doi:10.1109/ICPR48806.2021.9412317

work page doi:10.1109/icpr48806.2021.9412317 2021

[20] [20]

Neural Computing and Applications 35(28):21165--21180

Bharti S, Balmik A, Nandy A (2023) Novel error correction-based key frame extraction technique for dynamic hand gesture recognition. Neural Computing and Applications 35(28):21165--21180

work page 2023

[21] [21]

Multimedia Tools and Applications 83(14):41805--41822

Bhaumik G, Govil MC (2024) Spatnet: A spatial feature attention network for hand gesture recognition. Multimedia Tools and Applications 83(14):41805--41822

work page 2024

[22] [22]

IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1):1217--1232

Bilge YC, Cinbis RG, Ikizler-Cinbis N (2023) Towards zero-shot sign language recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1):1217--1232. doi:10.1109/TPAMI.2022.3143074

work page doi:10.1109/tpami.2022.3143074 2023

[23] [24]

Pattern Recognition 151:110374

Bilge YC, Ikizler-Cinbis N, Cinbis RG (2024 b ) Cross-lingual few-shot sign language recognition. Pattern Recognition 151:110374. doi:https://doi.org/10.1016/j.patcog.2024.110374, ://www.sciencedirect.com/science/article/pii/S0031320324001250

work page doi:10.1016/j.patcog.2024.110374 2024

[24] [25]

In: 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), pp 1--6, doi:10.1109/FG57933.2023.10042643

Bohacek M, Hrúz M (2023) Learning from what is already out there: Few-shot sign language recognition with online dictionaries. In: 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), pp 1--6, doi:10.1109/FG57933.2023.10042544

work page doi:10.1109/fg57933.2023.10042544 2023

[25] [26]

In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7784--7793, doi:10.1109/CVPR.2018.00812

Camgoz NC, Hadfield S, Koller O, et al (2018) Neural sign language translation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7784--7793, doi:10.1109/CVPR.2018.00812

work page doi:10.1109/cvpr.2018.00812 2018

[26] [27]

In: Koyejo S, Mohamed S, Agarwal A, et al (eds) Advances in Neural Information Processing Systems, vol 35

Chen Y, Zuo R, Wei F, et al (2022) Two-stream network for sign language recognition and translation. In: Koyejo S, Mohamed S, Agarwal A, et al (eds) Advances in Neural Information Processing Systems, vol 35. Curran Associates, Inc., pp 17043--17056, ://proceedings.neurips.cc/paper_files/paper/2022/file/6cd3ac24cdb789beeaa9f7145670fcae-Paper-Conference.pdf

work page 2022

[27] [28]

IEEE Transactions on Multimedia 26:3951--3961

Cheng J, Shi D, Li C, et al (2024) Skeleton-based gesture recognition with learnable paths and signature features. IEEE Transactions on Multimedia 26:3951--3961. doi:10.1109/TMM.2023.3318242

work page doi:10.1109/tmm.2023.3318242 2024

[28] [29]

In: Karlinsky L, Michaeli T, Nishino K (eds) Computer Vision -- ECCV 2022 Workshops

Constantin S, Eyiokur FI, Yaman D, et al (2023) Interactive multimodal robot dialog using pointing gesture recognition. In: Karlinsky L, Michaeli T, Nishino K (eds) Computer Vision -- ECCV 2022 Workshops. Springer Nature Switzerland, Cham, pp 640--657

work page 2023

[29] [30]

arXiv preprint arXiv:250112086

Cui H, Huang R, Zhang R, et al (2025) Dstsa-gcn: Advancing skeleton-based gesture recognition with semantic-aware spatio-temporal topology modeling. arXiv preprint arXiv:250112086

work page 2025

[30] [31]

IEEE Transactions on Multimedia 21(7):1880--1891

Cui R, Liu H, Zhang C (2019) A deep neural framework for continuous sign language recognition by iterative training. IEEE Transactions on Multimedia 21(7):1880--1891. doi:10.1109/TMM.2018.2889563

work page doi:10.1109/tmm.2018.2889563 2019

[31] [32]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

Cunico F, Girella F, Avogaro A, et al (2023) Oo-dmvmt: A deep multi-view multi-task classification framework for real-time 3d hand gesture classification and segmentation. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 2745--2754, doi:10.1109/CVPRW59228.2023.00275

work page doi:10.1109/cvprw59228.2023.00275 2023

[32] [33]

Neural Computing and Applications 34(24):21849--21862

Dang TL, Nguyen HT, Dao DM, et al (2022) Shape: a dataset for hand gesture recognition. Neural Computing and Applications 34(24):21849--21862

work page 2022

[33] [34]

Multimedia Tools and Applications 82(18):28569--28587

Dang TL, Pham TH, Dang QM, et al (2023) A lightweight architecture for hand gesture recognition. Multimedia Tools and Applications 82(18):28569--28587

work page 2023

[34] [35]

Neural Computing and Applications 36(28):17311--17325

Dang TL, Pham TH, Dao DM, et al (2024) Date: a video dataset and benchmark for dynamic hand gesture recognition. Neural Computing and Applications 36(28):17311--17325

work page 2024

[35] [36]

In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

De Coster M, Van Herreweghe M, Dambre J (2021) Isolated sign recognition from rgb video using pose flow and self-attention. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 3436--3445, doi:10.1109/CVPRW53098.2021.00383

work page doi:10.1109/cvprw53098.2021.00383 2021

[36] [37]

In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 1206--1214, doi:10.1109/CVPRW.2016.153

De Smedt Q, Wannous H, Vandeborre JP (2016) Skeleton-based dynamic hand gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 1206--1214, doi:10.1109/CVPRW.2016.153

work page doi:10.1109/cvprw.2016.153 2016

[37] [38]

Neurocomputing 572:127194

Deng Z, Leng Y, Chen J, et al (2024) Tms-net: A multi-feature multi-stream multi-level information sharing network for skeleton-based sign language recognition. Neurocomputing 572:127194. doi:https://doi.org/10.1016/j.neucom.2023.127194, ://www.sciencedirect.com/science/article/pii/S0925231223013176

work page doi:10.1016/j.neucom.2023.127194 2024

[38] [39]

IEEE Access 11:140382--140394

Dozdor Z, Kalafatic Z, Ban Z, et al (2023) Ty-net: Transforming yolo for hand gesture recognition. IEEE Access 11:140382--140394. doi:10.1109/ACCESS.2023.3341702

work page doi:10.1109/access.2023.3341702 2023

[39] [40]

Neurocomputing 500:115--123

Du Y, Xie P, Wang M, et al (2022) Full transformer network with masking future for word-level sign language recognition. Neurocomputing 500:115--123. doi:https://doi.org/10.1016/j.neucom.2022.05.051, ://www.sciencedirect.com/science/article/pii/S0925231222006178

work page doi:10.1016/j.neucom.2022.05.051 2022

[40] [41]

Derf: Decomposed radiance fields,

Duarte A, Palaskar S, Ventura L, et al (2021) How2sign: A large-scale multimodal dataset for continuous american sign language. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2734--2743, doi:10.1109/CVPR46437.2021.00276

work page doi:10.1109/cvpr46437.2021.00276 2021

[41] [42]

In: Proceedings of the 11th international conference on language resources and evaluation (LREC) 2018, The European Language Resources Association (ELRA)

Ebling S, Camg \"o z NC, Braem PB, et al (2018) Smile swiss german sign language dataset. In: Proceedings of the 11th international conference on language resources and evaluation (LREC) 2018, The European Language Resources Association (ELRA)

work page 2018

[42] [43]

Engineering Applications of Artificial Intelligence 114:105198

El-Alfy ESM, Luqman H (2022) A comprehensive survey and taxonomy of sign language research. Engineering Applications of Artificial Intelligence 114:105198

work page 2022

[43] [44]

p 1911 – 1916, ://www.scopus.com/inward/record.uri?eid=2-s2.0-85020237430&partnerID=40&md5=e17a51f131e02531fa45248c5bd212bc, cited by: 130

Forster J, Schmidt C, Koller O, et al (2014) Extensions of the sign language recognition and translation corpus rwth-phoenix-weather. p 1911 – 1916, ://www.scopus.com/inward/record.uri?eid=2-s2.0-85020237430&partnerID=40&md5=e17a51f131e02531fa45248c5bd212bc, cited by: 130

work page 2014

[44] [45]

IEEE Transactions on Intelligent Transportation Systems 24(9):9518--9531

Fu Z, Chen J, Jiang K, et al (2023) Traffic police 3d gesture recognition based on spatial–temporal fully adaptive graph convolutional network. IEEE Transactions on Intelligent Transportation Systems 24(9):9518--9531. doi:10.1109/TITS.2023.3276345

work page doi:10.1109/tits.2023.3276345 2023

[45] [46]

IEEE Robotics and Automation Letters 8(11):7807--7814

Gan M, Liu J, He Y, et al (2023) Keyframe selection via deep reinforcement learning for skeleton-based gesture recognition. IEEE Robotics and Automation Letters 8(11):7807--7814. doi:10.1109/LRA.2023.3322645

work page doi:10.1109/lra.2023.3322645 2023

[46] [47]

Emogen: Emotional image content generation with text-to-image diffusion models,

Gan S, Yin Y, Jiang Z, et al (2024) Signgraph: A sign sequence is worth graphs of nodes. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13470--13479, doi:10.1109/CVPR52733.2024.01279

work page doi:10.1109/cvpr52733.2024.01279 2024

[47] [48]

IEEE Transactions on Circuits and Systems for Video Technology 34(11):11724--11738

Gao L, Lyu F, Shi P, et al (2024) Overcoming modality bias in question-driven sign language video translation. IEEE Transactions on Circuits and Systems for Video Technology 34(11):11724--11738. doi:10.1109/TCSVT.2024.3419089

work page doi:10.1109/tcsvt.2024.3419089 2024

[48] [49]

IEEE Sensors Journal 22(18):17421--17430

Gao Q, Chen Y, Ju Z, et al (2022) Dynamic hand gesture recognition based on 3d hand pose estimation for human–robot interaction. IEEE Sensors Journal 22(18):17421--17430. doi:10.1109/JSEN.2021.3059685

work page doi:10.1109/jsen.2021.3059685 2022

[49] [50]

In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 409--419, doi:10.1109/CVPR.2018.00050

Garcia-Hernando G, Yuan S, Baek S, et al (2018) First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 409--419, doi:10.1109/CVPR.2018.00050

work page doi:10.1109/cvpr.2018.00050 2018

[50] [51]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Work- shops (CVPR W), pp

Garg M, Ghosh D, Pradhan PM (2024 a ) Gestformer: Multiscale wavelet pooling transformer network for dynamic hand gesture recognition. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 2473--2483, doi:10.1109/CVPRW63382.2024.00254

work page doi:10.1109/cvprw63382.2024.00254 2024

[51] [52]

://arxiv.org/abs/2409.03890, https://arxiv.org/abs/2409.03890 arXiv:2409.03890

Garg M, Ghosh D, Pradhan PM (2024 b ) Mvtn: A multiscale video transformer network for hand gesture recognition. ://arxiv.org/abs/2409.03890, https://arxiv.org/abs/2409.03890 arXiv:2409.03890

work page arXiv 2024

[52] [53]

In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV)

Garg M, Ghosh D, Pradhan PM (2025) Convmixformer- a resource-efficient convolution mixer for transformer-based dynamic hand gesture recognition. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 6156--6166, doi:10.1109/WACV61041.2025.00600

work page doi:10.1109/wacv61041.2025.00600 2025

[53] [54]

Neural Computing and Applications 35(28):20889--20907

Ghorai A, Nandi U, Changdar C, et al (2023) Indian sign language recognition system using network deconvolution and spatial transformer network. Neural Computing and Applications 35(28):20889--20907

work page 2023

[54] [55]

In: Proceedings of the 23rd international conference on Machine learning, pp 369--376

Graves A, Fern \'a ndez S, Gomez F, et al (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning, pp 369--376

work page 2006

[55] [56]

doi:https://doi.org/10.1016/j.patcog.2025.111602, ://www.sciencedirect.com/science/article/pii/S0031320325002626

Guan M, Wang Y, Ma G, et al (2025) Mska: Multi-stream keypoint attention network for sign language recognition and translation. doi:https://doi.org/10.1016/j.patcog.2025.111602, ://www.sciencedirect.com/science/article/pii/S0031320325002626

work page doi:10.1016/j.patcog.2025.111602 2025

[56] [57]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Guo L, Xue W, Guo Q, et al (2023) Distilling cross-temporal contexts for continuous sign language recognition. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10771--10780, doi:10.1109/CVPR52729.2023.01037

work page doi:10.1109/cvpr52729.2023.01037 2023

[57] [58]

IEEE Transactions on Image Processing 33:3486--3495

Guo L, Xue W, Liu B, et al (2024 a ) Gloss prior guided visual feature learning for continuous sign language recognition. IEEE Transactions on Image Processing 33:3486--3495. doi:10.1109/TIP.2024.3404869

work page doi:10.1109/tip.2024.3404869 2024

[58] [59]

IEEE Transactions on Intelligent Transportation Systems 25(10):14031--14039

Guo X, Zhu Q, Wang Y, et al (2024 b ) Mg-gct: A motion-guided graph convolutional transformer for traffic gesture recognition. IEEE Transactions on Intelligent Transportation Systems 25(10):14031--14039. doi:10.1109/TITS.2024.3394911

work page doi:10.1109/tits.2024.3394911 2024

[59] [60]

IEEE Access 11:34094--34103

Hampiholi B, Jarvers C, Mader W, et al (2023) Convolutional transformer fusion blocks for multi-modal gesture recognition. IEEE Access 11:34094--34103. doi:10.1109/ACCESS.2023.3263812

work page doi:10.1109/access.2023.3263812 2023

[60] [61]

IEEE Access 12:67005--67012

Han W, Hao M, Yuan Y, et al (2024) Fusion enhancement of yolov5 and copula bayesian classifier for hand gesture recognition in smart sports venues. IEEE Access 12:67005--67012. doi:10.1109/ACCESS.2024.3398142

work page doi:10.1109/access.2024.3398142 2024

[61] [62]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

Hao A, Min Y, Chen X (2021) Self-mutual distillation learning for continuous sign language recognition. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp 11283--11292, doi:10.1109/ICCV48922.2021.01111

work page doi:10.1109/iccv48922.2021.01111 2021

[62] [63]

IEEE Sensors Journal 23(23):29227--29234

Hao S, Fu M, Liu X, et al (2023) Dynamic gesture recognition based on two-scale 3-d-convnext. IEEE Sensors Journal 23(23):29227--29234. doi:10.1109/JSEN.2023.3324479

work page doi:10.1109/jsen.2023.3324479 2023

[63] [64]

IEEE Access

Hashi AO, Hashim SZM, Asamah AB (2024) A systematic review of hand gesture recognition: An update from 2018 to 2024. IEEE Access

work page 2024

[64] [65]

IEEE Access 12:28761--28774

Hax DRT, Penava P, Krodel S, et al (2024) A novel hybrid deep learning architecture for dynamic hand gesture recognition. IEEE Access 12:28761--28774. doi:10.1109/ACCESS.2024.3365274

work page doi:10.1109/access.2024.3365274 2024

[65] [66]

Neurocomputing 390:248--259

He J, Zhang C, He X, et al (2020) Visual recognition of traffic police gestures with convolutional pose machine and handcrafted features. Neurocomputing 390:248--259. doi:https://doi.org/10.1016/j.neucom.2019.07.103, ://www.sciencedirect.com/science/article/pii/S0925231219314420

work page doi:10.1016/j.neucom.2019.07.103 2020

[66] [67]

In: Proceedings of the 5th ACM Multimedia Systems Conference

Hsiao YS, Sanchez-Riera J, Lim T, et al (2014) Lared: a large rgb-d extensible hand gesture dataset. In: Proceedings of the 5th ACM Multimedia Systems Conference. Association for Computing Machinery, New York, NY, USA, MMSys '14, p 53–58, doi:10.1145/2557642.2563669, ://doi.org/10.1145/2557642.2563669

work page doi:10.1145/2557642.2563669 2014

[67] [68]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

Hu H, Zhao W, Zhou W, et al (2021 a ) Signbert: Pre-training of hand-model-aware representation for sign language recognition. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp 11067--11076, doi:10.1109/ICCV48922.2021.01090

work page doi:10.1109/iccv48922.2021.01090 2021

[68] [69]

ACM Trans Multimedia Comput Commun Appl 17(3)

Hu H, Zhou W, Pu J, et al (2021 b ) Global-local enhancement network for nmf-aware sign language recognition. ACM Trans Multimedia Comput Commun Appl 17(3). doi:10.1145/3436754, ://doi.org/10.1145/3436754

work page doi:10.1145/3436754 2021

[69] [70]

IEEE Transactions on Multimedia 25:7559--7570

Hu H, Pu J, Zhou W, et al (2023 a ) Collaborative multilingual continuous sign language recognition: A unified framework. IEEE Transactions on Multimedia 25:7559--7570. doi:10.1109/TMM.2022.3223260

work page doi:10.1109/tmm.2022.3223260 2023

[70] [71]

IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9):11221--11239

Hu H, Zhao W, Zhou W, et al (2023 b ) Signbert+: Hand-model-aware self-supervised pre-training for sign language understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9):11221--11239. doi:10.1109/TPAMI.2023.3269220

work page doi:10.1109/tpami.2023.3269220 2023

[71] [72]

IEEE Transactions on Multimedia 26:593--606

Hu H, Pu J, Zhou W, et al (2024 a ) Prior-aware cross modality augmentation learning for continuous sign language recognition. IEEE Transactions on Multimedia 26:593--606. doi:10.1109/TMM.2023.3268368

work page doi:10.1109/tmm.2023.3268368 2024

[72] [73]

In: European conference on computer vision, Springer, pp 511--527

Hu L, Gao L, Liu Z, et al (2022) Temporal lift pooling for continuous sign language recognition. In: European conference on computer vision, Springer, pp 511--527

work page 2022

[73] [74]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Hu L, Gao L, Liu Z, et al (2023 c ) Continuous sign language recognition with correlation network. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2529--2539, doi:10.1109/CVPR52729.2023.00249

work page doi:10.1109/cvpr52729.2023.00249 2023

[74] [75]

Proceedings of the AAAI Conference on Artificial Intelligence 37(1):854--862

Hu L, Gao L, Liu Z, et al (2023 d ) Self-emphasizing network for continuous sign language recognition. Proceedings of the AAAI Conference on Artificial Intelligence 37(1):854--862. doi:10.1609/aaai.v37i1.25164, ://ojs.aaai.org/index.php/AAAI/article/view/25164

work page doi:10.1609/aaai.v37i1.25164 2023

[75] [76]

://arxiv.org/abs/2404.11111, https://arxiv.org/abs/2404.11111 arXiv:2404.11111

Hu L, Feng W, Gao L, et al (2024 b ) Corrnet+: Sign language recognition and translation via spatial-temporal correlation. ://arxiv.org/abs/2404.11111, https://arxiv.org/abs/2404.11111 arXiv:2404.11111

work page arXiv 2024

[76] [77]

arXiv preprint arXiv:240312519

Hu L, Gao L, Liu Z, et al (2024 c ) Dynamic spatial-temporal aggregation for skeleton-aware sign language recognition. arXiv preprint arXiv:240312519

work page 2024

[77] [78]

IEEE Transactions on Circuits and Systems for Video Technology 29(9):2822--2832

Huang J, Zhou W, Li H, et al (2019) Attention-based 3d-cnns for large-vocabulary sign language recognition. IEEE Transactions on Circuits and Systems for Video Technology 29(9):2822--2832. doi:10.1109/TCSVT.2018.2870740

work page doi:10.1109/tcsvt.2018.2870740 2019

[78] [79]

In: 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pp 1--10, doi:10.1109/FG59268.2024.10581975

Ikne O, Allaert B, Wannous H (2024) Skeleton-based self-supervised feature extraction for improved dynamic hand gesture recognition. In: 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pp 1--10, doi:10.1109/FG59268.2024.10581975

work page doi:10.1109/fg59268.2024.10581975 2024

[79] [80]

IEEE Access 11:133755--133766

Jafari F, Basu A (2023) Two-dimensional parallel spatio-temporal pyramid pooling for hand gesture recognition. IEEE Access 11:133755--133766. doi:10.1109/ACCESS.2023.3336591

work page doi:10.1109/access.2023.3336591 2023

[80] [81]

arXiv preprint arXiv:211006161

Jiang S, Sun B, Wang L, et al (2021 a ) Sign language recognition via skeleton-aware multi-model ensemble. arXiv preprint arXiv:211006161

work page 2021