Recognition: unknown
Beyond Binary Contrast: Modeling Continuous Skeleton Action Spaces with Transitional Anchors
Pith reviewed 2026-05-10 04:28 UTC · model grok-4.3
The pith
TranCLR replaces binary contrastive objectives with transitional anchors to model the continuous geometry of skeleton actions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TranCLR captures the continuous geometry of the action space through Action Transitional Anchor Construction, which models transitional states, and Multi-Level Geometric Manifold Calibration, which adaptively adjusts the manifold across continuity levels, yielding superior accuracy and calibration on NTU RGB+D, NTU RGB+D 120, and PKU-MMD.
What carries the argument
Action Transitional Anchor Construction (ATAC) that explicitly builds geometric transitional states, paired with Multi-Level Geometric Manifold Calibration (MGMC) that performs adaptive calibration of the action manifold at multiple continuity levels.
If this is right
- Representations become smoother and better reflect gradual motion transitions rather than discrete clusters.
- Accuracy and calibration metrics improve on the NTU RGB+D, NTU RGB+D 120, and PKU-MMD benchmarks.
- The learned features carry explicit uncertainty information useful for sequences containing transitional movements.
- The framework produces more discriminative embeddings by preserving the underlying geometry of action space.
Where Pith is reading between the lines
- The same anchor-and-calibration pattern could be tested on video or sensor streams where actions also blend continuously.
- Ablation studies that remove only the transitional anchors would isolate whether continuity modeling drives the reported gains.
- If successful, the approach suggests a route to reduce reliance on hard class boundaries in any self-supervised setting with ordered data.
- Downstream tasks such as action forecasting may benefit because the manifold already encodes transitional states.
Load-bearing premise
That constructing explicit transitional anchors and applying multi-level manifold calibration will reliably capture motion continuity and outperform binary contrastive objectives.
What would settle it
If head-to-head experiments on the NTU RGB+D dataset show that TranCLR fails to exceed the accuracy or calibration scores of standard binary contrastive baselines.
Figures
read the original abstract
Self-supervised contrastive learning has emerged as a powerful paradigm for skeleton-based action recognition by enforcing consistency in the embedding space. However, existing methods rely on binary contrastive objectives that overlook the intrinsic continuity of human motion, resulting in fragmented feature clusters and rigid class boundaries. To address these limitations, we propose TranCLR, a Transitional anchor-based Contrastive Learning framework that captures the continuous geometry of the action space. Specifically, the proposed Action Transitional Anchor Construction (ATAC) explicitly models the geometric structure of transitional states to enhance the model's perception of motion continuity. Building upon these anchors, a Multi-Level Geometric Manifold Calibration (MGMC) mechanism is introduced to adaptively calibrate the action manifold across multiple levels of continuity, yielding a smoother and more discriminative representation space. Extensive experiments on the NTU RGB+D, NTU RGB+D 120 and PKU-MMD datasets demonstrate that TranCLR achieves superior accuracy and calibration performance, effectively learning continuous and uncertainty-aware skeleton representations. The code is available at https://github.com/Philchieh/TranCLR.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TranCLR, a transitional anchor-based contrastive learning framework for skeleton-based action recognition. It proposes Action Transitional Anchor Construction (ATAC) to model geometric transitional states between poses for capturing motion continuity, and Multi-Level Geometric Manifold Calibration (MGMC) to adaptively calibrate the action manifold across continuity levels. Experiments on NTU RGB+D, NTU RGB+D 120, and PKU-MMD datasets report superior accuracy and calibration metrics compared to binary contrastive baselines, with ablations supporting the contributions; code is released.
Significance. If the reported gains hold, the work meaningfully extends contrastive learning by addressing the continuity of human motion, yielding smoother and more uncertainty-aware representations. The direct, falsifiable extension of standard objectives, combined with consistent dataset results and ablations, positions it as a useful advance for skeleton action recognition. Code availability strengthens the contribution.
minor comments (3)
- [§3.1] §3.1: The ATAC construction from pose sequences is described at a high level; adding a short algorithmic outline or pseudocode would improve reproducibility of the anchor sampling process.
- [Tables 2-3] Table 2 and Table 3: While gains are shown, the manuscript would benefit from reporting standard deviations across multiple runs to quantify variability in the accuracy and calibration improvements.
- [§4.3] §4.3: The ablation on MGMC levels is informative, but the interaction between the number of levels and dataset characteristics could be discussed more explicitly to clarify generalizability.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our manuscript and the recommendation for minor revision. The summary accurately reflects the core contributions of TranCLR, including the Action Transitional Anchor Construction (ATAC) for modeling geometric transitional states and the Multi-Level Geometric Manifold Calibration (MGMC) for adaptive manifold calibration across continuity levels. We appreciate the recognition that these elements yield smoother, more uncertainty-aware representations compared to binary contrastive baselines, supported by results on NTU RGB+D, NTU RGB+D 120, and PKU-MMD, along with ablations and code release. Since the report lists no specific major comments, we have no individual points requiring detailed rebuttal or changes at this stage.
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper proposes TranCLR as a direct extension of standard contrastive learning via two explicitly constructed components: ATAC (Action Transitional Anchor Construction) to model transitional states in pose sequences, and MGMC (Multi-Level Geometric Manifold Calibration) to adaptively adjust the action manifold. These are introduced as novel mechanisms without any equations, fitted parameters, or predictions that reduce by construction to the inputs or to prior self-citations. Validation relies on independent experiments across NTU RGB+D, NTU RGB+D 120, and PKU-MMD datasets showing measurable gains in accuracy and calibration metrics over binary baselines, making the central claims externally falsifiable rather than tautological.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Human motion possesses intrinsic continuity that binary contrastive objectives overlook, resulting in fragmented feature clusters.
invented entities (1)
-
Action Transitional Anchor
no independent evidence
Reference graph
Works this paper leans on
-
[1]
S-jepa: A joint embedding predictive architecture for skeletal action recog- nition
Mohamed Abdelfattah and Alexandre Alahi. S-jepa: A joint embedding predictive architecture for skeletal action recog- nition. InECCV, 2024. 7
2024
-
[2]
Maskclr: Attention-guided contrastive learning for robust action representation learning
Mohamed Abdelfattah, Mariam Hassan, and Alexandre Alahi. Maskclr: Attention-guided contrastive learning for robust action representation learning. InCVPR, 2024. 1
2024
-
[3]
A simple framework for contrastive learning of visual representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Ge- offrey Hinton. A simple framework for contrastive learning of visual representations. 2020. 3
2020
-
[4]
Exploring simple siamese rep- resentation learning
Xinlei Chen and Kaiming He. Exploring simple siamese rep- resentation learning. InCVPR, 2021. 3
2021
-
[5]
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020. 3
work page internal anchor Pith review arXiv 2003
-
[6]
Channel-wise topology refinement graph convolution for skeleton-based action recognition
Yuxin Chen, Ziqi Zhang, Chunfeng Yuan, Bing Li, Ying Deng, and Weiming Hu. Channel-wise topology refinement graph convolution for skeleton-based action recognition. In ICCV, 2021. 3
2021
-
[7]
Neu- ron: Learning context-aware evolving representations for zero-shot skeleton action recognition
Yang Chen, Jingcai Guo, Song Guo, and Dacheng Tao. Neu- ron: Learning context-aware evolving representations for zero-shot skeleton action recognition. InCVPR, 2024. 1
2024
-
[8]
Zhan Chen, Hong Liu, Tianyu Guo, Zhengyan Chen, Pin- hao Song, and Hao Tang. Contrastive learning from spatio- temporal mixed skeleton sequences for self-supervised skeleton-based action recognition.arXiv:2207.03065, 2022. 1, 3
-
[9]
Skeleton-based action recognition with shift graph convolutional network
Ke Cheng, Yifan Zhang, Xiangyu He, Weihan Chen, Jian Cheng, and Hanqing Lu. Skeleton-based action recognition with shift graph convolutional network. InCVPR, 2020. 3
2020
-
[10]
Re- visiting the evaluation of uncertainty estimation and its ap- plication to explore model complexity-uncertainty trade-off
Yukun Ding, Jinglan Liu, Jinjun Xiong, and Yiyu Shi. Re- visiting the evaluation of uncertainty estimation and its ap- plication to explore model complexity-uncertainty trade-off. InCVPRW, 2020. 6
2020
-
[11]
Hierarchical contrast for un- supervised skeleton-based action representation learning
Jianfeng Dong, Shengkai Sun, Zhonglin Liu, Shujie Chen, Baolong Liu, and Xun Wang. Hierarchical contrast for un- supervised skeleton-based action representation learning. In AAAI, 2023. 7
2023
-
[12]
Hierarchical recur- rent neural network for skeleton based action recognition
Yong Du, Wei Wang, and Liang Wang. Hierarchical recur- rent neural network for skeleton based action recognition. In CVPR, 2015. 3
2015
-
[13]
Skeleton-contrastive 3d action representation learning
Hazel Doughty Fida Mohammad Thoker and Cees Snoek. Skeleton-contrastive 3d action representation learning. In ACM MM, 2021. 7
2021
-
[14]
Hyperbolic self-paced learning for self-supervised skeleton-based action representations
Luca Franco, Paolo Mandica, Bharti Munjal, and Fabio Galasso. Hyperbolic self-paced learning for self-supervised skeleton-based action representations. InICLR, 2023. 7
2023
-
[15]
Rethinking masked data reconstruction pretraining for strong 3d action representation learning
Tao Gong, Qi Chu, Bin Liu, and Nenghai Yu. Rethinking masked data reconstruction pretraining for strong 3d action representation learning. InAAAI, 2025. 3, 7
2025
-
[16]
Bootstrap your own latent: A new approach to self-supervised learning
Jean Bastien Grill, Florian Strub, Florent Altch ´e, Corentin Tallec, Pierre H Richemond, Elena Buchatskaya, Carl Do- ersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Moham- mad Gheshlaghi Azar, et al. Bootstrap your own latent: A new approach to self-supervised learning. InNeurIPS, 2020. 3
2020
-
[17]
On calibration of modern neural networks
Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. 2017. 6
2017
-
[18]
Contrastive learning from ex- tremely augmented skeleton sequences for self-supervised action recognition.AAAI, 2022
Tianyu Guo, Hong Liu, Zhan Chen, Mengyuan Liu, Tao Wang, and Runwei Ding. Contrastive learning from ex- tremely augmented skeleton sequences for self-supervised action recognition.AAAI, 2022. 1, 2, 3, 6, 7, 8
2022
-
[19]
Momentum contrast for unsupervised visual rep- resentation learning
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual rep- resentation learning. InCVPR, 2020. 3
2020
-
[20]
Global and local contrastive learning for self-supervised skeleton-based action recognition.IEEE TCSVT, 2024
Jinhua Hu, Yonghong Hou, Zihui Guo, and Jiajun Gao. Global and local contrastive learning for self-supervised skeleton-based action recognition.IEEE TCSVT, 2024. 3, 7
2024
-
[21]
Part aware contrastive learning for self-supervised action recognition
Yilei Hua, Wenhan Wu, Ce Zheng, Aidong Lu, Mengyuan Liu, Chen Chen, and Shiqian Wu. Part aware contrastive learning for self-supervised action recognition. InIJCAI,
-
[22]
Pastd: Progressive augmentation and spa- tiotemporal decoupling contrastive learning for skeleton- based action recognition
Qian Huang, Weiwen Qian, Chang Li, Gongyou Xu, and Zhongqi Chen. Pastd: Progressive augmentation and spa- tiotemporal decoupling contrastive learning for skeleton- based action recognition. InICASSP, 2025. 7
2025
-
[23]
A new representation of skeleton sequences for 3d action recognition
Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, and Farid Boussaid. A new representation of skeleton sequences for 3d action recognition. InCVPR, 2017. 3
2017
-
[24]
Lisa: Reasoning segmentation via large language model
Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, and Jiaya Jia. Lisa: Reasoning segmentation via large language model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9579–9589, 2024. 1
2024
-
[25]
3D human action rep- resentation learning via cross-view consistency pursuit
Linguo Li, Minsi Wang, Bingbing Ni, Hang Wang, Jiancheng Yang, and Wenjun Zhang. 3D human action rep- resentation learning via cross-view consistency pursuit. In CVPR, 2021. 3, 6, 7
2021
-
[26]
Yunxin Li, Zhenyu Liu, Zitao Li, Xuanyu Zhang, Zhenran Xu, Xinyu Chen, Haoyuan Shi, Shenyuan Jiang, Xintong Wang, Jifang Wang, et al. Perception, reason, think, and plan: A survey on large multimodal reasoning models.arXiv preprint arXiv:2505.04921, 2025. 1
-
[27]
Actionlet- dependent contrastive learning for unsupervised skeleton- based action recognition
Lilang Lin, Jiahang Zhang, and Jiaying Liu. Actionlet- dependent contrastive learning for unsupervised skeleton- based action recognition. InCVPR, 2023. 1, 2, 3, 5, 6, 7, 8
2023
-
[28]
Self-supervised skeleton representation learning via actionlet contrast and re- construct
Lilang Lin, Jiahang Zhang, and Jiaying Liu. Self-supervised skeleton representation learning via actionlet contrast and re- construct. 2025. 1, 7
2025
-
[29]
Skeleton-cutmix: Mixing up skeleton with probabilistic bone exchange for supervised domain adapta- tion.IEEE TIP, 2023
Hanchao Liu, Yuhe Liu, Tai-Jiang Mu, Xiaolei Huang, and Shi-Min Hu. Skeleton-cutmix: Mixing up skeleton with probabilistic bone exchange for supervised domain adapta- tion.IEEE TIP, 2023. 4
2023
-
[30]
Recovering complete actions for cross-dataset skeleton ac- tion recognition
Hanchao Liu, Yujiang Li, Tai-Jiang Mu, and Shi-Min Hu. Recovering complete actions for cross-dataset skeleton ac- tion recognition. InNeurIPS, 2024. 1
2024
-
[31]
Revealing key details to see differ- ences: A novel prototypical perspective for skeleton-based action recognition
Hongda Liu, Yunfan Liu, Min Ren, Hao Wang, Yunlong Wang, and Zhenan Sun. Revealing key details to see differ- ences: A novel prototypical perspective for skeleton-based action recognition. InCVPR, 2025. 3
2025
-
[32]
Jun Liu, Gang Wang, Ping Hu, Ling-Yu Duan, and Alex C. Kot. Global context-aware attention lstm networks for 3d action recognition. InCVPR, 2017. 3
2017
-
[33]
Ntu rgb+d 120: A large- scale benchmark for 3d human activity understanding
Jun Liu, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, and Alex C Kot. Ntu rgb+d 120: A large- scale benchmark for 3d human activity understanding. 2020. 2
2020
-
[34]
A benchmark dataset and comparison study for multi-modal human action analytics.ACM MM, 2020
Jiaying Liu, Sijie Song, Chunhui Liu, Yanghao Li, and Yueyu Hu. A benchmark dataset and comparison study for multi-modal human action analytics.ACM MM, 2020. 2
2020
-
[35]
Enhanced skele- ton visualization for view invariant human action recogni- tion.PR, 2017
Mengyuan Liu, Hong Liu, and Chen Chen. Enhanced skele- ton visualization for view invariant human action recogni- tion.PR, 2017. 3
2017
-
[36]
Cmd: Self-supervised 3d action representa- tion learning with cross-modal mutual distillation
Yunyao Mao, Wengang Zhou, Zhenbo Lu, Jiajun Deng, and Houqiang Li. Cmd: Self-supervised 3d action representa- tion learning with cross-modal mutual distillation. InECCV,
-
[37]
Masked motion predictors are strong 3d action representation learners
Yunyao Mao, Jiajun Deng, Wengang Zhou, Yao Fang, Wanli Ouyang, and Houqiang Li. Masked motion predictors are strong 3d action representation learners. InICCV, 2023. 7
2023
-
[38]
I 2md: 3d action repre- sentation learning with inter- and intra-modal mutual distil- lation.IJCV, 2024
Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang, and Houqiang Li. I 2md: 3d action repre- sentation learning with inter- and intra-modal mutual distil- lation.IJCV, 2024. 7
2024
-
[39]
Stars: Self-supervised tuning for 3d action recognition in skeleton sequences.arXiv:2407.10935,
Soroush Mehraban, Mohammad Javad Rajabi, Andrea Iaboni, and Babak Taati. Stars: Self-supervised tuning for 3d action recognition in skeleton sequences.arXiv:2407.10935,
-
[40]
Cooper, and Milos Hauskrecht
Mahdi Pakdaman Naeini, Gregory F. Cooper, and Milos Hauskrecht. Obtaining well calibrated probabilities using bayesian binning. InAAAI, 2015. 6
2015
-
[41]
Boosting few-shot 3d point cloud segmentation via query-guided enhancement
Zhenhua Ning, Zhuotao Tian, Guangming Lu, and Wenjie Pei. Boosting few-shot 3d point cloud segmentation via query-guided enhancement. InProceedings of the 31st ACM international conference on multimedia, pages 1895–1904,
1904
-
[42]
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Rep- resentation learning with contrastive predictive coding. arXiv:1807.03748, 2018. 3, 5
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[43]
Oa-cnns: Omni- adaptive sparse cnns for 3d semantic segmentation
Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Heng- shuang Zhao, Zhuotao Tian, and Jiaya Jia. Oa-cnns: Omni- adaptive sparse cnns for 3d semantic segmentation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21305–21315, 2024. 1
2024
-
[44]
Skeleton-based action recognition via spatial and temporal transformer networks.CVIU, 2021
Chiara Plizzari, Marco Cannici, and Matteo Matteucci. Skeleton-based action recognition via spatial and temporal transformer networks.CVIU, 2021. 3
2021
-
[45]
Llms are good action recognizers
Haoxuan Qu, Yujun Cai, and Jun Liu. Llms are good action recognizers. InCVPR, 2024. 1
2024
-
[46]
Halp: Hallucinating latent positives for skeleton-based self- supervised learning of actions
Anshul Shah, Aniket Roy, Ketul Shah, Shlok Kumar Mishra, David Jacobs, Anoop Cherian, and Rama Chellappa. Halp: Hallucinating latent positives for skeleton-based self- supervised learning of actions. InCVPR, 2023. 7
2023
-
[47]
Ntu rgb+d: A large scale dataset for 3d human activity anal- ysis
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. Ntu rgb+d: A large scale dataset for 3d human activity anal- ysis. InCVPR, 2016. 2
2016
-
[48]
Ex- plore the potential of clip for training-free open vocabulary semantic segmentation
Tong Shao, Zhuotao Tian, Hang Zhao, and Jingyong Su. Ex- plore the potential of clip for training-free open vocabulary semantic segmentation. InEuropean Conference on Com- puter Vision, pages 139–156. Springer, 2024. 1
2024
-
[49]
Two- stream adaptive graph convolutional networks for skeleton- based action recognition
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. Two- stream adaptive graph convolutional networks for skeleton- based action recognition. InCVPR, 2019. 3
2019
-
[50]
Decou- pled spatial-temporal attention network for skeleton-based action-gesture recognition
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. Decou- pled spatial-temporal attention network for skeleton-based action-gesture recognition. InACCV, 2020. 3
2020
-
[51]
Uni- fied multi-modal unsupervised representation learning for skeleton-based action understanding
Shengkai Sun, Daizong Liu, Jianfeng Dong, Xiaoye Qu, Junyu Gao, Xun Yang, Xun Wang, and Meng Wang. Uni- fied multi-modal unsupervised representation learning for skeleton-based action understanding. InACM MM, 2023. 7
2023
-
[52]
Towards efficient general feature prediction in masked skeleton modeling
Shengkai Sun, Zefan Zhang, Jianfeng Dong, Zhiyong Cheng, Xiaojun Chang, and Meng Wang. Towards efficient general feature prediction in masked skeleton modeling. In ICCV, 2025. 7
2025
-
[53]
Adaptive perspective distillation for semantic segmentation
Zhuotao Tian, Pengguang Chen, Xin Lai, Li Jiang, Shu Liu, Hengshuang Zhao, Bei Yu, Ming-Chang Yang, and Jiaya Jia. Adaptive perspective distillation for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 45(2):1372–1387, 2022. 1
2022
-
[54]
Generalized few-shot se- mantic segmentation
Zhuotao Tian, Xin Lai, Li Jiang, Shu Liu, Michelle Shu, Hengshuang Zhao, and Jiaya Jia. Generalized few-shot se- mantic segmentation. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 11563–11572, 2022
2022
-
[55]
Learning context-aware classifier for semantic segmentation
Zhuotao Tian, Jiequan Cui, Li Jiang, Xiaojuan Qi, Xin Lai, Yixin Chen, Shu Liu, and Jiaya Jia. Learning context-aware classifier for semantic segmentation. InProceedings of the AAAI conference on artificial intelligence, pages 2438–2446,
-
[56]
Groupcontrast: Semantic-aware self-supervised representation learning for 3d understanding
Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bo- hao Peng, Hengshuang Zhao, and Jiaya Jia. Groupcontrast: Semantic-aware self-supervised representation learning for 3d understanding. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 4917–4928, 2024. 1
2024
-
[57]
Heterogeneous skeleton-based action representation learn- ing
Hongsong Wang, Xiaoyan Ma, Jidong Kuang, and Jie Gui. Heterogeneous skeleton-based action representation learn- ing. InCVPR, 2025. 7
2025
-
[58]
Foundation model for skeleton-based human action understanding
Hongsong Wang, Wanjiang Weng, Junbo Wang, Fang Zhao, Guo-Sen Xie, Xin Geng, and Liang Wang. Foundation model for skeleton-based human action understanding. 2025. 1
2025
-
[59]
Declip: Decoupled learning for open- vocabulary dense perception
Junjie Wang, Bin Chen, Yulin Li, Bin Kang, Yichi Chen, and Zhuotao Tian. Declip: Decoupled learning for open- vocabulary dense perception. InProceedings of the Com- puter Vision and Pattern Recognition Conference, pages 14824–14834, 2025. 1
2025
-
[60]
Junjie Wang, Keyu Chen, Yulin Li, Bin Chen, Hengshuang Zhao, Xiaojuan Qi, and Zhuotao Tian. Generalized decou- pled learning for enhancing open-vocabulary dense percep- tion.arXiv preprint arXiv:2508.11256, 2025. 1
-
[61]
Skeleton-in-context: Unified skeleton sequence modeling with in-context learning
Xinshun Wang, Zhongbin Fang, Xia Li, Xiangtai Li, Chen Chen, and Mengyuan Liu. Skeleton-in-context: Unified skeleton sequence modeling with in-context learning. In CVPR, 2024. 1
2024
-
[62]
Usdrl: Unified skeleton-based dense represen- tation learning with multi-grained feature decorrelation
Wanjiang Weng, Hongsong Wang, Junbo Wang, Lei He, and Guosen Xie. Usdrl: Unified skeleton-based dense represen- tation learning with multi-grained feature decorrelation. In AAAI, 2025. 7
2025
-
[63]
Macdiff: Unified skeleton modeling with masked conditional diffusion
Lehong Wu, Lilang Lin, Jiahang Zhang, Yiyang Ma, and Ji- aying Liu. Macdiff: Unified skeleton modeling with masked conditional diffusion. InECCV, 2024. 7
2024
-
[64]
Skeletonmae: Spatial-temporal masked au- toencoders for self-supervised skeleton action recognition
Wenhan Wu, Yilei Hua, Ce Zheng, Shiqian Wu, Chen Chen, and Aidong Lu. Skeletonmae: Spatial-temporal masked au- toencoders for self-supervised skeleton action recognition
-
[65]
Towards large- scale 3d representation learning with multi-dataset point prompt training
Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu, and Hengshuang Zhao. Towards large- scale 3d representation learning with multi-dataset point prompt training. InCVPR, 2024. 1
2024
-
[66]
Attack-augmentation mixing-contrastive skeletal representation learning
Binqian Xu, Xiangbo Shu, Jiachao Zhang, Rui Yan, and Guo-Sen Xie. Attack-augmentation mixing-contrastive skeletal representation learning. 2024. 7
2024
-
[67]
Spatial tempo- ral graph convolutional networks for skeleton-based action recognition
Sijie Yan, Yuanjun Xiong, and Dahua Lin. Spatial tempo- ral graph convolutional networks for skeleton-based action recognition. InAAAI, 2018. 3
2018
-
[68]
Unified language-driven zero-shot domain adaptation
Senqiao Yang, Zhuotao Tian, Li Jiang, and Jiaya Jia. Unified language-driven zero-shot domain adaptation. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 23407–23415, 2024. 1
2024
-
[69]
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimiza- tion.arXiv:1710.09412, 2017. 3
work page internal anchor Pith review arXiv 2017
-
[70]
Hierarchi- cal consistent contrastive learning for skeleton-based action recognition with growing augmentations
Jiahang Zhang, Lilang Lin, and Jiaying Liu. Hierarchi- cal consistent contrastive learning for skeleton-based action recognition with growing augmentations. InAAAI, 2023. 1, 3
2023
-
[71]
Prompted con- trast with masked motion modeling: Towards versatile 3d action representation learning
Jiahang Zhang, Lilang Lin, and Jiaying Liu. Prompted con- trast with masked motion modeling: Towards versatile 3d action representation learning. InACM MM, 2023. 7
2023
-
[72]
Shap-mix: Shapley value guided mixing for long-tailed skeleton based action recognition
Jiahang Zhang, Lilang Lin, and Jiaying Liu. Shap-mix: Shapley value guided mixing for long-tailed skeleton based action recognition. InIJCAI, 2024. 4
2024
-
[73]
Jiahang Zhang, Lilang Lin, Shuai Yang, and Jiaying Liu. Self-supervised skeleton-based action representation learn- ing: A benchmark and beyond.arXiv:2406.02978, 2024. 1
-
[74]
Con- certo: Joint 2d-3d self-supervised learning emerges spatial representations
Yujia Zhang, Xiaoyang Wu, Yixing Lao, Chengyao Wang, Zhuotao Tian, Naiyan Wang, and Hengshuang Zhao. Con- certo: Joint 2d-3d self-supervised learning emerges spatial representations. InNeurIPS, 2025. 1
2025
-
[75]
Self-supervised action representation learning from partial spatio-temporal skeleton sequences
Yujie Zhou, Haodong Duan, Anyi Rao, Bing Su, and Jiaqi Wang. Self-supervised action representation learning from partial spatio-temporal skeleton sequences. InAAAI, 2023. 7
2023
-
[76]
Blockgcn: Redefining topology awareness for skeleton-based action recognition
Yuxuan Zhou, Xudong Yan, Zhi-Qi Cheng, Yan Yan, Qi Dai, and Xian-Sheng Hua. Blockgcn: Redefining topology awareness for skeleton-based action recognition. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 3
2024
-
[77]
Semantic-guided cross-modal prompt learning for skeleton-based zero-shot action recognition
Anqi Zhu, Jingmin Zhu, James Bailey, Mingming Gong, and Qiuhong Ke. Semantic-guided cross-modal prompt learning for skeleton-based zero-shot action recognition. InCVPR,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.